nach oben

1996 | Buch

Kapitel lesen Erstes Kapitel lesen

Mathematical Classification and Clustering

verfasst von: Boris Mirkin

Verlag: Springer US

Buchreihe : Nonconvex Optimization and Its Applications

Enthalten in: Professional Book Archive

Einloggen, um Zugang zu erhalten

Über dieses Buch

I am very happy to have this opportunity to present the work of Boris Mirkin, a distinguished Russian scholar in the areas of data analysis and decision making methodologies. The monograph is devoted entirely to clustering, a discipline dispersed through many theoretical and application areas, from mathematical statistics and combina torial optimization to biology, sociology and organizational structures. It compiles an immense amount of research done to date, including many original Russian de velopments never presented to the international community before (for instance, cluster-by-cluster versions of the K-Means method in Chapter 4 or uniform par titioning in Chapter 5). The author's approach, approximation clustering, allows him both to systematize a great part of the discipline and to develop many in novative methods in the framework of optimization problems. The optimization methods considered are proved to be meaningful in the contexts of data analysis and clustering. The material presented in this book is quite interesting and stimulating in paradigms, clustering and optimization. On the other hand, it has a substantial application appeal. The book will be useful both to specialists and students in the fields of data analysis and clustering as well as in biology, psychology, economics, marketing research, artificial intelligence, and other scientific disciplines. Panos Pardalos, Series Editor.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Classes and Clusters

Abstract

The concept of classification, along with its forms and purposes, is discussed.
A review of classification in the sciences is provided emphasizing the current extension-driven phase of its development.
Clustering is considered as data-based classification.
Three kinds of table data, column-conditional, comparable and aggregable, are defined.
A set of illustrative data sets are introduced, along with corresponding clustering problems.

Boris Mirkin

Chapter 2. Geometry of Data Sets

Abstract

Entity-to-variable data table can be represented geometrically in three different settings of which one (row-points) pertains to conventional clustering, another (column-vectors), to conceptual clustering, and the third one (matrix space), to approximation clustering.
Two principles for standardizing the conditional data tables are suggested as related to the data scatter.
Standardizing the aggregable data is suggested based on the flow index concept introduced.
Graph-theoretic concepts related to clustering are considered.
Low-rank approximation of data, including the popular Principal component and Correspondence analysis techniques, are discussed and extended into a general Sequential fitting procedure, SEFIT, which will be employed for approximation clustering.

Boris Mirkin

Chapter 3. Clustering Algorithms: a Review

Abstract

A review of clustering concepts and algorithms is provided emphasizing: (a) output cluster structure, (b) input data kind, and (c) criterion.
A dozen cluster structures is considered including those used in either supervised or unsupervised learning or both.
The techniques discussed cover such algorithms as nearest neighbor, K-Means (moving centers), agglomerative clustering, conceptual clustering, EM-algorithm, high-density clustering, and back-propagation.
Interpretation is considered as achieving clustering goals (partly, via presentation of the same data with both extensional and intensional forms of cluster structures).

Boris Mirkin

Chapter 4. Single Cluster Clustering

Abstract

Various approaches to comparing subsets are discussed.
Two approaches to direct single cluster clustering are described: seriation and moving center separation, which are reinterpreted as locally optimal algorithms for particular (mainly approximational) criteria.
A moving center algorithm is based on a novel concept of reference point: the cluster size depends on its distance from the reference point.
Five single cluster structures are considered in detail:
- Principal cluster as related to both seriation and moving center;
- Ideal fuzzy type cluster as modeling “ideal type” concept;
- Additive cluster as related to the average link seriation;
- Star cluster as a kind of cluster in a “non-geometrical” environment;
- Box cluster as a pair of interconnected subsets.
Approximation framework is shown quite convenient in both extending the algorithms to multi cluster clustering (overlapping permitted) and interpreting.

Boris Mirkin

Chapter 5. Partition: Square Data Table

Abstract

Forms of representing and comparing partitions are reviewed.
Mathematical analysis of some of the agglomerative clustering axioms is presented.
Approximation clustering methods for aggregating square data tables are suggested along with associated mathematical theories:
- Uniform partitioning as based on a “soft” similarity threshold;
- Structured partitioning (along with the structure of between-class associations);
- Aggregation of mobility and other aggregable interaction data as based on chi-squared criterion and underlying substantive modeling.

Boris Mirkin

Chapter 6. Partition: Rectangular Data Table

Abstract

Bilinear clustering for mixed — quantitative, nominal and binary — variables is proved to be a theory-motivated extension of K-Means method.
Decomposition of the data scatter into “explained” and “residual” parts is provided (for each of the two norms: sum of squares and moduli).
Contribution weights are derived to attack machine learning problems (conceptual description, selecting and transforming the variables, and knowledge discovery).
The explained data scatter parts related to nominal variables appear to coincide with the chi-squared Pearson coefficient and some other popular indices, as well.
Approximation (bi)-partitioning for contingency tables substantiates and extends some popular clustering techniques.

Boris Mirkin

Chapter 7. Hierarchy as a Clustering Structure

Abstract

Directions for representing and comparing hierarchies are discussed.
Clustering methods that are invariant under monotone dissimilarity transformations are analyzed.
Most recent theories and methods concerning such concepts as ultrametric, tree metric, Robinson matrix, pyramid, and weak hierarchy are presented.
A linear theory for binary hierarchy is proposed to allow decomposing the data entries, as well as covariances, by the clusters.

Boris Mirkin

Backmatter

Titel: Mathematical Classification and Clustering
verfasst von: Boris Mirkin
Verlag: Springer US
Electronic ISBN: 978-1-4613-0457-9
Print ISBN: 978-1-4613-8057-3
DOI: https://doi.org/10.1007/978-1-4613-0457-9

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Chapter 1. Classes and Clusters

Chapter 2. Geometry of Data Sets

Chapter 3. Clustering Algorithms: a Review

Chapter 4. Single Cluster Clustering

Chapter 5. Partition: Square Data Table

Chapter 6. Partition: Rectangular Data Table

Chapter 7. Hierarchy as a Clustering Structure

Backmatter