Skip to main content

1996 | Buch

Mathematical Classification and Clustering

verfasst von: Boris Mirkin

Verlag: Springer US

Buchreihe : Nonconvex Optimization and Its Applications

insite
SUCHEN

Über dieses Buch

I am very happy to have this opportunity to present the work of Boris Mirkin, a distinguished Russian scholar in the areas of data analysis and decision making methodologies. The monograph is devoted entirely to clustering, a discipline dispersed through many theoretical and application areas, from mathematical statistics and combina­ torial optimization to biology, sociology and organizational structures. It compiles an immense amount of research done to date, including many original Russian de­ velopments never presented to the international community before (for instance, cluster-by-cluster versions of the K-Means method in Chapter 4 or uniform par­ titioning in Chapter 5). The author's approach, approximation clustering, allows him both to systematize a great part of the discipline and to develop many in­ novative methods in the framework of optimization problems. The optimization methods considered are proved to be meaningful in the contexts of data analysis and clustering. The material presented in this book is quite interesting and stimulating in paradigms, clustering and optimization. On the other hand, it has a substantial application appeal. The book will be useful both to specialists and students in the fields of data analysis and clustering as well as in biology, psychology, economics, marketing research, artificial intelligence, and other scientific disciplines. Panos Pardalos, Series Editor.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Classes and Clusters
Abstract
  • The concept of classification, along with its forms and purposes, is discussed.
  • A review of classification in the sciences is provided emphasizing the current extension-driven phase of its development.
  • Clustering is considered as data-based classification.
  • Three kinds of table data, column-conditional, comparable and aggregable, are defined.
  • A set of illustrative data sets are introduced, along with corresponding clustering problems.
Boris Mirkin
Chapter 2. Geometry of Data Sets
Abstract
  • Entity-to-variable data table can be represented geometrically in three different settings of which one (row-points) pertains to conventional clustering, another (column-vectors), to conceptual clustering, and the third one (matrix space), to approximation clustering.
  • Two principles for standardizing the conditional data tables are suggested as related to the data scatter.
  • Standardizing the aggregable data is suggested based on the flow index concept introduced.
  • Graph-theoretic concepts related to clustering are considered.
  • Low-rank approximation of data, including the popular Principal component and Correspondence analysis techniques, are discussed and extended into a general Sequential fitting procedure, SEFIT, which will be employed for approximation clustering.
Boris Mirkin
Chapter 3. Clustering Algorithms: a Review
Abstract
  • A review of clustering concepts and algorithms is provided emphasizing: (a) output cluster structure, (b) input data kind, and (c) criterion.
  • A dozen cluster structures is considered including those used in either supervised or unsupervised learning or both.
  • The techniques discussed cover such algorithms as nearest neighbor, K-Means (moving centers), agglomerative clustering, conceptual clustering, EM-algorithm, high-density clustering, and back-propagation.
  • Interpretation is considered as achieving clustering goals (partly, via presentation of the same data with both extensional and intensional forms of cluster structures).
Boris Mirkin
Chapter 4. Single Cluster Clustering
Abstract
  • Various approaches to comparing subsets are discussed.
  • Two approaches to direct single cluster clustering are described: seriation and moving center separation, which are reinterpreted as locally optimal algorithms for particular (mainly approximational) criteria.
  • A moving center algorithm is based on a novel concept of reference point: the cluster size depends on its distance from the reference point.
  • Five single cluster structures are considered in detail:
    • Principal cluster as related to both seriation and moving center;
    • Ideal fuzzy type cluster as modeling “ideal type” concept;
    • Additive cluster as related to the average link seriation;
    • Star cluster as a kind of cluster in a “non-geometrical” environment;
    • Box cluster as a pair of interconnected subsets.
  • Approximation framework is shown quite convenient in both extending the algorithms to multi cluster clustering (overlapping permitted) and interpreting.
Boris Mirkin
Chapter 5. Partition: Square Data Table
Abstract
  • Forms of representing and comparing partitions are reviewed.
  • Mathematical analysis of some of the agglomerative clustering axioms is presented.
  • Approximation clustering methods for aggregating square data tables are suggested along with associated mathematical theories:
    • Uniform partitioning as based on a “soft” similarity threshold;
    • Structured partitioning (along with the structure of between-class associations);
    • Aggregation of mobility and other aggregable interaction data as based on chi-squared criterion and underlying substantive modeling.
Boris Mirkin
Chapter 6. Partition: Rectangular Data Table
Abstract
  • Bilinear clustering for mixed — quantitative, nominal and binary — variables is proved to be a theory-motivated extension of K-Means method.
  • Decomposition of the data scatter into “explained” and “residual” parts is provided (for each of the two norms: sum of squares and moduli).
  • Contribution weights are derived to attack machine learning problems (conceptual description, selecting and transforming the variables, and knowledge discovery).
  • The explained data scatter parts related to nominal variables appear to coincide with the chi-squared Pearson coefficient and some other popular indices, as well.
  • Approximation (bi)-partitioning for contingency tables substantiates and extends some popular clustering techniques.
Boris Mirkin
Chapter 7. Hierarchy as a Clustering Structure
Abstract
  • Directions for representing and comparing hierarchies are discussed.
  • Clustering methods that are invariant under monotone dissimilarity transformations are analyzed.
  • Most recent theories and methods concerning such concepts as ultrametric, tree metric, Robinson matrix, pyramid, and weak hierarchy are presented.
  • A linear theory for binary hierarchy is proposed to allow decomposing the data entries, as well as covariances, by the clusters.
Boris Mirkin
Backmatter
Metadaten
Titel
Mathematical Classification and Clustering
verfasst von
Boris Mirkin
Copyright-Jahr
1996
Verlag
Springer US
Electronic ISBN
978-1-4613-0457-9
Print ISBN
978-1-4613-8057-3
DOI
https://doi.org/10.1007/978-1-4613-0457-9