Skip to main content
Top

2016 | OriginalPaper | Chapter

9. Quality Measures in Clustering

Author : Israël César Lerman

Published in: Foundations and Methods in Combinatorial and Statistical Data Analysis and Clustering

Publisher: Springer London

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The construction of a clustering criterion depends on the nature of the data and the mathematical structure retained for its representation. We saw in Chap. 2 two criteria associated with two different methods: the “Central partition” and the “Dynamic adaptative” methods, respectively. A formal definition of a criterion in Data Analysis may be expressed as follows: “The structure \(\sigma \) on the set E concerned to which the data representation belongs (e.g. similarity coefficient on E) is more general than that \(\tau \) of the structure sought (partition or ordered chain of partitions on E). The general strategy consists of injecting the family \(\Theta \) of the \(\tau \) structures on E into the family \(\Sigma \) of the \(\sigma \) structures on E. The objective is then to determine that or those of the elements of \(\Theta \) which are the most comparable with \(\sigma (E)\) (\(\sigma \) calculated on E). For this purpose, a criterion is built. Quite generally, a criterion is a ranking (preorder) relation (possibly partial) on the set \(\Theta \). For the latter, \(\tau (E)\) is preferred to \(\tau '(E)\) if and only if, in a given sense, \(\tau (E)\) is nearer \(\sigma (E)\) than \(\tau '(E)\)”.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The rank notion used corresponds to that given in Sect. 3.​3.​3 of Chap. 3.
 
Literature
1.
go back to reference Benzécri, J.-P.: L’Analyse des Données, 1 La Taxinomie. Dunod (1973) Benzécri, J.-P.: L’Analyse des Données, 1 La Taxinomie. Dunod (1973)
2.
go back to reference Benzécri, J.P.: Analyse factorielle des proximités. Publications de l’Institut de Statistique de l’Université de Paris, (13–14):13:235–282, 14:219–246 (1964–1965) Benzécri, J.P.: Analyse factorielle des proximités. Publications de l’Institut de Statistique de l’Université de Paris, (13–14):13:235–282, 14:219–246 (1964–1965)
3.
go back to reference Bertin, J.: La Graphique et le traitement graphique de l’Information. Flammarion (1977) Bertin, J.: La Graphique et le traitement graphique de l’Information. Flammarion (1977)
4.
go back to reference Bertin, J.: Graphic and graphic information process, translated by William J. Berg and Paul Scott. de Gruyter (1981) Bertin, J.: Graphic and graphic information process, translated by William J. Berg and Paul Scott. de Gruyter (1981)
5.
go back to reference Daniels, H.E.: The relation between measures of correlation in the universe of sample permutations. Biometrika 33, 129–135 (1944)MathSciNetCrossRefMATH Daniels, H.E.: The relation between measures of correlation in the universe of sample permutations. Biometrika 33, 129–135 (1944)MathSciNetCrossRefMATH
6.
go back to reference de la Vega, W.F.: Techniques de classification automatique utilisant un indice de ressemblance. Revue Francaise de Sociologie, (8–4):506–520 (1967) de la Vega, W.F.: Techniques de classification automatique utilisant un indice de ressemblance. Revue Francaise de Sociologie, (8–4):506–520 (1967)
7.
go back to reference de la Vega, W.F.: Quelques propriétés des hiérarchies de classification. In Gardin, J.-C. (ed.) Archéologie et Calculateurs, pp. 329–343. Centre National de la Recherche Scientifique (1970) de la Vega, W.F.: Quelques propriétés des hiérarchies de classification. In Gardin, J.-C. (ed.) Archéologie et Calculateurs, pp. 329–343. Centre National de la Recherche Scientifique (1970)
8.
go back to reference Ganter, B., Wille, R.: Formal Concept Analysis, Mathematical Foundations. Springer, New York (1999)CrossRefMATH Ganter, B., Wille, R.: Formal Concept Analysis, Mathematical Foundations. Springer, New York (1999)CrossRefMATH
9.
go back to reference Ghazzali, N.: Comparaison et réduction d ’ arbres de classification, en relation avec des problèmes de quantification en imagerie numérique. Ph.D. thesis, Université de Rennes 1, mai 1992 Ghazzali, N.: Comparaison et réduction d ’ arbres de classification, en relation avec des problèmes de quantification en imagerie numérique. Ph.D. thesis, Université de Rennes 1, mai 1992
10.
go back to reference Govaert, G.: Classification croisée, Doctorat d’Etat. Ph.D. thesis, University of Paris 6 (1983) Govaert, G.: Classification croisée, Doctorat d’Etat. Ph.D. thesis, University of Paris 6 (1983)
11.
go back to reference Govaert, G.: La classification croisée. La Revue de Modulad 4:9–36 (1989) Govaert, G.: La classification croisée. La Revue de Modulad 4:9–36 (1989)
12.
go back to reference Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67(337), 123–129 (1972)CrossRef Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67(337), 123–129 (1972)CrossRef
13.
go back to reference Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)MATH Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)MATH
14.
go back to reference Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2, 283–304 (1998)CrossRef Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2, 283–304 (1998)CrossRef
15.
go back to reference Hubert, L.J.: Inference procedures for the evaluation and comparison of proximity matrices. In: Felsenstein, J. (ed.) Numerical Taxonomy, pp. 209–228. Springer, New York (1983)CrossRef Hubert, L.J.: Inference procedures for the evaluation and comparison of proximity matrices. In: Felsenstein, J. (ed.) Numerical Taxonomy, pp. 209–228. Springer, New York (1983)CrossRef
16.
go back to reference Kendall, D.G.: Seriation from abundance matrices. In: Kendall, D.G., Hodson, F.R., Tautu, P. (eds.) Mathematics in the Archaeological and Historical Sciences, pp. 215–252. Edinburgh University Press, Edinburgh (1971) Kendall, D.G.: Seriation from abundance matrices. In: Kendall, D.G., Hodson, F.R., Tautu, P. (eds.) Mathematics in the Archaeological and Historical Sciences, pp. 215–252. Edinburgh University Press, Edinburgh (1971)
17.
go back to reference Lefebvre, B., Losfeld, J.: Formalisation constructive de la notion de classe polythétique pour un tableau de données binaires. In Diday, E., et al. (eds.) Analyse des Données et Informatique. IRIA (1979) Lefebvre, B., Losfeld, J.: Formalisation constructive de la notion de classe polythétique pour un tableau de données binaires. In Diday, E., et al. (eds.) Analyse des Données et Informatique. IRIA (1979)
18.
go back to reference Leredde, H.: La méthode des pôles d’attraction; La méthode des pôles d’agrégation: deux nouvelles familles de classification automatique et sériation, Volume I: méthodes et exemples réels, Volume II: Programmes. Ph.D. thesis, University of Paris 6 (1979) Leredde, H.: La méthode des pôles d’attraction; La méthode des pôles d’agrégation: deux nouvelles familles de classification automatique et sériation, Volume I: méthodes et exemples réels, Volume II: Programmes. Ph.D. thesis, University of Paris 6 (1979)
19.
go back to reference Lerman, I.-C.: Les bases de la classification automatique. Gauthier-Villars (1970) Lerman, I.-C.: Les bases de la classification automatique. Gauthier-Villars (1970)
20.
go back to reference Lerman, I.C.: On two criteria of classification. In: Cole, A.J. (ed.) Numerical Taxonomy, pp. 114–128. Academic Press, New York (1969) Lerman, I.C.: On two criteria of classification. In: Cole, A.J. (ed.) Numerical Taxonomy, pp. 114–128. Academic Press, New York (1969)
21.
go back to reference Lerman, I.C.: Analyse du phénomène de la sériation. Revue Mathématique et Sciences Humaines 38:39–57 (1972) Lerman, I.C.: Analyse du phénomène de la sériation. Revue Mathématique et Sciences Humaines 38:39–57 (1972)
23.
go back to reference Lerman, I.C.: Group methodology in production management. Appl. Stoch. Models Data Anal. 2, 153–165 (1986)CrossRef Lerman, I.C.: Group methodology in production management. Appl. Stoch. Models Data Anal. 2, 153–165 (1986)CrossRef
24.
go back to reference Lerman, I.C.: Comparing classification tree structures: a special case of comparing \(q\)-ary relations. RAIRO Oper. Res. 33, 339–365 (1999)MathSciNetCrossRefMATH Lerman, I.C.: Comparing classification tree structures: a special case of comparing \(q\)-ary relations. RAIRO Oper. Res. 33, 339–365 (1999)MathSciNetCrossRefMATH
25.
go back to reference Lerman, I.C., Ghazzali, N.: What do we retain from a classification tree? In: Diday, E., Lechevallier, Y. (eds.) Symbolic-Numeric Data Analysis and Learning, pp. 27–42. Nova Science, New York (1991) Lerman, I.C., Ghazzali, N.: What do we retain from a classification tree? In: Diday, E., Lechevallier, Y. (eds.) Symbolic-Numeric Data Analysis and Learning, pp. 27–42. Nova Science, New York (1991)
26.
go back to reference Lerman, I.C., Leredde, H.: La méthode des pôles d’attraction. In Diday, E., et al. (eds.) Analyse des Données et Informatique, pp. 37–50. IRIA (1977) Lerman, I.C., Leredde, H.: La méthode des pôles d’attraction. In Diday, E., et al. (eds.) Analyse des Données et Informatique, pp. 37–50. IRIA (1977)
27.
go back to reference Lerman, I.C., Pinto da Costa, J., Silva, H.: Validation of very large data sets clustering by means of a nonparametric linear criterion. In: Bock, H.-H., Jajuga, K., Sokolowski, A. (eds.) Classification, Clustering and Data Analysis, pp. 147–157. Springer, New York (2002)CrossRef Lerman, I.C., Pinto da Costa, J., Silva, H.: Validation of very large data sets clustering by means of a nonparametric linear criterion. In: Bock, H.-H., Jajuga, K., Sokolowski, A. (eds.) Classification, Clustering and Data Analysis, pp. 147–157. Springer, New York (2002)CrossRef
28.
go back to reference Lerman, I.C., Rouxel, F.: Comparing classification tree structures: a special case of comparing \(q\)-ary relations ii. RAIRO Oper. Res. 34, 251–281 (2000)MathSciNetCrossRefMATH Lerman, I.C., Rouxel, F.: Comparing classification tree structures: a special case of comparing \(q\)-ary relations ii. RAIRO Oper. Res. 34, 251–281 (2000)MathSciNetCrossRefMATH
30.
go back to reference Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis. IEEE Trans. Comput. Biol. Bioinf. 1, 24–45 (2004)CrossRef Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis. IEEE Trans. Comput. Biol. Bioinf. 1, 24–45 (2004)CrossRef
31.
go back to reference Mannila, H.: Finding total and partial orders from data for seriation. Lect. Notes Comput. Sci. 5255, 16–25 (2008)CrossRef Mannila, H.: Finding total and partial orders from data for seriation. Lect. Notes Comput. Sci. 5255, 16–25 (2008)CrossRef
32.
go back to reference Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 2:159–179 (1985) Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 2:159–179 (1985)
33.
go back to reference Mollière, J.-L.: What’s the real number of clusters. In: Gaul, W., Schader, M. (eds.) Classification as a Tool of Research, pp. 311–320. North-Holland, Amsterdam (1986) Mollière, J.-L.: What’s the real number of clusters. In: Gaul, W., Schader, M. (eds.) Classification as a Tool of Research, pp. 311–320. North-Holland, Amsterdam (1986)
34.
35.
go back to reference Nicolau, M.H.: Analyse d’un algorithme de classification. Ph.D. thesis, University of Paris 6 (1972) Nicolau, M.H.: Analyse d’un algorithme de classification. Ph.D. thesis, University of Paris 6 (1972)
36.
go back to reference Pieraut-le, G., van Meter, K.: Étude génétique de la construction d’une propriété relationnelle: La propriété de passage. CNRS (1976) Pieraut-le, G., van Meter, K.: Étude génétique de la construction d’une propriété relationnelle: La propriété de passage. CNRS (1976)
37.
go back to reference Shepard, R.N.: The analysis of proximities: multidimensional scaling with unknown distance function. Psychometrika 27, 219–246 (1962)MathSciNetCrossRefMATH Shepard, R.N.: The analysis of proximities: multidimensional scaling with unknown distance function. Psychometrika 27, 219–246 (1962)MathSciNetCrossRefMATH
38.
go back to reference Tanay, A., Sharan, R., Shamir, R.: Biclustering algorithms: a survey. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology, pp. 1–20. Chapman, Boca Raton (2004) Tanay, A., Sharan, R., Shamir, R.: Biclustering algorithms: a survey. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology, pp. 1–20. Chapman, Boca Raton (2004)
39.
go back to reference Warren, S.S.: Cubic clustering criterion. Report A - 108, SAS Institute Inc. (1983) Warren, S.S.: Cubic clustering criterion. Report A - 108, SAS Institute Inc. (1983)
40.
go back to reference Wilderjans, T.F., Depril, D., Mechelen, I.V.: Additive biclustering: a comparison of one new and two existing ALS algorithms. J. Classif. 30, 56–74 (2013)MathSciNetCrossRef Wilderjans, T.F., Depril, D., Mechelen, I.V.: Additive biclustering: a comparison of one new and two existing ALS algorithms. J. Classif. 30, 56–74 (2013)MathSciNetCrossRef
41.
go back to reference Zahn, C.T.: Approximating symmetric relations by equivalence relations. S.I.A.M. J. Appl. Math. 12, 840–847 (1964)MathSciNetMATH Zahn, C.T.: Approximating symmetric relations by equivalence relations. S.I.A.M. J. Appl. Math. 12, 840–847 (1964)MathSciNetMATH
Metadata
Title
Quality Measures in Clustering
Author
Israël César Lerman
Copyright Year
2016
Publisher
Springer London
DOI
https://doi.org/10.1007/978-1-4471-6793-8_9

Premium Partner