Skip to main content

2013 | OriginalPaper | Buchkapitel

Summary and Semi-average Similarity Criteria for Individual Clusters

verfasst von : Boris Mirkin

Erschienen in: Models, Algorithms, and Technologies for Network Analysis

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

There exists much prejudice against the within-cluster summary similarity criterion which supposedly leads to collecting all the entities in one cluster. This is not so if the similarity matrix is preprocessed by subtraction of “noise”, of which two ways, the uniform and modularity, are analyzed in the chapter. Another criterion under consideration is the semi-average within-cluster similarity, which manifests more versatile properties. In fact, both types of criteria emerge in relation to the least-squares data approximation approach to clustering, as shown in the chapter. A very simple local optimization algorithm, Add-and-Remove(S), leads to a suboptimal cluster satisfying some tightness conditions. Three versions of an iterative extraction approach are considered, leading to a portrayal of the cluster structure of the data. Of these, probably most promising is what is referred to as the injunctive clustering approach. Applications are considered to the analysis of semantics, to integrating different knowledge aspects and consensus clustering.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ayad, H., Kamel, M.: On voting-based consensus of cluster ensembles. Pattern Recognit. 43, 1943–1953 (2010) CrossRefMATH Ayad, H., Kamel, M.: On voting-based consensus of cluster ensembles. Pattern Recognit. 43, 1943–1953 (2010) CrossRefMATH
2.
Zurück zum Zitat Bader, G.D., Hogue, C.W.V.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 4, 2 (2003) CrossRef Bader, G.D., Hogue, C.W.V.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 4, 2 (2003) CrossRef
3.
Zurück zum Zitat Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. Comput. Biol. 6, 281–297 (1999) CrossRef Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. Comput. Biol. 6, 281–297 (1999) CrossRef
4.
Zurück zum Zitat Dimitriadou, E., Weingessel, A., Hornik, K.: A combination scheme for fuzzy clustering. In: Proceedings of the 2002 AFSS International Conference on Fuzzy Systems, pp. 332–338 (2002) Dimitriadou, E., Weingessel, A., Hornik, K.: A combination scheme for fuzzy clustering. In: Proceedings of the 2002 AFSS International Conference on Fuzzy Systems, pp. 332–338 (2002)
5.
Zurück zum Zitat Frumkina, R., Mirkin, B.: Sematics of domain-specific nouns: a psycho-linguistic approach. Not. Russ. Acad. Sci. Lang. Lit. 45(1), 12–22 (1986) (in Russian) Frumkina, R., Mirkin, B.: Sematics of domain-specific nouns: a psycho-linguistic approach. Not. Russ. Acad. Sci. Lang. Lit. 45(1), 12–22 (1986) (in Russian)
6.
Zurück zum Zitat Gallo, G., Grigoriadis, M.D., Tarjan, R.E.: A fast parametric maximum flow algorithm and applications. SIAM J. Comput. 18, 30–55 (1989) MathSciNetCrossRefMATH Gallo, G., Grigoriadis, M.D., Tarjan, R.E.: A fast parametric maximum flow algorithm and applications. SIAM J. Comput. 18, 30–55 (1989) MathSciNetCrossRefMATH
8.
Zurück zum Zitat Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared nearest neighbors. IEEE Trans. Comput. 22, 1025–1034 (1973) CrossRef Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared nearest neighbors. IEEE Trans. Comput. 22, 1025–1034 (1973) CrossRef
9.
Zurück zum Zitat Kernighan, B.W., Lin, S.: An eflicient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970) CrossRefMATH Kernighan, B.W., Lin, S.: An eflicient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970) CrossRefMATH
10.
Zurück zum Zitat Kupershtoh, V., Mirkin, B.: A problem for automatic classification. In: Bagrinowski, K. (ed.) Mathematical Methods for Economics, pp. 39–49. Siberian Branch of Nauka Publisher, Novosibirsk (1968) (in Russian) Kupershtoh, V., Mirkin, B.: A problem for automatic classification. In: Bagrinowski, K. (ed.) Mathematical Methods for Economics, pp. 39–49. Siberian Branch of Nauka Publisher, Novosibirsk (1968) (in Russian)
11.
Zurück zum Zitat Kupershtoh, V., Mirkin, B., Trofimov, V.: Sum of within partition similarities as a clustering criterion. Autom. Remote Control 37(2), 548–553 (1976) Kupershtoh, V., Mirkin, B., Trofimov, V.: Sum of within partition similarities as a clustering criterion. Autom. Remote Control 37(2), 548–553 (1976)
12.
Zurück zum Zitat Mirkin, B.: Analysis of Categorical Features. Finansy i Statistika, Moscow (1976). 166 pp. (in Russian) Mirkin, B.: Analysis of Categorical Features. Finansy i Statistika, Moscow (1976). 166 pp. (in Russian)
13.
Zurück zum Zitat Mirkin, B.: Additive clustering and qualitative factor analysis methods for similarity matrices. J. Classif., 4, 7–31 (1987). Erratum 6, 271–272 (1989) MathSciNetCrossRefMATH Mirkin, B.: Additive clustering and qualitative factor analysis methods for similarity matrices. J. Classif., 4, 7–31 (1987). Erratum 6, 271–272 (1989) MathSciNetCrossRefMATH
14.
Zurück zum Zitat Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic, Dordrecht (1996) CrossRefMATH Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic, Dordrecht (1996) CrossRefMATH
15.
Zurück zum Zitat Mirkin, B.: Core Concepts in Data Analysis: Summarization, Correlation, Visualization. Springer, London (2011) CrossRef Mirkin, B.: Core Concepts in Data Analysis: Summarization, Correlation, Visualization. Springer, London (2011) CrossRef
16.
Zurück zum Zitat Mirkin, B.: Clustering: A Data Recovery Approach, 2nd edn. Chapman and Hall, Boca Raton (2012) CrossRef Mirkin, B.: Clustering: A Data Recovery Approach, 2nd edn. Chapman and Hall, Boca Raton (2012) CrossRef
17.
Zurück zum Zitat Mirkin, B.G., Camargo, R., Fenner, T., Loizou, G., Kellam, P.: Similarity clustering of proteins using substantive knowledge and reconstruction of evolutionary gene histories in herpesvirus. Theor. Chem. Acc. 125(3–6), 569–581 (2010) CrossRef Mirkin, B.G., Camargo, R., Fenner, T., Loizou, G., Kellam, P.: Similarity clustering of proteins using substantive knowledge and reconstruction of evolutionary gene histories in herpesvirus. Theor. Chem. Acc. 125(3–6), 569–581 (2010) CrossRef
18.
Zurück zum Zitat Mirkin, B., Fenner, T., Galperin, M., Koonin, E.: Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3, 2 (2003). www.biomedcentral.com/1471-2148/3/2/ CrossRef Mirkin, B., Fenner, T., Galperin, M., Koonin, E.: Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3, 2 (2003). www.​biomedcentral.​com/​1471-2148/​3/​2/​ CrossRef
19.
Zurück zum Zitat Mirkin, B., Muchnik, I.: Geometric interpretation of clustering criteria. In: Mirkin, B. (ed.) Methods for Analysis of Multidimensional Economics Data, pp. 3–11. Nauka Publishers (Siberian Branch), Novosibirsk (1981) (in Russian) Mirkin, B., Muchnik, I.: Geometric interpretation of clustering criteria. In: Mirkin, B. (ed.) Methods for Analysis of Multidimensional Economics Data, pp. 3–11. Nauka Publishers (Siberian Branch), Novosibirsk (1981) (in Russian)
20.
Zurück zum Zitat Mirkin, B., Nascimento, S.: Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices. Inf. Sci. 183, 16–34 (2012) CrossRef Mirkin, B., Nascimento, S.: Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices. Inf. Sci. 183, 16–34 (2012) CrossRef
21.
Zurück zum Zitat Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103(23), 8577–8582 (2006) CrossRef Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103(23), 8577–8582 (2006) CrossRef
22.
Zurück zum Zitat Newman, M., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004) CrossRef Newman, M., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004) CrossRef
23.
Zurück zum Zitat Rosenberg, S., Kim, M.P.: The method of sorting as a data-gathering procedure in multivariate research. Multivar. Behav. Res. 10, 489–502 (1975) CrossRef Rosenberg, S., Kim, M.P.: The method of sorting as a data-gathering procedure in multivariate research. Multivar. Behav. Res. 10, 489–502 (1975) CrossRef
24.
Zurück zum Zitat Satarov, G.A.: A non-intrusive knowledge evaluation method. Personal communication (1981) Satarov, G.A.: A non-intrusive knowledge evaluation method. Personal communication (1981)
25.
Zurück zum Zitat Sevillano Dominguez, X., Socoro Carrie, J.C., Alias Pujol, F.: Fuzzy clusters combination by positional voting for robust document clustering. Procesamiento del Lenguaje Natural 43, 245–253 (2009) Sevillano Dominguez, X., Socoro Carrie, J.C., Alias Pujol, F.: Fuzzy clusters combination by positional voting for robust document clustering. Procesamiento del Lenguaje Natural 43, 245–253 (2009)
26.
Zurück zum Zitat Shepard, R.N., Arabie, P.: Additive clustering: representation of similarities as combinations of overlapping properties. Psychol. Rev. 86, 87–123 (1979) CrossRef Shepard, R.N., Arabie, P.: Additive clustering: representation of similarities as combinations of overlapping properties. Psychol. Rev. 86, 87–123 (1979) CrossRef
27.
Zurück zum Zitat Shestakov, A., Mirkin, B.G.: Least square consensus clustering: criteria, methods, experiments. In: Advances in Information Retrieval. LNCS, vol. 7814, pp. 764–767 (2013) CrossRef Shestakov, A., Mirkin, B.G.: Least square consensus clustering: criteria, methods, experiments. In: Advances in Information Retrieval. LNCS, vol. 7814, pp. 764–767 (2013) CrossRef
28.
Zurück zum Zitat Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000) CrossRef Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000) CrossRef
29.
Zurück zum Zitat Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24, 265–269 (1973) CrossRef Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24, 265–269 (1973) CrossRef
30.
Zurück zum Zitat Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002) MathSciNet Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002) MathSciNet
31.
Zurück zum Zitat Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X., Kellam, P.: Consensus clustering and functional interpretation of gene expression data. Genome Biol. 5, R94 (2004) CrossRef Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X., Kellam, P.: Consensus clustering and functional interpretation of gene expression data. Genome Biol. 5, R94 (2004) CrossRef
32.
Zurück zum Zitat Wang, H., Shan, H., Banerjee, A.: Bayesian cluster ensembles. In: Proceedings of the Ninth SIAM International Conference on Data Mining, pp. 211–222 (2009) Wang, H., Shan, H., Banerjee, A.: Bayesian cluster ensembles. In: Proceedings of the Ninth SIAM International Conference on Data Mining, pp. 211–222 (2009)
Metadaten
Titel
Summary and Semi-average Similarity Criteria for Individual Clusters
verfasst von
Boris Mirkin
Copyright-Jahr
2013
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-8588-9_8