Skip to main content

2013 | OriginalPaper | Buchkapitel

7. Structure Preserving Embedding of Dissimilarity Data

verfasst von : Volker Roth, Thomas J. Fuchs, Julia E. Vogt, Sandhya Prabhakaran, Joachim M. Buhmann

Erschienen in: Similarity-Based Pattern Analysis and Recognition

Verlag: Springer London

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Partitioning methods for observations represented by pairwise dissimilarities are studied. Particular emphasis is put on their properties when applied to dissimilarity matrices that do not admit a loss-free embedding into a vector space. Specifically, the Pairwise Clustering cost function is shown to exhibit a shift invariance property which basically means that any symmetric dissimilarity matrix can be modified to allow a vector-space representation without distorting the optimal group structure. In an approximate sense, the same holds true for a probabilistic generalization of Pairwise Clustering, the so-called Wishart–Dirichlet Cluster Process. This shift-invariance property essentially means that these clustering methods are “blind” against Euclidean or metric violations. From the application side, such blindness against metric violations might be seen as a highly desired feature, since it broadens the applicability of certain algorithms. From the viewpoint of theory building, however, the same property might be viewed as a “negative” result, since studying these algorithms will not lead to any new insights on the role of metricity in clustering problems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986) CrossRef Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986) CrossRef
2.
Zurück zum Zitat Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001) CrossRef Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001) CrossRef
3.
Zurück zum Zitat Roth, V., Laub, J., Kawanabe, M., Buhmann, J.M.: Optimal cluster preserving embedding of non-metric proximity data. IEEE Trans. Pattern Anal. Mach. Intell. 25(12) (2003) Roth, V., Laub, J., Kawanabe, M., Buhmann, J.M.: Optimal cluster preserving embedding of non-metric proximity data. IEEE Trans. Pattern Anal. Mach. Intell. 25(12) (2003)
4.
Zurück zum Zitat Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001) MATH Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001) MATH
5.
Zurück zum Zitat Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999) CrossRef Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999) CrossRef
6.
Zurück zum Zitat Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000) CrossRef Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000) CrossRef
7.
Zurück zum Zitat Puzicha, J., Hofmann, T., Buhmann, J.: A theory of proximity based clustering: structure detection by optimization. Pattern Recognit. 33(4), 617–634 (1999) CrossRef Puzicha, J., Hofmann, T., Buhmann, J.: A theory of proximity based clustering: structure detection by optimization. Pattern Recognit. 33(4), 617–634 (1999) CrossRef
8.
Zurück zum Zitat Hofmann, T., Buhmann, J.: Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 1–14 (1997) CrossRef Hofmann, T., Buhmann, J.: Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 1–14 (1997) CrossRef
9.
Zurück zum Zitat Brucker, P.: On the complexity of clustering problems. In: Beckman, M., Kunzi, H.P. (eds.) Optimization and Operations Research: Lecture Notes in Economics and Mathematical Systems, pp. 45–54. Springer, Berlin (1978) CrossRef Brucker, P.: On the complexity of clustering problems. In: Beckman, M., Kunzi, H.P. (eds.) Optimization and Operations Research: Lecture Notes in Economics and Mathematical Systems, pp. 45–54. Springer, Berlin (1978) CrossRef
10.
Zurück zum Zitat Torgerson, W.S.: Theory and Methods of Scaling. Wiley, New York (1958) Torgerson, W.S.: Theory and Methods of Scaling. Wiley, New York (1958)
11.
Zurück zum Zitat Young, G., Householder, A.S.: Discussion of a set of points in terms of their mutual distances. Psychometrika 3, 19–22 (1938) CrossRefMATH Young, G., Householder, A.S.: Discussion of a set of points in terms of their mutual distances. Psychometrika 3, 19–22 (1938) CrossRefMATH
12.
Zurück zum Zitat Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998) CrossRef Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998) CrossRef
14.
Zurück zum Zitat Vogt, J., Prabhakaran, S., Fuchs, T., Roth, V.: The translation-invariant Wishart–Dirichlet process for clustering distance data. In: Proceedings of the 27th International Conference on Machine Learning (2010) Vogt, J., Prabhakaran, S., Fuchs, T., Roth, V.: The translation-invariant Wishart–Dirichlet process for clustering distance data. In: Proceedings of the 27th International Conference on Machine Learning (2010)
15.
Zurück zum Zitat Prabhakaran, S., Boehm, A., Metzner, K.J., Roth, V.: Recovering networks from distance data. J. Mach. Learn. Res. 25, 349–364 (2012) Prabhakaran, S., Boehm, A., Metzner, K.J., Roth, V.: Recovering networks from distance data. J. Mach. Learn. Res. 25, 349–364 (2012)
16.
Zurück zum Zitat Pitman, J.: Combinatorial stochastic processes. In: Picard, J. (ed.) Ecole d’Ete de Probabilites de Saint-Flour XXXII-2002. Springer, Berlin (2006) Pitman, J.: Combinatorial stochastic processes. In: Picard, J. (ed.) Ecole d’Ete de Probabilites de Saint-Flour XXXII-2002. Springer, Berlin (2006)
17.
Zurück zum Zitat MacEachern, S.N.: Estimating normal means with a conjugate-style Dirichlet process prior. Commun. Stat., Simul. Comput. 23, 727–741 (1994) MathSciNetCrossRefMATH MacEachern, S.N.: Estimating normal means with a conjugate-style Dirichlet process prior. Commun. Stat., Simul. Comput. 23, 727–741 (1994) MathSciNetCrossRefMATH
18.
Zurück zum Zitat Dahl, D.B.: Sequentially-allocated merge-split sampler for conjugate and non-conjugate Dirichlet process mixture models. Technical report, Department of Statistics, Texas A&M University (2005) Dahl, D.B.: Sequentially-allocated merge-split sampler for conjugate and non-conjugate Dirichlet process mixture models. Technical report, Department of Statistics, Texas A&M University (2005)
20.
Zurück zum Zitat Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9, 249–265 (2000) MathSciNet Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9, 249–265 (2000) MathSciNet
21.
Zurück zum Zitat Blei, D., Jordan, M.: Variational inference for Dirichlet process mixtures. Bayesian Anal. 1, 121–144 (2006) MathSciNetCrossRef Blei, D., Jordan, M.: Variational inference for Dirichlet process mixtures. Bayesian Anal. 1, 121–144 (2006) MathSciNetCrossRef
22.
Zurück zum Zitat Srivastava, M.S.: Singular Wishart and multivariate beta distributions. Ann. Stat. (2003) Srivastava, M.S.: Singular Wishart and multivariate beta distributions. Ann. Stat. (2003)
23.
24.
Zurück zum Zitat Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman & Hall, London (2001) MATH Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman & Hall, London (2001) MATH
25.
Zurück zum Zitat Roth, V., Laub, J., Buhmann, J.M., Müller, K.-R.: Going metric: denoising pairwise data. In: Thrun, S., Becker, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 817–824. MIT Press, Cambridge (2003) Roth, V., Laub, J., Buhmann, J.M., Müller, K.-R.: Going metric: denoising pairwise data. In: Thrun, S., Becker, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 817–824. MIT Press, Cambridge (2003)
27.
Zurück zum Zitat Tannapfel, A., Hahn, H.A., Katalinic, A., Fietkau, R.J., Kühn, R., Wittekind, C.W.: Prognostic value of ploidy and proliferation markers in renal cell carcinoma. Cancer 77(1), 164–171 (1996) CrossRef Tannapfel, A., Hahn, H.A., Katalinic, A., Fietkau, R.J., Kühn, R., Wittekind, C.W.: Prognostic value of ploidy and proliferation markers in renal cell carcinoma. Cancer 77(1), 164–171 (1996) CrossRef
28.
Zurück zum Zitat Fuchs, T.J., Wild, P.J., Moch, H., Buhmann, J.M.: Computational pathology analysis of tissue microarrays predicts survival of renal clear cell carcinoma patients. In: Medical Image Computing and Computer-Assisted Intervention. MICCAI 2008. Lecture Notes in Computer Science, vol. 5242, pp. 1–8. Springer, Berlin (2008) CrossRef Fuchs, T.J., Wild, P.J., Moch, H., Buhmann, J.M.: Computational pathology analysis of tissue microarrays predicts survival of renal clear cell carcinoma patients. In: Medical Image Computing and Computer-Assisted Intervention. MICCAI 2008. Lecture Notes in Computer Science, vol. 5242, pp. 1–8. Springer, Berlin (2008) CrossRef
30.
Zurück zum Zitat Ahonen, T., Hadid, A., Pietikainen, M.: Face recognition with local binary patterns. In: ECCV 2004, vol. 3021, pp. 469–481 (2004) CrossRef Ahonen, T., Hadid, A., Pietikainen, M.: Face recognition with local binary patterns. In: ECCV 2004, vol. 3021, pp. 469–481 (2004) CrossRef
31.
Zurück zum Zitat Buhmann, J.M.: Information theoretic model validation for clustering. In: International Symposium on Information Theory, Austin Texas, pp. 1398–1402. IEEE Press, New York (2010). doi:10.1109/ISIT.2010.5513616 Buhmann, J.M.: Information theoretic model validation for clustering. In: International Symposium on Information Theory, Austin Texas, pp. 1398–1402. IEEE Press, New York (2010). doi:10.​1109/​ISIT.​2010.​5513616
Metadaten
Titel
Structure Preserving Embedding of Dissimilarity Data
verfasst von
Volker Roth
Thomas J. Fuchs
Julia E. Vogt
Sandhya Prabhakaran
Joachim M. Buhmann
Copyright-Jahr
2013
Verlag
Springer London
DOI
https://doi.org/10.1007/978-1-4471-5628-4_7