nach oben

Pattern Analysis and Applications

Erschienen in:

01.08.2016 | Short Paper

Data visualization via latent variables and mixture models: a brief survey

verfasst von: Rodolphe Priam, Mohamed Nadif

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In the literature, data visualization is extensively studied via diverse parametric probabilistic distributions for the exploration of continuous, binary, and counting data. An overview of the existing methods for non-symmetric data matrices is presented in an unified framework via the Bernoulli law and binary variables. An extension to continuous or counting variables is available by using instead any another univariate distribution such as the Poisson or Gaussian one. Several approaches are possible when the model is with a distribution on the rows, the columns, the row clusters, the column clusters, the cells, the blocks, or a transformed matrix of the distances from the pairs of rows or columns. The objective functions are presented with their full expressions in separated sections, one for each method: Kohonen’s map and related methods of self-organizing maps, generative topographic mapping as a probabilistic self-organizing map, linear principal component analysis and related matricial methods (non-negative factorization, factorization), probabilistic parametric embedding, probabilistic latent semantic visualization, latent cluster position model, t-distributed stochastic neighbor embedding. The conclusion is a discussion of the contribution with perspectives.

Vorheriger Artikel Hidden Markov models for gene sequence classification

Nächster Artikel COLOR CHILD: a novel color image local descriptor for texture classification and segmentation

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Note that an optimal parameter \(\lambda\) may be found with a measure of the quality of the mapping as proposed in [20].

This distribution illustrates well the double purpose of a visual representation. Here, only the central positions are visualized and the corresponding sampled data just shown at the same coordinates. This can be summarized by (1) First a non-soft clustering (2) A projection of the cluster means. This is the global part to achieve in order to get a skeleton of the data cloud. The local part comes with the fuzzification when the data vectors scatter around the mean centers.

The R package named VBLCPM was used for training the model. The convergence has been observed after less than 61 steps.

The representation has to keep enough 1) the local relations of vicinity in the data cloud to access similar data in a same area of the map, 2) the global relations which make the shape and the form of the data cloud in order to access a suitable view of its appearance and also of the relative distances between the different sub-structures. When classes exist, it might be preferred the higher possible separation for their projections, without canceling the visual information on nearest neighbors for the points projections.

Ambroise C, Govaert G (1996) Constrained clustering and kohonen self-organizing maps. J Classif 13(2):299–313MathSciNetCrossRefMATH

Anouar F, Badran F, Thiria S (1997) Self organizing map: a probabilistic approach. In: Proceedings of the WSOM'07, Finland, pp 339–344

Bacciu D, Micheli A, Sperduti A (2012) Compositional generative mapping for tree-structured data - part I: bottom-up probabilistic modeling of trees. IEEE Trans Neural Netw Learn Syst 23(12):1987–2002CrossRef

Baek J, McLachlan G, Flack LK (2009) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32(7):1298–1309

Bakker R, Poole KT (2013) Bayesian metric multidimensional scaling. Polit Anal 21(1):125–140CrossRef

Banerjee A, Dhillon IS, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises-Fisher distributions. J Mach Learn Res 6:1345–1382MathSciNetMATH

Basseville M (2013) Divergence measures for statistical data processing—An annotated bibliography. Signal Process 93(4):621–633

Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comp 15(6):1373–1396CrossRefMATH

Bilmes JA (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Tech. rep., ICSI, U.C. Berkeley

10.

Bingham E, Kabán A, Fortelius M (2009) The aspect Bernoulli model: multiple causes of presences and absences. Pat Anal Appl 12(1):55–78MathSciNetCrossRef

11.

Bingham E, Mannila H (2001) Random projection in dimensionality reduction: Applications to image and text data. In: KDD ’01: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 245–250

12.

Bishop C, Svensén M, Williams CKI (1997) Magnification factors for the gtm algorithm. In: Fifth international conference on artificial neural networks, pp 64–69

13.

Bishop CM, Svensén M, Williams CKI (1997) GTM: a principled alternative to the self-organizing map. In: Advances in neural information processing systems 9, pp 354–360

14.

Bishop CM, Svensén M, Williams CKI (1998) Developments of the generative topographic mapping. Neurocomputing 21:203–224CrossRefMATH

15.

Careirra-Perpinan MA, Lu Z (2007) The Laplacian eigenmaps latent variable model. In: Proceedings of the Eleventh international conference on artificial intelligence and statistics (AISTATS -7), pp 59–66

16.

Carreira-Perpina MA (2010) The elastic embedding algorithm for dimensionality reduction. In: Proceedings of the 27th international conference on machine learning (ICML '10), pp 167–174

17.

Carter KM, Raich R, Hero AO (2008) FINE: Information embedding for document classification. In: ICASSP, pp 1861–1864

18.

Chaibi A, Lebbah M, Azzag H (2013) A new bi-clustering approach using topological maps. In: Neural Networks (IJCNN), pp 1–7

19.

Chang KY, Ghosh J (2001) A unified model for probabilistic principal surfaces. IEEE Trans Pattern Anal Mach Intell 23(1):22–41CrossRef

20.

Chen L, Buja A (2009) Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. J Am Stat Assoc 104(485):209–219MathSciNetCrossRefMATH

21.

Choi JY, Qiu J, Pierce M, Fox G (2010) Generative topographic mapping by deterministic annealing. Procedia Comp Sci 1(1):47–56CrossRef

22.

Cruz-Barbosa R, Vellido A (2010) Semi-supervised geodesic generative topographic mapping. Pattern Recogn Lett 31(3):202–209CrossRef

23.

Daudin JJ, Picard F, Robin S (2008) A mixture model for random graphs. Stat Comp 18(2):173–183MathSciNetCrossRef

24.

Deerwester S, Dumais ST, Furnas GW, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Info Sci 41(6):391–407CrossRef

25.

Dempster AP, Laird NM, Rubin DB (1977) Maximum-likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39:1–38MathSciNetMATH

26.

Ding C, Li T, Peng W (2008) On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comp Stat Data Anal 52:3913–3927MathSciNetCrossRefMATH

27.

Estévez PA, Figueroa CJ, Saito K (2005) Special issue: Cross-entropy embedding of high-dimensional data using the neural gas model. Neural Netw 18(5–6):727–737CrossRef

28.

Fevotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the itakura-saito divergence. with application to music analysis. Neural Comp 21(3):793–830

29.

Fort JC, Letremy P, Cottrell M (2002) Advantages and drawbacks of the batch kohonen algorithm. In: ESANN, pp 223–230

30.

Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comp J 41(8):578–588CrossRefMATH

31.

Girolami M (2001) The topographic organization and visualization of binary data using multivariate-bernoulli latent variable models. IEEE Trans Neural Netw 12(6):1367–1374CrossRef

32.

Gisbrecht A, Mokbel B, Hammer B (2011) Relational generative topographic mapping. Neurocomputing 74(9):1359–1371CrossRef

33.

Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2005) Neighbourhood components analysis. In: Advances in neural information processing systems 17, pp 513–520

34.

Gollini I, Murphy TB (2014) Mixture of latent trait analyzers for model-based clustering of categorical data. Stat Comp 24(4):569–588MathSciNetCrossRefMATH

35.

Govaert G (1983) Classification croisée. Thèse d’état, Université Paris 6, France

36.

Govaert G (1995) Simultaneous clustering of rows and columns. Control Cybern 24(4):437–458MATH

37.

Govaert G, Nadif M (2003) Clustering with block mixture models. Patt Recogn 36(2):463–473CrossRefMATH

38.

Govaert G, Nadif M (2005) An EM algorithm for the block mixture model. IEEE Trans Pattern Anal Mach Intell 27(4):643–647CrossRefMATH

39.

Gupta MR, Chen Y (2011) Theory and use of the em algorithm. Found Trends Signal Process 4(3):223–296CrossRefMATH

40.

Handcock MS, Raftery AE, Tantrum JM (2007) Model-based clustering for social networks. J Royal Stat Soc A 170(2):301–354MathSciNetCrossRef

41.

Hathaway RJ (1986) Another interpretation of the EM algorithm for mixture distributions. Stat Probab Lett 4(2):53–56MathSciNetCrossRefMATH

42.

Hernandez-lobato JM, Houlsby N, Ghahramani Z (2014) Stochastic inference for scalable probabilistic modeling of binary matrices. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp 379–387

43.

Hinton G, Roweis S (2003) Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems 15, pp 857–864

44.

Hoff PD, Raftery AE, Handcock MS (2002) Latent space approaches to social network analysis. J Am Stat Assoc 97(460):1090–1098

45.

Hofmann T (2000) ProbMap - a probabilistic approach for mapping large document collections. Intell Data Anal 4(2):149–164MATH

46.

Hofmann T, Puzicha J (1998) Statistical models for co-occurrence data. Tech. Rep. AIM-1625, MIT

47.

Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36(3):1171–1220MathSciNetCrossRefMATH

48.

Hsu CC (2006) Generalizing self-organizing map for categorical data. IEEE Trans Neural Netw 17(2):294–304

49.

Iwata T, Saito K, Ueda N, Stromsten S, Griffiths TL, Tenenbaum JB (2007) Parametric embedding for class visualization. Neural Comp 19(9):2536–2556CrossRefMATH

50.

Iwata T, Yamada T, Ueda N (2008) Probabilistic latent semantic visualization: topic model for visualizing documents. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining., KDD ’08, New York, pp 363–371

51.

Jolliffe I (2002) Principal component analysis. Springer Verlag

52.

Juan A, Vidal E (2002) On the use of bernoulli mixture models for text classification. Patt Recogn 35(12):2705–2710CrossRefMATH

53.

Kabán A (2007) Predictive modelling of heterogeneous sequence collections by topographic ordering of histories. Mach Learn 68(1):63–95CrossRef

54.

Kabán A, Bingham E, Hirsimaki T (2004) Learning to read between the lines: the aspect bernoulli model. In: Proceedings of the 2004 SIAM International Conference on Data Mining (SIAM DM04), pp 462–66

55.

Kabán A, Girolami M (2001) A combined latent class and trait model for analysis and visualisation of discrete data. IEEE Trans Pattern Anal Mach Intell 23(8):859–872

56.

Kabán A, Sun J, Raychaudhury S, Nolan L (2006) On class visualisation for high dimensional data: Exploring scientific data sets. In: Proceedings of the 9th International Conference on Discovery Science, DS’06, Springer-Verlag pp 125–136

57.

Kiang MY (2001) Extending the kohonen, self-organizing map networks for clustering analysis. Comp Stat Data Anal 38(2):161–180MathSciNetCrossRefMATH

58.

Klock H, Buhmann JM (2000) Data visualization by multidimensional scaling: a deterministic annealing approach. Patt Recogn 33(4):651–669

59.

Kohonen T (1997) Self-organizing maps. Springer

60.

Kozma L, Ilin A, Raiko T (2009) Binary principal component analysis in the netflix collaborative filtering task. In: IEEE International Workshop on Machine Learning for Signal Processing, pp 1–6

61.

Lawrence ND (2005) Probabilistic non-linear principal component analysis with gaussian process latent variable models. J Mach Learn Res 6:1783–1816MathSciNetMATH

62.

Lawrence ND (2012) A unifying probabilistic perspective for spectral dimensionality reduction: Insights and new models. J Mach Learn Res 13:1609–1638MathSciNetMATH

63.

Le TV, Lauw HW (2014) Manifold learning for jointly modeling topic and visualization. In: Twenty-Eighth AAAI Conference on Artificial Intelligence, pp 1960–1967

64.

Le TV, Lauw HW (2014) Probabilistic latent document network embedding. In: ICDM, pp 270–279

65.

Lebart L, Morineau A, Warwick K (1984) Multivariate descriptive statistical analysis. Wiley

66.

Lebbah M, Rogovschi N, Bennani Y (2007) Besom : Bernoulli on self-organizing map. In: IJCNN, pp 631–636

67.

Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems 13, pp 556–562

68.

Lee JA, Renard E, Bernard G, Dupont P, Verleysen M (2013) Type 1 and 2 mixtures of Kullback-Leibler divergences as cost functions in dimensionality reduction based on similarity preservation. Neurocomputing 112:92–108

69.

Lee JA, Verleysen M (2009) Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing 72(7–9):1431–1443CrossRef

70.

Lee JA, Verleysen M (2010) Scale-independent quality criteria for dimensionality reduction. Pattern Recogn Lett 31(14):2248–2257CrossRef

71.

Lee S, Huang JZ, Hu J (2010) Sparse logistic principal components analysis for binary data. Ann Appl Stat 4(3):1579–1601MathSciNetCrossRefMATH

72.

López-Rubio E (2010) Probabilistic self-organizing maps for qualitative data. Neural Netw 23(10):1208–1225CrossRef

73.

Luttrell SP (1994) A bayesian analysis of self-organising maps. Neural Comp 6(5):767–794

74.

van der Maaten L, Hinton G (2008) Visualizing Data using t-SNE. J Mach Learn Res 9:2579–2605MATH

75.

Makarenkov V, Legendre P (2002) Nonlinear Redundancy Analysis and Canonical Correspondence Analysis Based on Polynomial Regression. Ecology 83(4):1146–1161

76.

Maniyar D, Nabney I (2006) Data visualization with simultaneous feature selection. In: Computational Intelligence and Bioinformatics and Computational Biology CIBCB ’06, pp 1–8

77.

McLachlan GJ, Basford KE (1988) Mixture Models. Inference and applications to clustering. Marcel Dekker, New YorkMATH

78.

McLachlan GJ, Peel D (2000) Finite Mixture Models. Wiley, New YorkCrossRefMATH

79.

Mirisaee SH, Gaussier E, Termier A (2015) Improved local search for binary matrix factorization. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp 1198–1204

80.

Noack A (2003) Energy models for drawing clustered small-world graphs. Tech. rep, BTU Cottbus

81.

Oh MS, Raftery AE (2001) Bayesian Multidimensional Scaling and Choice of Dimension. J Am Stat Assoc 96(455):1031–1044

82.

Olier I, Vellido A (2008) Advances in clustering and visualization of time series using GTM through time. Neural Netw 21(7):904–913CrossRefMATH

83.

Olier I, Vellido A (2008) Variational bayesian generative topographic mapping. J Math Model Algor 7(4):371–387MathSciNetCrossRefMATH

84.

Olier, I, Vellido A, Giraldo J (2010) Kernel generative topographic mapping. In: ESANN, pp 481–486

85.

Paatero P (1997) Least squares formulation of robust non-negative factor analysis. Chemom Intell Lab Syst 37(1):23–35

86.

Park M, Jitkrittum W, Qamar A, Szabo Z, Buesing L, Sahani M (2015) Bayesian Manifold Learning: Locally Linear Latent Variable Model (LL-LVM). Arxiv preprint. http://arxiv.org/pdf/1410.6791v3.pdf

87.

Priam R (2005) CASOM: Som for contingency tables and biplot. In: Proceedings of the WSOM'05, Paris, pp 379–386

88.

Priam R, Nadif M (2012) Generative topographic mapping and factor analyzers. In: ICPRAM (1), pp 284–287

89.

Priam R, Nadif M, Govaert G (2014) Topographic bernoulli block mixture mapping for binary tables. Patt Anal Appl 17(4):839–847MathSciNetCrossRefMATH

90.

Priam R, Nadif M, Govaert G (2015) Generalized topographic block model. Neurocomputing (in press)

91.

Roweis S, Ghahramani Z (1999) A unifying review of linear gaussian models. Neural Comp 11(2):305–345CrossRef

92.

Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326CrossRef

93.

Salakhutdinov R, Mnih A (2008) Probabilistic matrix factorization. In: Advances in Neural Information Processing Systems 20, pp 1257–1264

94.

Salter-Townshend M, Murphy TB (2013) Variational bayesian inference for the latent position cluster model for network data. Comp Stat Data Anal 57(1):661–671MathSciNetCrossRef

95.

Salter-Townshend M, White A, Gollini I, Murphy TB (2012) Review of statistical network analysis: models, algorithms, and software. Stat Anal Data Mining 5(4):243–264MathSciNetCrossRef

96.

Sammon J (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comp C-18(5):401–409CrossRef

97.

Schein AI, Saul LK, Ungar LH (2003) A Generalized Linear Model for Principal Component Analysis of Binary Data. In: Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (AISTATS-9), pp 14–21

98.

Silvestre C, Cardoso M, Figueiredo M (2014) Identifying the number of clusters in discrete mixture models. Arxiv preprint. http://arxiv.org/pdf/1409.7419.pdf

99.

Singh AP, Gordon GJ (2008) A unified view of matrix factorization models. In: ECML PKDD, LNAI 5212, pp 358–373

100.

Stulp F, Sigaud O (2015) Many regression algorithms, one unified model: a review. Neural Netw 69:60–79

101.

Sun S (2013) A review of deterministic approximate inference techniques for bayesian machine learning. Neural Comp Appl 23(7):2039–2050CrossRef

102.

Symons MJ (1981) Clustering criteria and multivariate normal mixtures. Biometrics 37:35–43MathSciNetCrossRefMATH

103.

Tino P, Nabney I (2002) Hierarchical GTM: constructing localized nonlinear projection manifolds in a principled way. IEEE Trans Pattern Anal Mach Intell 24(5):639–656CrossRef

104.

Tino P, Nabney I, Sun Y (2001) Using directional curvatures to visualize folding patterns of the GTM projection manifolds. In: ICANN, pp 421–428

105.

Tipping ME (1999) Probabilistic visualisation of high-dimensional binary data. In: Advances in Neural Information Processing Systems 11, pp 592–598

106.

Tipping ME, Bishop CM (1999) Mixtures of probabilistic principal component analyzers. Neural Comp 11(2):443–482CrossRef

107.

Tipping ME, Bishop CM (1999) Probabilistic principal component analysis. J Royal Stat Soc B 61(3):611–622MathSciNetCrossRefMATH

108.

Titsias MK, Lawrence ND (2010) Bayesian gaussian process latent variable model. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS-10), 9:844–851

109.

Hastie T, Stuetzle W (1989) Principal Curves. J Am Stat Assoc 84(406):502–516

110.

Utsugi A (1997) Hyperparameter selection for self-organizing maps. Neural Comp 9:623–635CrossRef

111.

Utsugi A (2000) Bayesian sampling and ensemble learning in generative topographic mapping. Neural Process Lett 12(3):277–290CrossRefMATH

112.

Van Hulle M (2012) Self-organizing maps. In: Handbook of Natural Computing. Springer, Berlin, Heidelberg, pp 585–622

113.

Vellido A (2006) Assessment of an unsupervised feature selection method for generative topographic mapping. In: ICANN, pp 361–370

114.

Vellido A, El-Deredy W, Lisboa PJG (2003) Selective smoothing of the generative topographic mapping. IEEE Trans Neural Networks 14(4):847–852CrossRef

115.

Verbeek JJ, Vlassis N, Krose BJA (2002) The generative self-organizing map: a probabilistic generalization of Kohonen’s SOM. Tech. rep., IAS-UVA-02-03

116.

Willenbockel CT, Schütte C (2015) A variational bayesian algorithm for clustering of large and complex networks. Tech. Rep. 15-25, ZIB

117.

Yamaguchi N (2012) Variational bayesian inference with automatic relevance determination for generative topographic mapping. In: SCIS-ISIS, pp 2124–2129

118.

Yin H (2008) The self-organizing maps: Background, theories, extensions and applications. In: Computational intelligence: a compendium, studies in computational intelligence, vol 115. Springer, Berlin, Heidelberg, pp 715–762CrossRef

Titel: Data visualization via latent variables and mixture models: a brief survey
verfasst von: Rodolphe Priam
Mohamed Nadif
Publikationsdatum: 01.08.2016
Verlag: Springer London
Erschienen in: Pattern Analysis and Applications / Ausgabe 3/2016
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-015-0521-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2016

Two density-based k-means initialization algorithms for non-metric data clustering

SWGMM: a semi-wrapped Gaussian mixture model for clustering of circular–linear data

Hyperplane arrangements for the fast matching and classification of visual landmarks

Scale-space module detection for random fields observed on a graph non-embedded in a metric space

Event Pattern Analysis and Prediction at Sentence Level using Neuro-Fuzzy Model for Crime Event Detection

Explicit and implicit employment of edge-related information in super-resolving distant faces for recognition