Skip to main content
Erschienen in: Annals of Data Science 1/2018

20.02.2018

A Novel Multiview Topic Model to Compute Correlation of Heterogeneous Data

verfasst von: Jinsheng Shen, Mingmin Chi

Erschienen in: Annals of Data Science | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With fast development of Internet technologies and sensor techniques, it is much easier to acquire data from different sources in different dates and times. However, how to compute the correlation of those heterogeneous data is a big challenge for data mining and information retrieval. Here, data feature from one source is called as a view, and the multiview features denote the same data point. In the paper, hidden correlation of two-view features is proposed to construct a Heterogeneous (multiview) Topic Model (HTM). In particular, probabilistic topic model is utilized for different views as usually, generative models provide much richer features when handling high-dimensional data such as texts. Nevertheless, it is necessary to know the form of probability distribution for most existent probabilistic topic models, such as latent Dirichlet allocation. By avoiding the limitation of probabilistic topic model, the HTM is reduced to solving a non-negative matrix tri-factorization problem with certain constraints such that the proposed approach can be used in terms of an arbitrary model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 127–134 Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 127–134
2.
Zurück zum Zitat Buntine WL (2002) Variational extensions to EM and multinomial PCA. In: Proceedings of the 13th European conference on machine learning, ECML ’02, pp 23–34 Buntine WL (2002) Variational extensions to EM and multinomial PCA. In: Proceedings of the 13th European conference on machine learning, ECML ’02, pp 23–34
3.
Zurück zum Zitat Chang J, Blei D (2010) Hierarchical relational models for document networks. Ann Appl Stat 4(1):124–150CrossRef Chang J, Blei D (2010) Hierarchical relational models for document networks. Ann Appl Stat 4(1):124–150CrossRef
4.
Zurück zum Zitat Chen X, Zhou M, Carin L (2012) The contextual focused topic model. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 96–104 Chen X, Zhou M, Carin L (2012) The contextual focused topic model. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 96–104
5.
Zurück zum Zitat Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 126–135 Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 126–135
6.
Zurück zum Zitat Fama EF (1970) Efficient capital markets: a review of theory and empirical work. J Finance 25(2):383C417CrossRef Fama EF (1970) Efficient capital markets: a review of theory and empirical work. J Finance 25(2):383C417CrossRef
7.
Zurück zum Zitat Furnas GW, Deerwester S, Dumais ST, Landauer TK, Harshman RA, Streeter LA, Lochbaum KE (1988) Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceedings of the 11th annual international ACM SIGIR conference on research and development in information retrieval, pp 465–480 Furnas GW, Deerwester S, Dumais ST, Landauer TK, Harshman RA, Streeter LA, Lochbaum KE (1988) Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceedings of the 11th annual international ACM SIGIR conference on research and development in information retrieval, pp 465–480
8.
Zurück zum Zitat Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, UAI’99, pp 289–296 Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, UAI’99, pp 289–296
9.
Zurück zum Zitat Lee D, Seung H et al (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791CrossRef Lee D, Seung H et al (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791CrossRef
10.
Zurück zum Zitat Li T, Zhang Y, Sindhwani V (2009) A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: Volume 1–Volume 1, ACL ’09, pp 244–252 Li T, Zhang Y, Sindhwani V (2009) A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: Volume 1–Volume 1, ACL ’09, pp 244–252
11.
Zurück zum Zitat Nallapati R, Cohen W (2008) Link-PLSA-LDA: a new unsupervised model for topics and influence of blogs. In: Proceedings of the international conference on weblogs and social media (ICWSM). Association for the Advancement of Artificial Intelligence, pp 84–92 Nallapati R, Cohen W (2008) Link-PLSA-LDA: a new unsupervised model for topics and influence of blogs. In: Proceedings of the international conference on weblogs and social media (ICWSM). Association for the Advancement of Artificial Intelligence, pp 84–92
12.
Zurück zum Zitat Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P (2004) The author-topic model for authors and documents. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, pp 487–494 Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P (2004) The author-topic model for authors and documents. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, pp 487–494
13.
Zurück zum Zitat Stigler SM (1989) Francis galton’s account of the invention of correlation. Stat Sci 4(2):73C79CrossRef Stigler SM (1989) Francis galton’s account of the invention of correlation. Stat Sci 4(2):73C79CrossRef
14.
Zurück zum Zitat Wang H, Huang H, Ding C (2011) Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization. In: Proceedings of the 20th ACM international conference on information and knowledge management, pp 279–28 Wang H, Huang H, Ding C (2011) Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization. In: Proceedings of the 20th ACM international conference on information and knowledge management, pp 279–28
15.
Zurück zum Zitat Wang H, Nie F, Huang H, Makedon F (2011) Fast nonnegative matrix tri-factorization for large-scale data co-clustering. In: Proceedings of the twenty-second international joint conference on artificial intelligence–vol 2, pp 1553–1558 Wang H, Nie F, Huang H, Makedon F (2011) Fast nonnegative matrix tri-factorization for large-scale data co-clustering. In: Proceedings of the twenty-second international joint conference on artificial intelligence–vol 2, pp 1553–1558
16.
Zurück zum Zitat Zhang Y, Yeung D (2012) Overlapping community detection via bounded nonnegative matrix tri-factorization. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 606–614 Zhang Y, Yeung D (2012) Overlapping community detection via bounded nonnegative matrix tri-factorization. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 606–614
Metadaten
Titel
A Novel Multiview Topic Model to Compute Correlation of Heterogeneous Data
verfasst von
Jinsheng Shen
Mingmin Chi
Publikationsdatum
20.02.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Annals of Data Science / Ausgabe 1/2018
Print ISSN: 2198-5804
Elektronische ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-017-0135-y

Weitere Artikel der Ausgabe 1/2018

Annals of Data Science 1/2018 Zur Ausgabe

Premium Partner