Skip to main content
Top

2016 | OriginalPaper | Chapter

Labeled Phrase Latent Dirichlet Allocation

Authors : Yi-Kun Tang, Xian-Ling Mao, Heyan Huang

Published in: Web Information Systems Engineering – WISE 2016

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In recent years, topic modeling, such as Latent Dirichlet Allocation (LDA) and its variations, has been widely used to discover the abstract topics in text corpora. There are two state-of-the-art topic models: Labeled LDA (LLDA) and PhraseLDA. LLDA is a supervised generative model which considers the label information, but it does not take into consideration word order under the bag-of-words assumption. On the contrary, PhraseLDA regards each document as a mixture of phrases, which partly considers the word order. However, PhraseLDA cannot model the supervised label information. In this paper, in order to overcome the defects of two models above while combining their merits, we propose a novel topic model, called Labeled Phrase LDA, which synchronously considers the supervised information and word order. Lots of experiments were conducted among the proposed model and two state-of-the-art models, which show the proposed model significantly outperforms baselines in terms of case study, perplexity and scalability.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res., 993–1022 (2003) Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res., 993–1022 (2003)
2.
go back to reference Ramage, D., et al.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. Empirical Methods in Natural Language Processing (2009) Ramage, D., et al.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. Empirical Methods in Natural Language Processing (2009)
3.
go back to reference Elkishky, A., et al.: Scalable topical phrase mining from text corpora. In: Proceedings of the Vldb Endowment 8.3, pp. 305–316 (2014) Elkishky, A., et al.: Scalable topical phrase mining from text corpora. In: Proceedings of the Vldb Endowment 8.3, pp. 305–316 (2014)
4.
go back to reference Blei, D., Mcauliffe, J.: Supervised Topic Models. Neural Information Processing Systems (2008) Blei, D., Mcauliffe, J.: Supervised Topic Models. Neural Information Processing Systems (2008)
5.
go back to reference Lacostejulien, S., Sha, F., Ijordan, M.: DiscLDA: discriminative learning for dimensionality reduction and classification. In: Neural Information Processing Systems (2009) Lacostejulien, S., Sha, F., Ijordan, M.: DiscLDA: discriminative learning for dimensionality reduction and classification. In: Neural Information Processing Systems (2009)
6.
go back to reference Ramage, D., et al.: Clustering the tagged web. In: Web Search and Data Mining (2009) Ramage, D., et al.: Clustering the tagged web. In: Web Search and Data Mining (2009)
7.
go back to reference Rosenzvi, M., et al.: The author-topic model for authors and documents. In: Uncertainty in Artificial Intelligence (2004) Rosenzvi, M., et al.: The author-topic model for authors and documents. In: Uncertainty in Artificial Intelligence (2004)
8.
go back to reference Nrubin, T., et al.: Statistical topic models for multi-label document classification. Mach. Learn. 88(1), 157–208 (2012)MathSciNet Nrubin, T., et al.: Statistical topic models for multi-label document classification. Mach. Learn. 88(1), 157–208 (2012)MathSciNet
9.
go back to reference Xiao, H., Wang, X., Du, C.: Injecting structured data to generative topic model in enterprise settings. In: Asian Conference on Machine Learning (2009) Xiao, H., Wang, X., Du, C.: Injecting structured data to generative topic model in enterprise settings. In: Asian Conference on Machine Learning (2009)
10.
go back to reference Ramage, D., Dmanning, C., Tdumais, S.: Partially labeled topic models for interpretable text mining. In: Knowledge Discovery and Data Mining (2011) Ramage, D., Dmanning, C., Tdumais, S.: Partially labeled topic models for interpretable text mining. In: Knowledge Discovery and Data Mining (2011)
11.
go back to reference Wang, X., Mccallum, A., Wei, X.: Topical N-Grams: phrase and topic discovery, with an application to information retrieval. In: International Conference on Data Mining (2007) Wang, X., Mccallum, A., Wei, X.: Topical N-Grams: phrase and topic discovery, with an application to information retrieval. In: International Conference on Data Mining (2007)
12.
go back to reference Vlindsey, R., Pheadden, W., Jstipicevic, M.: A phrase-discovering topic model using hierarchical pitman-yor processes. In: Empirical Methods in Natural Language Processing (2012) Vlindsey, R., Pheadden, W., Jstipicevic, M.: A phrase-discovering topic model using hierarchical pitman-yor processes. In: Empirical Methods in Natural Language Processing (2012)
13.
go back to reference Xiao, X., et al.: A topic similarity model for hierarchical phrase-based translation (2012) Xiao, X., et al.: A topic similarity model for hierarchical phrase-based translation (2012)
14.
go back to reference Wang, C., et al.: A phrase mining framework for recursive construction of a topical hierarchy. In: Knowledge Discovery and Data Mining (2013) Wang, C., et al.: A phrase mining framework for recursive construction of a topical hierarchy. In: Knowledge Discovery and Data Mining (2013)
15.
go back to reference Petinot, Y., Mckeown, K., Thadani, K.: A hierarchical model of web summaries. In: Meeting of the Association for Computational Linguistics (2011) Petinot, Y., Mckeown, K., Thadani, K.: A hierarchical model of web summaries. In: Meeting of the Association for Computational Linguistics (2011)
16.
go back to reference Perotte, A., et al.: Hierarchically supervised latent Dirichlet allocation. In: Neural Information Processing Systems (2011) Perotte, A., et al.: Hierarchically supervised latent Dirichlet allocation. In: Neural Information Processing Systems (2011)
17.
go back to reference Mao, X., et al.: SSHLDA: a semi-supervised hierarchical topic model. In: Empirical Methods in Natural Language Processing (2012) Mao, X., et al.: SSHLDA: a semi-supervised hierarchical topic model. In: Empirical Methods in Natural Language Processing (2012)
18.
go back to reference Deerwester, S., et al.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRef Deerwester, S., et al.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRef
19.
go back to reference Hofmann, T.: Probabilistic latent semantic indexing. In: International ACM SIGIR Conference on Research and Development in Information Retrieval (1999) Hofmann, T.: Probabilistic latent semantic indexing. In: International ACM SIGIR Conference on Research and Development in Information Retrieval (1999)
20.
go back to reference Lgriffiths, T., et al.: Hierarchical topic models and the nested chinese restaurant process. In: Neural Information Processing Systems (2004) Lgriffiths, T., et al.: Hierarchical topic models and the nested chinese restaurant process. In: Neural Information Processing Systems (2004)
21.
go back to reference Whyeteh, Y., et al.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc., 1566–1581 (2012) Whyeteh, Y., et al.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc., 1566–1581 (2012)
22.
go back to reference Li, W., Mccallum, A.: Pachinko allocation (DAG-structured mixture models of topic correlations). In: Machine Learning (2006) Li, W., Mccallum, A.: Pachinko allocation (DAG-structured mixture models of topic correlations). In: Machine Learning (2006)
Metadata
Title
Labeled Phrase Latent Dirichlet Allocation
Authors
Yi-Kun Tang
Xian-Ling Mao
Heyan Huang
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-48740-3_39

Premium Partner