Skip to main content

2015 | OriginalPaper | Buchkapitel

EHLLDA: A Supervised Hierarchical Topic Model

verfasst von : Xian-Ling Mao, Yixuan Xiao, Qiang Zhou, Jun Wang, Heyan Huang

Erschienen in: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we consider the problem of modeling hierarchical labeled data – such as Web pages and their placement in hierarchical directories. The state-of-the-art model, hierarchical Labeled LDA (hLLDA), assumes that each child of a non-leaf label has equal importance, and that a document in the corpus cannot locate in a non-leaf node. However, in most cases, these assumptions do not meet the actual situation. Thus, in this paper, we introduce a supervised hierarchical topic models: Extended Hierarchical Labeled Latent Dirichlet Allocation (EHLLDA), which aim to relax the assumptions of hLLDA by incorporating prior information of labels into hLLDA. The experimental results show that the perplexity performance of EHLLDA is always better than that of LLDA and hLLDA on all four datasets; and our proposed model is also superior to hLLDA in terms of p@n.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Blei, D., Griffiths, T., Jordan, M., Tenenbaum, J.: Hierarchical topic models and the nested chinese restaurant process. In: Advances in Neural Information Processing Systems, vol. 16, pp. 106 (2004) Blei, D., Griffiths, T., Jordan, M., Tenenbaum, J.: Hierarchical topic models and the nested chinese restaurant process. In: Advances in Neural Information Processing Systems, vol. 16, pp. 106 (2004)
2.
Zurück zum Zitat Blei, D., Lafferty, J.: Correlated topic models. In: Advances in Neural Information Processing Systems, vol. 18, p. 147 (2006) Blei, D., Lafferty, J.: Correlated topic models. In: Advances in Neural Information Processing Systems, vol. 18, p. 147 (2006)
3.
Zurück zum Zitat Blei, D., McAuliffe, J.: Supervised topic models. In: Proceeding of the Neural Information Processing Systems (NIPS) (2007) Blei, D., McAuliffe, J.: Supervised topic models. In: Proceeding of the Neural Information Processing Systems (NIPS) (2007)
5.
Zurück zum Zitat Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
6.
Zurück zum Zitat Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by combining semantic concepts with unsupervised statistical learning. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Heidelberg (2008) CrossRef Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by combining semantic concepts with unsupervised statistical learning. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Heidelberg (2008) CrossRef
7.
Zurück zum Zitat Chemudugunta, C., Smyth, P., Steyvers, M.: Combining concept hierarchies and statistical topic models. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 1469–1470. ACM (2008) Chemudugunta, C., Smyth, P., Steyvers, M.: Combining concept hierarchies and statistical topic models. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 1469–1470. ACM (2008)
8.
Zurück zum Zitat Chemudugunta, C., Smyth, P., Steyvers, M.: Text modeling using unsupervised topic models and concept hierarchies (2008). Arxiv preprint arXiv:0808.0973 Chemudugunta, C., Smyth, P., Steyvers, M.: Text modeling using unsupervised topic models and concept hierarchies (2008). Arxiv preprint arXiv:​0808.​0973
9.
Zurück zum Zitat Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRef Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRef
10.
Zurück zum Zitat Du, L., Pate, J.K., Johnson, M.: Topic segmentation with an ordering-based topic model. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015) Du, L., Pate, J.K., Johnson, M.: Topic segmentation with an ordering-based topic model. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
11.
Zurück zum Zitat Griffiths, T., Steyvers, M.: Finding scientific topics. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 101(Suppl 1), p. 5228 (2004)CrossRef Griffiths, T., Steyvers, M.: Finding scientific topics. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 101(Suppl 1), p. 5228 (2004)CrossRef
12.
Zurück zum Zitat Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of Uncertainty in Artificial Intelligence, UAI1999, p. 21. Citeseer (1999) Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of Uncertainty in Artificial Intelligence, UAI1999, p. 21. Citeseer (1999)
13.
Zurück zum Zitat Kawamae, N.: Supervised n-gram topic model. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 473–482. ACM (2014) Kawamae, N.: Supervised n-gram topic model. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 473–482. ACM (2014)
14.
Zurück zum Zitat Lacoste-Julien, S., Sha, F., Jordan, M.: ndisclda: Discriminative learning for dimensionality reduction and classification. In: Advances in Neural Information Processing Systems, vol. 21 (2008) Lacoste-Julien, S., Sha, F., Jordan, M.: ndisclda: Discriminative learning for dimensionality reduction and classification. In: Advances in Neural Information Processing Systems, vol. 21 (2008)
15.
Zurück zum Zitat Ma, Z., Sun, A., Yuan, Q., Cong, G.: A tri-role topic model for domain-specific question answering. In: Proceedings of The Twenty-Ninth AAAI Conference on Artificial Intelligence (2015) Ma, Z., Sun, A., Yuan, Q., Cong, G.: A tri-role topic model for domain-specific question answering. In: Proceedings of The Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
16.
Zurück zum Zitat Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with pachinko allocation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 633–640. ACM (2007) Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with pachinko allocation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 633–640. ACM (2007)
17.
Zurück zum Zitat Minka, T.: Estimating a dirichlet distribution. Ann. Phys. 2000(8), 1–13 (2003) Minka, T.: Estimating a dirichlet distribution. Ann. Phys. 2000(8), 1–13 (2003)
18.
Zurück zum Zitat Perotte, A.J., Wood, F., Elhadad, N., Bartlett, N.: Hierarchically supervised latent dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 2609–2617 (2011) Perotte, A.J., Wood, F., Elhadad, N., Bartlett, N.: Hierarchically supervised latent dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 2609–2617 (2011)
19.
Zurück zum Zitat Petinot, Y., McKeown, K., Thadani, K.: A hierarchical model of web summaries. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, vol. 2, pp. 670–675. Association for Computational Linguistics (2011) Petinot, Y., McKeown, K., Thadani, K.: A hierarchical model of web summaries. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, vol. 2, pp. 670–675. Association for Computational Linguistics (2011)
20.
Zurück zum Zitat Rabinovich, M., Blei, D.: The inverse regression topic model. In: Proceedings of the 31st International Conference on Machine Learning, pp. 199–207 (2014) Rabinovich, M., Blei, D.: The inverse regression topic model. In: Proceedings of the 31st International Conference on Machine Learning, pp. 199–207 (2014)
21.
Zurück zum Zitat Ramage, D., Hall, D., Nallapati, R., Manning, C.: Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 248–256. Association for Computational Linguistics (2009) Ramage, D., Hall, D., Nallapati, R., Manning, C.: Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 248–256. Association for Computational Linguistics (2009)
22.
Zurück zum Zitat Ramage, D., Heymann, P., Manning, C., Garcia-Molina, H.: Clustering the tagged web. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 54–63. ACM (2009) Ramage, D., Heymann, P., Manning, C., Garcia-Molina, H.: Clustering the tagged web. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 54–63. ACM (2009)
23.
Zurück zum Zitat Ramage, D., Manning, C., Dumais, S.: Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 457–465. ACM (2011) Ramage, D., Manning, C., Dumais, S.: Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 457–465. ACM (2011)
24.
Zurück zum Zitat Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004) Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)
26.
Zurück zum Zitat Teh, Y., Jordan, M., Beal, M., Blei, D.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)MathSciNetCrossRef Teh, Y., Jordan, M., Beal, M., Blei, D.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)MathSciNetCrossRef
27.
Zurück zum Zitat Xia, Y., Tang, N., Hussain, A., Cambria, E.: Discriminative bi-term topic model for headline-based social news clustering. In: The Twenty-Eighth International Flairs Conference (2015) Xia, Y., Tang, N., Hussain, A., Cambria, E.: Discriminative bi-term topic model for headline-based social news clustering. In: The Twenty-Eighth International Flairs Conference (2015)
28.
Zurück zum Zitat Xiao, H., Wang, X., Du, C.: Injecting structured data to generative topic model in enterprise settings. In: Zhou, Z.-H., Washio, T. (eds.) ACML 2009. LNCS, vol. 5828, pp. 382–395. Springer, Heidelberg (2009) CrossRef Xiao, H., Wang, X., Du, C.: Injecting structured data to generative topic model in enterprise settings. In: Zhou, Z.-H., Washio, T. (eds.) ACML 2009. LNCS, vol. 5828, pp. 382–395. Springer, Heidelberg (2009) CrossRef
29.
Zurück zum Zitat Zhu, J., Ahmed, A., Xing, E.P.: Medlda: maximum margin supervised topic models. J. Mach. Learn. Res. 13(1), 2237–2278 (2012)MathSciNetMATH Zhu, J., Ahmed, A., Xing, E.P.: Medlda: maximum margin supervised topic models. J. Mach. Learn. Res. 13(1), 2237–2278 (2012)MathSciNetMATH
Metadaten
Titel
EHLLDA: A Supervised Hierarchical Topic Model
verfasst von
Xian-Ling Mao
Yixuan Xiao
Qiang Zhou
Jun Wang
Heyan Huang
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-25816-4_18