Skip to main content
Top

2016 | OriginalPaper | Chapter

Enriching Topic Models with DBpedia

Authors : Alexandru Todor, Wojciech Lukasiewicz, Tara Athan, Adrian Paschke

Published in: On the Move to Meaningful Internet Systems: OTM 2016 Conferences

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Traditional Topic Modeling approaches only consider the words in the document. By using an entity-topic modeling approach and including background knowledge about the entities such as the occupation of persons, the location of organizations, the band of a musician etc., we can better cluster related documents together, and produce semantic topic models that can be represented in a knowledge base. In our approach we first reduce the text documents to a set of entities and then enrich this set with background knowledge from DBpedia. Topic modeling is performed on the enriched set of entities and various feature combinations are evaluated in order to determine the combination that achieves the best classification precision or perplexity compared to using word-based topic models alone.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet Forest priors. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 25–32. ACM (2009) Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet Forest priors. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 25–32. ACM (2009)
2.
go back to reference Andrzejewski, D., Zhu, X., Craven, M., Recht, B.: A framework for incorporating general domain knowledge into latent dirichlet allocation using first-order logic. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, p. 1171 (2011) Andrzejewski, D., Zhu, X., Craven, M., Recht, B.: A framework for incorporating general domain knowledge into latent dirichlet allocation using first-order logic. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, p. 1171 (2011)
3.
go back to reference Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76298-0_52 CrossRef Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). doi:10.​1007/​978-3-540-76298-0_​52 CrossRef
4.
go back to reference Baker, C.F., Fillmore, C.J., Lowe, J.B.: The berkeley framenet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 1, pp. 86–90. Association for Computational Linguistics (1998) Baker, C.F., Fillmore, C.J., Lowe, J.B.: The berkeley framenet project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 1, pp. 86–90. Association for Computational Linguistics (1998)
5.
go back to reference Blei, D., Lafferty, J.: Correlated topic models. Adv. Neural Inf. Process. Syst. 18, 147 (2006) Blei, D., Lafferty, J.: Correlated topic models. Adv. Neural Inf. Process. Syst. 18, 147 (2006)
6.
go back to reference Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM (2006) Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM (2006)
7.
go back to reference David, M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH David, M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
8.
go back to reference Chang, J., Blei, D.M.: Relational topic models for document networks. AIStats 9, 81–88 (2009) Chang, J., Blei, D.M.: Relational topic models for document networks. AIStats 9, 81–88 (2009)
9.
go back to reference Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems, pp. 288–296 (2009) Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems, pp. 288–296 (2009)
10.
go back to reference Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by combining semantic concepts with unsupervised statistical learning. In: Sheth, A., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88564-1_15 CrossRef Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by combining semantic concepts with unsupervised statistical learning. In: Sheth, A., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-88564-1_​15 CrossRef
11.
go back to reference Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems (I-Semantics) (2013) Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems (I-Semantics) (2013)
12.
go back to reference Gabrilovich, E., Markovitch, S.: Wikipedia-based semantic interpretation for natural language processing. J. Artif. Intell. Res. 34, 443–498 (2009)MATH Gabrilovich, E., Markovitch, S.: Wikipedia-based semantic interpretation for natural language processing. J. Artif. Intell. Res. 34, 443–498 (2009)MATH
13.
go back to reference Blei, D.M., Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B.: Hierarchical topic models and the nested chinese restaurant process. Adv. Neural Inf. Process. Syst. 16, 17 (2004) Blei, D.M., Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B.: Hierarchical topic models and the nested chinese restaurant process. Adv. Neural Inf. Process. Syst. 16, 17 (2004)
14.
go back to reference Griffiths, T.L., Steyvers, M., Finding scientific topics. Proc. Natl. Acad. Sci. U.S.A. 101, 5228–5235 (2004) Griffiths, T.L., Steyvers, M., Finding scientific topics. Proc. Natl. Acad. Sci. U.S.A. 101, 5228–5235 (2004)
15.
go back to reference Hu, Z., Luo, G., Sachan, M., Xing, E., Nie, Z.: Grounding topic models with knowledge bases Hu, Z., Luo, G., Sachan, M., Xing, E., Nie, Z.: Grounding topic models with knowledge bases
16.
go back to reference Kliegr, T.: Linked hypernyms: enriching dbpedia with targeted hypernym discovery. Web Seman. Sci. Serv. Agents World Wide Web 31, 59–69 (2015)CrossRef Kliegr, T.: Linked hypernyms: enriching dbpedia with targeted hypernym discovery. Web Seman. Sci. Serv. Agents World Wide Web 31, 59–69 (2015)CrossRef
17.
go back to reference Movshovitz-Attias, D., Cohen, W.W.: KB-LDA: Jointly learning a knowledge base of hierarchy, relations, and facts. In: Proceedings of ACL (2015) Movshovitz-Attias, D., Cohen, W.W.: KB-LDA: Jointly learning a knowledge base of hierarchy, relations, and facts. In: Proceedings of ACL (2015)
18.
go back to reference Newman, D., Chemudugunta, C., Smyth, P.: Statistical entity-topic models. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 680–686. ACM (2006) Newman, D., Chemudugunta, C., Smyth, P.: Statistical entity-topic models. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 680–686. ACM (2006)
19.
go back to reference Hossein, S., Miller, D.J.: Parsimonious topic models with salient word discovery. IEEE Trans. Knowl. Data Eng. 27(3), 824–837 (2015) Hossein, S., Miller, D.J.: Parsimonious topic models with salient word discovery. IEEE Trans. Knowl. Data Eng. 27(3), 824–837 (2015)
Metadata
Title
Enriching Topic Models with DBpedia
Authors
Alexandru Todor
Wojciech Lukasiewicz
Tara Athan
Adrian Paschke
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-48472-3_46

Premium Partner