Skip to main content
Erschienen in: Soft Computing 15/2020

23.12.2019 | Methodologies and Application

A novel topic model for documents by incorporating semantic relations between words

verfasst von: Jihong Chen, Kai Zhang, Yuan Zhou, Zheng Chen, Yufei Liu, Zhuo Tang, Li Yin

Erschienen in: Soft Computing | Ausgabe 15/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Topic models have been widely used to infer latent topics in text documents. However, the unsupervised topic models often result in incoherent topics, which always confused users in applications. Incorporating prior domain knowledge into topic models is an effective strategy to extract coherent and meaningful topics. In this paper, we go one step further to explore how different forms of prior semantic relations of words can be encoded into models to improve the performance of topic modeling process. We develop a novel topic model—called Mixed Word Correlation Knowledge-based Latent Dirichlet Allocation—to infer latent topics from text corpus. Specifically, the proposed model mines two forms of lexical semantic knowledge based on recent progress in word embedding, which can represent semantic information of words in a continuous vector space. To incorporate generated prior knowledge, a Mixed Markov Random Field is constructed over the latent topic layer to regularize the topic assignment of each word during the topic sampling process. Experimental results on two public benchmark datasets illustrate the superior performance of the proposed approach over several state-of-the-art baseline models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ahmed A, Long J, Silva D, Wang Y (2017) A practical algorithm for solving the incoherence problem of topic models in industrial applications. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1713–1721 Ahmed A, Long J, Silva D, Wang Y (2017) A practical algorithm for solving the incoherence problem of topic models in industrial applications. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1713–1721
Zurück zum Zitat Andrzejewski D, Zhu X, Craven M (2009) Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: Proceedings of the 26th annual international conference on machine learning, pp 25–32 Andrzejewski D, Zhu X, Craven M (2009) Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: Proceedings of the 26th annual international conference on machine learning, pp 25–32
Zurück zum Zitat Blei DM, Lafferty JD (2005) Correlated topic models. In: Proceedings of the 18th international conference on neural information processing systems, pp 147–154 Blei DM, Lafferty JD (2005) Correlated topic models. In: Proceedings of the 18th international conference on neural information processing systems, pp 147–154
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res Arch 3:993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res Arch 3:993–1022MATH
Zurück zum Zitat Chang J, Boyd-Graber J, Gerrish S, Wang C, Blei DM (2009) Reading tea leaves: how humans interpret topic models. In: Proceedings of the 22nd international conference on neural information processing systems, pp 288–296 Chang J, Boyd-Graber J, Gerrish S, Wang C, Blei DM (2009) Reading tea leaves: how humans interpret topic models. In: Proceedings of the 22nd international conference on neural information processing systems, pp 288–296
Zurück zum Zitat Chen Z, Liu B (2014a) Mining topics in documents: standing on the shoulders of big data. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1116–1125 Chen Z, Liu B (2014a) Mining topics in documents: standing on the shoulders of big data. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1116–1125
Zurück zum Zitat Chen Z, Liu B (2014b) Topic modeling using topics from many domains, lifelong learning and big data. In: Proceedings of the 31st international conference on international conference on machine learning, pp II-703–II-711 Chen Z, Liu B (2014b) Topic modeling using topics from many domains, lifelong learning and big data. In: Proceedings of the 31st international conference on international conference on machine learning, pp II-703–II-711
Zurück zum Zitat Chen Z, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013a) Discovering coherent topics using general knowledge. In: Proceedings of the 22nd ACM international conference on information & knowledge management, pp 209–218 Chen Z, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013a) Discovering coherent topics using general knowledge. In: Proceedings of the 22nd ACM international conference on information & knowledge management, pp 209–218
Zurück zum Zitat Chen Z, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013b) Leveraging multi-domain prior knowledge in topic models. In: Proceedings of the twenty-third international joint conference on artificial Intelligence, pp 2071–2077 Chen Z, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013b) Leveraging multi-domain prior knowledge in topic models. In: Proceedings of the twenty-third international joint conference on artificial Intelligence, pp 2071–2077
Zurück zum Zitat Fang A, Macdonald C, Ounis I, Habel P (2016) Using word embedding to evaluate the coherence of topics from Twitter data. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, pp 1057–1060 Fang A, Macdonald C, Ounis I, Habel P (2016) Using word embedding to evaluate the coherence of topics from Twitter data. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, pp 1057–1060
Zurück zum Zitat Fu X, Sun X, Wu H, Cui L, Huang JZ (2018) Weakly supervised topic sentiment joint model with word embeddings. Knowl-Based Syst 147:43–54CrossRef Fu X, Sun X, Wu H, Cui L, Huang JZ (2018) Weakly supervised topic sentiment joint model with word embeddings. Knowl-Based Syst 147:43–54CrossRef
Zurück zum Zitat Gao S, Li X, Yu Z, Qin Y, Zhang Y (2017) Combining paper cooperative network and topic model for expert topic analysis and extraction. Neurocomputing 257:136–143CrossRef Gao S, Li X, Yu Z, Qin Y, Zhang Y (2017) Combining paper cooperative network and topic model for expert topic analysis and extraction. Neurocomputing 257:136–143CrossRef
Zurück zum Zitat Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci USA 101(Suppl 1):5228–5235CrossRef Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci USA 101(Suppl 1):5228–5235CrossRef
Zurück zum Zitat Heinrich, G (2005) Parameter estimation for text analysis. Technical report Heinrich, G (2005) Parameter estimation for text analysis. Technical report
Zurück zum Zitat Jagarlamudi J, Daumé H III, Udupa R (2012) Incorporating lexical priors into topic models. In: Proceedings of the 13th conference of the European chapter of the Association for Computational Linguistics, pp 204–213 Jagarlamudi J, Daumé H III, Udupa R (2012) Incorporating lexical priors into topic models. In: Proceedings of the 13th conference of the European chapter of the Association for Computational Linguistics, pp 204–213
Zurück zum Zitat Lee TY, Alison S, Seppi K, Elmqvist N, Boyd-Graber J, Findlater L (2017) The human touch: how non-expert users perceive, interpret, and fix topic models. Int J Hum Comput Stud 105:28–42CrossRef Lee TY, Alison S, Seppi K, Elmqvist N, Boyd-Graber J, Findlater L (2017) The human touch: how non-expert users perceive, interpret, and fix topic models. Int J Hum Comput Stud 105:28–42CrossRef
Zurück zum Zitat Li X, Ma Z, Peng P, Guo X, Huang F, Wang X, Guo J (2018) Supervised latent Dirichlet allocation with a mixture of sparse softmax. Neurocomputing 312:324–335CrossRef Li X, Ma Z, Peng P, Guo X, Huang F, Wang X, Guo J (2018) Supervised latent Dirichlet allocation with a mixture of sparse softmax. Neurocomputing 312:324–335CrossRef
Zurück zum Zitat Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. In: Proceedings of the international conference on learning representations (ICLR), pp 1–12 Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. In: Proceedings of the international conference on learning representations (ICLR), pp 1–12
Zurück zum Zitat Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems, pp 3111–3119 Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems, pp 3111–3119
Zurück zum Zitat Mimno D, Wallach HM, Talley E, Leenders M, Mccallum A (2011) Optimizing semantic coherence in topic models. In: Proceedings of the conference on empirical methods in natural language processing, pp 262–272 Mimno D, Wallach HM, Talley E, Leenders M, Mccallum A (2011) Optimizing semantic coherence in topic models. In: Proceedings of the conference on empirical methods in natural language processing, pp 262–272
Zurück zum Zitat Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543 Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Zurück zum Zitat Petterson J, Smola AJ, Caetano TS, Buntine WL, Narayanamurthy S (2010) Word features for latent Dirichlet allocation. In: Proceedings of the 23rd international conference on neural information processing systems, pp 1921–1929 Petterson J, Smola AJ, Caetano TS, Buntine WL, Narayanamurthy S (2010) Word features for latent Dirichlet allocation. In: Proceedings of the 23rd international conference on neural information processing systems, pp 1921–1929
Zurück zum Zitat Qiang J, Chen P, Wang T, Wu X (2017) Topic modeling over short texts by incorporating word embeddings. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp 363–374 Qiang J, Chen P, Wang T, Wu X (2017) Topic modeling over short texts by incorporating word embeddings. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp 363–374
Zurück zum Zitat Shams M, Baraani-Dastjerdi A (2017) Enriched LDA (ELDA): combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction. Expert Syst Appl 80:136–146CrossRef Shams M, Baraani-Dastjerdi A (2017) Enriched LDA (ELDA): combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction. Expert Syst Appl 80:136–146CrossRef
Zurück zum Zitat Xie P, Xing EP (2013) Integrating document clustering and topic modeling. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, pp 694–703 Xie P, Xing EP (2013) Integrating document clustering and topic modeling. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, pp 694–703
Zurück zum Zitat Xie P, Yang D, Xing E (2015) Incorporating word correlation knowledge into topic modeling. In: Proceedings of the 2015 conference of the North American chapter of the Association for Computational Linguistics: Human Language Technologies, pp 725–734 Xie P, Yang D, Xing E (2015) Incorporating word correlation knowledge into topic modeling. In: Proceedings of the 2015 conference of the North American chapter of the Association for Computational Linguistics: Human Language Technologies, pp 725–734
Zurück zum Zitat Xu Y, Yin J, Huang J, Yin Y (2018) Hierarchical topic modeling with automatic knowledge mining. Expert Syst Appl 103:106–117CrossRef Xu Y, Yin J, Huang J, Yin Y (2018) Hierarchical topic modeling with automatic knowledge mining. Expert Syst Appl 103:106–117CrossRef
Zurück zum Zitat Xun G, Gopalakrishnan V, Ma F, Li Y, Gao J, Zhang A (2016) Topic discovery for short texts using word embeddings. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 1299–1304 Xun G, Gopalakrishnan V, Ma F, Li Y, Gao J, Zhang A (2016) Topic discovery for short texts using word embeddings. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 1299–1304
Zurück zum Zitat Yang L, Liu Z, Chua TS, Sun M (2015a) Topical word embeddings. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 2418–2424 Yang L, Liu Z, Chua TS, Sun M (2015a) Topical word embeddings. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 2418–2424
Zurück zum Zitat Yang S, Lu W, Yang D, Yao L, Wei B (2015b) Short text understanding by leveraging knowledge into topic model. In: The 2015 annual conference of the North American Chapter of the ACL, pp 1232–1237 Yang S, Lu W, Yang D, Yao L, Wei B (2015b) Short text understanding by leveraging knowledge into topic model. In: The 2015 annual conference of the North American Chapter of the ACL, pp 1232–1237
Zurück zum Zitat Yang Y, Downey D, Boyd-Graber J (2015c) Efficient methods for incorporating knowledge into topic models. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 308–317 Yang Y, Downey D, Boyd-Graber J (2015c) Efficient methods for incorporating knowledge into topic models. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 308–317
Zurück zum Zitat Yao L, Zhang Y, Wei B, Li L, Wu F, Zhang P, Bian Y (2016) Concept over time: the combination of probabilistic topic model with wikipedia knowledge. Expert Syst Appl 60:27–38CrossRef Yao L, Zhang Y, Wei B, Li L, Wu F, Zhang P, Bian Y (2016) Concept over time: the combination of probabilistic topic model with wikipedia knowledge. Expert Syst Appl 60:27–38CrossRef
Zurück zum Zitat Yao L, Zhang Y, Chen Q, Qian H, Wei B, Hu Z (2017) Mining coherent topics in documents using word embeddings and large-scale text data. Eng Appl Artif Intell 64:432–439CrossRef Yao L, Zhang Y, Chen Q, Qian H, Wei B, Hu Z (2017) Mining coherent topics in documents using word embeddings and large-scale text data. Eng Appl Artif Intell 64:432–439CrossRef
Zurück zum Zitat Zhu J, Xing EP (2010) Conditional topic random fields. In: Proceedings of the 27th international conference on international conference on machine learning, pp 1239–1246 Zhu J, Xing EP (2010) Conditional topic random fields. In: Proceedings of the 27th international conference on international conference on machine learning, pp 1239–1246
Metadaten
Titel
A novel topic model for documents by incorporating semantic relations between words
verfasst von
Jihong Chen
Kai Zhang
Yuan Zhou
Zheng Chen
Yufei Liu
Zhuo Tang
Li Yin
Publikationsdatum
23.12.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 15/2020
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-04604-0

Weitere Artikel der Ausgabe 15/2020

Soft Computing 15/2020 Zur Ausgabe