Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 6/2022

30.09.2022

Joint dynamic topic model for recognition of lead-lag relationship in two text corpora

verfasst von: Yandi Zhu, Xiaoling Lu, Jingya Hong, Feifei Wang

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 6/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Topic evolution modeling has received significant attentions in recent decades. Although various topic evolution models have been proposed, most studies focus on the single document corpus. However in practice, we can easily access data from multiple sources and also observe relationships between them. Then it is of great interest to recognize the relationship between multiple text corpora and further utilize this relationship to improve topic modeling. In this work, we focus on a special type of relationship between two text corpora, which we define as the “lead-lag relationship". This relationship characterizes the phenomenon that one text corpus would influence the topics to be discussed in the other text corpus in the future. To discover the lead-lag relationship, we propose a joint dynamic topic model and also develop an embedding extension to address the modeling problem of large-scale text corpus. With the recognized lead-lag relationship, the similarities of the two text corpora can be figured out and the quality of topic learning in both corpora can be improved. We numerically investigate the performance of the joint dynamic topic modeling approach using synthetic data. Finally, we apply the proposed model on two text corpora consisting of statistical papers and the graduation theses. Results show the proposed model can well recognize the lead-lag relationship between the two corpora, and the specific and shared topic patterns in the two corpora are also discovered.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Ahmed A, Xing EP (2008) Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: with applications to evolutionary clustering. In: Proceedings of the SIAM international conference on data mining, SDM 2008, April 24–26, 2008, Atlanta, pp 219–230 Ahmed A, Xing EP (2008) Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: with applications to evolutionary clustering. In: Proceedings of the SIAM international conference on data mining, SDM 2008, April 24–26, 2008, Atlanta, pp 219–230
Zurück zum Zitat Ahmed A, Xing EP (2010) Timeline: a dynamic hierarchical Dirichlet process model for recovering birth/death and evolution of topics in text stream. In: Proceedings of the twenty-sixth conference on uncertainty in artificial intelligence, Catalina Island, July 8–11, pp 20–29 Ahmed A, Xing EP (2010) Timeline: a dynamic hierarchical Dirichlet process model for recovering birth/death and evolution of topics in text stream. In: Proceedings of the twenty-sixth conference on uncertainty in artificial intelligence, Catalina Island, July 8–11, pp 20–29
Zurück zum Zitat AlSumait L, Barbara D, Domeniconi C (2008) On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 8th IEEE international conference on data mining, pp 3–12 AlSumait L, Barbara D, Domeniconi C (2008) On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 8th IEEE international conference on data mining, pp 3–12
Zurück zum Zitat Ashley R, Granger CWJ, Schmalensee R (1980) Advertising and aggregate consumption: an analysis of causality. Econometrica 48(5):1149–1167MathSciNetCrossRefMATH Ashley R, Granger CWJ, Schmalensee R (1980) Advertising and aggregate consumption: an analysis of causality. Econometrica 48(5):1149–1167MathSciNetCrossRefMATH
Zurück zum Zitat Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the twenty-third international conference (ICML 2006), Pittsburgh, June 25–29, pp 113–120 Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the twenty-third international conference (ICML 2006), Pittsburgh, June 25–29, pp 113–120
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022MATH
Zurück zum Zitat Chae J, Thom D, Bosch H, et al (2012) Spatiotemporal social media analytics for abnormal event detection and examination using seasonal-trend decomposition. In: IEEE conference on visual analytics science & technology, pp 143–152 Chae J, Thom D, Bosch H, et al (2012) Spatiotemporal social media analytics for abnormal event detection and examination using seasonal-trend decomposition. In: IEEE conference on visual analytics science & technology, pp 143–152
Zurück zum Zitat Chen J, Gong Z, Liu W (2019) A nonparametric model for online topic discovery with word embeddings. Inf Sci 504:32–47MathSciNetCrossRef Chen J, Gong Z, Liu W (2019) A nonparametric model for online topic discovery with word embeddings. Inf Sci 504:32–47MathSciNetCrossRef
Zurück zum Zitat Costa G, Ortale R (2021) Jointly modeling and simultaneously discovering topics and clusters in text corpora using word vectors. Inf Sci 563:226–240MathSciNetCrossRef Costa G, Ortale R (2021) Jointly modeling and simultaneously discovering topics and clusters in text corpora using word vectors. Inf Sci 563:226–240MathSciNetCrossRef
Zurück zum Zitat Dieng AB, Ruiz FJR, Blei DM (2020) Topic modeling in embedding spaces. Trans Assoc Comput Linguist 8:439–453CrossRef Dieng AB, Ruiz FJR, Blei DM (2020) Topic modeling in embedding spaces. Trans Assoc Comput Linguist 8:439–453CrossRef
Zurück zum Zitat Dubey A, Hefny A, Williamson S, et al (2013) A nonparametric mixture model for topic modeling over time. In: Proceedings of the SIAM international conference on data mining, pp 530–538 Dubey A, Hefny A, Williamson S, et al (2013) A nonparametric mixture model for topic modeling over time. In: Proceedings of the SIAM international conference on data mining, pp 530–538
Zurück zum Zitat Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37(3):424–438CrossRefMATH Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37(3):424–438CrossRefMATH
Zurück zum Zitat He J, Chen X, Du M et al (2015) Topic evolution analysis based on improved online LDA model. J Cent South Univ (Sci Technol) 46(2):547–553 He J, Chen X, Du M et al (2015) Topic evolution analysis based on improved online LDA model. J Cent South Univ (Sci Technol) 46(2):547–553
Zurück zum Zitat Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
Zurück zum Zitat Jordan MI, Ghahramani Z, Jaakkola TS et al (1999) An introduction to variational methods for graphical models. Mach Learn 37(2):183–233CrossRefMATH Jordan MI, Ghahramani Z, Jaakkola TS et al (1999) An introduction to variational methods for graphical models. Mach Learn 37(2):183–233CrossRefMATH
Zurück zum Zitat Kawamae N (2011) Trend analysis model: trend consists of temporal words, topics, and timestamps. In: Proceedings of the fourth ACM international conference on web search and data mining, pp 317–326 Kawamae N (2011) Trend analysis model: trend consists of temporal words, topics, and timestamps. In: Proceedings of the fourth ACM international conference on web search and data mining, pp 317–326
Zurück zum Zitat Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: International conference on learning representations, 2015, pp 1–15 Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: International conference on learning representations, 2015, pp 1–15
Zurück zum Zitat Kingma DP, Welling M (2013) Auto-encoding variational Bayes. In: International conference on learning representations(ICLR), pp 1–14 Kingma DP, Welling M (2013) Auto-encoding variational Bayes. In: International conference on learning representations(ICLR), pp 1–14
Zurück zum Zitat Meng H, Xu HC, Zhou WX et al (2017) Symmetric thermal optimal path and time-dependent lead-lag relationship: novel statistical tests and application to uk and us real-estate and monetary policies. Quant Finance 17(6):959–977MathSciNetCrossRefMATH Meng H, Xu HC, Zhou WX et al (2017) Symmetric thermal optimal path and time-dependent lead-lag relationship: novel statistical tests and application to uk and us real-estate and monetary policies. Quant Finance 17(6):959–977MathSciNetCrossRefMATH
Zurück zum Zitat Mohamad S, Bouchachia A (2020) Online gaussian lda for unsupervised pattern mining from utility usage data. In: 2020 19th IEEE international conference on machine learning and applications (ICMLA), IEEE, pp 41–48 Mohamad S, Bouchachia A (2020) Online gaussian lda for unsupervised pattern mining from utility usage data. In: 2020 19th IEEE international conference on machine learning and applications (ICMLA), IEEE, pp 41–48
Zurück zum Zitat Nallapati RM, Ditmore S, Lafferty JD, et al (2007) Multiscale topic tomography. In: ACM Sigkdd international conference on knowledge discovery & data mining, pp 520–529 Nallapati RM, Ditmore S, Lafferty JD, et al (2007) Multiscale topic tomography. In: ACM Sigkdd international conference on knowledge discovery & data mining, pp 520–529
Zurück zum Zitat Pozdnoukhov A, Kaiser C (2011) Space-time dynamics of topics in streaming text. In: ACM Sigspatial international workshop on location-based social networks, pp 1–8 Pozdnoukhov A, Kaiser C (2011) Space-time dynamics of topics in streaming text. In: ACM Sigspatial international workshop on location-based social networks, pp 1–8
Zurück zum Zitat Rudolph M, Blei D (2018) Dynamic embeddings for language evolution. In: Proceedings of the 2018 world wide web conference, pp 1003–1011 Rudolph M, Blei D (2018) Dynamic embeddings for language evolution. In: Proceedings of the 2018 world wide web conference, pp 1003–1011
Zurück zum Zitat Runge J, Bathiany S, Bollt E et al (2019) Inferring causation from time series in Earth system sciences. Nat Commun 10(1):2553–2553CrossRef Runge J, Bathiany S, Bollt E et al (2019) Inferring causation from time series in Earth system sciences. Nat Commun 10(1):2553–2553CrossRef
Zurück zum Zitat Sasaki K, Yoshikawa T, Furuhashi T (2014) Online topic model for Twitter considering dynamics of user interests and topic trends. In: Proceedings of the conference on empirical methods in natural language processing, pp 1977–1985 Sasaki K, Yoshikawa T, Furuhashi T (2014) Online topic model for Twitter considering dynamics of user interests and topic trends. In: Proceedings of the conference on empirical methods in natural language processing, pp 1977–1985
Zurück zum Zitat Saul LK, Jordan MI (1995) Exploiting tractable substructures in intractable networks. Adv Neural Inf Process Syst 8:486–492 Saul LK, Jordan MI (1995) Exploiting tractable substructures in intractable networks. Adv Neural Inf Process Syst 8:486–492
Zurück zum Zitat Sornette D, Zhou W (2005) Non-parametric determination of real-time lag structure between two time series: the “optimal thermal causal path’’ method. Quantit Finance 5(6):577–591MathSciNetCrossRefMATH Sornette D, Zhou W (2005) Non-parametric determination of real-time lag structure between two time series: the “optimal thermal causal path’’ method. Quantit Finance 5(6):577–591MathSciNetCrossRefMATH
Zurück zum Zitat Sugihara G, May RM, Ye H et al (2012) Detecting causality in complex ecosystems. Science 338(6106):496–500CrossRefMATH Sugihara G, May RM, Ye H et al (2012) Detecting causality in complex ecosystems. Science 338(6106):496–500CrossRefMATH
Zurück zum Zitat Vavliakis KN, Tzima FA, Mitkas PA (2012) Event detection via LDA for the mediaeval 2012 sed task. In: MediaEval workshop, pp 1–2 Vavliakis KN, Tzima FA, Mitkas PA (2012) Event detection via LDA for the mediaeval 2012 sed task. In: MediaEval workshop, pp 1–2
Zurück zum Zitat Wallach HM, Murray I, Salakhutdinov R, et al (2009) Evaluation methods for topic models. In: Proceedings of the 26th annual international conference on machine learning, pp 1–8 Wallach HM, Murray I, Salakhutdinov R, et al (2009) Evaluation methods for topic models. In: Proceedings of the 26th annual international conference on machine learning, pp 1–8
Zurück zum Zitat Wang C, Blei D, Heckerman D (2008) Continuous time dynamic topic models. Uncertainty in Artificial Intelligence, pp 579–586 Wang C, Blei D, Heckerman D (2008) Continuous time dynamic topic models. Uncertainty in Artificial Intelligence, pp 579–586
Zurück zum Zitat Wang X, McCallum A (2006) Topics over time: A non-Markov continuous-time model of topical trends. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 424–433 Wang X, McCallum A (2006) Topics over time: A non-Markov continuous-time model of topical trends. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 424–433
Zurück zum Zitat Ye H, Deyle ER, Gilarranz LJ et al (2015) Distinguishing time-delayed causal interactions using convergent cross mapping. Sci Rep 5(1):14750CrossRef Ye H, Deyle ER, Gilarranz LJ et al (2015) Distinguishing time-delayed causal interactions using convergent cross mapping. Sci Rep 5(1):14750CrossRef
Zurück zum Zitat Zhou H, Huimin YU, Roland HU (2017) Topic evolution based on the probabilistic topic model: a review. Front Comput Sci 11(5):786–802CrossRef Zhou H, Huimin YU, Roland HU (2017) Topic evolution based on the probabilistic topic model: a review. Front Comput Sci 11(5):786–802CrossRef
Metadaten
Titel
Joint dynamic topic model for recognition of lead-lag relationship in two text corpora
verfasst von
Yandi Zhu
Xiaoling Lu
Jingya Hong
Feifei Wang
Publikationsdatum
30.09.2022
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 6/2022
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-022-00873-w

Weitere Artikel der Ausgabe 6/2022

Data Mining and Knowledge Discovery 6/2022 Zur Ausgabe

Premium Partner