Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 1/2017

15.02.2016

Hierarchical evolving Dirichlet processes for modeling nonlinear evolutionary traces in temporal data

verfasst von: Peng Wang, Peng Zhang, Chuan Zhou, Zhao Li, Hong Yang

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Clustering analysis aims to group a set of similar data objects into the same cluster. Topic models, which belong to the soft clustering methods, are powerful tools to discover latent clusters/topics behind large data sets. Due to the dynamic nature of temporal data, clusters often exhibit complicated patterns such as birth, branch and death. However, most existing temporal clustering models assume that clusters evolve as a linear chain, and they cannot model and detect branching of clusters. In this paper, we present evolving Dirichlet processes (EDP for short) to model nonlinear evolutionary traces behind temporal data, especially for temporal text collections. In the setting of EDP, temporal collections are divided into epochs. In order to model cluster branching over time, EDP allows each cluster in an epoch to form Dirichlet processes (DP) and uses a combination of the cluster-specific DPs as the prior for cluster distributions in the next epoch. To model hierarchical temporal data, such as online document collections, we propose a new class of evolving hierarchical Dirichlet processes (EHDP for short) which extends the hierarchical Dirichlet processes (HDP) to model evolving temporal data. We design an online learning framework based on Gibbs sampling to infer the evolutionary traces of clusters over time. In experiments, we validate that EDP and EHDP can capture nonlinear evolutionary traces of clusters on both synthetic and real-world text collections and achieve better results than its peers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ahmed A, Ho Q, Teo C, Eisenstein J, Smola A, Xing E (2011) Online inference for the infinite cluster-topic model: storylines from streaming text. In: Proceedings of the 14th conference on artificial intelligence and statistics (AISTATS), pp 101–109 Ahmed A, Ho Q, Teo C, Eisenstein J, Smola A, Xing E (2011) Online inference for the infinite cluster-topic model: storylines from streaming text. In: Proceedings of the 14th conference on artificial intelligence and statistics (AISTATS), pp 101–109
Zurück zum Zitat Ahmed A, Hong L, Smola A (2013) Nested chinese restaurant franchise process: Applications to user tracking and document modeling. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 1426–1434 Ahmed A, Hong L, Smola A (2013) Nested chinese restaurant franchise process: Applications to user tracking and document modeling. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 1426–1434
Zurück zum Zitat Ahmed A, Xing E (2008) Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications to evolutionary clustering. In: Proceedings of the 2008 SIAM international conference on data mining. SIAM, pp 219–230 Ahmed A, Xing E (2008) Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications to evolutionary clustering. In: Proceedings of the 2008 SIAM international conference on data mining. SIAM, pp 219–230
Zurück zum Zitat Ahmed A, Xing EP (2010) Timeline: a dynamic hierarchical dirichlet process model for recovering birth/death and evolution of topics in text stream. In: Proceedings of the 26th Uncertainty in Artificial Intelligence (UAI), UAI ’10, pp 20–29 Ahmed A, Xing EP (2010) Timeline: a dynamic hierarchical dirichlet process model for recovering birth/death and evolution of topics in text stream. In: Proceedings of the 26th Uncertainty in Artificial Intelligence (UAI), UAI ’10, pp 20–29
Zurück zum Zitat Antoniak CE et al (1974) Mixtures of dirichlet processes with applications to bayesian nonparametric problems. Ann Stat 2(6):1152–1174MathSciNetMATHCrossRef Antoniak CE et al (1974) Mixtures of dirichlet processes with applications to bayesian nonparametric problems. Ann Stat 2(6):1152–1174MathSciNetMATHCrossRef
Zurück zum Zitat Banerjee A, Basu S (2007) Topic models over text streams: a study of batch and online unsupervised learning. In: SDM. SIAM, vol 7, pp 437–442 Banerjee A, Basu S (2007) Topic models over text streams: a study of batch and online unsupervised learning. In: SDM. SIAM, vol 7, pp 437–442
Zurück zum Zitat Blei DM, Frazier PI (2011) Distance dependent chinese restaurant processes. J Mach Learn Res 12:2461–2488MathSciNetMATH Blei DM, Frazier PI (2011) Distance dependent chinese restaurant processes. J Mach Learn Res 12:2461–2488MathSciNetMATH
Zurück zum Zitat Blei DM, Frazier PI (2011) Distance dependent chinese restaurant processes. J Mach Learn Res 12:2461–2488MathSciNetMATH Blei DM, Frazier PI (2011) Distance dependent chinese restaurant processes. J Mach Learn Res 12:2461–2488MathSciNetMATH
Zurück zum Zitat Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 113–120 Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 113–120
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH
Zurück zum Zitat Boyles L, Welling M (2012) The time-marginalized coalescent prior for hierarchical clustering. Advances in neural information processing systems. MIT Press, London, pp 2969–2977 Boyles L, Welling M (2012) The time-marginalized coalescent prior for hierarchical clustering. Advances in neural information processing systems. MIT Press, London, pp 2969–2977
Zurück zum Zitat Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’06. ACM, New York, pp 554–560 Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’06. ACM, New York, pp 554–560
Zurück zum Zitat Chen C, Ding N, Buntine W (2012) Dependent hierarchical normalized random measures for dynamic topic modeling. arXiv preprint arXiv:1206.4671 p 8 Chen C, Ding N, Buntine W (2012) Dependent hierarchical normalized random measures for dynamic topic modeling. arXiv preprint arXiv:​1206.​4671 p 8
Zurück zum Zitat Chi Y, Song X, Zhou D, Hino K, Tseng BL (2007) Evolutionary spectral clustering by incorporating temporal smoothness. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 153–162 Chi Y, Song X, Zhou D, Hino K, Tseng BL (2007) Evolutionary spectral clustering by incorporating temporal smoothness. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 153–162
Zurück zum Zitat Diao Q, Jiang J, Zhu F, Lim EP (2012) Finding bursty topics from microblogs. In: Proceedings of the 50th annual meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, pp 536–544 Diao Q, Jiang J, Zhu F, Lim EP (2012) Finding bursty topics from microblogs. In: Proceedings of the 50th annual meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, pp 536–544
Zurück zum Zitat Gao Z, Song Y, Liu S, Wang H, Wei H, Chen Y, Cui W (2011) Tracking and connecting topics via incremental hierarchical dirichlet processes. In: 2011 IEEE 11th International Conference on Data Mining (ICDM). IEEE, pp 1056–1061 Gao Z, Song Y, Liu S, Wang H, Wei H, Chen Y, Cui W (2011) Tracking and connecting topics via incremental hierarchical dirichlet processes. In: 2011 IEEE 11th International Conference on Data Mining (ICDM). IEEE, pp 1056–1061
Zurück zum Zitat Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis. CRC Press, Boca RatonMATH Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis. CRC Press, Boca RatonMATH
Zurück zum Zitat Gordon N, Ristic B, Arulampalam S (2004) Beyond the kalman filter: particle filters for tracking applications. Artech House, London Gordon N, Ristic B, Arulampalam S (2004) Beyond the kalman filter: particle filters for tracking applications. Artech House, London
Zurück zum Zitat Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235CrossRef Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235CrossRef
Zurück zum Zitat Griffiths DMBTL, Tenenbaum MIJJB (2004) Hierarchical topic models and the nested Chinese restaurant process. Adv Neural Inf Process Syst 16:17 Griffiths DMBTL, Tenenbaum MIJJB (2004) Hierarchical topic models and the nested Chinese restaurant process. Adv Neural Inf Process Syst 16:17
Zurück zum Zitat Kawamae N (2011) Trend analysis model: trend consists of temporal words, topics, and timestamps. In: Proceedings of the fourth ACM international conference on Web search and data mining. ACM, pp 317–326 Kawamae N (2011) Trend analysis model: trend consists of temporal words, topics, and timestamps. In: Proceedings of the fourth ACM international conference on Web search and data mining. ACM, pp 317–326
Zurück zum Zitat Kawamae N (2012) Theme chronicle model: Chronicle consists of timestamp and topical words over each theme. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12. ACM, New York, pp 2065–2069 Kawamae N (2012) Theme chronicle model: Chronicle consists of timestamp and topical words over each theme. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12. ACM, New York, pp 2065–2069
Zurück zum Zitat Kingman JF (1982a) On the genealogy of large populations. J Appl Probab 19:27–43 Kingman JF (1982a) On the genealogy of large populations. J Appl Probab 19:27–43
Zurück zum Zitat Kingman JFC (1982b) The coalescent. Stoch Process Appl 13(3):235–248 Kingman JFC (1982b) The coalescent. Stoch Process Appl 13(3):235–248
Zurück zum Zitat Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 497–506 Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 497–506
Zurück zum Zitat Li AQ, Ahmed A, Ravi S, Smola AJ (2014) Reducing the sampling complexity of topic models. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 891–900 Li AQ, Ahmed A, Ravi S, Smola AJ (2014) Reducing the sampling complexity of topic models. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 891–900
Zurück zum Zitat Lin D, Grimson E, Fisher III JW (2010) Construction of dependent dirichlet processes based on poisson processes. Neural Inf Process Syst Found pp 1396–1404 Lin D, Grimson E, Fisher III JW (2010) Construction of dependent dirichlet processes based on poisson processes. Neural Inf Process Syst Found pp 1396–1404
Zurück zum Zitat MacEachern SN (2000) Dependent dirichlet processes. Unpublished manuscript, Department of Statistics, The Ohio State University pp 1–40 MacEachern SN (2000) Dependent dirichlet processes. Unpublished manuscript, Department of Statistics, The Ohio State University pp 1–40
Zurück zum Zitat Neal RM (2000) Markov chain sampling methods for dirichlet process mixture models. J Comput Graph Stat 9(2):249–265MathSciNet Neal RM (2000) Markov chain sampling methods for dirichlet process mixture models. J Comput Graph Stat 9(2):249–265MathSciNet
Zurück zum Zitat Neal RM (2003) Density modeling and clustering using dirichlet diffusion trees. Bayesian Stat 7:619–629MathSciNet Neal RM (2003) Density modeling and clustering using dirichlet diffusion trees. Bayesian Stat 7:619–629MathSciNet
Zurück zum Zitat Ren L, Dunson DB, Carin L (2008) The dynamic hierarchical dirichlet process. In: Proceedings of the 25th international conference on Machine learning. ACM, pp 824–831 Ren L, Dunson DB, Carin L (2008) The dynamic hierarchical dirichlet process. In: Proceedings of the 25th international conference on Machine learning. ACM, pp 824–831
Zurück zum Zitat Shahaf D, Yang J, Suen C, Jacobs J, Wang H, Leskovec J (2013) Information cartography: creating zoomable, large-scale maps of information. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1097–1105 Shahaf D, Yang J, Suen C, Jacobs J, Wang H, Leskovec J (2013) Information cartography: creating zoomable, large-scale maps of information. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1097–1105
Zurück zum Zitat Sun Y, Tang J, Han J, Chen C, Gupta M (2013) Co-evolution of multi-typed objects in dynamic star networks. IEEE Trans Knowl Data Eng 99:1 Sun Y, Tang J, Han J, Chen C, Gupta M (2013) Co-evolution of multi-typed objects in dynamic star networks. IEEE Trans Knowl Data Eng 99:1
Zurück zum Zitat Teh YW (2006) A hierarchical bayesian language model based on pitman-yor processes. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 985–992 Teh YW (2006) A hierarchical bayesian language model based on pitman-yor processes. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 985–992
Zurück zum Zitat Teh YW, Kurihara K, Welling M (2008) Collapsed variational inference for HDP. Advances in neural information processing systems. MIT Press, London, pp 1481–1488 Teh YW, Kurihara K, Welling M (2008) Collapsed variational inference for HDP. Advances in neural information processing systems. MIT Press, London, pp 1481–1488
Zurück zum Zitat Thibaux R, Jordan MI (2007) Hierarchical beta processes and the indian buffet process. In: International conference on artificial intelligence and statistics, pp 564–571 Thibaux R, Jordan MI (2007) Hierarchical beta processes and the indian buffet process. In: International conference on artificial intelligence and statistics, pp 564–571
Zurück zum Zitat Wallach HM, Murray I, Salakhutdinov R, Mimno D (2009) Evaluation methods for topic models. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 1105–1112 Wallach HM, Murray I, Salakhutdinov R, Mimno D (2009) Evaluation methods for topic models. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 1105–1112
Zurück zum Zitat Wang C, Paisley JW, Blei DM (2011) Online variational inference for the hierarchical dirichlet process. In: International conference on artificial intelligence and statistics, pp 752–760 Wang C, Paisley JW, Blei DM (2011) Online variational inference for the hierarchical dirichlet process. In: International conference on artificial intelligence and statistics, pp 752–760
Zurück zum Zitat Wang X, Ma X, Grimson WEL (2009) Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models. IEEE Trans Pattern Anal Mach Intell 31(3):539–555CrossRef Wang X, Ma X, Grimson WEL (2009) Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models. IEEE Trans Pattern Anal Mach Intell 31(3):539–555CrossRef
Zurück zum Zitat Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 424–433. ACM Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 424–433. ACM
Zurück zum Zitat Xu MEKJ (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96). AAAI, pp 226–231 Xu MEKJ (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96). AAAI, pp 226–231
Zurück zum Zitat Yang T, Chi Y, Zhu S, Gong Y, Jin R (2011) Detecting communities and their evolutions in dynamic social networksa bayesian approach. Mach Learn 82(2):157–189MathSciNetMATHCrossRef Yang T, Chi Y, Zhu S, Gong Y, Jin R (2011) Detecting communities and their evolutions in dynamic social networksa bayesian approach. Mach Learn 82(2):157–189MathSciNetMATHCrossRef
Zurück zum Zitat Yao L, Mimno D, McCallum A (2009) Efficient methods for topic model inference on streaming document collections. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 937–946 Yao L, Mimno D, McCallum A (2009) Efficient methods for topic model inference on streaming document collections. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 937–946
Zurück zum Zitat Zhang J, Song Y, Zhang C, Liu S (2010) Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1079–1088 Zhang J, Song Y, Zhang C, Liu S (2010) Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1079–1088
Zurück zum Zitat Zhang P, Gao BJ, Liu P, Shi Y, Guo L (2012) A framework for application-driven classification of data streams. Neurocomputing 92:170–182CrossRef Zhang P, Gao BJ, Liu P, Shi Y, Guo L (2012) A framework for application-driven classification of data streams. Neurocomputing 92:170–182CrossRef
Zurück zum Zitat Zhang P, Zhou C, Wang P, Gao BJ, Zhu X, Guo L (2015) E-tree: an efficient indexing structure for ensemble models on data streams. IEEE Trans Knowl Data Eng 27(2):461–474CrossRef Zhang P, Zhou C, Wang P, Gao BJ, Zhu X, Guo L (2015) E-tree: an efficient indexing structure for ensemble models on data streams. IEEE Trans Knowl Data Eng 27(2):461–474CrossRef
Zurück zum Zitat Zhang W, Li R, Feng D, Chernikov A, Chrisochoides N, Osgood C, Ji S (2015) Evolutionary soft co-clustering: formulations, algorithms, and applications. Data Min Knowl Discov 29(3):765–791MathSciNetCrossRef Zhang W, Li R, Feng D, Chernikov A, Chrisochoides N, Osgood C, Ji S (2015) Evolutionary soft co-clustering: formulations, algorithms, and applications. Data Min Knowl Discov 29(3):765–791MathSciNetCrossRef
Metadaten
Titel
Hierarchical evolving Dirichlet processes for modeling nonlinear evolutionary traces in temporal data
verfasst von
Peng Wang
Peng Zhang
Chuan Zhou
Zhao Li
Hong Yang
Publikationsdatum
15.02.2016
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 1/2017
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-016-0454-1

Weitere Artikel der Ausgabe 1/2017

Data Mining and Knowledge Discovery 1/2017 Zur Ausgabe