Skip to main content
Top
Published in: World Wide Web 2/2017

21-04-2016

A probabilistic method for emerging topic tracking in Microblog stream

Authors: Jiajia Huang, Min Peng, Hua Wang, Jinli Cao, Wang Gao, Xiuzhen Zhang

Published in: World Wide Web | Issue 2/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Microblog is a popular and open platform for discovering and sharing the latest news about social issues and daily life. The quickly-updated microblog streams make it urgent to develop an effective tool to monitor such streams. Emerging topic tracking is one of such tools to reveal what new events are attracting the most online attention at present. However, due to the fast changing, high noise and short length of the microblog feeds, two challenges should be addressed in emerging topic tracking. One is the problem of detecting emerging topics early, long before they become hot, and the other is how to effectively monitor evolving topics over time. In this study, we propose a novel emerging topics tracking method, which aligns emerging word detection from temporal perspective with coherent topic mining from spatial perspective. Specifically, we first design a metric to estimate word novelty and fading based on local weighted linear regression (LWLR), which can highlight the word novelty of expressing an emerging topic and suppress the word novelty of expressing an existing topic. We then track emerging topics by leveraging topic novelty and fading probabilities, which are learnt by designing and solving an optimization problem. We evaluate our method on a microblog stream containing over one million feeds. Experimental results show the promising performance of the proposed method in detecting emerging topic and tracking topic evolution over time on both effectiveness and efficiency.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
2
We use Jieba for Chinese word segmentation, which can be downloaded from https://​github.​com/​fxsjy/​jieba
 
Literature
1.
go back to reference Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content in social media. In: WSDM, pp. 183–194 (2008) Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content in social media. In: WSDM, pp. 183–194 (2008)
2.
go back to reference AlSumait, L., Barbar, D., Domeniconi, C.: On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In: ICDM, pp 3–12 (2008) AlSumait, L., Barbar, D., Domeniconi, C.: On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In: ICDM, pp 3–12 (2008)
3.
go back to reference Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML, pp. 113–120 (2006) Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML, pp. 113–120 (2006)
4.
go back to reference Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)MATH
5.
go back to reference Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. FTML3(1), 1–122 (2011) Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. FTML3(1), 1–122 (2011)
6.
go back to reference Cai, H., Huang, Z., Srivastava, D., Zhang, Q.: Indexing evolving events from tweet streams. TKDE 27(11), 3001–3015 (2015) Cai, H., Huang, Z., Srivastava, D., Zhang, Q.: Indexing evolving events from tweet streams. TKDE 27(11), 3001–3015 (2015)
7.
go back to reference Chen, Y., Amiri, H., Li, Z., Chua, T.S.: Emerging topic detection for organizations from microblogs. In: SIGIR, pp. 43–52 (2013) Chen, Y., Amiri, H., Li, Z., Chua, T.S.: Emerging topic detection for organizations from microblogs. In: SIGIR, pp. 43–52 (2013)
8.
go back to reference Chen, Z., Liu, B.: Mining topics in documents: Standing on the shoulders of big data. In: SIGKDD, pp. 1116–1125 (2014) Chen, Z., Liu, B.: Mining topics in documents: Standing on the shoulders of big data. In: SIGKDD, pp. 1116–1125 (2014)
9.
go back to reference Cheng, X., Yan, X., Lan, Y., Guo, J.: BTM: Topic model over short texts. TKDE 26(12), 2928–2941 (2014) Cheng, X., Yan, X., Lan, Y., Guo, J.: BTM: Topic model over short texts. TKDE 26(12), 2928–2941 (2014)
10.
go back to reference Diao, Q., Jiang, J., Zhu, F., Lim, E.P.: Finding bursty topics from microblogs. In: ACL, pp. 536–544 (2012) Diao, Q., Jiang, J., Zhu, F., Lim, E.P.: Finding bursty topics from microblogs. In: ACL, pp. 536–544 (2012)
11.
go back to reference Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci 101, 5228–5235 (2004)CrossRef Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci 101, 5228–5235 (2004)CrossRef
12.
go back to reference Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57 (1999) Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57 (1999)
13.
go back to reference Huang, J., Peng, M., Wang, H.: Topic detection from large scale of microblog stream with high utility pattern clustering. In: Proceedings of the 8th Workshop on Ph. D. Workshop in CIKM, pp. 3–10 (2015) Huang, J., Peng, M., Wang, H.: Topic detection from large scale of microblog stream with high utility pattern clustering. In: Proceedings of the 8th Workshop on Ph. D. Workshop in CIKM, pp. 3–10 (2015)
14.
go back to reference Iwata, T., Watanabe, S., Yamada, T., Ueda, N.: Topic tracking model for analyzing consumer purchase behavior. In: IJCAI, pp. 1427–1432 (2009) Iwata, T., Watanabe, S., Yamada, T., Ueda, N.: Topic tracking model for analyzing consumer purchase behavior. In: IJCAI, pp. 1427–1432 (2009)
15.
go back to reference Jeffery, S.R., Garofalakis, M., Franklin, M.J.: Adaptive cleaning for RFID data streams. In: VLDB, pp. 163–174 (2006) Jeffery, S.R., Garofalakis, M., Franklin, M.J.: Adaptive cleaning for RFID data streams. In: VLDB, pp. 163–174 (2006)
16.
go back to reference Kasiviswanathan, S.P., Melville, P., Banerjee, A., Sindhwani, V.: Emerging topic detection using dictionary learning. In: CIKM, pp. 745–754 (2011) Kasiviswanathan, S.P., Melville, P., Banerjee, A., Sindhwani, V.: Emerging topic detection using dictionary learning. In: CIKM, pp. 745–754 (2011)
17.
go back to reference Lau, J.H., Collier, N., Baldwin, T.: On-line trend analysis with topic models: Twitter trends detection topic model online. In: COLING, pp. 1519–1534 (2012) Lau, J.H., Collier, N., Baldwin, T.: On-line trend analysis with topic models: Twitter trends detection topic model online. In: COLING, pp. 1519–1534 (2012)
18.
go back to reference Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: CIKM, pp. 155–164 (2012) Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: CIKM, pp. 155–164 (2012)
19.
go back to reference Lin, T., Tian, W., Mei, Q., Cheng, H.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: WWW, pp. 539–550 (2014) Lin, T., Tian, W., Mei, Q., Cheng, H.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: WWW, pp. 539–550 (2014)
20.
go back to reference Ma, J., Sun, L., Wang, H., Zhang, Y., Aickelin, U.: Supervised anomaly detection in uncertain pseudoperiodic data streams. ACM Trans. Internet Technol. (TOIT) 16(1), 4 (2016)CrossRef Ma, J., Sun, L., Wang, H., Zhang, Y., Aickelin, U.: Supervised anomaly detection in uncertain pseudoperiodic data streams. ACM Trans. Internet Technol. (TOIT) 16(1), 4 (2016)CrossRef
21.
go back to reference McAuley, J., Leskovec, J.: Hidden factors and hidden topics: understanding rating dimensions with review text. In: Recommender Systems, pp 165–172 (2013) McAuley, J., Leskovec, J.: Hidden factors and hidden topics: understanding rating dimensions with review text. In: Recommender Systems, pp 165–172 (2013)
22.
go back to reference Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272 (2011) Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272 (2011)
23.
go back to reference Nallapati, R.M., Ditmore, S., Lafferty, J.D., Ung, K.: Multiscale topic tomography. In: SIGKDD, pp. 520–529 (2007) Nallapati, R.M., Ditmore, S., Lafferty, J.D., Ung, K.: Multiscale topic tomography. In: SIGKDD, pp. 520–529 (2007)
24.
go back to reference Peng, M., Huang, J., Fu, H., Zhu, J., Zhou, L., He, Y., Li, F.: High quality microblog extraction based on multiple features fusion and time-frequency transformation. In: WISE, pp. 188–201 (2013) Peng, M., Huang, J., Fu, H., Zhu, J., Zhou, L., He, Y., Li, F.: High quality microblog extraction based on multiple features fusion and time-frequency transformation. In: WISE, pp. 188–201 (2013)
25.
go back to reference Petrovi, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: NAACL, pp. 181–189 (2010) Petrovi, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: NAACL, pp. 181–189 (2010)
26.
go back to reference Pu, X., Jin, R., Wu, G., Han, D., Xue, G.: Topic modeling in semantic space with keywords. In: CIKM, pp. 1141–1150 (2015) Pu, X., Jin, R., Wu, G., Han, D., Xue, G.: Topic modeling in semantic space with keywords. In: CIKM, pp. 1141–1150 (2015)
27.
go back to reference Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: Real-time event detection by social sensors. In: WWW, pp 851–860 (2010) Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: Real-time event detection by social sensors. In: WWW, pp 851–860 (2010)
28.
go back to reference Schubert, E., Weiler, M., Kriegel, H.P.: Signitrend: Scalable detection of emerging topics in textual streams by hashed significance thresholds. In: SIGKDD, pp. 871–880 (2014) Schubert, E., Weiler, M., Kriegel, H.P.: Signitrend: Scalable detection of emerging topics in textual streams by hashed significance thresholds. In: SIGKDD, pp. 871–880 (2014)
29.
go back to reference Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on text mining, pp. 525–526 (2000) Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on text mining, pp. 525–526 (2000)
30.
go back to reference Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting elections with Twitter: What 140 characters reveal about political sentiment. In: ICWSM, pp. 178–185 (2010) Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting elections with Twitter: What 140 characters reveal about political sentiment. In: ICWSM, pp. 178–185 (2010)
31.
go back to reference Unankard, S., Li, X., Sharaf, M.A.: Emerging event detection in social networks with location sensitivity. JWWW 18(5), 1–25 (2014) Unankard, S., Li, X., Sharaf, M.A.: Emerging event detection in social networks with location sensitivity. JWWW 18(5), 1–25 (2014)
32.
go back to reference Wang, X., McCallum, A.: Topics over time: A non-Markov continuous-time model of topical trends. In: SIGKDD, pp. 424–433 (2006) Wang, X., McCallum, A.: Topics over time: A non-Markov continuous-time model of topical trends. In: SIGKDD, pp. 424–433 (2006)
33.
go back to reference Weng, J., Lee, B.S.: Event detection in Twitter. In: ICWSM, pp 401–408 (2011) Weng, J., Lee, B.S.: Event detection in Twitter. In: ICWSM, pp 401–408 (2011)
34.
go back to reference Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: Topicsketch: real-time bursty topic detection from Twitter. In: ICDM, pp. 837–846 (2013) Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: Topicsketch: real-time bursty topic detection from Twitter. In: ICDM, pp. 837–846 (2013)
35.
go back to reference Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: AAAI Conference on artificial intelligence, pp. 353–359 (2015) Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: AAAI Conference on artificial intelligence, pp. 353–359 (2015)
36.
go back to reference Yang, X., Ghoting, A., Ruan, Y., Parthasarathy, S.: A framework for summarizing and analyzing Twitter feeds. In: SIGKDD, pp. 370–378 (2012) Yang, X., Ghoting, A., Ruan, Y., Parthasarathy, S.: A framework for summarizing and analyzing Twitter feeds. In: SIGKDD, pp. 370–378 (2012)
37.
go back to reference Yao, W., He, J., Wang, H., Zhang, Y., Cao, J.: Collaborative topic ranking: Leveraging item Meta-Data for sparsity reduction. In: AAAI, pp. 374–380 (2015) Yao, W., He, J., Wang, H., Zhang, Y., Cao, J.: Collaborative topic ranking: Leveraging item Meta-Data for sparsity reduction. In: AAAI, pp. 374–380 (2015)
38.
go back to reference Yin, J., Wang, J.: A dirichlet multinomial mixture model-based approach for short text clustering. In: SIGKDD, pp. 233–242 (2014) Yin, J., Wang, J.: A dirichlet multinomial mixture model-based approach for short text clustering. In: SIGKDD, pp. 233–242 (2014)
39.
go back to reference Yin, H., Cui, B., Lu, H., Huang, Y., Yao, J.: A unified model for stable and temporal topic detection from social media data. In: ICDE, pp. 661–672 (2013) Yin, H., Cui, B., Lu, H., Huang, Y., Yao, J.: A unified model for stable and temporal topic detection from social media data. In: ICDE, pp. 661–672 (2013)
40.
go back to reference Zhang, H., Kim, G., Xing, E.P.: Dynamic topic modeling for monitoring market competition from online text and image data. In: SIGKDD, pp. 1425–1434 (2015) Zhang, H., Kim, G., Xing, E.P.: Dynamic topic modeling for monitoring market competition from online text and image data. In: SIGKDD, pp. 1425–1434 (2015)
41.
go back to reference Zhu, J., Xing, E.P.: Sparse topical coding. In: UAI, pp. 831–838 (2011) Zhu, J., Xing, E.P.: Sparse topical coding. In: UAI, pp. 831–838 (2011)
42.
go back to reference Zhu, J., Peng, M., Huang, J., Qian, T., Huang, J., Liu, J., Hong, R., Liu, P.: Coherent topic hierarchy: A strategy for topic evolutionary analysis on microblog feeds. In: WAIM, pp. 70–82 (2015) Zhu, J., Peng, M., Huang, J., Qian, T., Huang, J., Liu, J., Hong, R., Liu, P.: Coherent topic hierarchy: A strategy for topic evolutionary analysis on microblog feeds. In: WAIM, pp. 70–82 (2015)
Metadata
Title
A probabilistic method for emerging topic tracking in Microblog stream
Authors
Jiajia Huang
Min Peng
Hua Wang
Jinli Cao
Wang Gao
Xiuzhen Zhang
Publication date
21-04-2016
Publisher
Springer US
Published in
World Wide Web / Issue 2/2017
Print ISSN: 1386-145X
Electronic ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-016-0390-4

Other articles of this Issue 2/2017

World Wide Web 2/2017 Go to the issue

Premium Partner