Skip to main content
Erschienen in: World Wide Web 2/2017

21.04.2016

A probabilistic method for emerging topic tracking in Microblog stream

verfasst von: Jiajia Huang, Min Peng, Hua Wang, Jinli Cao, Wang Gao, Xiuzhen Zhang

Erschienen in: World Wide Web | Ausgabe 2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Microblog is a popular and open platform for discovering and sharing the latest news about social issues and daily life. The quickly-updated microblog streams make it urgent to develop an effective tool to monitor such streams. Emerging topic tracking is one of such tools to reveal what new events are attracting the most online attention at present. However, due to the fast changing, high noise and short length of the microblog feeds, two challenges should be addressed in emerging topic tracking. One is the problem of detecting emerging topics early, long before they become hot, and the other is how to effectively monitor evolving topics over time. In this study, we propose a novel emerging topics tracking method, which aligns emerging word detection from temporal perspective with coherent topic mining from spatial perspective. Specifically, we first design a metric to estimate word novelty and fading based on local weighted linear regression (LWLR), which can highlight the word novelty of expressing an emerging topic and suppress the word novelty of expressing an existing topic. We then track emerging topics by leveraging topic novelty and fading probabilities, which are learnt by designing and solving an optimization problem. We evaluate our method on a microblog stream containing over one million feeds. Experimental results show the promising performance of the proposed method in detecting emerging topic and tracking topic evolution over time on both effectiveness and efficiency.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
2
We use Jieba for Chinese word segmentation, which can be downloaded from https://​github.​com/​fxsjy/​jieba
 
Literatur
1.
Zurück zum Zitat Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content in social media. In: WSDM, pp. 183–194 (2008) Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content in social media. In: WSDM, pp. 183–194 (2008)
2.
Zurück zum Zitat AlSumait, L., Barbar, D., Domeniconi, C.: On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In: ICDM, pp 3–12 (2008) AlSumait, L., Barbar, D., Domeniconi, C.: On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In: ICDM, pp 3–12 (2008)
3.
Zurück zum Zitat Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML, pp. 113–120 (2006) Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML, pp. 113–120 (2006)
4.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)MATH
5.
Zurück zum Zitat Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. FTML3(1), 1–122 (2011) Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. FTML3(1), 1–122 (2011)
6.
Zurück zum Zitat Cai, H., Huang, Z., Srivastava, D., Zhang, Q.: Indexing evolving events from tweet streams. TKDE 27(11), 3001–3015 (2015) Cai, H., Huang, Z., Srivastava, D., Zhang, Q.: Indexing evolving events from tweet streams. TKDE 27(11), 3001–3015 (2015)
7.
Zurück zum Zitat Chen, Y., Amiri, H., Li, Z., Chua, T.S.: Emerging topic detection for organizations from microblogs. In: SIGIR, pp. 43–52 (2013) Chen, Y., Amiri, H., Li, Z., Chua, T.S.: Emerging topic detection for organizations from microblogs. In: SIGIR, pp. 43–52 (2013)
8.
Zurück zum Zitat Chen, Z., Liu, B.: Mining topics in documents: Standing on the shoulders of big data. In: SIGKDD, pp. 1116–1125 (2014) Chen, Z., Liu, B.: Mining topics in documents: Standing on the shoulders of big data. In: SIGKDD, pp. 1116–1125 (2014)
9.
Zurück zum Zitat Cheng, X., Yan, X., Lan, Y., Guo, J.: BTM: Topic model over short texts. TKDE 26(12), 2928–2941 (2014) Cheng, X., Yan, X., Lan, Y., Guo, J.: BTM: Topic model over short texts. TKDE 26(12), 2928–2941 (2014)
10.
Zurück zum Zitat Diao, Q., Jiang, J., Zhu, F., Lim, E.P.: Finding bursty topics from microblogs. In: ACL, pp. 536–544 (2012) Diao, Q., Jiang, J., Zhu, F., Lim, E.P.: Finding bursty topics from microblogs. In: ACL, pp. 536–544 (2012)
11.
Zurück zum Zitat Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci 101, 5228–5235 (2004)CrossRef Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci 101, 5228–5235 (2004)CrossRef
12.
Zurück zum Zitat Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57 (1999) Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57 (1999)
13.
Zurück zum Zitat Huang, J., Peng, M., Wang, H.: Topic detection from large scale of microblog stream with high utility pattern clustering. In: Proceedings of the 8th Workshop on Ph. D. Workshop in CIKM, pp. 3–10 (2015) Huang, J., Peng, M., Wang, H.: Topic detection from large scale of microblog stream with high utility pattern clustering. In: Proceedings of the 8th Workshop on Ph. D. Workshop in CIKM, pp. 3–10 (2015)
14.
Zurück zum Zitat Iwata, T., Watanabe, S., Yamada, T., Ueda, N.: Topic tracking model for analyzing consumer purchase behavior. In: IJCAI, pp. 1427–1432 (2009) Iwata, T., Watanabe, S., Yamada, T., Ueda, N.: Topic tracking model for analyzing consumer purchase behavior. In: IJCAI, pp. 1427–1432 (2009)
15.
Zurück zum Zitat Jeffery, S.R., Garofalakis, M., Franklin, M.J.: Adaptive cleaning for RFID data streams. In: VLDB, pp. 163–174 (2006) Jeffery, S.R., Garofalakis, M., Franklin, M.J.: Adaptive cleaning for RFID data streams. In: VLDB, pp. 163–174 (2006)
16.
Zurück zum Zitat Kasiviswanathan, S.P., Melville, P., Banerjee, A., Sindhwani, V.: Emerging topic detection using dictionary learning. In: CIKM, pp. 745–754 (2011) Kasiviswanathan, S.P., Melville, P., Banerjee, A., Sindhwani, V.: Emerging topic detection using dictionary learning. In: CIKM, pp. 745–754 (2011)
17.
Zurück zum Zitat Lau, J.H., Collier, N., Baldwin, T.: On-line trend analysis with topic models: Twitter trends detection topic model online. In: COLING, pp. 1519–1534 (2012) Lau, J.H., Collier, N., Baldwin, T.: On-line trend analysis with topic models: Twitter trends detection topic model online. In: COLING, pp. 1519–1534 (2012)
18.
Zurück zum Zitat Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: CIKM, pp. 155–164 (2012) Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: CIKM, pp. 155–164 (2012)
19.
Zurück zum Zitat Lin, T., Tian, W., Mei, Q., Cheng, H.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: WWW, pp. 539–550 (2014) Lin, T., Tian, W., Mei, Q., Cheng, H.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: WWW, pp. 539–550 (2014)
20.
Zurück zum Zitat Ma, J., Sun, L., Wang, H., Zhang, Y., Aickelin, U.: Supervised anomaly detection in uncertain pseudoperiodic data streams. ACM Trans. Internet Technol. (TOIT) 16(1), 4 (2016)CrossRef Ma, J., Sun, L., Wang, H., Zhang, Y., Aickelin, U.: Supervised anomaly detection in uncertain pseudoperiodic data streams. ACM Trans. Internet Technol. (TOIT) 16(1), 4 (2016)CrossRef
21.
Zurück zum Zitat McAuley, J., Leskovec, J.: Hidden factors and hidden topics: understanding rating dimensions with review text. In: Recommender Systems, pp 165–172 (2013) McAuley, J., Leskovec, J.: Hidden factors and hidden topics: understanding rating dimensions with review text. In: Recommender Systems, pp 165–172 (2013)
22.
Zurück zum Zitat Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272 (2011) Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272 (2011)
23.
Zurück zum Zitat Nallapati, R.M., Ditmore, S., Lafferty, J.D., Ung, K.: Multiscale topic tomography. In: SIGKDD, pp. 520–529 (2007) Nallapati, R.M., Ditmore, S., Lafferty, J.D., Ung, K.: Multiscale topic tomography. In: SIGKDD, pp. 520–529 (2007)
24.
Zurück zum Zitat Peng, M., Huang, J., Fu, H., Zhu, J., Zhou, L., He, Y., Li, F.: High quality microblog extraction based on multiple features fusion and time-frequency transformation. In: WISE, pp. 188–201 (2013) Peng, M., Huang, J., Fu, H., Zhu, J., Zhou, L., He, Y., Li, F.: High quality microblog extraction based on multiple features fusion and time-frequency transformation. In: WISE, pp. 188–201 (2013)
25.
Zurück zum Zitat Petrovi, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: NAACL, pp. 181–189 (2010) Petrovi, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: NAACL, pp. 181–189 (2010)
26.
Zurück zum Zitat Pu, X., Jin, R., Wu, G., Han, D., Xue, G.: Topic modeling in semantic space with keywords. In: CIKM, pp. 1141–1150 (2015) Pu, X., Jin, R., Wu, G., Han, D., Xue, G.: Topic modeling in semantic space with keywords. In: CIKM, pp. 1141–1150 (2015)
27.
Zurück zum Zitat Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: Real-time event detection by social sensors. In: WWW, pp 851–860 (2010) Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: Real-time event detection by social sensors. In: WWW, pp 851–860 (2010)
28.
Zurück zum Zitat Schubert, E., Weiler, M., Kriegel, H.P.: Signitrend: Scalable detection of emerging topics in textual streams by hashed significance thresholds. In: SIGKDD, pp. 871–880 (2014) Schubert, E., Weiler, M., Kriegel, H.P.: Signitrend: Scalable detection of emerging topics in textual streams by hashed significance thresholds. In: SIGKDD, pp. 871–880 (2014)
29.
Zurück zum Zitat Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on text mining, pp. 525–526 (2000) Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on text mining, pp. 525–526 (2000)
30.
Zurück zum Zitat Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting elections with Twitter: What 140 characters reveal about political sentiment. In: ICWSM, pp. 178–185 (2010) Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting elections with Twitter: What 140 characters reveal about political sentiment. In: ICWSM, pp. 178–185 (2010)
31.
Zurück zum Zitat Unankard, S., Li, X., Sharaf, M.A.: Emerging event detection in social networks with location sensitivity. JWWW 18(5), 1–25 (2014) Unankard, S., Li, X., Sharaf, M.A.: Emerging event detection in social networks with location sensitivity. JWWW 18(5), 1–25 (2014)
32.
Zurück zum Zitat Wang, X., McCallum, A.: Topics over time: A non-Markov continuous-time model of topical trends. In: SIGKDD, pp. 424–433 (2006) Wang, X., McCallum, A.: Topics over time: A non-Markov continuous-time model of topical trends. In: SIGKDD, pp. 424–433 (2006)
33.
Zurück zum Zitat Weng, J., Lee, B.S.: Event detection in Twitter. In: ICWSM, pp 401–408 (2011) Weng, J., Lee, B.S.: Event detection in Twitter. In: ICWSM, pp 401–408 (2011)
34.
Zurück zum Zitat Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: Topicsketch: real-time bursty topic detection from Twitter. In: ICDM, pp. 837–846 (2013) Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: Topicsketch: real-time bursty topic detection from Twitter. In: ICDM, pp. 837–846 (2013)
35.
Zurück zum Zitat Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: AAAI Conference on artificial intelligence, pp. 353–359 (2015) Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: AAAI Conference on artificial intelligence, pp. 353–359 (2015)
36.
Zurück zum Zitat Yang, X., Ghoting, A., Ruan, Y., Parthasarathy, S.: A framework for summarizing and analyzing Twitter feeds. In: SIGKDD, pp. 370–378 (2012) Yang, X., Ghoting, A., Ruan, Y., Parthasarathy, S.: A framework for summarizing and analyzing Twitter feeds. In: SIGKDD, pp. 370–378 (2012)
37.
Zurück zum Zitat Yao, W., He, J., Wang, H., Zhang, Y., Cao, J.: Collaborative topic ranking: Leveraging item Meta-Data for sparsity reduction. In: AAAI, pp. 374–380 (2015) Yao, W., He, J., Wang, H., Zhang, Y., Cao, J.: Collaborative topic ranking: Leveraging item Meta-Data for sparsity reduction. In: AAAI, pp. 374–380 (2015)
38.
Zurück zum Zitat Yin, J., Wang, J.: A dirichlet multinomial mixture model-based approach for short text clustering. In: SIGKDD, pp. 233–242 (2014) Yin, J., Wang, J.: A dirichlet multinomial mixture model-based approach for short text clustering. In: SIGKDD, pp. 233–242 (2014)
39.
Zurück zum Zitat Yin, H., Cui, B., Lu, H., Huang, Y., Yao, J.: A unified model for stable and temporal topic detection from social media data. In: ICDE, pp. 661–672 (2013) Yin, H., Cui, B., Lu, H., Huang, Y., Yao, J.: A unified model for stable and temporal topic detection from social media data. In: ICDE, pp. 661–672 (2013)
40.
Zurück zum Zitat Zhang, H., Kim, G., Xing, E.P.: Dynamic topic modeling for monitoring market competition from online text and image data. In: SIGKDD, pp. 1425–1434 (2015) Zhang, H., Kim, G., Xing, E.P.: Dynamic topic modeling for monitoring market competition from online text and image data. In: SIGKDD, pp. 1425–1434 (2015)
41.
Zurück zum Zitat Zhu, J., Xing, E.P.: Sparse topical coding. In: UAI, pp. 831–838 (2011) Zhu, J., Xing, E.P.: Sparse topical coding. In: UAI, pp. 831–838 (2011)
42.
Zurück zum Zitat Zhu, J., Peng, M., Huang, J., Qian, T., Huang, J., Liu, J., Hong, R., Liu, P.: Coherent topic hierarchy: A strategy for topic evolutionary analysis on microblog feeds. In: WAIM, pp. 70–82 (2015) Zhu, J., Peng, M., Huang, J., Qian, T., Huang, J., Liu, J., Hong, R., Liu, P.: Coherent topic hierarchy: A strategy for topic evolutionary analysis on microblog feeds. In: WAIM, pp. 70–82 (2015)
Metadaten
Titel
A probabilistic method for emerging topic tracking in Microblog stream
verfasst von
Jiajia Huang
Min Peng
Hua Wang
Jinli Cao
Wang Gao
Xiuzhen Zhang
Publikationsdatum
21.04.2016
Verlag
Springer US
Erschienen in
World Wide Web / Ausgabe 2/2017
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-016-0390-4

Weitere Artikel der Ausgabe 2/2017

World Wide Web 2/2017 Zur Ausgabe