Skip to main content

2017 | OriginalPaper | Buchkapitel

A CWTM Model of Topic Extraction for Short Text

verfasst von : Yunlan Diao, Yajun Du, Pan Xiao, Jia Liu

Erschienen in: Knowledge Graph and Semantic Computing. Language, Knowledge, and Intelligence

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The topic model is designed to find potential topics from the massive micro-blog data. On the one hand, the extraction of potential topics contributes to the next analysis. On the other hand, because of the particularity of the data, we can not deal with it directly with the traditional topic model algorithm. In the field of data mining, although the traditional text topic mining has been widely studied, a short text like micro-blog has the distinctive characteristics of network languages and emerging novel words. Owning to the short message, the sparsity of data and incomplete description, the micro-blog can not be obtained efficiently. In this paper, we propose a simple, fast, and effective topic model for short texts, named couple-word topic model (CWTM). Based on Dirichlet Multinomial Mixture (DMM) model, it can leverage couple word co-occurrence to help distill better topics over short texts instead of the traditional word co-occurrence way. The method can alleviate the data sparseness problems, improve the performance of the model and adopt the Gibbs sampling algorithm to derive parameters. Through extensive experiments on two real-world short text collections, we find that CWTM achieves comparable or better topic representations than traditional topic model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Weng, J., Lim, E.-P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: WSDM (2010) Weng, J., Lim, E.-P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: WSDM (2010)
2.
Zurück zum Zitat Wang, X., Zhai, C., Hu, X., Sproat, R.: Mining correlated bursty topic patterns from coordinated text streams. In: SIGKDD (2007) Wang, X., Zhai, C., Hu, X., Sproat, R.: Mining correlated bursty topic patterns from coordinated text streams. In: SIGKDD (2007)
3.
Zurück zum Zitat Xiaohui, Y., Jiafeng, G., Yanyan, L.: A biterm topic model for short texts. In: WWW, pp. 13–17 (2003) Xiaohui, Y., Jiafeng, G., Yanyan, L.: A biterm topic model for short texts. In: WWW, pp. 13–17 (2003)
4.
Zurück zum Zitat Blei, D., McAuliffe, J.: Supervised topic models. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems 20, pp. 121–128. MIT Press, Cambridge (2008) Blei, D., McAuliffe, J.: Supervised topic models. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems 20, pp. 121–128. MIT Press, Cambridge (2008)
5.
Zurück zum Zitat Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR (1999) Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR (1999)
6.
Zurück zum Zitat Ma, Z., Sun, A., Yuan, Q., Cong, G.: Topic-driven reader comments summarization. In: CIKM (2012) Ma, Z., Sun, A., Yuan, Q., Cong, G.: Topic-driven reader comments summarization. In: CIKM (2012)
7.
Zurück zum Zitat Ramage, D., Dumais, S., Liebling, D.: Characterizing microblogs with topic models. In: International AAAI Conference on Weblogs and Social Media, vol. 5, pp. 130–137 (2010) Ramage, D., Dumais, S., Liebling, D.: Characterizing microblogs with topic models. In: International AAAI Conference on Weblogs and Social Media, vol. 5, pp. 130–137 (2010)
8.
Zurück zum Zitat Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: UAI (2004) Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: UAI (2004)
9.
Zurück zum Zitat Chen, J., Nairn, R., Nelson, L., Bernstein, M., Chi, E.: Short and tweet: experiments on recommending content from information streams. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems, pp. 1185–1194. ACM (2010) Chen, J., Nairn, R., Nelson, L., Bernstein, M., Chi, E.: Short and tweet: experiments on recommending content from information streams. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems, pp. 1185–1194. ACM (2010)
10.
Zurück zum Zitat Wang, Y., Agichtein, E., Benzi, M.: TM-LDA: efficient online modeling of latent topic transitions in social media. In: Proceedings of the 18th ACM SIGKDD, New York, pp. 123–131. ACM (2012) Wang, Y., Agichtein, E., Benzi, M.: TM-LDA: efficient online modeling of latent topic transitions in social media. In: Proceedings of the 18th ACM SIGKDD, New York, pp. 123–131. ACM (2012)
11.
Zurück zum Zitat Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: SIGIR (2010) Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: SIGIR (2010)
12.
Zurück zum Zitat Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRef Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRef
13.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
14.
Zurück zum Zitat Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101 (2004) Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101 (2004)
15.
Zurück zum Zitat Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. In: ICWSM (2010) Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. In: ICWSM (2010)
16.
Zurück zum Zitat Quan, X., Kit, C., Ge, Y., Pan, S.J.: Short and sparse text topic modeling via self-aggregation. In: AAAI (2015) Quan, X., Kit, C., Ge, Y., Pan, S.J.: Short and sparse text topic modeling via self-aggregation. In: AAAI (2015)
17.
Zurück zum Zitat Lin, C.X., Zhao, B., Mei, Q., Han, J.: PET: a statistical model for popular events tracking in social communities. In: Proceedings of the 16th ACM SIGKDD, pp. 929–938. ACM (2010) Lin, C.X., Zhao, B., Mei, Q., Han, J.: PET: a statistical model for popular events tracking in social communities. In: Proceedings of the 16th ACM SIGKDD, pp. 929–938. ACM (2010)
18.
Zurück zum Zitat Weng, J., Lim, E., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 261–270. ACM (2010) Weng, J., Lim, E., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 261–270. ACM (2010)
19.
Zurück zum Zitat Zhai, K., Boyd-Graber, J.L.: Online latent dirichlet allocation with infinite vocabulary. In: ICML, vol. 28, no. 1, pp. 561–569 (2013). JMLR Proceedings. JMLR.org Zhai, K., Boyd-Graber, J.L.: Online latent dirichlet allocation with infinite vocabulary. In: ICML, vol. 28, no. 1, pp. 561–569 (2013). JMLR Proceedings. JMLR.​org
20.
Zurück zum Zitat Zhao, W., Jiang, J., Weng, J., He, J., Lim, E., Yan, H., Li, X.: Comparing twitter and traditional media using topic models. In: Advances in Information Retrieval, pp. 338–349 (2011) Zhao, W., Jiang, J., Weng, J., He, J., Lim, E., Yan, H., Li, X.: Comparing twitter and traditional media using topic models. In: Advances in Information Retrieval, pp. 338–349 (2011)
21.
Zurück zum Zitat Phelan, O., McCarthy, K., Smyth, B.: Using twitter to recommend real-time topical news. In: Proceedings of the Third ACM Conference on Recommender Systems, New York, pp. 385–388. ACM (2009) Phelan, O., McCarthy, K., Smyth, B.: Using twitter to recommend real-time topical news. In: Proceedings of the Third ACM Conference on Recommender Systems, New York, pp. 385–388. ACM (2009)
22.
Zurück zum Zitat Hong, L., Davison, B.: Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88. ACM (2010) Hong, L., Davison, B.: Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88. ACM (2010)
23.
Zurück zum Zitat Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408. ACM (2015) Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408. ACM (2015)
Metadaten
Titel
A CWTM Model of Topic Extraction for Short Text
verfasst von
Yunlan Diao
Yajun Du
Pan Xiao
Jia Liu
Copyright-Jahr
2017
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-7359-5_9

Premium Partner