Skip to main content
Top

2016 | OriginalPaper | Chapter

SBTM: Topic Modeling over Short Texts

Authors : Jianhui Pang, Xiangsheng Li, Haoran Xie, Yanghui Rao

Published in: Database Systems for Advanced Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the rapid development of social media services such as Twitter, Sina Weibo and so forth, short texts are becoming more and more prevalent. However, inferring topics from short texts is always full of challenges for many content analysis tasks because of the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a classification model named sentimental biterm topic model (SBTM), which is applied to sentiment classification over short texts. To alleviate the problem of sparsity in short texts, the similarity between words and documents are firstly estimated by singular value decomposition. Then, the most similar words are added to each short document in the corpus. Extensive evaluations on sentiment detection of short text validate the effectiveness of the proposed method.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 787–788. ACM (2007) Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 787–788. ACM (2007)
2.
go back to reference Bao, S., Xu, S., Zhang, L., Yan, R., Su, Z., Han, D., Yu, Y.: Mining social emotions from affective text. IEEE Trans. Knowl. Data Eng. 24(9), 1658–1670 (2012)CrossRef Bao, S., Xu, S., Zhang, L., Yan, R., Su, Z., Han, D., Yu, Y.: Mining social emotions from affective text. IEEE Trans. Knowl. Data Eng. 24(9), 1658–1670 (2012)CrossRef
3.
go back to reference Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
4.
go back to reference Cheng, X., Lan, Y., Guo, J., Yan, X.: Btm: Topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26(12), 2928–2941 (2014)CrossRef Cheng, X., Lan, Y., Guo, J., Yan, X.: Btm: Topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26(12), 2928–2941 (2014)CrossRef
5.
go back to reference Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRef Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRef
6.
go back to reference Gangemi, A., Presutti, V., Reforgiato Recupero, D.: Frame-based detection of opinion holders and topics: a model and a tool. IEEE Comput. Intell. Mag. 9(1), 20–30 (2014)CrossRef Gangemi, A., Presutti, V., Reforgiato Recupero, D.: Frame-based detection of opinion holders and topics: a model and a tool. IEEE Comput. Intell. Mag. 9(1), 20–30 (2014)CrossRef
7.
go back to reference Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6(6), 721–741 (1984)CrossRefMATH Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6(6), 721–741 (1984)CrossRefMATH
8.
go back to reference Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 50–57 (1999) Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 50–57 (1999)
9.
go back to reference Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), pp. 775–784. ACM (2011) Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), pp. 775–784. ACM (2011)
10.
go back to reference Katz, P., Singleton, M., Wicentowski, R.: Swat-mp: The semeval-2007 systems for task 5 and task 14. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval), pp. 308–313. Association for Computational Linguistics (2007) Katz, P., Singleton, M., Wicentowski, R.: Swat-mp: The semeval-2007 systems for task 5 and task 14. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval), pp. 308–313. Association for Computational Linguistics (2007)
11.
go back to reference Kim, S.B., Han, K.S., Rim, H.C., Myaeng, S.H.: Some effective techniques for naive bayes text classification. IEEE Trans. Knowl. Data Eng. 18(11), 1457–1466 (2006)CrossRef Kim, S.B., Han, K.S., Rim, H.C., Myaeng, S.H.: Some effective techniques for naive bayes text classification. IEEE Trans. Knowl. Data Eng. 18(11), 1457–1466 (2006)CrossRef
12.
go back to reference Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86 (2002) Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86 (2002)
13.
go back to reference Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web (WWW), pp. 91–100. ACM (2008) Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web (WWW), pp. 91–100. ACM (2008)
14.
go back to reference Rao, Y., Lei, J., Liu, W., Li, Q., Chen, M.: Building emotional dictionary for sentiment analysis of online news. World Wide Web J. 17(4), 723–742 (2014)CrossRef Rao, Y., Lei, J., Liu, W., Li, Q., Chen, M.: Building emotional dictionary for sentiment analysis of online news. World Wide Web J. 17(4), 723–742 (2014)CrossRef
15.
go back to reference Rao, Y., Li, Q., Liu, W., Wu, Q., Quan, X.: Affective topic model for social emotion detection. Neural Netw. 58(5), 29–37 (2014)CrossRef Rao, Y., Li, Q., Liu, W., Wu, Q., Quan, X.: Affective topic model for social emotion detection. Neural Netw. 58(5), 29–37 (2014)CrossRef
16.
go back to reference Rao, Y., Li, Q., Mao, X., Liu, W.: Sentiment topic models for social emotion mining. Inf. Sci. 266(5), 90–100 (2014)CrossRef Rao, Y., Li, Q., Mao, X., Liu, W.: Sentiment topic models for social emotion mining. Inf. Sci. 266(5), 90–100 (2014)CrossRef
17.
go back to reference Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web (WWW), pp. 377–386(2006) Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web (WWW), pp. 377–386(2006)
18.
go back to reference Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast–but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 254–263. Association for Computational Linguistics (2008) Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast–but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 254–263. Association for Computational Linguistics (2008)
19.
go back to reference Stoyanov, V., Cardie, C.: Annotating topics of opinions. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), pp. 3213–3217 (2008) Stoyanov, V., Cardie, C.: Annotating topics of opinions. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), pp. 3213–3217 (2008)
20.
go back to reference Strapparava, C., Mihalcea, R.: Semeval-2007 task 14: affective text. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval), pp. 70–74. Association for Computational Linguistics (2007) Strapparava, C., Mihalcea, R.: Semeval-2007 task 14: affective text. In: Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval), pp. 70–74. Association for Computational Linguistics (2007)
21.
go back to reference Turney, P.D.: Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In: Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL). pp. 417–424 (2002) Turney, P.D.: Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In: Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL). pp. 417–424 (2002)
22.
go back to reference Wang, J., Yao, Y., Liu, Z.: A new text classification method based on hmm-svm. In: International Symposium on Communications and Information Technologies (ISCIT). pp. 1516–1519 (2007) Wang, J., Yao, Y., Liu, Z.: A new text classification method based on hmm-svm. In: International Symposium on Communications and Information Technologies (ISCIT). pp. 1516–1519 (2007)
23.
go back to reference Xie, H., Li, Q., Cai, Y.: Community-aware resource profiling for personalized search in folksonomy. J. Comput. Sci. Technol. 27(3), 599–610 (2012)CrossRefMATH Xie, H., Li, Q., Cai, Y.: Community-aware resource profiling for personalized search in folksonomy. J. Comput. Sci. Technol. 27(3), 599–610 (2012)CrossRefMATH
24.
go back to reference Xie, H., Li, Q., Mao, X., Li, X., Cai, Y., Rao, Y.: Community-aware user profile enrichment in folksonomy. Neural Netw. 58, 111–121 (2014)CrossRef Xie, H., Li, Q., Mao, X., Li, X., Cai, Y., Rao, Y.: Community-aware user profile enrichment in folksonomy. Neural Netw. 58, 111–121 (2014)CrossRef
25.
go back to reference Xie, H., Li, Q., Mao, X., Li, X., Cai, Y., Zheng, Q.: Mining latent user community for tag-based and content-based search in social media. Comput. J. 57(9), 1415–1430 (2014)CrossRef Xie, H., Li, Q., Mao, X., Li, X., Cai, Y., Zheng, Q.: Mining latent user community for tag-based and content-based search in social media. Comput. J. 57(9), 1415–1430 (2014)CrossRef
Metadata
Title
SBTM: Topic Modeling over Short Texts
Authors
Jianhui Pang
Xiangsheng Li
Haoran Xie
Yanghui Rao
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-32055-7_4

Premium Partner