Skip to main content
Erschienen in: Journal of Intelligent Information Systems 1/2021

25.05.2020

Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm

verfasst von: Di Wu, Ruixin Yang, Chao Shen

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 1/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Latent Dirichlet Allocation (LDA) topic model is a popular research topic in the field of text mining. In this paper, Sentiment Word Co-occurrence and Knowledge Pair Feature Extraction based LDA Short Text Clustering Algorithm (SKP-LDA) is proposed. A definition of a word bag based on sentiment word co-occurrence is proposed. The co-occurrence of emotional words takes full account of different short texts. Then, the short texts of a microblog are endowed with emotional polarity. Furthermore, the knowledge pairs of topic special words and topic relation words are extracted and inserted into the LDA model for clustering. Thus, semantic information can be found more accurately. Then, the hidden n topics and Top30 special words set of each topic are extracted from the knowledge pair set. Finally, via LDA topic model primary clustering, a Top30 topic special words set is obtained that is clustered by K-means secondary clustering. The clustering center is optimized iteratively. Comparing with JST, LSM, LTM and ELDA, SKP-LDA performs better in terms of Accuracy, Precision, Recall and F-measure. The experimental results show that SKP-LDA reveals better semantic analysis ability and emotional topic clustering effect. It can be applied to the field of micro-blog to improve the accuracy of network public opinion analysis effectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Blei, D.M., Ng, A.Y., & Jordan, M.I. (2003). Latent Dirichlet allocation[J]. Journal of Machine Learning Research Archive, 3, 993–1022.MATH Blei, D.M., Ng, A.Y., & Jordan, M.I. (2003). Latent Dirichlet allocation[J]. Journal of Machine Learning Research Archive, 3, 993–1022.MATH
Zurück zum Zitat Chang, P., & Ma, H. (2011). Efficient short texts keyword extraction method analysis[j]. Computer Engineering & Applications, 47(20), 126–128,154. Chang, P., & Ma, H. (2011). Efficient short texts keyword extraction method analysis[j]. Computer Engineering & Applications, 47(20), 126–128,154.
Zurück zum Zitat Chen, Z., & Liu, B. Topic modeling using topics from many domains, lifelong learning and big data[C]. Chen, Z., & Liu, B. Topic modeling using topics from many domains, lifelong learning and big data[C].
Zurück zum Zitat Hao, J., Xie, J., Su, J.Q., & et al. (2016). An unsupervised approach for sentiment classification based on weighted latent Dirichlet allocation [J]. CAAI Transactions on Intelligent Systems, 11(4), 539–545. Hao, J., Xie, J., Su, J.Q., & et al. (2016). An unsupervised approach for sentiment classification based on weighted latent Dirichlet allocation [J]. CAAI Transactions on Intelligent Systems, 11(4), 539–545.
Zurück zum Zitat He, Y. (2011). Latent sentiment model for weakly-supervised crosslingual sentiment classification[J]. Advances in Information Retrieval, 6611, 214–225.CrossRef He, Y. (2011). Latent sentiment model for weakly-supervised crosslingual sentiment classification[J]. Advances in Information Retrieval, 6611, 214–225.CrossRef
Zurück zum Zitat Huang, F.L., Yu, G., Zhang, J.L., & et al. (2017). Mining topic sentiment in micro-blogging based on micro-blogger social relation [J]. Journal of Software, 28(3), 694–707. Huang, F.L., Yu, G., Zhang, J.L., & et al. (2017). Mining topic sentiment in micro-blogging based on micro-blogger social relation [J]. Journal of Software, 28(3), 694–707.
Zurück zum Zitat Kozlowski, M., & Rybinski, H. (2019). Clustering of semantically enriched short texts[J]. Journal of Intelligent Information Systems, 53(1), 69–92.CrossRef Kozlowski, M., & Rybinski, H. (2019). Clustering of semantically enriched short texts[J]. Journal of Intelligent Information Systems, 53(1), 69–92.CrossRef
Zurück zum Zitat Lin, C., & He, Y. (2009). Joint sentiment topic model for sentiment analysis[C]. In Proceedings of the 18th ACM conference on information and knowledge management (pp. 375–384). New York: ACM Press. Lin, C., & He, Y. (2009). Joint sentiment topic model for sentiment analysis[C]. In Proceedings of the 18th ACM conference on information and knowledge management (pp. 375–384). New York: ACM Press.
Zurück zum Zitat Liu, B.Y., Wang, C.R., Wang, C., & et al. (2017). Micro-blog community discovery algorithm based on dynamic topic model with multidimensional data fusion[J]. Journal of Software, 28(2), 246–261. Liu, B.Y., Wang, C.R., Wang, C., & et al. (2017). Micro-blog community discovery algorithm based on dynamic topic model with multidimensional data fusion[J]. Journal of Software, 28(2), 246–261.
Zurück zum Zitat Liu, Z., Liu, C.Y., Xia, B., & Li, T. (2018). Multiple relational topic modeling for noisy short texts[J]. International Journal of Software Engineering and Knowledge Engineering, 28(11), 1559–1574.CrossRef Liu, Z., Liu, C.Y., Xia, B., & Li, T. (2018). Multiple relational topic modeling for noisy short texts[J]. International Journal of Software Engineering and Knowledge Engineering, 28(11), 1559–1574.CrossRef
Zurück zum Zitat Lu, L., Fuxi, Z., Rong, G., & et al. (2018). Point of interest joint recommendation method based on user-content topic model[J]. Computer Engineering & Applications, 4, 154–159. Lu, L., Fuxi, Z., Rong, G., & et al. (2018). Point of interest joint recommendation method based on user-content topic model[J]. Computer Engineering & Applications, 4, 154–159.
Zurück zum Zitat Peng, M., Huang, J.J., Zhu, J.H., & et al. (2015). Mass of short texts clustering and topic extraction based on frequent itemsets[J]. Journal of Computer Research & Development, 52(9), 1941–1953. Peng, M., Huang, J.J., Zhu, J.H., & et al. (2015). Mass of short texts clustering and topic extraction based on frequent itemsets[J]. Journal of Computer Research & Development, 52(9), 1941–1953.
Zurück zum Zitat Qi, J., Xun, L., Zhou, X., & et al. (2018). Micro-blog user community discovery using generalized simrank edge weighting method[J]. PLoS ONE, 13(5). Qi, J., Xun, L., Zhou, X., & et al. (2018). Micro-blog user community discovery using generalized simrank edge weighting method[J]. PLoS ONE, 13(5).
Zurück zum Zitat Qu, J., Chen, Z., & Zheng, Y. (2018). Research on the text clustering method of science and technology reports based on the topic model[J]. Library & Information Service. Qu, J., Chen, Z., & Zheng, Y. (2018). Research on the text clustering method of science and technology reports based on the topic model[J]. Library & Information Service.
Zurück zum Zitat Shams, M., & Baraani-Dastjerdi, A. (2017). Enriched LDA (ELDA): combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction[J]. Expert Systems with Applications, 80, 136–146.CrossRef Shams, M., & Baraani-Dastjerdi, A. (2017). Enriched LDA (ELDA): combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction[J]. Expert Systems with Applications, 80, 136–146.CrossRef
Zurück zum Zitat Sun, Y., & Zhou, X.G. (2013). Unsupervised topic and sentiment unification model for sentiment analysis[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 49(1), 102–108. Sun, Y., & Zhou, X.G. (2013). Unsupervised topic and sentiment unification model for sentiment analysis[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 49(1), 102–108.
Zurück zum Zitat Tago, K., & Jin, Q. (2018). Influence analysis of emotional behaviors and user relationships based on twitter data[J]. Tsinghua Science & Technology, 23(1), 104–113.CrossRef Tago, K., & Jin, Q. (2018). Influence analysis of emotional behaviors and user relationships based on twitter data[J]. Tsinghua Science & Technology, 23(1), 104–113.CrossRef
Zurück zum Zitat Wan, H.X., & Peng, Y. (2018). Topic words extraction of social media based on semantic constrained and time associated LDA[J]. Journal of Chinese Computer Systems, 39(4), 742–747. Wan, H.X., & Peng, Y. (2018). Topic words extraction of social media based on semantic constrained and time associated LDA[J]. Journal of Chinese Computer Systems, 39(4), 742–747.
Zurück zum Zitat Wang, X.W., & Zhang, K. (2012). Improved expansion algorithm based on co-occurrence relationship between short text feature[J]. Journal of Henan University of Urban Construction, 21(4), 48–50. Wang, X.W., & Zhang, K. (2012). Improved expansion algorithm based on co-occurrence relationship between short text feature[J]. Journal of Henan University of Urban Construction, 21(4), 48–50.
Zurück zum Zitat Xiong, S., Wang, K., Ji, D., & et al. (2018). A Short text sentiment-topic model for product reviews[J]. Neurocomputing, 297, 94–102.CrossRef Xiong, S., Wang, K., Ji, D., & et al. (2018). A Short text sentiment-topic model for product reviews[J]. Neurocomputing, 297, 94–102.CrossRef
Zurück zum Zitat Yong, M.C., Qing, C., School, B, & et al. (2018). Chinese short text topic analysis by latent Dirichlet allocation model with co-word network analysis[J]. Journal of the China Society for Scientific and Technical Information, 37(3), 305–317. Yong, M.C., Qing, C., School, B, & et al. (2018). Chinese short text topic analysis by latent Dirichlet allocation model with co-word network analysis[J]. Journal of the China Society for Scientific and Technical Information, 37(3), 305–317.
Metadaten
Titel
Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm
verfasst von
Di Wu
Ruixin Yang
Chao Shen
Publikationsdatum
25.05.2020
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 1/2021
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-020-00597-7

Weitere Artikel der Ausgabe 1/2021

Journal of Intelligent Information Systems 1/2021 Zur Ausgabe

Premium Partner