Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2020

01.12.2020 | Original Article

Topics extraction in incremental short texts based on LSTM

verfasst von: Xubo Zhang, Li Zhang

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the development of online social media, the topic extraction of short text has become an important research field. How to extract the topic, especially new topics that have not been recognized, from increasing and updated short texts has attracted the attention of scholars. This paper focuses on constructing a system based on long short-term memory (LSTM) model in deep learning. Firstly, the short text is converted to a word vector matrix by the word2vec model. After that, two models based on LSTM were designed. One is used to recognize whether the text belongs to an existing topic or a new one. The other identifies whether two text samples belong to the same topic or not. Finally, a hierarchical clustering model is used to find the number of new topics based on the output information of the two LSTM models. The experimental results show that the system constructed in this paper can identify new text topics well and achieve good algorithm performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abhijit B, Terrance B (2015) Towards open world recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1893–1902 Abhijit B, Terrance B (2015) Towards open world recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1893–1902
Zurück zum Zitat Ayon D (2016) Machine learning algorithms: a review. Int J Comput Sci Inf Technol 7(3):1174–1179MathSciNet Ayon D (2016) Machine learning algorithms: a review. Int J Comput Sci Inf Technol 7(3):1174–1179MathSciNet
Zurück zum Zitat Bennett KP, Demiriz A (1999) Semi-supervised support vector machines. In: Advances in neural Information processing systems, pp 368–374 Bennett KP, Demiriz A (1999) Semi-supervised support vector machines. In: Advances in neural Information processing systems, pp 368–374
Zurück zum Zitat Daniel R, Christopher DM, Susan D (2011) Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ‘11). ACM, New York, pp 457–465 Daniel R, Christopher DM, Susan D (2011) Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ‘11). ACM, New York, pp 457–465
Zurück zum Zitat David MB, Andrew YN, Michael IJ (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH David MB, Andrew YN, Michael IJ (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATH
Zurück zum Zitat de Souza JV, Gomes J Jr, de Souza-Filho FM et al (2020) A systematic mapping on automatic classification of fake news in social media. Soc Netw Anal Min 10:48CrossRef de Souza JV, Gomes J Jr, de Souza-Filho FM et al (2020) A systematic mapping on automatic classification of fake news in social media. Soc Netw Anal Min 10:48CrossRef
Zurück zum Zitat Gowda KC, Krishna G (1978) Agglomerative clustering using the concept of mutual nearest neighbourhood. Pattern Recognit 10(2):105–112CrossRef Gowda KC, Krishna G (1978) Agglomerative clustering using the concept of mutual nearest neighbourhood. Pattern Recognit 10(2):105–112CrossRef
Zurück zum Zitat Grégoire M, Tomas M, Marc’Aurelio R, Yoshua B (2014) Ensemble of generative and discriminative techniques for sentiment analysis of movie reviews. arXiv:1412.5335 Grégoire M, Tomas M, Marc’Aurelio R, Yoshua B (2014) Ensemble of generative and discriminative techniques for sentiment analysis of movie reviews. arXiv:​1412.​5335
Zurück zum Zitat Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
Zurück zum Zitat Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on Uncertainty in artificial intelligence. [S.l.]: Morgan Kaufmann Publishers Inc., pp 289–296 Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on Uncertainty in artificial intelligence. [S.l.]: Morgan Kaufmann Publishers Inc., pp 289–296
Zurück zum Zitat Huang B, Carley KM (2020) Discover your social identity from what you tweet: a content based approach. In: Shu K, Wang S, Lee D, Liu H (eds) Disinformation, misinformation, and fake news in social media. Lecture notes in social networks. Springer, Cham Huang B, Carley KM (2020) Discover your social identity from what you tweet: a content based approach. In: Shu K, Wang S, Lee D, Liu H (eds) Disinformation, misinformation, and fake news in social media. Lecture notes in social networks. Springer, Cham
Zurück zum Zitat Imon B, Yuan L, Matthew CC, Sadid AH, Curtis PL, Nathaniel M, Brian C, Timothy A, David M, Daniel LR, Oladimeji Frri, Matthew PL (2019) Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification. Artif Intell Med 97:79–88. ISSN 0933-3657 Imon B, Yuan L, Matthew CC, Sadid AH, Curtis PL, Nathaniel M, Brian C, Timothy A, David M, Daniel LR, Oladimeji Frri, Matthew PL (2019) Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification. Artif Intell Med 97:79–88. ISSN 0933-3657
Zurück zum Zitat Interdonato R, Guillaume J, Doucet A (2019) A lightweight and multilingual framework for crisis information extraction from Twitter data. Soc Netw Anal Min 9:65CrossRef Interdonato R, Guillaume J, Doucet A (2019) A lightweight and multilingual framework for crisis information extraction from Twitter data. Soc Netw Anal Min 9:65CrossRef
Zurück zum Zitat James M et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297, Oakland, CA, USA James M et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297, Oakland, CA, USA
Zurück zum Zitat Jaradat S, Matskin M (2019) On Dynamic Topic Models for Mining Social Media. In: Agarwal N, Dokoohaki N, Tokdemir S (eds) Emerging research challenges and opportunities in computational social network analysis and mining. Lecture notes in social networks. Springer, Cham Jaradat S, Matskin M (2019) On Dynamic Topic Models for Mining Social Media. In: Agarwal N, Dokoohaki N, Tokdemir S (eds) Emerging research challenges and opportunities in computational social network analysis and mining. Lecture notes in social networks. Springer, Cham
Zurück zum Zitat Jaradat S, Dokoohaki N, Matskin M, Ferrari E (2018) Learning what to share in online social networks using deep reinforcement learning. In: Özyer T, Alhajj R (eds) Machine learning techniques for online social networks. Lecture notes in social networks. Springer, Cham Jaradat S, Dokoohaki N, Matskin M, Ferrari E (2018) Learning what to share in online social networks using deep reinforcement learning. In: Özyer T, Alhajj R (eds) Machine learning techniques for online social networks. Lecture notes in social networks. Springer, Cham
Zurück zum Zitat Ji J, Luo C, Chen X, Yu L, Li P (2018) Cross-domain sentiment classification via a bifurcated-LSTM. In: Advances in knowledge discovery and data mining, pp 681–693 Ji J, Luo C, Chen X, Yu L, Li P (2018) Cross-domain sentiment classification via a bifurcated-LSTM. In: Advances in knowledge discovery and data mining, pp 681–693
Zurück zum Zitat Josien PWP, Maintz JBA, Viergever MA (2000) Image registration by maximization of combined mutual information and gradient information. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 452–461 Josien PWP, Maintz JBA, Viergever MA (2000) Image registration by maximization of combined mutual information and gradient information. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 452–461
Zurück zum Zitat Kejriwal M, Zhou P (2020) On detecting urgency in short crisis messages using minimal supervision and transfer learning. Soc Netw Anal Min 10:58CrossRef Kejriwal M, Zhou P (2020) On detecting urgency in short crisis messages using minimal supervision and transfer learning. Soc Netw Anal Min 10:58CrossRef
Zurück zum Zitat Kušen E, Strembeck M, Conti M (2019) Emotional valence shifts and user behavior on Twitter, Facebook, and YouTube. In: Kaya M, Alhajj R (eds) Influence and behavior analysis in social networks and social media. ASONAM 2018. Lecture notes in social networks. Springer, Cham Kušen E, Strembeck M, Conti M (2019) Emotional valence shifts and user behavior on Twitter, Facebook, and YouTube. In: Kaya M, Alhajj R (eds) Influence and behavior analysis in social networks and social media. ASONAM 2018. Lecture notes in social networks. Springer, Cham
Zurück zum Zitat Lawrence S, Giles CL, Tsoi AC, Back AD (1997) Face recognition: a convolutional neural-network approach. IEEE Trans Neural Netw 8(1):98–113CrossRef Lawrence S, Giles CL, Tsoi AC, Back AD (1997) Face recognition: a convolutional neural-network approach. IEEE Trans Neural Netw 8(1):98–113CrossRef
Zurück zum Zitat Lilleberg J, Zhu Y, Zhang Y (2015) Support vector machines and Word2vec for text classification with semantic features. In: IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC), Beijing, pp 136–140 Lilleberg J, Zhu Y, Zhang Y (2015) Support vector machines and Word2vec for text classification with semantic features. In: IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC), Beijing, pp 136–140
Zurück zum Zitat Ombabi AH, Ouarda W, Alimi AM (2020) Deep learning CNN–LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Soc Netw Anal Min 10:53CrossRef Ombabi AH, Ouarda W, Alimi AM (2020) Deep learning CNN–LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Soc Netw Anal Min 10:53CrossRef
Zurück zum Zitat Omri K, Adir C, Noam M, Michael R, Jonathan B (2018) Text segmentation as a supervised learning task. arXiv preprint arXiv:1803.09337 Omri K, Adir C, Noam M, Michael R, Jonathan B (2018) Text segmentation as a supervised learning task. arXiv preprint arXiv:​1803.​09337
Zurück zum Zitat Park K, Kim T, Yoon S, Cha M, Jung K (2020) BaitWatcher: A Lightweight Web Interface for the Detection of Incongruent News Headlines. In: Shu K, Wang S, Lee D, Liu H (eds) Disinformation, misinformation, and fake news in social media. Lecture notes in social networks. Springer, Cham Park K, Kim T, Yoon S, Cha M, Jung K (2020) BaitWatcher: A Lightweight Web Interface for the Detection of Incongruent News Headlines. In: Shu K, Wang S, Lee D, Liu H (eds) Disinformation, misinformation, and fake news in social media. Lecture notes in social networks. Springer, Cham
Zurück zum Zitat Qiang J, Qian Z, Li Y, Yuan Y, Wu X (2019) Short text topic modeling techniques, applications, and performance: a survey. arXiv:1904.07695 Qiang J, Qian Z, Li Y, Yuan Y, Wu X (2019) Short text topic modeling techniques, applications, and performance: a survey. arXiv:​1904.​07695
Zurück zum Zitat Rokach L, Maimon O (2005) Top-down induction of decision trees classifiers—a survey. IEEE Trans Syst Man Cybern Part C (Appl Rev) 35(4):476–487CrossRef Rokach L, Maimon O (2005) Top-down induction of decision trees classifiers—a survey. IEEE Trans Syst Man Cybern Part C (Appl Rev) 35(4):476–487CrossRef
Zurück zum Zitat Rosenberg C, Hebert M, Schneiderman H (2005) Semi-supervised self-training of object detection models. WACV/MOTION Rosenberg C, Hebert M, Schneiderman H (2005) Semi-supervised self-training of object detection models. WACV/MOTION
Zurück zum Zitat Santos G, Mota VFS, Benevenuto F et al (2020) Neutrality may matter: sentiment analysis in reviews of Airbnb, Booking, and Couchsurfing in Brazil and USA. Soc Netw Anal Min 10:45CrossRef Santos G, Mota VFS, Benevenuto F et al (2020) Neutrality may matter: sentiment analysis in reviews of Airbnb, Booking, and Couchsurfing in Brazil and USA. Soc Netw Anal Min 10:45CrossRef
Zurück zum Zitat Shwartz SS, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the 24th international conference on machine learning, Corvallis, OR Shwartz SS, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the 24th international conference on machine learning, Corvallis, OR
Zurück zum Zitat Su Z, Xu H, Zhang D, Xu Y (2014) Chinese sentiment classification using a neural network tool—Word2vec. In: International Conference on Multisensor Fusion and Information Integration for Intelligent Systems (MFI), Beijing, pp 1–6 Su Z, Xu H, Zhang D, Xu Y (2014) Chinese sentiment classification using a neural network tool—Word2vec. In: International Conference on Multisensor Fusion and Information Integration for Intelligent Systems (MFI), Beijing, pp 1–6
Zurück zum Zitat Walter JS, de Rezende Rocha A, Sapkota A, Boult TE (2013) Toward open set recognition. IEEE Trans Pattern Anal Mach Intell 35(7):1757–1772CrossRef Walter JS, de Rezende Rocha A, Sapkota A, Boult TE (2013) Toward open set recognition. IEEE Trans Pattern Anal Mach Intell 35(7):1757–1772CrossRef
Zurück zum Zitat Xue B, Fu C, Shaobin Z (2014) A study on sentiment computing and classification of Sina Weibo with Word2vec. In: IEEE International Congress on Big Data, Anchorage, AK, pp 358–363 Xue B, Fu C, Shaobin Z (2014) A study on sentiment computing and classification of Sina Weibo with Word2vec. In: IEEE International Congress on Big Data, Anchorage, AK, pp 358–363
Zurück zum Zitat Yanming H, Jiang Y, Hasan T, Jiang Q, Li C (2018) A topic BiLSTM model for sentiment classification. In: Proceedings of the 2nd International Conference on Innovation in Artificial Intelligence (ICIAI ‘18). ACM, New York, pp 143–147 Yanming H, Jiang Y, Hasan T, Jiang Q, Li C (2018) A topic BiLSTM model for sentiment classification. In: Proceedings of the 2nd International Conference on Innovation in Artificial Intelligence (ICIAI ‘18). ACM, New York, pp 143–147
Zurück zum Zitat Zhang X, Zhao J, LeCun Y (2015a) Character-level convolutional networks for text classification. Adv Neural Inf Process Syst 28:649–657 Zhang X, Zhao J, LeCun Y (2015a) Character-level convolutional networks for text classification. Adv Neural Inf Process Syst 28:649–657
Zurück zum Zitat Zhang D, Hua X, Zengcai S, Yunfeng X (2015b) Chinese comments sentiment classification based on word2vec and SVMperf. Expert Syst Appl 42(4):1857–1863CrossRef Zhang D, Hua X, Zengcai S, Yunfeng X (2015b) Chinese comments sentiment classification based on word2vec and SVMperf. Expert Syst Appl 42(4):1857–1863CrossRef
Metadaten
Titel
Topics extraction in incremental short texts based on LSTM
verfasst von
Xubo Zhang
Li Zhang
Publikationsdatum
01.12.2020
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2020
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-020-00699-8

Weitere Artikel der Ausgabe 1/2020

Social Network Analysis and Mining 1/2020 Zur Ausgabe

Premium Partner