Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2019

01.12.2019 | Original Article

Character level embedding with deep convolutional neural network for text normalization of unstructured data for Twitter sentiment analysis

verfasst von: Monika Arora, Vineet Kansal

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

On social media platforms such as Twitter and Facebook, people express their views, arguments, and emotions of many events in daily life. Twitter is an international microblogging service featuring short messages called “tweets” from different languages. These texts often consist of noise in the form of incorrect grammar, abbreviations, freestyle, and typographical errors. Sentiment analysis (SA) aims to predict the actual emotions from the raw text expressed by the people through the field of natural language processing (NLP). The main aim of our work is to process the raw sentence from the Twitter dataset and find the actual polarity of the message. This paper proposes a text normalization with deep convolutional character level embedding (Conv-char-Emb) neural network model for SA of unstructured data. This model can tackle the problems: (1) processing the noisy sentence for sentiment detection (2) handling small memory space in word level embedded learning (3) accurate sentiment analysis of the unstructured data. The initial preprocessing stage for performing text normalization includes the following steps: tokenization, out of vocabulary (OOV) detection and its replacement, lemmatization and stemming. A character-based embedding in convolutional neural network (CNN) is an effective and efficient technique for SA that uses less learnable parameters in feature representation. Thus, the proposed method performs both the normalization and classification of sentiments for unstructured sentences. The experimental results are evaluated in the Twitter dataset by a different point polarity (positive, negative and neutral). As a result, our model performs well in normalization and sentiment analysis of the raw Twitter data enriched with hidden information.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agarwal B, Ramampiaro H, Langseth H, Ruocco M (2018) A deep network model for paraphrase detection in short text messages. Inf Process Manag 54(6):922–937CrossRef Agarwal B, Ramampiaro H, Langseth H, Ruocco M (2018) A deep network model for paraphrase detection in short text messages. Inf Process Manag 54(6):922–937CrossRef
Zurück zum Zitat Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) A brief survey of text mining: classification, clustering and extraction techniques. arXiv:1707.02919 (arXiv preprint) Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) A brief survey of text mining: classification, clustering and extraction techniques. arXiv:1707.02919 (arXiv preprint)
Zurück zum Zitat Asghar MZ, Kundi FM, Ahmad S, Khan A, Khan F (2018) T-SAF: Twitter sentiment analysis framework using a hybrid classification scheme. Expert Syst 35(1):e12233CrossRef Asghar MZ, Kundi FM, Ahmad S, Khan A, Khan F (2018) T-SAF: Twitter sentiment analysis framework using a hybrid classification scheme. Expert Syst 35(1):e12233CrossRef
Zurück zum Zitat Baecchi C, Uricchio T, Bertini M, Del Bimbo A (2016) A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimed Tools Appl 75(5):2507–2525CrossRef Baecchi C, Uricchio T, Bertini M, Del Bimbo A (2016) A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimed Tools Appl 75(5):2507–2525CrossRef
Zurück zum Zitat Baziotis C, Pelekis N, Doulkeridis C (2017) Datastories at semeval-2017 task 4: deep lstm with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pp. 747–754 Baziotis C, Pelekis N, Doulkeridis C (2017) Datastories at semeval-2017 task 4: deep lstm with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pp. 747–754
Zurück zum Zitat Chen P, Sun Z, Bing L, Yang W (2017) Recurrent attention network on memory for aspect sentiment analysis. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp. 452–461 Chen P, Sun Z, Bing L, Yang W (2017) Recurrent attention network on memory for aspect sentiment analysis. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp. 452–461
Zurück zum Zitat Corrêa EA Jr, Marinho VQ, Santos LB (2017) Nilc-usp at semeval-2017 task 4: a multi-view ensemble for twitter sentiment analysis. arXiv:1704.02263 (arXiv preprint) Corrêa EA Jr, Marinho VQ, Santos LB (2017) Nilc-usp at semeval-2017 task 4: a multi-view ensemble for twitter sentiment analysis. arXiv:1704.02263 (arXiv preprint)
Zurück zum Zitat Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AY, Gelbukh A, Zhou Q (2016) Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cognitive computation 8(4):757–771CrossRef Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AY, Gelbukh A, Zhou Q (2016) Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cognitive computation 8(4):757–771CrossRef
Zurück zum Zitat Dolan B, Quirk C, Brockett C (2004) Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In: Proceedings of the 20th international conference on Computational Linguistics, Association for Computational Linguistics, pp 350 Dolan B, Quirk C, Brockett C (2004) Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In: Proceedings of the 20th international conference on Computational Linguistics, Association for Computational Linguistics, pp 350
Zurück zum Zitat dos Santos C, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, pp 69–78 dos Santos C, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, pp 69–78
Zurück zum Zitat Dragoni M, Federici M, Rexha A (2018) An unsupervised aspect extraction strategy for monitoring real-time reviews stream. Inf Process Manage 56(3):1103–1118CrossRef Dragoni M, Federici M, Rexha A (2018) An unsupervised aspect extraction strategy for monitoring real-time reviews stream. Inf Process Manage 56(3):1103–1118CrossRef
Zurück zum Zitat Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(12) Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(12)
Zurück zum Zitat Hailong Z, Wenyan G, Bo J (2014) Machine learning and lexicon based methods for sentiment classification: a survey. In: Web information system and application conference (WISA), 2014 11th, IEEE, pp 262–265 Hailong Z, Wenyan G, Bo J (2014) Machine learning and lexicon based methods for sentiment classification: a survey. In: Web information system and application conference (WISA), 2014 11th, IEEE, pp 262–265
Zurück zum Zitat Hanafiah N, Kevin A, Sutanto C, Arifin Y, Hartanto J (2017) Text normalization algorithm on Twitter in complaint category. Procedia Comput Sci 116:20–26CrossRef Hanafiah N, Kevin A, Sutanto C, Arifin Y, Hartanto J (2017) Text normalization algorithm on Twitter in complaint category. Procedia Comput Sci 116:20–26CrossRef
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034 He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Zurück zum Zitat Hwang K, Sung W (2017) Character-level language modeling with hierarchical recurrent neural networks. In: IEEE international conference on Acoustics, speech and signal processing (ICASSP), 2017, pp 5720–5724 Hwang K, Sung W (2017) Character-level language modeling with hierarchical recurrent neural networks. In: IEEE international conference on Acoustics, speech and signal processing (ICASSP), 2017, pp 5720–5724
Zurück zum Zitat Jabreel M, Moreno A (2017) SiTAKA at SemEval-2017 task 4: sentiment analysis in twitter based on a rich set of features. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pp 694–699 Jabreel M, Moreno A (2017) SiTAKA at SemEval-2017 task 4: sentiment analysis in twitter based on a rich set of features. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pp 694–699
Zurück zum Zitat Jianqiang Z, Xiaolin G (2017) Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Access 5:2870–2879CrossRef Jianqiang Z, Xiaolin G (2017) Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Access 5:2870–2879CrossRef
Zurück zum Zitat Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882 (arXiv preprint) Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882 (arXiv preprint)
Zurück zum Zitat Ma Y, Peng H, Khan T, Cambria E, Hussain A (2018) Sentic LSTM: a hybrid network for targeted aspect-based sentiment analysis. Cogn Comput 10(4):639–650CrossRef Ma Y, Peng H, Khan T, Cambria E, Hussain A (2018) Sentic LSTM: a hybrid network for targeted aspect-based sentiment analysis. Cogn Comput 10(4):639–650CrossRef
Zurück zum Zitat Martínez-Cámara E, Martín-Valdivia MT, Urena-López LA, Montejo-Ráez AR (2014) Sentiment analysis in Twitter. Nat Lang Eng 20(1):1–28CrossRef Martínez-Cámara E, Martín-Valdivia MT, Urena-López LA, Montejo-Ráez AR (2014) Sentiment analysis in Twitter. Nat Lang Eng 20(1):1–28CrossRef
Zurück zum Zitat Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NIPS'13 Proceedings of the 26th International Conference on Neural Information Processing Systems vol 2, pp 3111–3119 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NIPS'13 Proceedings of the 26th International Conference on Neural Information Processing Systems vol 2, pp 3111–3119
Zurück zum Zitat Mozetič I, Grčar M, Smailović J (2016) Multilingual Twitter sentiment classification: The role of human annotators. PloS One 11(5):e0155036CrossRef Mozetič I, Grčar M, Smailović J (2016) Multilingual Twitter sentiment classification: The role of human annotators. PloS One 11(5):e0155036CrossRef
Zurück zum Zitat Nicolai G, Kondrak G (2016) Leveraging inflection tables for stemming and lemmatization. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pp 1138–1147 Nicolai G, Kondrak G (2016) Leveraging inflection tables for stemming and lemmatization. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pp 1138–1147
Zurück zum Zitat Onyibe C, Habash N (2017) OMAM at SemEval-2017 Task 4: english sentiment analysis with conditional random fields. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pp 670–674 Onyibe C, Habash N (2017) OMAM at SemEval-2017 Task 4: english sentiment analysis with conditional random fields. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pp 670–674
Zurück zum Zitat Ouyang X, Zhou P, Li CH, Liu L (2015) Sentiment analysis using convolutional neural network. In: IEEE International conference on computer and information technology; ubiquitous computing and communications; dependable, autonomic and secure computing; pervasive intelligence and computing (CIT/IUCC/DASC/PICOM), 2015, pp 2359–2364 Ouyang X, Zhou P, Li CH, Liu L (2015) Sentiment analysis using convolutional neural network. In: IEEE International conference on computer and information technology; ubiquitous computing and communications; dependable, autonomic and secure computing; pervasive intelligence and computing (CIT/IUCC/DASC/PICOM), 2015, pp 2359–2364
Zurück zum Zitat Pontiki M, Galanis D, Papageorgiou H, Manandhar S, Androutsopoulos I (2015) Semeval-2015 task 12: aspect based sentiment analysis. In: Proceedings of the 9th international workshop on semantic evaluation, pp 486–495 Pontiki M, Galanis D, Papageorgiou H, Manandhar S, Androutsopoulos I (2015) Semeval-2015 task 12: aspect based sentiment analysis. In: Proceedings of the 9th international workshop on semantic evaluation, pp 486–495
Zurück zum Zitat Prettenhofer P, Stein B (2010) Cross-language text classification using structural correspondence learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 1118–1127 Prettenhofer P, Stein B (2010) Cross-language text classification using structural correspondence learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 1118–1127
Zurück zum Zitat Roccetti M, Prandi C, Salomoni P, Marfia G (2016) Unleashing the true potential of social networks: confirming infliximab medical trials through facebook posts. Netw Model Anal Health Inf Bioinform 5(1):15CrossRef Roccetti M, Prandi C, Salomoni P, Marfia G (2016) Unleashing the true potential of social networks: confirming infliximab medical trials through facebook posts. Netw Model Anal Health Inf Bioinform 5(1):15CrossRef
Zurück zum Zitat Roccetti M, Salomoni P, Prandi C, Marfia G, Mirri S (2017) On the interpretation of the effects of the Infliximab treatment on Crohn’s disease patients from Facebook posts: a human vs. machine comparison. Netw Model Anal Health Inf Bioinform 6(1):11CrossRef Roccetti M, Salomoni P, Prandi C, Marfia G, Mirri S (2017) On the interpretation of the effects of the Infliximab treatment on Crohn’s disease patients from Facebook posts: a human vs. machine comparison. Netw Model Anal Health Inf Bioinform 6(1):11CrossRef
Zurück zum Zitat Rosenthal S, Farra N, Nakov P. SemEval-2017 task 4: sentiment analysis in Twitter. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017) 2017, pp 502–518 Rosenthal S, Farra N, Nakov P. SemEval-2017 task 4: sentiment analysis in Twitter. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017) 2017, pp 502–518
Zurück zum Zitat Ruangkanokmas P, Achalakul T, Akkarajitsakul K (2016) Deep belief networks with feature selection for sentiment classification. In: 7th International conference on intelligent systems, modelling and simulation (ISMS), 2016, pp 9–14 Ruangkanokmas P, Achalakul T, Akkarajitsakul K (2016) Deep belief networks with feature selection for sentiment classification. In: 7th International conference on intelligent systems, modelling and simulation (ISMS), 2016, pp 9–14
Zurück zum Zitat Saif H, Fernández M, He Y, Alani H (2014) On stopwords, filtering and data sparsity for sentiment analysis of Twitter. In: LREC 2014, Ninth International Conference on Language Resources and Evaluation. Proceedings, pp 810–817. Saif H, Fernández M, He Y, Alani H (2014) On stopwords, filtering and data sparsity for sentiment analysis of Twitter. In: LREC 2014, Ninth International Conference on Language Resources and Evaluation. Proceedings, pp 810–817.
Zurück zum Zitat Silva C, Ribeiro B (2003) The importance of stop word removal on recall values in text categorization. In: Proceedings of the international joint conference on neural networks, vol 3, pp 1661–1666 Silva C, Ribeiro B (2003) The importance of stop word removal on recall values in text categorization. In: Proceedings of the international joint conference on neural networks, vol 3, pp 1661–1666
Zurück zum Zitat Singh T, Kumari M (2016) Role of text pre-processing in twitter sentiment analysis. Procedia Comp Sci 89:549–554CrossRef Singh T, Kumari M (2016) Role of text pre-processing in twitter sentiment analysis. Procedia Comp Sci 89:549–554CrossRef
Zurück zum Zitat Vateekul P, Koomsubha T (2016) A study of sentiment analysis using deep learning techniques on Thai Twitter data. In: 13th international joint conference on computer science and software engineering (JCSSE), 2016, pp 1–6 Vateekul P, Koomsubha T (2016) A study of sentiment analysis using deep learning techniques on Thai Twitter data. In: 13th international joint conference on computer science and software engineering (JCSSE), 2016, pp 1–6
Zurück zum Zitat Vechtomova O (2017) Disambiguating context-dependent polarity of words: An information retrieval approach. Inf Process Manag 53(5):1062–1079CrossRef Vechtomova O (2017) Disambiguating context-dependent polarity of words: An information retrieval approach. Inf Process Manag 53(5):1062–1079CrossRef
Zurück zum Zitat Vinodhini G, Chandrasekaran RM (2012) Sentiment analysis and opinion mining: a survey. Int J 2(6):282–292 Vinodhini G, Chandrasekaran RM (2012) Sentiment analysis and opinion mining: a survey. Int J 2(6):282–292
Zurück zum Zitat Wehrmann J, Becker W, Cagnini HE, Barros RC (2017) A character-based convolutional neural network for language-agnostic Twitter sentiment analysis. In: International joint conference on neural networks (IJCNN), 2017, pp. 2384–2391 Wehrmann J, Becker W, Cagnini HE, Barros RC (2017) A character-based convolutional neural network for language-agnostic Twitter sentiment analysis. In: International joint conference on neural networks (IJCNN), 2017, pp. 2384–2391
Zurück zum Zitat Yang Z, Hu Z, Salakhutdinov R, Berg-Kirkpatrick T (2017) Improved variational autoencoders for text modeling using dilated convolutions. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 3881–3890 Yang Z, Hu Z, Salakhutdinov R, Berg-Kirkpatrick T (2017) Improved variational autoencoders for text modeling using dilated convolutions. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 3881–3890
Zurück zum Zitat Yuvaraj N, Sabari A (2017) Twitter sentiment classification using binary shuffled frog algorithm. Intell Autom Soft Comput 23(2):373–381CrossRef Yuvaraj N, Sabari A (2017) Twitter sentiment classification using binary shuffled frog algorithm. Intell Autom Soft Comput 23(2):373–381CrossRef
Zurück zum Zitat Zare M, Rohatgi S (2017) DeepNorm—a deep learning approach to text normalization. arXiv:1712.06994 (arXiv preprint) Zare M, Rohatgi S (2017) DeepNorm—a deep learning approach to text normalization. arXiv:1712.06994 (arXiv preprint)
Zurück zum Zitat Zhang X, LeCun Y (2015) Text understanding from scratch. arXiv:1502.01710 (arXiv preprint) Zhang X, LeCun Y (2015) Text understanding from scratch. arXiv:1502.01710 (arXiv preprint)
Zurück zum Zitat Zhang J, Zong C (2015a). Neural networks in machine translation: an overview. In: IEEE Intell Syst, pp 17241734 Zhang J, Zong C (2015a). Neural networks in machine translation: an overview. In: IEEE Intell Syst, pp 17241734
Zurück zum Zitat Zhang J, Zong C (2015b) Deep neural networks in machine translation: An overview. IEEE Intell Syst 30(5):16–25CrossRef Zhang J, Zong C (2015b) Deep neural networks in machine translation: An overview. IEEE Intell Syst 30(5):16–25CrossRef
Zurück zum Zitat Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In Advances in neural information processing systems, pp. 649–657 Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In Advances in neural information processing systems, pp. 649–657
Zurück zum Zitat Zhou G, Zeng Z, Huang JX, He T (2016) Transfer learning for cross-lingual sentiment classification with weakly shared deep neural networks. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 245–254 Zhou G, Zeng Z, Huang JX, He T (2016) Transfer learning for cross-lingual sentiment classification with weakly shared deep neural networks. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 245–254
Metadaten
Titel
Character level embedding with deep convolutional neural network for text normalization of unstructured data for Twitter sentiment analysis
verfasst von
Monika Arora
Vineet Kansal
Publikationsdatum
01.12.2019
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2019
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-019-0557-y

Weitere Artikel der Ausgabe 1/2019

Social Network Analysis and Mining 1/2019 Zur Ausgabe