nach oben

Social Network Analysis and Mining

Erschienen in:

01.12.2019 | Original Article

Character level embedding with deep convolutional neural network for text normalization of unstructured data for Twitter sentiment analysis

verfasst von: Monika Arora, Vineet Kansal

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

On social media platforms such as Twitter and Facebook, people express their views, arguments, and emotions of many events in daily life. Twitter is an international microblogging service featuring short messages called “tweets” from different languages. These texts often consist of noise in the form of incorrect grammar, abbreviations, freestyle, and typographical errors. Sentiment analysis (SA) aims to predict the actual emotions from the raw text expressed by the people through the field of natural language processing (NLP). The main aim of our work is to process the raw sentence from the Twitter dataset and find the actual polarity of the message. This paper proposes a text normalization with deep convolutional character level embedding (Conv-char-Emb) neural network model for SA of unstructured data. This model can tackle the problems: (1) processing the noisy sentence for sentiment detection (2) handling small memory space in word level embedded learning (3) accurate sentiment analysis of the unstructured data. The initial preprocessing stage for performing text normalization includes the following steps: tokenization, out of vocabulary (OOV) detection and its replacement, lemmatization and stemming. A character-based embedding in convolutional neural network (CNN) is an effective and efficient technique for SA that uses less learnable parameters in feature representation. Thus, the proposed method performs both the normalization and classification of sentiments for unstructured sentences. The experimental results are evaluated in the Twitter dataset by a different point polarity (positive, negative and neutral). As a result, our model performs well in normalization and sentiment analysis of the raw Twitter data enriched with hidden information.

Vorheriger Artikel Joint inference of user community and interest patterns in social interaction networks

Nächster Artikel Detecting intrinsic communities in evolving networks

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Agarwal B, Ramampiaro H, Langseth H, Ruocco M (2018) A deep network model for paraphrase detection in short text messages. Inf Process Manag 54(6):922–937CrossRef

Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) A brief survey of text mining: classification, clustering and extraction techniques. arXiv:1707.02919 (arXiv preprint)

Asghar MZ, Kundi FM, Ahmad S, Khan A, Khan F (2018) T-SAF: Twitter sentiment analysis framework using a hybrid classification scheme. Expert Syst 35(1):e12233CrossRef

Baecchi C, Uricchio T, Bertini M, Del Bimbo A (2016) A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimed Tools Appl 75(5):2507–2525CrossRef

Baziotis C, Pelekis N, Doulkeridis C (2017) Datastories at semeval-2017 task 4: deep lstm with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pp. 747–754

Chen P, Sun Z, Bing L, Yang W (2017) Recurrent attention network on memory for aspect sentiment analysis. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp. 452–461

Corrêa EA Jr, Marinho VQ, Santos LB (2017) Nilc-usp at semeval-2017 task 4: a multi-view ensemble for twitter sentiment analysis. arXiv:1704.02263 (arXiv preprint)

Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AY, Gelbukh A, Zhou Q (2016) Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cognitive computation 8(4):757–771CrossRef

Dolan B, Quirk C, Brockett C (2004) Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In: Proceedings of the 20th international conference on Computational Linguistics, Association for Computational Linguistics, pp 350

dos Santos C, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, pp 69–78

Dovdon E, Saias J (2017) ej-sa-2017 at SemEval-2017 Task 4: Experiments for Target oriented Sentiment Analysis in Twitter. INF—Artigos em Livros de Actas/Proceedings, ACL. http://www.aclweb.org/anthology/S/S17/S17-2106.pdf

Dragoni M, Federici M, Rexha A (2018) An unsupervised aspect extraction strategy for monitoring real-time reviews stream. Inf Process Manage 56(3):1103–1118CrossRef

Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(12)

Hailong Z, Wenyan G, Bo J (2014) Machine learning and lexicon based methods for sentiment classification: a survey. In: Web information system and application conference (WISA), 2014 11th, IEEE, pp 262–265

Hanafiah N, Kevin A, Sutanto C, Arifin Y, Hartanto J (2017) Text normalization algorithm on Twitter in complaint category. Procedia Comput Sci 116:20–26CrossRef

He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

Hwang K, Sung W (2017) Character-level language modeling with hierarchical recurrent neural networks. In: IEEE international conference on Acoustics, speech and signal processing (ICASSP), 2017, pp 5720–5724

Jabreel M, Moreno A (2017) SiTAKA at SemEval-2017 task 4: sentiment analysis in twitter based on a rich set of features. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pp 694–699

Jianqiang Z, Xiaolin G (2017) Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Access 5:2870–2879CrossRef

Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882 (arXiv preprint)

Ma Y, Peng H, Khan T, Cambria E, Hussain A (2018) Sentic LSTM: a hybrid network for targeted aspect-based sentiment analysis. Cogn Comput 10(4):639–650CrossRef

Martínez-Cámara E, Martín-Valdivia MT, Urena-López LA, Montejo-Ráez AR (2014) Sentiment analysis in Twitter. Nat Lang Eng 20(1):1–28CrossRef

Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NIPS'13 Proceedings of the 26th International Conference on Neural Information Processing Systems vol 2, pp 3111–3119

Mozetič I, Grčar M, Smailović J (2016) Multilingual Twitter sentiment classification: The role of human annotators. PloS One 11(5):e0155036CrossRef

Nicolai G, Kondrak G (2016) Leveraging inflection tables for stemming and lemmatization. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pp 1138–1147

Onyibe C, Habash N (2017) OMAM at SemEval-2017 Task 4: english sentiment analysis with conditional random fields. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pp 670–674

Ouyang X, Zhou P, Li CH, Liu L (2015) Sentiment analysis using convolutional neural network. In: IEEE International conference on computer and information technology; ubiquitous computing and communications; dependable, autonomic and secure computing; pervasive intelligence and computing (CIT/IUCC/DASC/PICOM), 2015, pp 2359–2364

Pontiki M, Galanis D, Papageorgiou H, Manandhar S, Androutsopoulos I (2015) Semeval-2015 task 12: aspect based sentiment analysis. In: Proceedings of the 9th international workshop on semantic evaluation, pp 486–495

Prettenhofer P, Stein B (2010) Cross-language text classification using structural correspondence learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 1118–1127

Roccetti M, Prandi C, Salomoni P, Marfia G (2016) Unleashing the true potential of social networks: confirming infliximab medical trials through facebook posts. Netw Model Anal Health Inf Bioinform 5(1):15CrossRef

Roccetti M, Salomoni P, Prandi C, Marfia G, Mirri S (2017) On the interpretation of the effects of the Infliximab treatment on Crohn’s disease patients from Facebook posts: a human vs. machine comparison. Netw Model Anal Health Inf Bioinform 6(1):11CrossRef

Rosenthal S, Farra N, Nakov P. SemEval-2017 task 4: sentiment analysis in Twitter. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017) 2017, pp 502–518

Ruangkanokmas P, Achalakul T, Akkarajitsakul K (2016) Deep belief networks with feature selection for sentiment classification. In: 7th International conference on intelligent systems, modelling and simulation (ISMS), 2016, pp 9–14

Saif H, Fernández M, He Y, Alani H (2014) On stopwords, filtering and data sparsity for sentiment analysis of Twitter. In: LREC 2014, Ninth International Conference on Language Resources and Evaluation. Proceedings, pp 810–817.

Silva C, Ribeiro B (2003) The importance of stop word removal on recall values in text categorization. In: Proceedings of the international joint conference on neural networks, vol 3, pp 1661–1666

Singh T, Kumari M (2016) Role of text pre-processing in twitter sentiment analysis. Procedia Comp Sci 89:549–554CrossRef

Vateekul P, Koomsubha T (2016) A study of sentiment analysis using deep learning techniques on Thai Twitter data. In: 13th international joint conference on computer science and software engineering (JCSSE), 2016, pp 1–6

Vechtomova O (2017) Disambiguating context-dependent polarity of words: An information retrieval approach. Inf Process Manag 53(5):1062–1079CrossRef

Vinodhini G, Chandrasekaran RM (2012) Sentiment analysis and opinion mining: a survey. Int J 2(6):282–292

Wehrmann J, Becker W, Cagnini HE, Barros RC (2017) A character-based convolutional neural network for language-agnostic Twitter sentiment analysis. In: International joint conference on neural networks (IJCNN), 2017, pp. 2384–2391

Yang Z, Hu Z, Salakhutdinov R, Berg-Kirkpatrick T (2017) Improved variational autoencoders for text modeling using dilated convolutions. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 3881–3890

Yuvaraj N, Sabari A (2017) Twitter sentiment classification using binary shuffled frog algorithm. Intell Autom Soft Comput 23(2):373–381CrossRef

Zare M, Rohatgi S (2017) DeepNorm—a deep learning approach to text normalization. arXiv:1712.06994 (arXiv preprint)

Zhang X, LeCun Y (2015) Text understanding from scratch. arXiv:1502.01710 (arXiv preprint)

Zhang J, Zong C (2015a). Neural networks in machine translation: an overview. In: IEEE Intell Syst, pp 17241734

Zhang J, Zong C (2015b) Deep neural networks in machine translation: An overview. IEEE Intell Syst 30(5):16–25CrossRef

Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In Advances in neural information processing systems, pp. 649–657

Zhou G, Zeng Z, Huang JX, He T (2016) Transfer learning for cross-lingual sentiment classification with weakly shared deep neural networks. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 245–254

Titel: Character level embedding with deep convolutional neural network for text normalization of unstructured data for Twitter sentiment analysis
verfasst von: Monika Arora
Vineet Kansal
Publikationsdatum: 01.12.2019
Verlag: Springer Vienna
Erschienen in: Social Network Analysis and Mining / Ausgabe 1/2019
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI: https://doi.org/10.1007/s13278-019-0557-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2019

User interactions and behaviors in a large-scale online emotional support service

Modeling memetics using edge diversity

Tracing temporal communities and event prediction in dynamic social networks

Handling uncertainty in social media textual information for improving venue recommendation formulation quality in social networks

Characterizing the Twitter network of prominent politicians and SPLC-defined hate groups in the 2016 US presidential election

Characterising and evaluating dynamic online communities from live microblogging user interactions