nach oben

Social Network Analysis and Mining

Erschienen in:

01.12.2018 | Original Article

Incorporating pre-training in long short-term memory networks for tweet classification

verfasst von: Shuhan Yuan, Xintao Wu, Yang Xiang

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The paper presents deep learning models for tweet classification. Our approach is based on the long short-term memory (LSTM) recurrent neural network and hence expects to be able to capture long-term dependencies among words. We first focus on binary classification task. The basic model, called LSTM-TC, takes word embeddings as inputs, uses LSTM to derive the semantic tweet representation, and applies logistic regression to predict the tweet label. The basic LSTM-TC model, like other deep learning models, requires a large amount of well-labeled training data to achieve good performance. To address this challenge, we further develop an improved model, called LSTM-TC*, that incorporates a large amount of weakly labeled data for classifying tweets. Finally, we extend the models, called LSTM-Multi-TC and LSTM-Multi-TC*, to multiclass classification task. We present two approaches of constructing the weakly labeled data. One is based on hashtag information and the other is based on the prediction output of a traditional classifier that does not need a large amount of well-labeled training data. Our LSTM-TC* and LSTM-Multi-TC* models first learn tweet representation based on the weakly labeled data, and then train the classifiers based on the small amount of well-labeled data. Experimental results show that: (1) the proposed methods can be successfully used for tweet classification and outperform existing state-of-the-art methods; (2) pre-training tweet representations, which utilizes weakly labeled tweets, can significantly improve the accuracy of tweet classification.

Vorheriger Artikel Social networks and the wages of job seekers: the case of China

Nächster Artikel An optimal method for URL design of webpage journals

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://code.google.com/archive/p/word2vec/.

http://scikit-learn.org/stable/.

http://xgboost.readthedocs.io/en/latest/.

http://everydaysexism.com/, http://stemfeminist.com/.

http://www.cs.cmu.edu/~ark/TweetNLP/.

Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations, 7–9 May 2015

Bamman D, Smith NA (2015) Contextualized sarcasm detection on twitter. In: Proceedings of the ninth international AAAI conference on web and social media, pp 574–577

Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow IJ, Bergeron A, Bouchard N, Bengio Y (2012) Theano: new features and speed improvements. In: The deep learning workshop, NIPS 2012

Bengio Y, Simard P, Frasconi P (1997) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166. https://doi.org/10.1109/72.279181 CrossRef

Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155MATH

Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. TPAMI 35(8):1798–1828CrossRef

Blacoe W, Lapata M (2012) A comparison of vector-based representations for semantic composition. IN: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL ’12), pp 546–556

Bonchi F, Hajian S, Mishra B, Ramazzotti D (2017) Exposing the probabilistic causal structure of discrimination. Int J Data Sci Anal 3(1):1–21CrossRef

Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’16. ACM, New York, pp 785–794. https://doi.org/10.1145/2939672.2939785

Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537MATH

Denil M, Demiraj A, Kalchbrenner N, Blunsom P, de Freitas N (2014) Modelling, visualising and summarising documents with a single convolutional neural network. University of Oxford

Feldman M, Friedler SA, Moeller J, Scheidegger C, Venkatasubramanian S (2015) Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’15), pp 259–268

Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66CrossRef

Gers F, Schmidhuber J, Cummins F (1999) Learning to forget: continual prediction with LSTM. In: Ninth international conference on artificial neural networks ICANN, pp 850–855

Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850 [cs]

Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. TKDE 25(7):1445–1459

Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554MathSciNetMATHCrossRef

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

Irsoy O, Cardie C (2014) Opinion mining with deep recurrent neural networks. In: Proceedings of the conference on empirical methods in natural language processing, pp. 720–728

Iyyer M, Enns P, Boyd-Graber J, Resnik P (2014) Political ideology detection using recursive neural networks. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, pp 1113–1122

Joshi A, Bhattacharyya P, Carman MJ (2017) Automatic sarcasm detection: a survey. ACM Comput Surv 50(5):73:1–73:22CrossRef

Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1746–1751

Lebret R, Collobert R (2014) Word embeddings through Hellinger PCA. In: Proceedings of the 14th conference of the European chapter of the Association for Computational Linguistics, pp 482–490

LeCun YA, Bottou L, Orr GB, Mller KR (2012) Efficient BackProp. In: Neural networks: tricks of the trade, 2nd edn. LNCS 7700. Springer, Berlin, pp 9–48CrossRef

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef

Mikolov T, Corrado G, Chen K, Dean J (2013) Efficient estimation of word representations in vector space. In: International conference on learning representations, January 2013

Mitchell J, Lapata M (2010) Composition in distributional models of semantics. Cogn Sci 34(8):1388–1429CrossRef

Paulus R, Xiong C, Socher R (2018) A deep reinforced model for abstractive summarization. In: International conference on learning representations, 30 Apr–3 May 2018

Pedreschi D, Ruggieri S, Turini F (2013) The discovery of discrimination. In: Custers B, Calders T, Schermer B, Zarsky T (eds) Discrimination and privacy in the information society. Studies in applied philosophy, epistemology and rational ethics, vol 3. Springer, Berlin, Heidelberg, pp 91–108CrossRef

Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

Rajadesingan A, Zafarani R, Liu H (2015) Sarcasm detection on twitter: a behavioral modeling approach. In: Proceedings of the eighth ACM international conference on web search and data mining, pp 97–106

Romei A, Ruggieri S (2014) A multidisciplinary survey on discrimination analysis. Knowl Eng Rev 29(05):582–638CrossRef

Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 379–389

Shen Y, Jin R, Chen J, He X, Gao J, Deng L (2015) A deep embedding model for co-occurrence learning. arXiv:1504.02824 [cs]

Shen Y, Huang PS, Gao J, Chen W (2017) ReasoNet: learning to stop reading in machine comprehension. In: KDD, ACM, KDD ’17, pp 1047–1055

Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the conference on empirical methods in natural language processing, pp 151–161

Socher R, Huval B, Manning CD, Ng AY (2012) Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 1201–1211

Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958MathSciNetMATH

Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics, pp 1556–1566

Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, pp 1555–1565

Tomas M, Karafiat M, Burget L, Cernocky JH, Khudanpur S (2010) Recurrent neural network based language model. In: INTERSPEECH

Turney PD (2014) Semantic composition and decomposition: from recognition to generation. arXiv:1405.7908 [cs]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

Vincent P, Larochelle H, Lajoie I (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408MathSciNetMATH

Weston J, Ratle F, Mobahi H, Collobert R (2012) Deep learning via semi-supervised embedding. In: Montavon G, Orr GB, Mller KR (eds) Neural networks: tricks of the trade, vol 7700. Lecture notes in computer science. Springer, Berlin, pp 639–655CrossRef

Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, pp 1480–1489

Yih Wt, Toutanova K, Platt J, Meek C (2011) Learning discriminative projections for text similarity measures. In: Proceedings of the fifteenth conference on computational natural language learning, pp 247–256

Yuan S, Wu X, Xiang Y (2016) Incorporating pre-training in long short-term memory networks for tweets classification. In: IEEE 16th international conference on data mining (ICDM), pp 1329–1334

Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701 [cs]

Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657

Zhu X, Grefenstette E (2017) Deep learning for semantic composition. In: Proceedings of annual meeting of the Association for Computational Linguistics, Tutorial Abstracts, pp 6–7

Titel: Incorporating pre-training in long short-term memory networks for tweet classification
verfasst von: Shuhan Yuan
Xintao Wu
Yang Xiang
Publikationsdatum: 01.12.2018
Verlag: Springer Vienna
Erschienen in: Social Network Analysis and Mining / Ausgabe 1/2018
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI: https://doi.org/10.1007/s13278-018-0530-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2018

Correlations and dynamics of consumption patterns in social-economic networks

Fine-grained document clustering via ranking and its application to social media analytics

Reconstructing news spread networks and studying its dynamics

Procure, persist, perish: communication tie dynamics in a disrupted task environment

An efficient method for mining the maximal α-quasi-clique-community of a given node in complex networks

User profiling of the Twitter Social Network during the impeachment of Brazilian President

Premium Partner