nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

On the Design and Tuning of Machine Learning Models for Language Toxicity Classification in Online Platforms

verfasst von : Maciej Rybinski, William Miller, Javier Del Ser, Miren Nekane Bilbao, José F. Aldana-Montes

Erschienen in: Intelligent Distributed Computing XII

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

One of the most concerning drawbacks derived from the lack of supervision in online platforms is their exploitation by misbehaving users to deliver offending (toxic) messages while remaining unknown themselves. Given the huge volumes of data handled by these platforms, the detection of toxicity in exchanged comments and messages has naturally called for the adoption of machine learning models to automate this task. In the last few years Deep Learning models and related techniques have played a major role in this regard due to their superior modeling capabilities, which have made them stand out as the prevailing choice in the related literature. By addressing a toxicity classification problem over a real dataset, this work aims at throwing light on two aspects of this noted dominance of Deep Learning models: (1) an empirical assessment of their predictive gains with respect to traditional Shallow Learning models; and (2) the impact of using different text embedding methods and data augmentation techniques in this classification task. Our findings reveal that in our case study the application of non-optimized Shallow and Deep Learning models attains very competitive accuracy scores, thus leaving a narrow improvement margin for the fine-grained refinement of the models or the addition of data augmentation techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel A Genetic Algorithm with Local Search Based on Label Propagation for Detecting Dynamic Communities

Nächstes Kapitel Ensuring Availability of Wireless Mesh Networks for Crisis Management

See e.g. http://mlwave.com/kaggle-ensembling-guide/.

http://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge.

Frequent in terms of total term frequency in the entire collection of training documents.

https://www.kaggle.com/eashish/bidirectional-gru-with-convolution.

https://www.kaggle.com/chongjiujjin/capsule-net-with-gru.

https://github.com/PavelOstyakov/toxic/tree/master/tools.

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef

Escalante, H.J., Villatoro-Tello, E., Garza, S.E., López-Monroy, A.P., Montes-y Gómez, M., Villaseñor-Pineda, L.: Early detection of deception and aggressiveness using profile-based representations. Expert. Syst. Appl. 89, 99–111 (2017)CrossRef

Hashim, E.N., Nohuddin, P.N.: Data mining techniques for recidivism prediction: a survey paper. Adv. Sci. Lett. 24(3), 1616–1618 (2018)CrossRef

Lara-Cabrera, R., Gonzalez-Pardo, A., Camacho, D.: Statistical analysis of risk assessment factors and metrics to evaluate radicalisation in twitter. Futur. Gener. Comput. Syst. (2017). https://www.sciencedirect.com/science/article/pii/S0167739X17308348

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)CrossRef

Miikkulainen, R., Liang, J., Meyerson, E., Rawal, A., Fink, D., Francon, O., Raju, B., Shahrzad, H., Navruzyan, A., Duffy, N., et al.: Evolving deep neural networks (2017). arXiv preprint arXiv:170300548

Nam, J., Kim, J., Mencía, E.L., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification revisiting neural networks. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 437–452. Springer (2014)

Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks (2013). arXiv preprint arXiv:13126026

Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

10.

Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3859–3869 (2017)

11.

Severyn, A., Moschitti, A.: Twitter sentiment analysis with deep convolutional neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 959–962. ACM (2015)

12.

Villar-Rodriguez, E., Del Ser, J., Bilbao, M.N., Salcedo-Sanz, S.: A feature selection method for author identification in interactive communications based on supervised learning and language typicality. Eng. Appl. Artif. Intell. 56, 175–184 (2016)CrossRef

13.

Villar-Rodríguez, E., Del Ser, J., Torre-Bastida, A.I., Bilbao, M.N., Salcedo-Sanz, S.: A novel machine learning approach to the detection of identity theft in social networks based on emulated attack instances and support vector machines. Concurr. Comput. Pract. Exp. 28(4), 1385–1395 (2016)CrossRef

14.

Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification (2015). arXiv preprint arXiv:151003820

15.

Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling (2016). arXiv preprint arXiv:161106639

Titel: On the Design and Tuning of Machine Learning Models for Language Toxicity Classification in Online Platforms
verfasst von: Maciej Rybinski
William Miller
Javier Del Ser
Miren Nekane Bilbao
José F. Aldana-Montes
Verlag: Springer International Publishing
Buch: Intelligent Distributed Computing XII
Print ISBN: 978-3-319-99625-7

Electronic ISBN: 978-3-319-99626-4

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-99626-4_29

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"