Top

Published in:

2018 | OriginalPaper | Chapter

Anomaly Detection for Short Texts: Identifying Whether Your Chatbot Should Switch from Goal-Oriented Conversation to Chit-Chatting

Authors : Amir Bakarov, Vasiliy Yadrintsev, Ilya Sochenkov

Published in: Digital Transformation and Global Society

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Goal-oriented conversational agents are systems able converse with humans using natural language to help them reach a certain goal. The number of goals (or domains) about which an agent could converse is limited, and one of the issues is to identify whether a user talks about the unknown domain (in order to report a misunderstanding or switch to chit-chatting mode). We argue that this issue could be resolved if we consider it as an anomaly detection task which is in a field of machine learning. The scientific community developed a broad range of methods for resolving this task, and their applicability to the short text data was never investigated before. The aim of this work is to compare performance of 6 different anomaly detection methods on Russian and English short texts modeling conversational utterances, proposing the first evaluation framework for this task. As a result of the study, we find out that a simple threshold for cosine similarity works better than other methods for both of the considered languages.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Detecting and Interfering in Cyberbullying Among Young People (Foundations and Results of German Case-Study)

next chapter Emotional Waves of a Plot in Literary Texts: New Approaches for Investigation of the Dynamics in Digital Culture

Actually, the collection of Reddit posts is based on an already crawled corpus available at https://github.com/linanqiu/reddit-dataset.

https://github.com/bakarov/conversational-anomaly.

Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41, 15:1–15:58 (2009). https://doi.org/10.1007/978-1-4899-7502-7_912-1

Markou, M., Singh, S.: Novelty detection: a review-part 1: statistical approaches. Signal Process. 83(12), 2481–2497 (2003). https://doi.org/10.1016/j.sigpro.2003.07.018CrossRefMATH

Guthrie, D., Guthrie, L., Allison, B., Wilks, Y.: Unsupervised anomaly detection. In: IJCAI, pp. 1624–1628 (2007)

Lester, J., Branting, K., Mott, B.: Conversational agents. In: The Practical Handbook of Internet Computing, pp. 220–240 (2004)

Chen, H., Liu, X., Yin, D., Tang, J.: A survey on dialogue systems: recent advances and new frontiers. arXiv preprint arXiv:1711.01731 (2017). https://doi.org/10.1145/3166054.3166058CrossRef

Cui, L., Huang, S., Wei, F., Tan, C., Duan, C., Zhou, M.: Superagent: a customer service chatbot for e-commerce websites. In: Proceedings of ACL 2017, System Demonstrations, pp. 97–102 (2017). https://doi.org/10.18653/v1/P17-4017

Venkatesh, A., et al.: On evaluating and comparing conversational agents. arXiv preprint arXiv:1801.03625 (2018)

Mathur, V., Singh, A.: The rapidly changing landscape of conversational agents. arXiv preprint arXiv:1803.08419 (2018)

Edgeworth, F.: XLI. on discordant observations. Lond. Edinb. Dublin Philos. Mag. J. Sci. 23(143), 364–375 (1887). https://doi.org/10.1080/14786448708628471CrossRef

10.

Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014). https://doi.org/10.1016/j.sigpro.2013.12.026CrossRef

11.

Baker, L.D., Hofmann, T., McCallum, A., Yang, Y.: A hierarchical probabilistic model for novelty detection in text. In: Proceedings of International Conference on Machine Learning (1999)

12.

Manevitz, L.M., Yousef, M.: One-class SVMS for document classification. J. Mach. Learn. Res. 2, 139–154 (2001)

13.

Guthrie, D.: Unsupervised Detection of Anomalous Text. Ph.D. thesis, Citeseer (2008)

14.

Kumaraswamy, R., Wazalwar, A., Khot, T., Shavlik, J.W., Natarajan, S.: Anomaly detection in text: the value of domain knowledge. In: FLAIRS Conference, pp. 225–228 (2015)

15.

Camacho-Collados, J., Navigli, R.: Find the word that does not belong: a framework for an intrinsic evaluation of word vector representations. In: Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, pp. 43–50 (2016). https://doi.org/10.18653/v1/W16-2508

16.

Pande, A., Ahuja, V.: WEAC: word embeddings for anomaly classification from event logs. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 1095–1100. IEEE (2017). https://doi.org/10.1109/BigData.2017.8258034

17.

Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Eighth IEEE International Conference on Data Mining, 2008. ICDM 2008, pp. 413–422. IEEE (2008). https://doi.org/10.1109/ICDM.2008.17

18.

Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: SIGMOD Conference on New York. LOF: Identifying Density-Based Local Outliers

19.

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

20.

Ng, A.: Sparse autoencoder. CS294A Lect. Notes 72, 1–19 (2011)

21.

Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 69–72. Association for Computational Linguistics (2006). https://doi.org/10.3115/1225403.1225421

22.

Korobov, M.: Morphological analyzer and generator for russian and ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_31CrossRef

23.

Straka, M., Hajic, J., Straková, J.: Udpipe: trainable pipeline for processing conll-u files performing tokenization, morphological analysis, pos tagging and parsing. In: LREC (2016)

24.

Li, B., et al.: Investigating different syntactic context types and context representations for learning word embeddings. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2421–2431 (2017). https://doi.org/10.18653/v1/D17-1257

25.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

26.

Bakarov, A., Gureenkova, O.: Automated detection of non-relevant posts on the Russian imageboard “2ch”: importance of the choice of word representations. In: van der Aalst, W.M.P. (ed.) AIST 2017. LNCS, vol. 10716, pp. 16–21. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73013-4_2CrossRef

27.

Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(2579–2605), 85 (2008)MATH

Title: Anomaly Detection for Short Texts: Identifying Whether Your Chatbot Should Switch from Goal-Oriented Conversation to Chit-Chatting
Authors: Amir Bakarov
Vasiliy Yadrintsev
Ilya Sochenkov
Publisher: Springer International Publishing
Book: Digital Transformation and Global Society
Print ISBN: 978-3-030-02845-9

Electronic ISBN: 978-3-030-02846-6

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-030-02846-6_23

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner