Skip to main content
Top

2018 | OriginalPaper | Chapter

Anomaly Detection for Short Texts: Identifying Whether Your Chatbot Should Switch from Goal-Oriented Conversation to Chit-Chatting

Authors : Amir Bakarov, Vasiliy Yadrintsev, Ilya Sochenkov

Published in: Digital Transformation and Global Society

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Goal-oriented conversational agents are systems able converse with humans using natural language to help them reach a certain goal. The number of goals (or domains) about which an agent could converse is limited, and one of the issues is to identify whether a user talks about the unknown domain (in order to report a misunderstanding or switch to chit-chatting mode). We argue that this issue could be resolved if we consider it as an anomaly detection task which is in a field of machine learning. The scientific community developed a broad range of methods for resolving this task, and their applicability to the short text data was never investigated before. The aim of this work is to compare performance of 6 different anomaly detection methods on Russian and English short texts modeling conversational utterances, proposing the first evaluation framework for this task. As a result of the study, we find out that a simple threshold for cosine similarity works better than other methods for both of the considered languages.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Actually, the collection of Reddit posts is based on an already crawled corpus available at https://​github.​com/​linanqiu/​reddit-dataset.
 
Literature
3.
go back to reference Guthrie, D., Guthrie, L., Allison, B., Wilks, Y.: Unsupervised anomaly detection. In: IJCAI, pp. 1624–1628 (2007) Guthrie, D., Guthrie, L., Allison, B., Wilks, Y.: Unsupervised anomaly detection. In: IJCAI, pp. 1624–1628 (2007)
4.
go back to reference Lester, J., Branting, K., Mott, B.: Conversational agents. In: The Practical Handbook of Internet Computing, pp. 220–240 (2004) Lester, J., Branting, K., Mott, B.: Conversational agents. In: The Practical Handbook of Internet Computing, pp. 220–240 (2004)
11.
go back to reference Baker, L.D., Hofmann, T., McCallum, A., Yang, Y.: A hierarchical probabilistic model for novelty detection in text. In: Proceedings of International Conference on Machine Learning (1999) Baker, L.D., Hofmann, T., McCallum, A., Yang, Y.: A hierarchical probabilistic model for novelty detection in text. In: Proceedings of International Conference on Machine Learning (1999)
12.
go back to reference Manevitz, L.M., Yousef, M.: One-class SVMS for document classification. J. Mach. Learn. Res. 2, 139–154 (2001) Manevitz, L.M., Yousef, M.: One-class SVMS for document classification. J. Mach. Learn. Res. 2, 139–154 (2001)
13.
go back to reference Guthrie, D.: Unsupervised Detection of Anomalous Text. Ph.D. thesis, Citeseer (2008) Guthrie, D.: Unsupervised Detection of Anomalous Text. Ph.D. thesis, Citeseer (2008)
14.
go back to reference Kumaraswamy, R., Wazalwar, A., Khot, T., Shavlik, J.W., Natarajan, S.: Anomaly detection in text: the value of domain knowledge. In: FLAIRS Conference, pp. 225–228 (2015) Kumaraswamy, R., Wazalwar, A., Khot, T., Shavlik, J.W., Natarajan, S.: Anomaly detection in text: the value of domain knowledge. In: FLAIRS Conference, pp. 225–228 (2015)
15.
go back to reference Camacho-Collados, J., Navigli, R.: Find the word that does not belong: a framework for an intrinsic evaluation of word vector representations. In: Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, pp. 43–50 (2016). https://doi.org/10.18653/v1/W16-2508 Camacho-Collados, J., Navigli, R.: Find the word that does not belong: a framework for an intrinsic evaluation of word vector representations. In: Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, pp. 43–50 (2016). https://​doi.​org/​10.​18653/​v1/​W16-2508
18.
go back to reference Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: SIGMOD Conference on New York. LOF: Identifying Density-Based Local Outliers Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: SIGMOD Conference on New York. LOF: Identifying Density-Based Local Outliers
19.
go back to reference Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003) Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
20.
go back to reference Ng, A.: Sparse autoencoder. CS294A Lect. Notes 72, 1–19 (2011) Ng, A.: Sparse autoencoder. CS294A Lect. Notes 72, 1–19 (2011)
23.
go back to reference Straka, M., Hajic, J., Straková, J.: Udpipe: trainable pipeline for processing conll-u files performing tokenization, morphological analysis, pos tagging and parsing. In: LREC (2016) Straka, M., Hajic, J., Straková, J.: Udpipe: trainable pipeline for processing conll-u files performing tokenization, morphological analysis, pos tagging and parsing. In: LREC (2016)
24.
go back to reference Li, B., et al.: Investigating different syntactic context types and context representations for learning word embeddings. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2421–2431 (2017). https://doi.org/10.18653/v1/D17-1257 Li, B., et al.: Investigating different syntactic context types and context representations for learning word embeddings. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2421–2431 (2017). https://​doi.​org/​10.​18653/​v1/​D17-1257
25.
go back to reference Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
27.
go back to reference Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(2579–2605), 85 (2008)MATH Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(2579–2605), 85 (2008)MATH
Metadata
Title
Anomaly Detection for Short Texts: Identifying Whether Your Chatbot Should Switch from Goal-Oriented Conversation to Chit-Chatting
Authors
Amir Bakarov
Vasiliy Yadrintsev
Ilya Sochenkov
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-02846-6_23

Premium Partner