Top

Cognitive Computation

Published in:

23-03-2020

A Deep Multi-task Model for Dialogue Act Classification, Intent Detection and Slot Filling

Authors: Mauajama Firdaus, Hitesh Golchha, Asif Ekbal, Pushpak Bhattacharyya

Published in: Cognitive Computation | Issue 3/2021

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

An essential component of any dialogue system is understanding the language which is known as spoken language understanding (SLU). Dialogue act classification (DAC), intent detection (ID) and slot filling (SF) are significant aspects of every dialogue system. In this paper, we propose a deep learning-based multi-task model that can perform DAC, ID and SF tasks together. We use a deep bi-directional recurrent neural network (RNN) with long short-term memory (LSTM) and gated recurrent unit (GRU) as the frameworks in our multi-task model. We use attention on the LSTM/GRU output for DAC and ID. The attention outputs are fed to individual task-specific dense layers for DAC and ID. The output of LSTM/GRU is fed to softmax layer for slot filling as well. Experiments on three datasets, i.e. ATIS, TRAINS and FRAMES, show that our proposed multi-task model performs better than the individual models as well as all the pipeline models. The experimental results prove that our attention-based multi-task model outperforms the state-of-the-art approaches for the SLU tasks. For DAC, in relation to the individual model, we achieve an improvement of more than 2% for all the datasets. Similarly, for ID, we get an improvement of 1% on the ATIS dataset, while for TRAINS and FRAMES dataset, there is a significant improvement of more than 3% compared to individual models. We also get a 0.8% enhancement for ATIS and a 4% enhancement for TRAINS and FRAMES dataset for SF with respect to individual models. Results obtained clearly show that our approach is better than existing methods. The validation of the obtained results is also demonstrated using statistical significance t tests.

previous article SOAR Improved Artificial Neural Network for Multistep Decision-making Tasks

next article VTAAN: Visual Tracking with Attentive Adversarial Network

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

http://nlp.stanford.edu/projects/glove/

https://code.google.com/archive/p/word2vec/

https://fasttext.cc/

www.keras.io

Ang J, Liu Y, Shriberg E. Automatic dialog act segmentation and classification in multiparty meetings, In: IEEE International Conference on Acoustics, Speech, and Signal Processing, {ICASSP} '05, Philadelphia, Pennsylvania, USA, March 18-23, 2005, Vol 1, pp 1061–1064.

Bapna A, Tur G, Hakkani-Tur D, Heck L. Sequential dialogue context modeling for spoken language understanding, In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, Saarbrucken, Germany, August 15–17, 2017; pp 103–114.

Barahona LMR, Gasic M, Mrkšić N, Su PH, Ultes S, Wen TH, Young S. Exploiting sentence and context representations in deep neural models for spoken language understanding, In: 26th International Conference on Computational Linguistics, (COLING), Proceedings of the Conference: Technical Papers, December 11–16, 2016, Osaka, Japan; pp 258–267.

Chen L, Di Eugenio B. Multimodality and dialogue act classification in the RoboHelper Project; In: Proceedings of the SIGDIAL 2013 Conference, The 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 22–24 August 2013, SUPELEC, Metz, France; pp 183–192.

A. Deoras, R. Sarikaya, Deep belief network based semantic taggers for spoken language understanding., In: INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013, pp. 2713–2717.

Fernandez R, Picard RW. Dialog act classification from prosodic features using support vector machines, In: Speech Prosody 2002, International Conference; 2002.

Firdaus M, Bhatnagar S, Ekbal A, Bhattacharyya P. Intent detection for spoken language understanding using a deep ensemble model, In: 15th Pacific Rim International Conference on Artificial Intelligence (PRICAI), Nanjing, China, August 28-31, 2018, Proceedings, Part {I}, Springer, pp 629–642.

Firdaus M, Bhatnagar S, Ekbal A, Bhattacharyya P. A deep learning based multi-task ensemble model for intent detection and slot filling in spoken language understanding, In: Neural Information Processing - 25th International Conference, (ICONIP) 2018, Siem Reap, Cambodia, December 13-16, 2018, Proceedings, Part {IV}, Springer, pp 647–658.

Firdaus M, Kumar A, Ekbal A, Bhattacharyya P. A Multi-task hierarchical approach for intent detection and slot filling, In: Knowledge-Based Systems, Elsevier; vol-183; 2019.

10.

Goo CW, Gao G, Hsu YK, Huo CL, Chen TC, Hsu KW, Chen YN. Slot-gated modeling for joint slot filling and intent prediction, In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers), pp 753–757.

11.

Gorin AL, Riccardi G, Wright JH. How may I help you? Speech Comm. 1997; vol-23, pp 113–27.

12.

Grau S, Sanchis E, Castro MJ, Vilar D. Dialogue act classification using a Bayesian approach, In: 9th Conference Speech and Computer; 2004.

13.

Guo D, Tur G, Yih Wt, Zweig G. Joint semantic utterance classification and slot filling with recursive neural networks, In: Spoken Language Technology Workshop (SLT), IEEE, South Lake Tahoe, NV, USA, December 7-10, 2014; pp 554–559.

14.

Haffner P, Tur G, Wright JH. Optimizing SVMs for complex call classification. In: Acoustics, Speech, and Signal Processing, IEEE International Conference, Hong Kong, April 6-10, 2003, vol 1, pp 632–635.

15.

Hakkani-Tür D, Tur G, Chotimongkol A. Using syntactic and semantic graphs for call classification, In: Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing; 2005.

16.

Hakkani-Tür D, Tür G, Celikyilmaz A, Chen YN, Gao J, Deng L, Wang YY Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM, In: 17th Annual Conference of the International Speech Communication Association, Interspeech, San Francisco, CA, USA, September 8-12, 2016; pp 715–719.

17.

Hashemi HB, Asiaee A, Kraft R. Query intent detection using convolutional neural networks, In: International Conference on Web Search and Data Mining, Workshop on Query Understanding; 2016.

18.

He Y, Young S. A data-driven spoken language understanding system, In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp 583–588; 2003.

19.

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.CrossRef

20.

Jeong M, Lee GG. Triangular-chain conditional random fields. IEEE Trans. Audio Speech Lang Process. 2008; vol-16(7); pp 1287–302.

21.

Ji G, Bilmes J. Dialog act tagging using graphical models. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP) '05, Philadelphia, Pennsylvania, USA, March 18-23, 2005; vol 1, pp 33–36.

22.

Ji Y, Haffari G, Eisenstein J. A Latent variable recurrent neural network for discourse relation language models, arXiv preprint arXiv:1603.01913; 2016.

23.

Justo R, Alcaide JM, Torres MI, Walker M. Detection of sarcasm and nastiness: new resources for Spanish language. In: Cognitive Computation; 2018; vol-10; pp 1135–1151.

24.

Kalchbrenner N, Blunsom P. Recurrent convolutional neural networks for discourse compositionality, In: Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, CVSM@ACL 2013, Sofia, Bulgaria, August 9, 2013, pp 119–126.

25.

Keizer S. A Bayesian approach to dialogue act classification, In: BI-DIALOG 2001: Proceedings of the 5th Workshop on Formal Semantics and Pragmatics of Dialogue, pp 210–218; 2001.

26.

Keizer S, Nijholt A, et al. Dialogue act recognition with Bayesian networks for Dutch dialogues, In: Proceedings of the SIGDIAL 2002 Workshop, The 3rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, Thursday, July 11, 2002 to Friday, July 12, 2002, Philadelphia, PA, USA; Association for Computational Linguistics, pp 88–94.

27.

Khanpour H, Guntakandla N, Nielsen R. Dialogue act classification in domain-independent conversations using a deep recurrent neural network, In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, December 11-16, 2016, Osaka, Japan, pp. 2012–2021.

28.

Kim JK, Tur G, Celikyilmaz A, Cao B, Wang YY. Intent detection using semantically enriched word embeddings, In: Spoken Language Technology Workshop (SLT), IEEE, San Diego, CA, USA, December 13-16, 2016; pp 414–419.

29.

Kim SN, Cavedon L, Baldwin T. Classifying Dialogue acts in one-on-one live chats, In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 9-11 October 2010, {MIT} Stata Center, Massachusetts, USA; pp 862–871.

30.

Kim Y, Jernite Y, Sontag D, Rush AM. Character-Aware Neural Language Models, In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA, pp 2741–2749.

31.

Kim YB, Lee S, Stratos K. ONENET: Joint domain, intent, slot prediction for spoken language understanding, In: Automatic Speech Recognition and Understanding Workshop (ASRU), IEEE, Okinawa, Japan, December 16-20, 2017 pp 547–553.

32.

Kingma D, Ba J. Adam: a method for stochastic optimization, In: 3rd International Conference on Learning Representations, {ICLR} 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.

33.

Kral P, Cerisara C. Automatic dialogue act recognition with syntactic features. Lang Resour Eval. 2014;48(3):419–41.

34.

Kumar H, Agarwal A, Dasgupta R, Joshi S, Kumar A. Dialogue act sequence labeling using hierarchical encoder with CRF, In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp 3440–3447.

35.

Lauren P, Qu G, Yang J, Watta P, Huang GB, Lendasse A. Generating word embeddings from an extreme learning machine for sentiment analysis and sequence labeling tasks. In: Cognitive Computation, 2018; Springer; vol- 10; pp 625–638.

36.

Li Y, Yang L, Xu B, Wang J, Lin H. Improving user attribute classification with text and social network attention. In: Cognitive Computation, 2019; Springer; vol- 11; pp 459–468.

37.

Liu B, Lane I. Attention-based recurrent neural network models for joint intent detection and slot filling, In: Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016, pp 685--689.

38.

Liu B, Lane I. Joint online spoken language understanding and language modeling with recurrent neural networks. In: Proceedings of the SIGDIAL 2016 Conference, The 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 13-15 September 2016, Los Angeles, CA, USA, pp 22-30.

39.

Liu B, Lane I. Dialog context language modeling with recurrent neural networks, In: IEEE International Conference on Acoustics, Speech and Signal Processing; ICASSP, New Orleans, LA, USA, March 5-9, 2017; pp. 5715–5719.

40.

Liu Y. Using SVM and error-correcting codes for multiclass dialog act classification in meeting corpus, In: Ninth International Conference on Spoken Language Processing, Interspeech, Pittsburgh, PA, USA, September 17-21, 2006.

41.

Liu Y, Han K, Tan Z, Lei Y. Using context information for dialog act classification in DNN framework, In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, September 9-11, 2017; pp. 2170–2178.

42.

Luan Y, Watanabe S, Harsham B. Efficient learning for spoken language understanding tasks with word embedding based pre-training, In: Sixteenth Annual Conference of the International Speech Communication Association, Interspeech, Dresden, Germany, September 6-10, 2015; pp 1398–1402.

43.

McCallum A, Freitag D, Pereira FC. Maximum entropy Markov models for information extraction and segmentation. ICML. 2000;17:591–8.

44.

Mesnil G, He X, Deng L, Bengio Y. Investigation of recurrent neural network architectures and learning methods for spoken language understanding, In: 14th Annual Conference of the International Speech Communication Association, Interspeech, Lyon, France, August 25-29, 2013; pp 3771–3775.

45.

Mesnil G, Dauphin Y, Yao K, Bengio Y, Deng L, Hakkani-Tur D, et al. Using recurrent neural networks for slot filling in spoken language understanding. IEEE-ACM T Audio Spe. 2015;23(3):530–9.

46.

Moschitti A, Riccardi G, Raymond C. Spoken language understanding with kernels for syntactic/semantic structures. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU, Kyoto, Japan, December 9-13, 2007; pp 183–188.

47.

Papalampidi P, Iosif E, Potamianos A. Dialogue act semantic representation and classification using recurrent neural networks, In: Proc. SEMDIAL 2017 (SaarDial) Workshop on the Semantics and Pragmatics of Dialogue, pp. 77–86; 2017.

48.

Pennington J, Socher R, Manning C. Glove: global vectors for word representation, In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), October 25-29, 2014, Doha, Qatar, pp 1532–1543.

49.

Price PJ. Evaluation of spoken language systems: the ATIS domain, In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27; 1990.

50.

Ravuri S, Stoicke A. A comparative study of neural network models for lexical intent classification, In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Scottsdale, AZ, USA, December 13-17, 2015, pp 368–374.

51.

Ravuri SV, Stolcke A. Recurrent neural network and LSTM models for lexical utterance classification, In: 16th Annual Conference of the International Speech Communication Association, Interspeech, Dresden, Germany, September 6-10, 2015, pp 135–139.

52.

Raymond C, Riccardi G. Generative and discriminative algorithms for spoken language understanding, In: Eighth Annual Conference of the International Speech Communication Association, Interspeech; Antwerp, Belgium, August 27-31, 2007, pp 1605–1608.

53.

Ribeiro E, Ribeiro R, de Matos DM. The influence of context on dialogue act recognition, arXiv preprint arXiv:150600839; 2015.

54.

Ries K. Hmm and neural network based speech act detection, In: IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP), Phoenix, Arizona, USA, March 15-19, 1999; vol 1, pp 497–500.

55.

Samei B, Li H, Keshtkar F, Rus V, Graesser AC. Context-based speech act classification in intelligent tutoring systems, In: International Conference on Intelligent Tutoring Systems, Springer, pp 236–241; 2014.

56.

Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.MathSciNetMATH

57.

Stolcke A, Ries K, Coccaro N, Shriberg E, Bates R, Jurafsky D, et al. Dialogue act modeling for automatic tagging and recognition of conversational speech. Comput Linguist. 2000;26(3):339–73.CrossRef

58.

Sun X, Peng X, Ding S. Emotional human machine conversation generation based on long short-term memory. In: Cognitive Computation, 2018; Springer; vol-10(3); pp 389–397.

59.

Tur G. Model adaptation for spoken language understanding. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, Pennsylvania, USA, March 18-23, 2005; vol 1, pp 41–44.

60.

Tur G, Hakkani-Tür D, Heck L, Parthasarathy S. Sentence simplification for spoken language understanding, In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 22-27, 2011, Prague Congress Center, Prague, Czech Republic; pp 5628–5631.

61.

Venkataraman A, Ferrer L, Stolcke A, Shriberg E. Training a prosody-based dialog act tagger from unlabeled data, In: Acoustics, Speech, and Signal Processing, Proceedings (ICASSP’03), IEEE International Conference on, IEEE, Hong Kong, April 6-10, 2003; vol 1, pp 272–275.

62.

Wang P, Song Q, Han H, Cheng J. Sequentially supervised long short-term memory for gesture recognition. In: Cognitive Computation, 2016; Springer; vol-8(5); pp 982–91.

63.

Wang Y, Shen Y, Jin H. A bi-model based RNN semantic frame parsing model for intent detection and slot filling, In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers), vol 2, pp 309–314.

64.

Wang Z, Lin Z. Optimal feature selection for learning-based algorithms for sentiment classification. In: Cognitive Computation, 2019; Springer; vol-12, pp 238–248.

65.

Welch BL. The generalization of student’s problem when several different population variances are involved. Biometrika. 1947;34(1/2):28–35.MathSciNetCrossRef

66.

Xing C, Wu W, Wu Y, Liu J, Huang Y, Zhou M, et al. Topic aware neural response generation. In: Proceedings of the Thirty-First (AAAI) Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA; pp 3351–3357.

67.

Xu P, Sarikaya R. Convolutional neural network based triangular CRF for joint intent detection and slot filling, In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Olomouc, Czech Republic, December 8-12, 2013, pp 78–83.

68.

Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E. Hierarchical attention networks for document classification, In: 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, pp 1480–1489.

69.

Yao K, Zweig G, Hwang MY, Shi Y, Yu D. Recurrent neural networks for language understanding, In: 14th Annual Conference of the International Speech Communication Association (Interspeech), Lyon, France, August 25-29, 2013; pp 2524–2528.

70.

Yao K, Peng B, Zhang Y, Yu D, Zweig G, Shi Y. Spoken language understanding using long short-term memory neural networks, In: IEEE Spoken Language Technology Workshop, {SLT} 2014, South Lake Tahoe, NV, USA, December 7-10, 2014; pp 189–194.

71.

Yao K, Peng B, Zweig G, Yu D, Li X, Gao F. Recurrent conditional random field for language understanding, In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, May 4-9, 2014; pp 4077–4081.

72.

Zhang X, Wang H. A joint model of intent determination and slot filling for spoken language understanding, In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, (IJCAI), New York, NY, USA, 9-15 July 2016, pp 2993-2999.

73.

Zhao L, Feng Z. Improving slot filling in spoken language understanding with joint pointer and attention, In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, {ACL} 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers}, pp 426–431.

74.

Zhou H, Huang M, Zhang T, Zhu X, Liu B. Emotional chatting machine: emotional conversation generation with internal and external memory, In: Proceedings of the Thirty-Second {AAAI} Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th {AAAI} Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018; pp 730–739.

75.

Zhou Y, Hu Q, Liu J, Jia Y. Combining heterogeneous deep neural networks with conditional random fields for Chinese dialogue act recognition. In: Neurocomputing, 2015; Vol - 168; pp 408–17.

76.

Zhu S, Yu K. Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding, In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP), New Orleans, LA, USA, March 5-9, 2017, pp 5675–5679.

Title: A Deep Multi-task Model for Dialogue Act Classification, Intent Detection and Slot Filling
Authors: Mauajama Firdaus
Hitesh Golchha
Asif Ekbal
Pushpak Bhattacharyya
Publication date: 23-03-2020
Publisher: Springer US
Published in: Cognitive Computation / Issue 3/2021
Print ISSN: 1866-9956
Electronic ISSN: 1866-9964
DOI: https://doi.org/10.1007/s12559-020-09718-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2021

Dense-CaptionNet: a Sentence Generation Architecture for Fine-grained Description of Image Semantics

Attention-Augmented Machine Memory

SOAR Improved Artificial Neural Network for Multistep Decision-making Tasks

Possibility Degree and Power Aggregation Operators of Single-Valued Trapezoidal Neutrosophic Numbers and Applications to Multi-Criteria Group Decision-Making

VTAAN: Visual Tracking with Attentive Adversarial Network

A Hybrid CNN-LSTM Model for Psychopathic Class Detection from Tweeter Users

Premium Partner