nach oben

Erschienen in:

2012 | OriginalPaper | Buchkapitel

7. Metrics and Evaluation of Spoken Dialogue Systems

verfasst von : Helen Hastie

Erschienen in: Data-Driven Methods for Adaptive Spoken Dialogue Systems

Verlag: Springer New York

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The ultimate goal of an evaluation framework is to determine a dialogue system’s performance, which can be defined as “the ability of a system to provide the function it has been designed for” [32]. Also important, particularly for industrial systems, is dialogue quality or usability. To measure usability, one can use subjective measures such as User Satisfaction or likelihood of future use. These subjective metrics are difficult to measure and are dependent on the context and the individual user, whose goal and values may differ from other users. This chapter will survey evaluation frameworks and discuss their advantages and disadvantages. We will examine metrics for evaluating system performance and dialogue quality. We will also discuss evaluation techniques that can be used to automatically detect problems in the dialogue, thus filtering out good dialogues and leaving poor dialogues for further evaluation and investigation [62].

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Statistical Approaches to Adaptive Natural Language Generation

Nächstes Kapitel Data-Driven Methods in Industrial Spoken Dialog Systems

http://www.classic-project.org

http://www.parlance-project.eu

Ai, H., Litman, D.: Assessing dialog system user simulation evaluation measures using human judges. In: Proceedings of ACL, Columbus, Ohio (USA), pp. 622–629 (2008)

Araki, M., Doshita, S.: Automatic evaluation environment for spoken dialogue systems. In: ECAI Workshop on Dialogue Processing in Spoken Language Systems’96, pp. 183–194 (1996)

Balentine, B., Morgan, D.P.: How to Build a Speech Recognition Application: A Style Guide for Telephony Dialogues. Enterprise Integration Group (2002)

Black, A.W., Burger, S., Conkie, A., Hastie, H., Keizer, S., Lemon, O., Merigaud, N., Gabriel Parent, G., Schubiner, G., Thomson, B., Williams, J.D., Yu, K., Young, S., Eskenazi, M.: Spoken dialog challenge 2010: Comparison of live and control test results. In: Proceedings of the SIGdial (2011)

Bonneau-Maynard, H., Devillers, L., Rosse, S.: Predictive performance of dialog systems. In: Proceedings of the Language Resources and Evaluation Conference (LREC) (2000)

Cohen, M.H., Giangola, J.P., Balogh, J.: Voice User Interface Design. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA (2004)

Cuayhuitl, H., Renals, S., Lemon, O., Shimodaira, H.: Human-computer dialogue simulation using hidden markov models. In: Proceedings of ASRU, pp. 290–295 (2005)

Danieli, M., Gerbino, E., Metrics for evaluating dialogue strategies in a spoken language system. CoRR (1996)

Devillers, L., Bonneau-maynard, H.: Evaluation of dialog strategies for a tourist information retrieval system. In: Proceedings of ICSLP, pp. 1187–1190 (1998)

10.

Eckert, W., Levin, E., Pieraccini, R.: User modelling for spoken dialogue system evaluation. In: Proceedings of ASRU, pp. 80–87 (1997)

11.

Engelbrecht, K.P.: Gödde, F., Hartard, F., Ketabdar, H., Möller, S., Modeling user satisfaction with hidden markov model. In: Proceedings of SIGdial (2009)

12.

Engelbrecht, K.P., Quade, M., Möller, S.: Analysis of a new simulation approach to dialog system evaluation. Speech Commun. 51, 1234–1252 (2009)CrossRef

13.

Frostad, K.: Best practices in designing speech interfaces. (2004) http://msdn.microsoft.com/en-us/library/ms994646.aspx

14.

Georgila, K., Henderson, J., Lemon, O.: User Simulation for Spoken Dialogue Systems: Learning and Evaluation. In: Proceedings of Interspeech (2006)

15.

Gorin, A.L., Riccardi, G., Wright, J.H.: How may I help you? Speech Commun. 23, 113–127 (1997)CrossRef

16.

Grice, H.P.: Logic and conversation. Syntax Semant. Vol 3. Speech Acts, 3 41–58 (1975)

17.

Hartikainen, M., Salonen, E.P., Markku Turunen, M.: Subjective evaluation of spoken dialogue systems using SERVQUAL method. In: Proceedings of Interspeech (2004)

18.

Henderson, J., Lemon, O., Georgila, K.: Hybrid reinforcement/supervised learning for dialogue policies from communicator data. In: Proceedings of the IJCAI workshop on Knowledge and Reasoning in Practical Dialogue Systems (2005)

19.

Hirschman, L., Pao, C.: The cost of errors in a spoken language system. In: Proceedings of Eurospeech’93 (1993)

20.

Hone, K.S. Graham, R.: Towards a tool for the subjective assessment of speech system interfaces (SASSI). Nat. Lang. Eng. 6, 303–387 (2000)CrossRef

21.

ITU-T Supplement 24. Parameters describing the interaction with spoken dialogue systems. Technical report, Internationals Telecommuncation Union (2005)

22.

ITU-T Rec. P851. 2003. Subjective quality evaluation of telephone services based on spoken dialogue systems. Technical report, Internationals Telecommuncation Union (2003)

23.

Janarthanam, S., Lemon, O.: Learning to adapt to unknown users: referring expression generation in spoken dialogue systems. In: Proceedings of ACL ’10 (2010)

24.

Janarthanam, S., Lemon, O.: A Two-tier User Simulation Model for Reinforcement Learning of Adaptive Referring Expression Generation Policies. In: Proceedings of SIGdial (2009)

25.

Kamm, C.: User Interfaces for voice applications, pp. 422–442. National Academy Press, Washington, DC, USA (1994)

26.

Keeney, R.L., Raiffa, H.: Decisions with multiple objectives: Preferences and value tradeoffs. John Wiley and Sons, New York (1976)

27.

Lamel, L., Rosset, S., Gauvain, J.L.: Considerations in the design and evaluation of spoken language dialog systems. In: Proceedings of ICSLP (2000)

28.

Levin, E., Pieraccini, R., Eckert, W.: A stochastic model of human-machine interaction for learning dialog strategies. IEEE Trans. Speech. Audio. Process. 8(1), 11–23 (2000)CrossRef

29.

Lin, B.S., Lee, L.S.: Computer-aided analysis and design for spoken dialogue systems based on quantitative simulations. IEEE Trans. Speech. Audio. Process. 9(5), 534–548 (2001)CrossRef

30.

López-Cózar, R., Callejas, Z., McTear, M.F.: Testing the performance of spoken dialogue systems by means of an artificially simulated user. Artif. Intell. Rev. 26(4), 291–323 (2006)CrossRef

31.

Möller, S., Englert, R., Engelbrecht, K., Hafner, V., Anthony Jameson, A., Antti Oulasvirta, A., Raake, E.R., Reithinger, N.: Memo: Towards automatic usability evaluation of spoken dialogue services by user error simulations (2006)

32.

Möller, S.: Quality of Telephone-Based Spoken Dialogue Systems. Springer (2005)

33.

Möller, S., Ward, N.G.: A framework for model-based evaluation of spoken dialog systems. In: Proceedings of SIGdial (2008)

34.

Paek, T., Empirical methods for evaluating dialog systems. In: Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16. Association for Computational Linguistics. (2001)

35.

Paek, T.: Toward evaluation that leads to best practices: reconciling dialog evaluation in research and industry. In: Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, pp. 40–47, Association for Computational Linguistics (2007)

36.

Pieraccini, R., Huerta, J.: Where do we go from here? research and commercial spoken dialog systems. In: Proceedings of 6th SIGdial Workshop on Discourse and Dialog, (2005)

37.

Pietquin, O.: A framework for unsupervised learning of dialogue strategies. Presses univ. de Louvain (2004)

38.

Pietquin, O., Hastie, H.: A survey on metrics for the evaluation of user simulations. Knowledge Engineering Review, 2013. Accepted for Publication.

39.

Putois, G., Young, S., Henderson, J., Lemon, O., Rieser, V., Liu, X., Bretier, P., Laroche, R.: Initial communication architecture and module interface definitions. Technical report, Classic Deliverable D5.1.1 (2008)

40.

Rahim, M., Fabbrizio, G.D., Kamm, C., Walker, M., Pokrovsky, A., Ruscitti, P., Levin, E., Lee, S., Syrdal, A., Schlosser, K.: Voice-if: A mixed-initiative spoken dialogue system for. In: Proceedings of Eurospeech (2001)

41.

Rieser, V., Lemon, O.: Simulations for learning dialogue strategies. In: Proceedings of Interspeech, Pittsburg (USA) (2006)

42.

Rieser, V., Lemon, O.: Reinforcement Learning for Adaptive Dialogue Systems: A Data-driven Methodology for Dialogue Management and Natural Language Generation. Spinger (2011)

43.

Rieser, V., Lemon, O.: Automatic learning and evaluation of user-centered objective functions for dialogue system optimisation. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC) (2008)

44.

Rieser, V., Lemon, O.: Learning effective multimodal dialogue strategies from wizard-of-oz data: bootstrapping and evaluation (2008)

45.

Schatzmann, J., Georgila, K., Young, S.: Quantitative evaluation of user simulation techniques for spoken dialogue systems. In: Proceedings of SIGdial’05 (2005)

46.

Scheffler, T., Roller, R., Reithinger, N.: SpeechEval – evaluating spoken dialog systems by user simulation. In: Proceedings of the 6th IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems, Pasadena, CA, USA, pp. 93–98 (2009)

47.

Schmitt, A., Schatz, B., Minker, W.: Modeling and Predicting Quality in Spoken Human-Computer Interaction. In: Proceedings of SIGdial (2011)

48.

Shriberg, E., Wade, E., Price, P.: Human-machine problem solving using spoken language systems (SLS): factors affecting performance and user satisfaction. In: HLT ’91: Proceedings of the workshop on Speech and Natural Language, pp. 49–54. Association for Computational Linguistics (1992)

49.

Suendermann, D., Evanini, K., Liscombe, J., Hunter., P, Dayanidhi, K., Pieraccini, R., From Rule-Based to Statistical Grammars: Continuous Improvement of Large-Scale Spoken Dialog Systems, Proceedings of the 2009 IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP 2009), Taipei, Taiwan, April 19–24 (2009)

50.

Suendermann, D., Liscombe, J., Pieraccini, R.: Contender. In: Proceedings of the SLT 2010 IEEE Workshop on Spoken Language Technology (2010)

51.

Suendermann, D., Liscombe, J., Krishna Dayanidhi, K., Roberto Pieraccini, R.: A handsome set of metrics to measure utterance classification performance in spoken dialog systems. In: Proceedings of SIGdial pp. 349–356 (2009)

52.

Walker, M.A., Langkilde-Geary, I., Wright-Hastie, H., Wright, J., Gorin, A.: Automatically training a problematic dialogue predictor for a spoken dialogue system. J. Artif. Intell. Res. 16, 293–319 (2002)MATH

53.

Walker, M., Rudnicky, A., Aberdeen, J., Owen Bratt, E., Garofolo, J., Hastie, H., Le, A., Pellom, B., Potamianos, A., Passonneau, R., Prasad, R., Roukos, S., Greg, S., Stallard, S.D.: Darpa communicator evaluation: Progress from 2000 to 2001. In: Proceedings of ICSLP 02, pp. 273–276 (2002)

54.

Walker, M.A., Passonneau, R., Boland. J.E.: Quantitative and qualitative evaluation of DARPA communicator spoken dialogue systems. In: Proceedings of ACL (2001)

55.

Walker, M.A., Aberdeen, J., Boland, J., Bratt, E., Garofolo, J., Hirschman, L., Le, A., Lee, S., Narayanan, S., Papineni, K., Pellom, B., Polifroni, J., Potamianos, A., Prabhu, P., Rudnicky, A., Sanders, G., Seneff, S., Stallard, D., Whittaker, S.: Darpa communicator dialog travel planning systems: The june 2000 data collection. In: Proceedings of Eurospeech (2001)

56.

Walker, M.A., Rudnicky, A., Aberdeen, J., Bratt, E., Garofolo, J., Hastie, H., Le, A., Pellom, B., Potamianos, A., Passonneau, R., Prasad, R., Roukos, S., Sanders, G., Seneff, S., Stallard, D.: Darpa communicator: Cross-system results for the 2001 evaluation. In: Proceedings of ICSLP (2002)

57.

Walker, M.A., Kamm, C.A., Litman, D.J.: Towards Developing General Models of Usability with PARADISE. Nat. Lang. Eng., 6(3), 363–377 (2000)CrossRef

58.

Walker, M., Passoneau, R.: DATE: A dialogue act tagging scheme for evaluation. In: Proceedings of the Human Language Technology Conference (HLT) (2001)

59.

Walker, M.A.: An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email. J. Artif. Intell. Res. 12, 387–416 (2000)MATH

60.

Walker, M.A.: Can we talk? methods for evaluation and training of spoken dialogue systems. Lang. Resour. Evaluation 39(1), 65–75 (2005)CrossRef

61.

Walker, X., Boland, J., Kamm, C.: The utility of elapsed time as a usability metric for spoken dialogue systems. In: Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU99) (1999)

62.

Wright-Hastie, H., Prasad, R., Walker, M.: What’s the trouble: Automatically identifying problematic dialogues in. In: Proceedings of ACL, pp. 384–391 (2002)

63.

Young, S., Gasic, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., Yu, J.: The hidden information state model: a practical framework for POMDP-based spoken dialogue management. Computer Speech and Language 24(2), 150–174 (2010)CrossRef

Titel: Metrics and Evaluation of Spoken Dialogue Systems
verfasst von: Helen Hastie
Verlag: Springer New York
Buch: Data-Driven Methods for Adaptive Spoken Dialogue Systems
Print ISBN: 978-1-4614-4802-0

Electronic ISBN: 978-1-4614-4803-7

Copyright-Jahr: 2012
DOI: https://doi.org/10.1007/978-1-4614-4803-7_7

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Buchstaben, die aus einem Megaphon kommen/© MicroStockHub/Getty Images/iStock, Digitale Lieferkette/© zapp2photo / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.