skip to main content
10.3115/976909.979652dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free Access

PARADISE: a framework for evaluating spoken dialogue agents

Published:07 July 1997Publication History

ABSTRACT

This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.

References

  1. Abella, Alicia, Michael K. Brown, and Bruce Buntschuh. 1996. Development principles for dialog-based interfaces. In ECAI-96 Spoken Dialog Processing Workshop, Budapest, Hungary. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bates, Madeleine and Damaris Ayuso. 1993. A proposal for incremental dialogue evaluation. In Proceedings of the DARPA Speech and NL Workshop, pages 319--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Carberry, S. 1989. Plan recognition and its use in understanding dialogue. In A. Kobsa and W. Wahlster, editors, User Models in Dialogue Systems. Springer Verlag, Berling, pages 133--162.Google ScholarGoogle ScholarCross RefCross Ref
  4. Carletta, Jean C. 1996. Assessing the reliability of subjective codings. Computational Linguistics, 22(2):249--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chu-Carrol, Jennifer and Sandra Carberry. 1995. Response generation in collaborative negotiation. In Proceedings of the Conference of the 33rd Annual Meeting of the Association for Computational Linguistics, pages 136--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cohen, Paul. R. 1995. Empirical Methods for Artificial Intelligence. MIT Press, Boston. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Danieli, M., W. Eckert, N. Fraser, N. Gilbert, M. Guyomard, P. Heisterkamp, M. Kharoune, J. Magadur, S. McGlashan, D. Sadek, J. Siroux, and N. Youd. 1992. Dialogue manager design evaluation. Technical Report Project Esprit 2218 SUNDIAL, WP6000-D3.Google ScholarGoogle Scholar
  8. Danieli, Morena and Elisabetta Gerbino. 1995. Metrics for evaluating dialogue strategies in a spoken language system. In Proceedings of the 1995 AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation, pages 34--39.Google ScholarGoogle Scholar
  9. Doyle, Jon. 1992. Rationality and its roles in reasoning. Computational Intelligence, 8(2):376--409.Google ScholarGoogle ScholarCross RefCross Ref
  10. Fraser, Norman M. 1995. Quality standards for spoken dialogue systems: a report on progress in EAGLES. In ESCA Workshop on Spoken Dialogue Systems Vigso, Denmark, pages 157--160.Google ScholarGoogle Scholar
  11. Gale, William, Ken W. Church, and David Yarowsky. 1992. Estimating upper and lower bounds on the performance of word-sense disambiguation programs. In Proc. of 30th ACL, pages 249--256, Newark, Delaware. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Grosz, Barbara J. and Candace L. Sidner. 1986. Attentions, intentions and the structure of discourse. Computational Linguistics, 12:175--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hirschberg, Julia and Christine Nakatani. 1996. A prosodic analysis of discourse segments in direction-giving monologues. In 34th Annual Meeting of the Association for Computational Linguistics, pages 286--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hirschman, Lynette, Deborah A. Dahl, Donald P. McKay, Lewis M. Norton, and Marcia C. Linebarger. 1990. Beyond class A: A proposal for automatic evaluation of discourse. In Proceedings of the Speech and Natural Language Workshop, pages 109--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hirschman, Lynette and Christine Pao. 1993. The cost of errors in a spoken language system. In Proceedings of the Third European Conference on Speech Communication and Technology, pages 1419--1422.Google ScholarGoogle Scholar
  16. Joshi, Aravind K., Bonnie L. Webber, and Ralph M. Weischedel. 1984. Preventing false inferences. In COLING84: Proc. 10th International Conference on Computational Linguistics., pages 134--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kamm, Candace. 1995. User interfaces for voice applications. In David Roe and Jay Wilpon, editors, Voice Communication between Humans and Machines. National Academy Press, pages 422--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Keeney, Ralph and Howard Raiffa. 1976. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. John Wiley and Sons.Google ScholarGoogle Scholar
  19. Krippendorf, Klaus. 1980. Content Analysis: An Introduction to its Methodology. Sage Publications, Beverly Hills, Ca.Google ScholarGoogle Scholar
  20. Litman, Diane and James Allen. 1990. Recognizing and relating discourse intentions and task-oriented plans. In Philip Cohen, Jerry Morgan, and Martha Pollack, editors, Intentions in Communication. MIT Press.Google ScholarGoogle Scholar
  21. Passonneau, Rebecca J. and Diane Litman. 1997. Discourse segmentation by human and automated means. Computational Linguistics, 23(1). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Polifroni, Joseph, Lynette Hirschman, Stephanie Seneff, and Victor Zue. 1992. Experiments in evaluating interactive spoken language systems. In Proceedings of the DARPA Speech and NL Workshop, pages 28--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Pollack, Martha, Julia Hirschberg, and Bonnie Webber. 1982. User participation in the reasoning process of expert systems. In Proceedings First National Conference on Artificial Intelligence, pages pp. 358--361.Google ScholarGoogle Scholar
  24. Shriberg, Elizabeth, Elizabeth Wade, and Patti Price. 1992. Human-machine problem solving using spoken language systems (SLS): Factors affecting performance and user satisfaction. In Proceedings of the DARPA Speech and NL Workshop, pages 49--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Siegel, Sidney and N.J. Castellan. 1988. Nonparametric Statistics for the Behavioral Sciences. McGraw Hill.Google ScholarGoogle Scholar
  26. Simpson, A. and N. A. Fraser. 1993. Black box and glass box evaluation of the SUNDIAL system. In Proceedings of the Third European Conference on Speech Communication and Technology, pages 1423--1426.Google ScholarGoogle Scholar
  27. Smith, Ronnie W. and Steven A. Gordon. 1997. Effects of variable initiative on linguistic behavior in human-computer spoken natural language dialog. Computational Linguistics, 23(1). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sparck-Jones, Karen and Julia R. Galliers. 1996. Evaluating Natural Language Processing Systems. Springer.Google ScholarGoogle Scholar
  29. Walker, Marilyn A. 1996. The Effect of Resource Limits and Task Complexity on Collaborative Planning in Dialogue. Artificial Intelligence Journal, 85(1--2):181--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Webber, Bonnie and Aravind Joshi. 1982. Taking the initiative in natural language database interaction: Justifying why. In Coling 82, pages 413--419. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. PARADISE: a framework for evaluating spoken dialogue agents

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image DL Hosted proceedings
      ACL '98/EACL '98: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
      July 1997
      543 pages

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      • Published: 7 July 1997

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate85of443submissions,19%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader