Article

Free Access

PARADISE: a framework for evaluating spoken dialogue agents

Authors:
Marilyn A. Walker

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

,
Diane J. Litman

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

,
Candace A. Kamm

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

,
Alicia Abella

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

ACL '98/EACL '98: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational LinguisticsJuly 1997Pages 271–280https://doi.org/10.3115/976909.979652

Published:07 July 1997Publication History

ACL '98/EACL '98: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics

Pages 271–280

ABSTRACT

This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.

References

Abella, Alicia, Michael K. Brown, and Bruce Buntschuh. 1996. Development principles for dialog-based interfaces. In ECAI-96 Spoken Dialog Processing Workshop, Budapest, Hungary. Google ScholarDigital Library
Bates, Madeleine and Damaris Ayuso. 1993. A proposal for incremental dialogue evaluation. In Proceedings of the DARPA Speech and NL Workshop, pages 319--322. Google ScholarDigital Library
Carberry, S. 1989. Plan recognition and its use in understanding dialogue. In A. Kobsa and W. Wahlster, editors, User Models in Dialogue Systems. Springer Verlag, Berling, pages 133--162.Google ScholarCross Ref
Carletta, Jean C. 1996. Assessing the reliability of subjective codings. Computational Linguistics, 22(2):249--254. Google ScholarDigital Library
Chu-Carrol, Jennifer and Sandra Carberry. 1995. Response generation in collaborative negotiation. In Proceedings of the Conference of the 33rd Annual Meeting of the Association for Computational Linguistics, pages 136--143. Google ScholarDigital Library
Cohen, Paul. R. 1995. Empirical Methods for Artificial Intelligence. MIT Press, Boston. Google ScholarDigital Library
Danieli, M., W. Eckert, N. Fraser, N. Gilbert, M. Guyomard, P. Heisterkamp, M. Kharoune, J. Magadur, S. McGlashan, D. Sadek, J. Siroux, and N. Youd. 1992. Dialogue manager design evaluation. Technical Report Project Esprit 2218 SUNDIAL, WP6000-D3.Google Scholar
Danieli, Morena and Elisabetta Gerbino. 1995. Metrics for evaluating dialogue strategies in a spoken language system. In Proceedings of the 1995 AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation, pages 34--39.Google Scholar
Doyle, Jon. 1992. Rationality and its roles in reasoning. Computational Intelligence, 8(2):376--409.Google ScholarCross Ref
Fraser, Norman M. 1995. Quality standards for spoken dialogue systems: a report on progress in EAGLES. In ESCA Workshop on Spoken Dialogue Systems Vigso, Denmark, pages 157--160.Google Scholar
Gale, William, Ken W. Church, and David Yarowsky. 1992. Estimating upper and lower bounds on the performance of word-sense disambiguation programs. In Proc. of 30th ACL, pages 249--256, Newark, Delaware. Google ScholarDigital Library
Grosz, Barbara J. and Candace L. Sidner. 1986. Attentions, intentions and the structure of discourse. Computational Linguistics, 12:175--204. Google ScholarDigital Library
Hirschberg, Julia and Christine Nakatani. 1996. A prosodic analysis of discourse segments in direction-giving monologues. In 34th Annual Meeting of the Association for Computational Linguistics, pages 286--293. Google ScholarDigital Library
Hirschman, Lynette, Deborah A. Dahl, Donald P. McKay, Lewis M. Norton, and Marcia C. Linebarger. 1990. Beyond class A: A proposal for automatic evaluation of discourse. In Proceedings of the Speech and Natural Language Workshop, pages 109--113. Google ScholarDigital Library
Hirschman, Lynette and Christine Pao. 1993. The cost of errors in a spoken language system. In Proceedings of the Third European Conference on Speech Communication and Technology, pages 1419--1422.Google Scholar
Joshi, Aravind K., Bonnie L. Webber, and Ralph M. Weischedel. 1984. Preventing false inferences. In COLING84: Proc. 10th International Conference on Computational Linguistics., pages 134--138. Google ScholarDigital Library
Kamm, Candace. 1995. User interfaces for voice applications. In David Roe and Jay Wilpon, editors, Voice Communication between Humans and Machines. National Academy Press, pages 422--442. Google ScholarDigital Library
Keeney, Ralph and Howard Raiffa. 1976. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. John Wiley and Sons.Google Scholar
Krippendorf, Klaus. 1980. Content Analysis: An Introduction to its Methodology. Sage Publications, Beverly Hills, Ca.Google Scholar
Litman, Diane and James Allen. 1990. Recognizing and relating discourse intentions and task-oriented plans. In Philip Cohen, Jerry Morgan, and Martha Pollack, editors, Intentions in Communication. MIT Press.Google Scholar
Passonneau, Rebecca J. and Diane Litman. 1997. Discourse segmentation by human and automated means. Computational Linguistics, 23(1). Google ScholarDigital Library
Polifroni, Joseph, Lynette Hirschman, Stephanie Seneff, and Victor Zue. 1992. Experiments in evaluating interactive spoken language systems. In Proceedings of the DARPA Speech and NL Workshop, pages 28--33. Google ScholarDigital Library
Pollack, Martha, Julia Hirschberg, and Bonnie Webber. 1982. User participation in the reasoning process of expert systems. In Proceedings First National Conference on Artificial Intelligence, pages pp. 358--361.Google Scholar
Shriberg, Elizabeth, Elizabeth Wade, and Patti Price. 1992. Human-machine problem solving using spoken language systems (SLS): Factors affecting performance and user satisfaction. In Proceedings of the DARPA Speech and NL Workshop, pages 49--54. Google ScholarDigital Library
Siegel, Sidney and N.J. Castellan. 1988. Nonparametric Statistics for the Behavioral Sciences. McGraw Hill.Google Scholar
Simpson, A. and N. A. Fraser. 1993. Black box and glass box evaluation of the SUNDIAL system. In Proceedings of the Third European Conference on Speech Communication and Technology, pages 1423--1426.Google Scholar
Smith, Ronnie W. and Steven A. Gordon. 1997. Effects of variable initiative on linguistic behavior in human-computer spoken natural language dialog. Computational Linguistics, 23(1). Google ScholarDigital Library
Sparck-Jones, Karen and Julia R. Galliers. 1996. Evaluating Natural Language Processing Systems. Springer.Google Scholar
Walker, Marilyn A. 1996. The Effect of Resource Limits and Task Complexity on Collaborative Planning in Dialogue. Artificial Intelligence Journal, 85(1--2):181--243. Google ScholarDigital Library
Webber, Bonnie and Aravind Joshi. 1982. Taking the initiative in natural language database interaction: Justifying why. In Coling 82, pages 413--419. Google ScholarDigital Library

PARADISE: a framework for evaluating spoken dialogue agents
1. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

PARADISE-style evaluation of a human-human library corpus
SIGDIAL '11: Proceedings of the SIGDIAL 2011 Conference

We apply a PARADISE-style evaluation to a human-human dialogue corpus that was collected to support the design of a spoken dialogue system for library transactions. The book request dialogue task we investigate is informational in nature: a book request ...
Read More
Towards developing general models of usability with PARADISE

The design of methods for performance evaluation is a major open research issue in the area of spoken language dialogue systems. This paper presents the PARADISE methodology for developing predictive models of spoken dialogue performance, and shows how ...
Read More
Paradise: a framework for evaluating spoken dialogue agents
Readings in intelligent user interfaces
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '98/EACL '98: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
July 1997
543 pages
Program Chairs:
Philip R. Cohen
Oregon Graduate Institute
,
Wolfgang Wahlster
DFKI Saarbrücken, Germany
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 7 July 1997
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 120
  Total Citations
  View Citations
- 1,724
  Total Downloads
- Downloads (Last 12 months)192
- Downloads (Last 6 weeks)41
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PARADISE: a framework for evaluating spoken dialogue agents

ACL '98/EACL '98: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

PARADISE-style evaluation of a human-human library corpus

Towards developing general models of usability with PARADISE

Paradise: a framework for evaluating spoken dialogue agents

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

PARADISE: a framework for evaluating spoken dialogue agents

ACL '98/EACL '98: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

PARADISE-style evaluation of a human-human library corpus

Towards developing general models of usability with PARADISE

Paradise: a framework for evaluating spoken dialogue agents

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media