Skip to main content
Log in

Adaptation of an automotive dialogue system to users’ expertise and evaluation of the system

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Spoken dialogue systems (SDSs) can be used to operate devices, e.g. in the automotive environment. People using these systems usually have different levels of experience. However, most systems do not take this into account. In this paper, we present a method to build a dialogue system in an automotive environment that automatically adapts to the user’s experience with the system. We implemented the adaptation in a prototype and carried out exhaustive tests. Our usability tests show that adaptation increases both user performance and user satisfaction. We describe the tests that were performed, and the methods used to assess the test results. One of these methods is a modification of PARADISE, a framework for evaluating the performance of SDSs [Walker MA, Litman DJ, Kamm CA, Abella A (Comput Speech Lang 12(3):317–347, 1998)]. We discuss its drawbacks for the evaluation of SDSs like ours, the modifications we have carried out, and the test results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. For a detailed exposition of the calculation see (Hassel, 2006)

  2. Prototype: more meta-commands were available, e.g. “back”, “suggestion”, etc.; the vocabulary was changed according to observed user expectations (Hassel, 2006; Hassel & Hagen, 2005); the prompts were adapted to the user experience.

  3. κ is usually used to rate pairwise agreement among coders making category judgments, correcting for chance expected agreement (Siegel & Castellan, 1988).

  4. The z-score represents the relative position of the data value by indicating the number of standard deviations it is from the mean. A rule of thumb is that any value with a z-score less than − 3 or greater than +3 should be considered an outlier. z-Scores make it easy to control if there are outliers that would distort the comparison (Rasch, Friese, Hofmann, & Naumann, 2004).

  5. In a directed graph the edge pairs are ordered and every edge has a specified direction. In a connected graph, for every pair of nodes there exists a sequence of edges starting at one node and ending at the other.

  6. P(A) is the proportion of times the values in the dialogue AVM are correct; these values are the ones of the main diagonal. P(E) is the proportion of times chance agreement is expected to occur (Carletta, 1996).

  7. Interruptions due to traffic conditions were documented during the test and then used to adequately rectify the times.

  8. The correlation coefficient p indicates the linear association between two variables. p = 1 means that the variables are perfectly related, and p = 0 means that there is no linear relationship between the two variables.

  9. For some questions there are differences between the answers of female and male subjects. Female test subjects seem to be a bit more critical towards the system than the males. Two examples: First, only ca. 14% of the women found that the voice interface is a very useful feature in contrast to ca. 40% of the men. Second, ca. 29% of the women and ca. 7% of the men found the voice interface not useful at all. Despite this and other differences, the gender had no significant effect on the performance function.

  10. Variance is a measure for the deviation of the observed values from the expected values (Rasch et al., 2004).

  11. R 2 is the coefficient of determination. R 2 ∈[0 ... 1] is a measure for the goodness of fit of the calculated linear function. Values closer to 1 mean a better fit; values closer to 0 imply that there is no linear relationship between the dependent and independent variables. R 2 = 0.50 means that ca. 50% of the data can be predicted by the calculated function (Bühner, 2004); Rasch et al., 2004.

  12. 77% of the prototype test subjects declared that options should be prompted after every system utterance, at least at the beginning, but only 27% of the test subjects of the reference system agreed with that.

Abbreviations

SDS:

Spoken dialogue system

ASR:

Automatic speech recognition

GUI:

Graphical user interface

PTT:

Push to talk

AVM:

Attribute value matrix

OOV:

Out of vocabulary

US:

User satisfaction

References

  • Aguilera, E. J. G., Bernsen, N. O., Bescós, S. R., Dybkjær, L., Fanard, F.-X., Hernandez, P. C., Macq, B., Martin, O., Nikolakis, G., de la Orden, P. L., Paternò, F., Santoro, C., Trevisan, D., Tzovaras, D., & Vanderdonckt, J. (2004). Usability evaluation issues in natural interactive and multimodal systems— State of the art and current practice (draft version). Technical report, NISLab, University of Southern Denmark. Project SIMILAR SIG7 on Usability and Evaluation, Deliverable D16.

  • Akyol, S., Libuda, L., & Kraiss, K.-F. (2001). Multimodale Benutzung adaptiver Kfz-Bordsysteme. In T. Jürgensohn & K.-P. Timpe (Eds.), Kraftfahrzeugführung (pp. 137–154). Berlin: Springer-Verlag.

    Google Scholar 

  • Allen, J. F., & Core, M. G. (1997). Draft of DAMSL: Dialog Act Markup in Several Layers. http://www.cs.rochester.edu/research/cisd/resources/damsl.

  • Beringer, N., Kartal, U., Louka, K., Schiel, F., & Türk, U. (2002). PROMISE—A procedure for multimodal interactive system evaluation. Technical report, LMU München, Institut für Phonetik und sprachliche Kommunikation. Teilprojekt 1: Modalitätsspezifische Analysatoren, Report Nr. 23.

  • Bernsen, N. O., & Dybkjær, L. (2001). Exploring natural interaction in the car. In International workshop on information presentation and natural multimodal dialogue, Verona, Italy, pp. 75–79.

  • Bühner, M. (2004). Einführung in die Test- und Fragebogenkonstruktion. München: Pearson Studium.

    Google Scholar 

  • Carletta, J. (1996). Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2), 249–254.

    Google Scholar 

  • Clark, H. H. (1997). Using language. Cambridge, New York, Melbourne: Cambridge University Press.

    Google Scholar 

  • Cnossen, F., Meijman, T., & Rothengatter, T. (2004). Adaptive strategy changes as a function of task demands: A study of car drivers. Ergonomics, 47(2), 218–236.

    Article  Google Scholar 

  • Core, M. G., & Allen, J. F. (1997). Coding dialogs with the DAMSL annotation scheme. In AAAI Fall 1997 symposium on communicative action in humans and machines, American Association for Artificial Intelligence (AAAI) (pp. 28–35). URL: http://www.citeseer.nj.nec.com/core97coding.htm.

  • DIN EN ISO 9241-10 (1996). Ergonomische Anforderungen für Bürotätigkeiten mit Bildschirmgeräten, Teil 10: Grundsätze der Dialoggestaltung. DIN EN ISO 9241-10.

  • Edelmann, W. (1996). Lernpsychologie (5th ed.). Weinheim: Psychologie Verlagsunion.

  • Hagen, E., Said, T., & Eckert, J. (2004). Spracheingabe im neuen BMW 6er. Sonderheft ATZ/MTZ (Der neue BMW 6er), May, pp. 134–139.

  • Haller, R. (2003). The display and control concept iDrive—Quick access to all driving and comfort functions. ATZ/MTZ Extra (The New BMW 5-Series), August, pp. 51–53.

  • Hassel, L. (2006). Adaption eines Sprachbediensystems im Automobilbereich an den Erfahrungsgrad des Anwenders und Evaluation von Konzepten zur Verbesserung der Bedienbarkeit des Sprachsystems. PhD thesis, Ludwig Maximilian Universität, Abschlussarbeit für das Aufbaustudium Computerlinguistik.

  • Hassel, L., & Hagen, E. (2005). Evaluation of a dialogue system in an automotive environment. In Proceedings of the 6th SIGdial workshop on discourse and dialogue, Lisbon, Portugal, 2–3 September 2005, pp. 155–165.

  • Heisterkamp, P. (2001). Linguatronic—Product-level speech system for Mercedes-Benz cars. In Proceedings of the 1st international conference on human language technology research (HLT), San Diego, CA, USA.

  • Hjalmarsson, A. (2002). Evaluating AdApt, a multi-modal conversational, dialogue system using PARADISE. Master’s thesis, Department of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden.

  • Hof, A. (2007). Entwicklung eines adaptiven Hilfesystems für multimodale Anzeige-Bedienkonzepte im Fahrzeug. PhD thesis, Universität Regensburg, Philosophische Fakultät IV (Sprach- und Literaturwissenschaften), to appear 2007.

  • Jokinen, K., Kanto, K., Kerminen, A., & Rissanen, J. (2004). Evaluation of adaptivity and user expertise in a speech-based e-mail system. In B. Gambäck, & K. Jokinen (Eds.), Proceedings of the 20th international conference on computational linguistics (ACL): “Robust and adaptive information processing for mobile speech interfaces: DUMAS final workshop”, Geneva, Switzerland, pp. 44–52.

  • Landauer, T. K. (1997). Behavioral research methods in human–computer interaction. In M. A. Helander, T. K. Landauer, & P. V. Prabhu (Eds.), Handbook of human–computer interaction (2nd ed., pp. 203–227). North-Holland, Amsterdam, Lausanne, New York, USA: ZMMS Forschungsbericht, 96-3.

  • Larsen, L. B. (2003a). Evaluation methodologies for spoken and multi modal dialogue systems—Revision 2. May 2003 (draft version). Presented at the COST 278 MC-Meeting in Stockholm, Sweden.

  • Larsen, L. B. (2003b). Issues on the evaluation of spoken dialogue systems using objective and subjective measures. In Proceedings of the 8th IEEE workshop on automatic speech recognition and understanding (ASRU), St. Thomas, U.S. Virgin Islands, pp. 209–214.

  • Libuda, L. (2001). Improving clarification dialogs in speech command systems with the help of user modeling: A conceptualization for an in-car user interface. In Online-Proceedings des 9. GI-Workshops: ABIS-Adaptivität und Benutzermodellierung in interaktiven Softwaresystemen. GI-Fachgruppe: Adaptivität und Benutzermodellierung in Interaktiven Softwaresystemen (ABIS).

  • Mourant, R. R., Tsai, F.-J., Al-Shihabi, T., & Jaeger, B. K. (2001). Divided attention ability of young and older drivers. In Proceedings of the 80th annual meeting of the transportation research board. Available online at http://www.nrd.nhtsa.dot.gov/departments/nrd-13/driver-distraction/PDF/9.PD.

  • Nielsen, J. (1993). Usability Engineering. Boston, USA: Academic Press Professional.

  • NIST (2001). Common industry format for usability test reports. Technical report, National Institute of Standards and Technology. Version 2.0, 18 May 2001.

  • Paek, T. (2001). Empirical methods for evaluating dialog systems. In ACL 2001 workshop on evaluation methodologies for language and dialogue systems, Toulouse, France, pp. 1–9.

  • Piechulla, W., Mayserb, C., Gehrke, H., & König, W. (2003). Reducing drivers’ mental workload by means of an adaptive man–machine interface. Transportation Research Part F: Traffic Psychology and Behaviour, 6(4), 233–248.

    Article  Google Scholar 

  • Rasch, B., Friese, M., Hofmann, W., & Naumann, E. (2004). Quantitative Methoden - Band 1. Berlin, Heidelberg: Springer-Verlag.

    Google Scholar 

  • Rich, E. (1979). User modeling via stereotypes. Cognitive Science, 3, 329–354.

    Article  Google Scholar 

  • Rogers, S., Fiechter, C.-N., & Thompson, C. (2000). Adaptive user interfaces for automotive environments. In Proceedings of the IEEE intelligent vehicles (IV) symposium, Detroit, USA, pp. 662–667.

  • Schütz, W., & Schäfer, R. (2002). Towards more realistic modelling of a user’s evaluation process. In ABIS-workshop 2002: Personalization for the mobile world, 9th–11th October 2002, during a week of workshops “LLA02: Learning–teaching–adaptivity” (pp. 91–98). Hannover, Germany: Learning Lab Lower Saxony (L3S).

  • Siegel, S. & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences. Singapore: McGraw-Hill International.

    Google Scholar 

  • Walker, M. A., Litman, D. J., Kamm, C. A., & Abella, A. (1998). Evaluating spoken dialogue agents with PARADISE: Two case studies. Computer Speech and Language, 12(3), 317–347.

    Article  Google Scholar 

  • Whittaker, S., Terveen, L., & Nardi, B. A. (2000). Let’s stop pushing the envelope and start addressing it: A reference task agenda for HCI. Human Computer Interaction, 15, 75–106.

    Article  Google Scholar 

  • Wu, J. (2000). Accomodating both experts and novices in one interface. Universal Usability Guide. Department of Computer Science, University of Maryland, http://www.otal.umd.edu/UUGuide.

Download references

Acknowledgements

We thank Professor Klaus Schulz (LMU, Munich) for helpful discussions clarifying our ideas and for comments on earlier drafts. We’d also like to express our gratitude to Stefan Pöhn (Berner & Mattner) for the programming, helping to make our, often chaotic, ideas concrete. Thanks to Alexander Huber (BMW AG) for his continuing encouraging support. We are also indebted to the anonymous reviewers for their careful reading and helpful comments. And, last but not least, we thank Laura Ramirez-Polo for amending the drafts of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liza Hassel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hassel, L., Hagen, E. Adaptation of an automotive dialogue system to users’ expertise and evaluation of the system. Lang Resources & Evaluation 40, 67–85 (2006). https://doi.org/10.1007/s10579-006-9009-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-006-9009-1

Keywords

Navigation