Weitere Artikel dieser Ausgabe durch Wischen aufrufen
This article is concerned with optimizing human-machine turn-taking. In particular, the article covers an in-depth analysis of the timings when users respond to system query in spoken dialog systems. The goal of this work is to obtain a broad understanding of such timing patterns independent of dialog system type and dialog state context. Therefore, the analysis was based on a large volume of data both from a number of deployed spoken dialog system and an experimental study. The data from the experimental study showed that too short timeout settings can cause the system to interrupt a user and thus cause turn-taking problems. Next, the response timing patterns both during a system prompt as well as after prompt-end were analyzed for a number of different question types. It is shown that user responses while the system is playing a prompt (aka ‘barge-in’) tend to occur in the range of 10–25 % of all user responses, where the exact percentage of barge-in is context-dependent. It was also found that the timing of user responses after a system finishes speaking always follows the same uni-modal pattern independent of system domain and question type. This pattern can be modeled with a rational distribution. Based on these findings, a probabilistic response time model is presented, that allows calculating the likelihood of a user response at any time in a system. This response timing model can be used for multiple purposes, among them timeout setting optimization.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
Balentine, B. (2010). “Be a good Machine” Blog, http://www.eiginc.com/cms/en/bagm-blog/121-yanking-back-the-turn.
Bull, M. & Aylett, M. (1998). An analysis of the timing of Turn-Taking in a corpus of Goal-Oriented dialogues, ICSLP 1998.
Chiba, Y. & Ito, A. (2012). Estimating a user’s internal state before the first input utterance. Advances in Human-Computer Interaction, Article No. 11, doi: 10.1155/2012/865362.
Cohen, M. H., Giangola, J. P., & Balogh, J. (2004). Voice user interface design (p. 213). Boston: Addison-Wesley.
Commarford P.M., & Lewis J.R. (2005). Optimizing the pause length before presentation of global navigation commands, Proceedings of HCI 2005, vol 2, p. 1–7, St Louis, USA.
Duncan, S. (1972). Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology, 23, 283. CrossRef
Edlund, J., & Heldner, M. (2005). Exploring prosody in interaction control. Phonetica, 62, 215–226.
Edlund, J., Heldner, M., & Gustafson, J. (2005). Utterance segmentation and turn-taking in spoken dialogue systems. In B. Fisseni, H.-C. Schmitz, B. Schröder, & P. Wagner (Eds.), Computer studies in language and speech. Frankfurt am Main: Peter Lang.
Gravano A. & Hirschberg J. (2010). Turn-taking cues in task-oriented dialogue. Journal of Computer Speech & Language, 25(3), 601–634.
Heins, R., Franzke, M., Durian, M., & Bayya, A. (1997). Turn-taking as a design principle for barge-in in spoken language systems. International Journal of Speech Technology, 2(2), 155–164. CrossRef
Ivaldi, S., Anzalone, S., Rousseau, W., Sigaud, O., & Chetouani, M. (2014). Robot initiative increases the rhythm of interaction in a team learning task. Proceedings of Timing in Human-Robot Interaction, Workshop of the 9th ACM/IEEE International Conference on Human-robot Interaction - HRI. Pages 1–4.
Kitaoka, N., Takeuchi, M., Nishimura, R., & Nakagawa, S. (2005). Response timing detection using prosodic and linguistic information for human-friendly spoken dialog systems. Transactions of the Japanese Society for Artificial Intelligence, 20(3), 220–228. CrossRef
Levow, G.-A. (1997). Making sense of silence in speech user interfaces, In CHI’97 workshop: Speech user interface design challenges (pp. 22–27). Atlana, GA.
Marguilies, E. (2004). Adventures in Turn-Taking. Notes on success and failure in Turn Cue Coupling. Las Vegas: Sterling Audits and Consulting Inc.
Matsuyama K., Komatani K., Ogata T. & Okuno, H. G. (2009). Enabling a user to specify an item at any time during system enumeration - Item identification for Barge-In-Able conversational dialogue systems, Proceedings of Interspeech, 252–255.
Matsuyama K., Komatani K., Takeda R., Takahashi T., Ogata T. & Okuno, H. G. (2010). Analyzing user utterances in Barge-in-able spoken dialogue system for improving identification accuracy, Proceedings of Interspeech, 3050–3053.
Olvera, E. (2007). Interface design lessons from the world around us, http://www.vuidesign.net/top-5-vui-dialog-design-guidelines-for-handling-errors-3-of-5.htm.
Press, W.H., Teukolsky, S.A., Vetterling, W.T., & Flannery, B.P. (2007), ”Section 3.4. Rational function interpolation and extrapolation”, Numerical recipes: The art of scientific computing (3rd ed.). New York: Cambridge University Press, ISBN 978-0-521-88068-8.
Raux, A., & Eskenazi, M. (2008). Optimizing end pointing thresholds using dialogue features in a spoken system. SigDial.
Raux, A., & Eskenazi, M. (2010). In Workshop on Modeling Human Communication Dynamics at NIPS: Optimizing end-of-turn detection for spoken dialog systems 2010.
Schlangen, D. (2006). From reaction to prediction: Experiments with computational models of Turn-Taking. Interspeech, Pittsburgh, USA.
Sidner, L. C., Lee, C., Kidd, C. D., Lesh, N., & Rich, C. (2005). Explorations in engagement for humans and robots. Artificial Intelligence, 166(1–2), 140–164. CrossRef
Suendermann, D., Liscombe, J., Dayanidhi & Pierraccini, R. (2009). “A handsome set of metrics to measure utterance classification performance in spoken dialog systems”, SigDial 2009, London.
Sugiyama, T., Komatani, K., & Sato, S. (2014). Evaluating model that predicts when people will speak to Humanoid robot and handling variations by individuality and instruction. International Workshop on spoken dialog systems, Napa, USA.
Williams, D. & Cheepen, C. (1998). ’The sound of silence’: A preliminary experiment investigating non-verbal auditory representations in telephone-based automated spoken dialogues. Proceedings of the 1998 international conference on Auditory Display (ICAD’98). British Computer Society, Swinton, UK, 32–32.
Witt S.M., Loose R., Zuber E., Brooks T., Hubbell J., & Master, A. (2010a). A study of user response intervals in spoken dialog systems, IEEE spoken language technology Workshop (SLT 2010), Berkeley, USA.
Witt S.M., Loose R., Rolandi W., Master A., Zuber E., & Brooks, T. (2010b). Optimizing successful Turn-taking in spoken dialog systems, HFE 2010, San Francisco, USA.
- Modeling user response timings in spoken dialog systems
- Springer US