2010 | OriginalPaper | Chapter
Expansion of WFST-Based Dialog Management for Handling Multiple ASR Hypotheses
Authors : Naoto Kimura, Chiori Hori, Teruhisa Misu, Kiyonori Ohtake, Hisashi Kawai, Satoshi Nakamura
Published in: Spoken Dialogue Systems for Ambient Environments
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
We proposed a weighted finite-state transducer-based dialog manager (WFSTDM) which was a platform for expandable and adaptable dialog systems. In this platform, all rules and/or models for dialog management (DM) are expressed in WFST form, and the WFSTs are used to accomplish various tasks via multiple modalities. With this framework, we constructed a statistical dialog system using the user concept and system action tags which were acquired from an annotated corpus of human-to-human spoken dialogs as input and output labels of the WFST. We introduced a
spoken language understanding (SLU) WFST
for converting user utterances to user concept tags, a
dialog scenario WFST
forconverting user concept tags to system action tags and a s
entence generation (SG) WFST
for converging system action tags to system utterances. The tag sequence probabilities of the
dialog scenario WFST
were estimated by using a spoken dialog corpus for hotel reservation. The
SLU, scenario and SG WFSTs
were then composed to be a
dialog management WFST
which determines the next action of the system responding to the user input. In our previous research, we evaluated the dialog strategy by referring to the manual transcription. Then in this paper, we present the performance of WFSTDM when speech recognition hypotheses are input. To alleviate degradation of the DM performance caused by speech recognition errors, we expand the WFSTDM for handling multiple hypotheses of speech recognition and confidence score which indicate acoustic and linguistic reliability of speech recognition. We also evaluated the accuracy of SLU results and the correctness of system actions selected by the
dialog management WFST
. We confirmed that the performance of dialog management was enhanced by choosing the optimal action among all the WFST paths for multiple hypotheses (N-best) of speech recognition in consideration of confidence score.