Published in:

1992 | OriginalPaper | Chapter

Linguistic Processing in a Speech Understanding System

Authors : Egidio P. Giachin, Claudio Rullent

Published in: Speech Recognition and Understanding

Publisher: Springer Berlin Heidelberg

Included in: Professional Book Archive

Get Access

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

The goal of a speech understanding system is to correctly identify the action to be taken as a response to a user’s voiced request. To this purpose, the system has to rely on some type of linguistic knowledge beside merely recognize words. Several approaches have been proposed to employ language modeling in speech understanding. They include unified architectures integrating modular knowledge sources that account for every level of knowledge from acoustics to linguistics, and two-level architectures in which the separation between recognition and linguistic processing is well defined. Within this approach, two main methods may be conceived: linguistic constraints are integrated into the recognizer, which decodes one string of words that is treated by a natural language interface; or the recognizer produces a scored word lattice that is subsequently processed by a suitable linguistic module. For the present study, this latter approach was considered the most promising one, provided a satisfactory solution to efficient word lattice parsing could be found.Parsing a word lattice is a search activity whose space is extremely large. It may be performed in two basic modes, namely the left-to-right mode and the score-driven middle-out mode. Optimal algorithms based on the left-to-right mode induce a computation that grows polynomially with the lattice length, while those based on the middle-out mode work exponentially with length. However, it is possible to devise score-driven middle-out methods so that the amount of computation they induce depends on the average likelihood score of the word sequence they are expected to output. Hence, if these words are recognized with a good score, computation may get lower than with left-to-right methods.This paper describes in detail an algorithm that was experimentally proven to exhibit high parsing efficiency in the task it was designed for (1000-word continuous speech understanding, restricted semantic domain, and high syntactic freedom). Improved efficiency is reached through the use of heuristics which, exploiting the redundancy of the middle-out parsing approach, permit to cut down search without sensibly invalidate the optimality of the method. Problems like imperfect determination of start and ending points of words and the absence of short function words from the lattice are also kept into account.Experimental results, evaluated on lattices produced on the speaker-dependent version of the recognizer available in 1988, show that high-speed speech understanding is feasible compatibly with habitable language models (for a specific application) and reasonable accuracy of comprehension. Parsing time is about 1.8 seconds on a Sun 4 workstation and correct sentence understanding is 82% for a language model of perplexity 25.

Springer Professional