nach oben

Pattern Analysis and Applications

Erschienen in:

01.08.2015 | Theoretical Advances

Integration of complex language models in ASR and LU systems

verfasst von: Raquel Justo, M. Inés Torres

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Throughout this work, we explore different methods to integrate a complex Language Model (a hierarchical Language Model based on classes of phrases) into an automatic speech recognition (ASR) system. First of all, an integrated architecture is considered, where the integration is carried out via the composition of the different Stochastic Finite-State Automata associated with the specific Language Model (LM). On the other hand, a decoupled architecture with a two-pass decoder is employed, where the complex LM is used to reorder the N-best list. The formal definition of both methods is provided in this work, thus enabling the theoretical comparison between them. Additionally, different experiments were carried out to compare empirically the proposed approaches. The results show that although the hierarchical LMs outperform a baseline word-based LM in both cases, the integrated architecture can provide better ASR system performance. However, the decoupled architecture could be more versatile due to the two-pass strategy, allowing the integration of different models using a standard decoder. Additionally, the use of this kind of complex LMs can also be extended to other NLP applications, such as language understanding, by employing the proposed architectures.

Vorheriger Artikel Genetic algorithm with aggressive mutation for feature selection in BCI feature space

Nächster Artikel Omnifont text recognition of printed cursive scripts via HMMs, compact lossless features, and soft data clustering

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

English: “With traditional hot plus fan function for 3 h and 20 min”.

Bangalore S, Johnston M (2004) Balancing data-driven and rule-based approaches in the context of a multimodal conversational system. In: Susan Dumais DM, Roukos S (eds) HLT-NAACL 2004: Main Proceedings. Association for Computational Linguistics, Boston, pp 33–40

Benedí J, Lleida E, Varona A, Castro M, Galiano I, Justo R, López I, Miguel A (2006) Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proceedings of LREC’06. Genoa

Benedí JM, Sánchez JA (2005) Estimation of stochastic context-free grammars and their use as language models. Comput Speech Lang 19(3):249–274CrossRef

Bilmes JA, Kirchhoff K (2003) Factored language models and generalized parallel backoff. In: Hearst M, Ostendorf M (eds) HLT-NAACL 2003: Short Papers. Association for Computational Linguistics, Edmonton, pp 4–6

Bougares F, Rouvier M, Estve Y, Linars G (2012) Low latency combination of parallelized single-pass lvcsr systems. In: Interspeech. Portland

Brown PF, Della Pietra VJ (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479

Casacuberta F, Ney H, Och FJ, Vidal E, Vilar JM, Barrachina S, García-Varea I, Llorens D, Martínez C, Molau S, Nevado F, Pastor M, Picó D, Sanchis A, Tillmann C (2004) Some approaches to statistical and finite-state speech-to-speech translation. Comput Speech Lang 18:25–47CrossRef

Caseiro D, Trancoso I (2006) A specialized on-the-fly algorithm for lexicon and language model composition. IEEE Trans Audio Speech Lang Process 14(4):1281–1291CrossRef

Chelba C, Jelinek F (2000) Structured language modeling. Comput Speech Lang 14(4):283–332CrossRef

10.

Chien JT, Chueh CH (2011) Dirichlet class language models for speech recognition. Audio Speech Lang Process IEEE Trans 19(3):482–495CrossRef

11.

García P, Vidal E (1990) Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Trans Pattern Anal Mach Intell 12(9):920–925CrossRef

12.

GENIO (2006) Gestor Embebido Natural de Interfaz Oral. INTEK project. Industry Department. Basque Government

13.

Guijarrubia VG, Torres MI (2010) Text- and speech-based phonotactic models for spoken language identification of basque and spanish. Pattern Recognit Lett 31(6):523–532 CIARP 2008: Robust and Efficient Analysis of Signals and Images.CrossRef

14.

Hahn S, Dinarelli M, Raymond C, Lefevre F, Lehnen P, de Mori R, Moschitti A, Ney H, Riccardi G (2011) Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Trans Audio Speech Lang Process 19(6):1569–1583CrossRef

15.

Jelinek F (1997) Statistical methods for speech recognition. MIT Press, Cambridge

16.

Jelinek F, Mercer RL (1980) Interpolated estimation of markov source parameters from sparse data. In: Workshop on Pattern Recognition in practise. North-Holland, The Netherlands, pp 381–397

17.

Johnston M, Bangalore S (2000) Finite-state multimodal parsing and understanding. In: Proceedings of the 18th conference on Computational linguistics, vol 1, COLING ’00. Association for Computational Linguistics, Stroudsburg, pp 369–375

18.

Jurafsky D, Wooters C, Segal J, Stolcke A, Fosler E, Tajchman G, Morgan N (1995) Using a stochastic context-free grammar as a language model for speech recognition. In: Proceedings of ICASSP ’95. IEEE Computer Society, Detroit, pp 189–192

19.

Justo R, Pérez A, Torres MI (2011) Impact of the approaches involved on word-graph derivation from the asr system. In: Proceedings of the IbPRIA 2011 (To be published in LNCS). Las Palmas de Gran Canaria, Spain

20.

Justo R, Saz O, Miguel A, Torres MI, Lleida E (2013) Improving language models in speech-based human-machine interaction. Int J Adv Robot Syst 10(87):1–11CrossRef

21.

Justo R, Torres MI (2009) Phrase classes in two-level language models for asr. Pattern Anal Appl 12(4):427–437MathSciNetCrossRef

22.

Khudanpur S, Wu J (2000) Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling. Comput Speech Lang 4:355–372CrossRef

23.

Kirchhoff K, Vergyri D, Bilmes J, Duh K, Stolcke A (2006) Morphology-based language modeling for conversational arabic speech recognition. Comput Speech Lang 20(4):589–608CrossRef

24.

Klakow D (1998) Log-linear interpolation of language models. In: Proceedings of ICSLP ’98, pp 1695–1699

25.

Kuo HK, Mangu L, Emami A, Zitouni I, Lee YS (2009) Syntactic features for arabic speech recognition. In: Automatic Speech Recognition Understanding, 2009. ASRU 2009. IEEE Workshop on, pp 327–332

26.

Lin BS, Chen B, Wang HM, Lee LS (2002) A hierarchical tag-graph search scheme with layered grammar rules for spontaneous speech understanding. Pattern Recognit. Lett. 23:819–831CrossRef

27.

Mohri M, Riley M (2001) A weight pushing algorithm for large vocabulary speech recognition. In: Proceedings of INTERSPEECH ’01. Aalborg, pp 1603–1606

28.

Mori D, Bechet R, Hakkani-Tur F, McTear D, Riccardi M, Tur G (2008) Spoken language understanding. IEEE Signal Process Mag 25(3):50–58CrossRef

29.

Niesler T, Whittaker E, Woodland P (1998) Comparison of part-of-speech and automatically derived category-based language models for speech recognition. In: ICASSP’98. Seattle, pp 177–180

30.

Oflazer K, Inkelas S (2006) The architecture and the implementation of a finite state pronunciation lexicon for turkish. Comput. Speech Lang. 20:80–106CrossRef

31.

Pereira F, Riley MD (1996) Speech recognition by composition of weighted finite automata. In: Finite-State Language Processing. MIT Press, pp 431–453

32.

Raymond C, Bechet F, Camelin N, Mori RD, Damnati G (2007) Sequential decision strategies for machine interpretation of speech. Audio Speech Lang Process IEEE Trans 15(1)

33.

San-Segundo R, Montero J, Córdoba R, Sama V, Fernández F, D Haro L, López-Ludeña V, Sánchez D, García A (2012) Design, development and field evaluation of a spanish into sign language translation system. Pattern Anal Appl 15:203–224

34.

Segarra E, Sanchis E, Galiano M, Hurtado FGL (2002) Extracting semantic information through automatic learning techniques. Int J Pattern Recognit Artif Intell 16(3):301–307CrossRef

35.

Seon CN, Kim H, Seo J (2011) Efficient appointment information extraction from short messages in mobile devices with limited hardware resources. Pattern Recognit Lett 32(2):127–133CrossRef

36.

Torres MI, Casacuberta F (2011) Stochastic k-tss bi-languages for machine translation. In: Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing. Springer, Blois, pp 98–106

37.

Torres MI, Varona A (2001) k-TSS language models in speech recognition systems. Comput Speech Lang 15(2):127–149CrossRef

38.

Vidal E, Thollard FC, de la Higuera FC, Carrasco R (2005) Probabilistic finite-state machines— part II. IEEE Trans Pattern Anal Mach Intell 27(7):1025–1039

39.

Woods WA (1975) What’s in a link: foundations for semantic networks. In: Bobrow DG, Collins A (eds) Representation and understanding. Academic Press, pp 35–82

40.

Zitouni I (2007) Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition. Comput Speech Lang 21(1):99–104CrossRef

Titel: Integration of complex language models in ASR and LU systems
verfasst von: Raquel Justo
M. Inés Torres
Publikationsdatum: 01.08.2015
Verlag: Springer London
Erschienen in: Pattern Analysis and Applications / Ausgabe 3/2015
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-014-0436-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2015

Analyses of pupils’ polygonal shape drawing strategy with respect to handwriting performance

Discriminative learning of generative models: large margin multinomial mixture models for document classification

Application of foreground object patterns analysis for event detection in an innovative video surveillance system

Hierarchical clustering based on the information bottleneck method using a control process

Meta-classifiers for high-dimensional, small sample classification for gene expression analysis

Automatic grading system for human tear films

Premium Partner