Skip to main content
Erschienen in: Pattern Analysis and Applications 3/2015

01.08.2015 | Theoretical Advances

Integration of complex language models in ASR and LU systems

verfasst von: Raquel Justo, M. Inés Torres

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Throughout this work, we explore different methods to integrate a complex Language Model (a hierarchical Language Model based on classes of phrases) into an automatic speech recognition (ASR) system. First of all, an integrated architecture is considered, where the integration is carried out via the composition of the different Stochastic Finite-State Automata associated with the specific Language Model (LM). On the other hand, a decoupled architecture with a two-pass decoder is employed, where the complex LM is used to reorder the N-best list. The formal definition of both methods is provided in this work, thus enabling the theoretical comparison between them. Additionally, different experiments were carried out to compare empirically the proposed approaches. The results show that although the hierarchical LMs outperform a baseline word-based LM in both cases, the integrated architecture can provide better ASR system performance. However, the decoupled architecture could be more versatile due to the two-pass strategy, allowing the integration of different models using a standard decoder. Additionally, the use of this kind of complex LMs can also be extended to other NLP applications, such as language understanding, by employing the proposed architectures.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
English: “With traditional hot plus fan function for 3 h and 20 min”.
 
Literatur
1.
Zurück zum Zitat Bangalore S, Johnston M (2004) Balancing data-driven and rule-based approaches in the context of a multimodal conversational system. In: Susan Dumais DM, Roukos S (eds) HLT-NAACL 2004: Main Proceedings. Association for Computational Linguistics, Boston, pp 33–40 Bangalore S, Johnston M (2004) Balancing data-driven and rule-based approaches in the context of a multimodal conversational system. In: Susan Dumais DM, Roukos S (eds) HLT-NAACL 2004: Main Proceedings. Association for Computational Linguistics, Boston, pp 33–40
2.
Zurück zum Zitat Benedí J, Lleida E, Varona A, Castro M, Galiano I, Justo R, López I, Miguel A (2006) Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proceedings of LREC’06. Genoa Benedí J, Lleida E, Varona A, Castro M, Galiano I, Justo R, López I, Miguel A (2006) Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA. In: Proceedings of LREC’06. Genoa
3.
Zurück zum Zitat Benedí JM, Sánchez JA (2005) Estimation of stochastic context-free grammars and their use as language models. Comput Speech Lang 19(3):249–274CrossRef Benedí JM, Sánchez JA (2005) Estimation of stochastic context-free grammars and their use as language models. Comput Speech Lang 19(3):249–274CrossRef
4.
Zurück zum Zitat Bilmes JA, Kirchhoff K (2003) Factored language models and generalized parallel backoff. In: Hearst M, Ostendorf M (eds) HLT-NAACL 2003: Short Papers. Association for Computational Linguistics, Edmonton, pp 4–6 Bilmes JA, Kirchhoff K (2003) Factored language models and generalized parallel backoff. In: Hearst M, Ostendorf M (eds) HLT-NAACL 2003: Short Papers. Association for Computational Linguistics, Edmonton, pp 4–6
5.
Zurück zum Zitat Bougares F, Rouvier M, Estve Y, Linars G (2012) Low latency combination of parallelized single-pass lvcsr systems. In: Interspeech. Portland Bougares F, Rouvier M, Estve Y, Linars G (2012) Low latency combination of parallelized single-pass lvcsr systems. In: Interspeech. Portland
6.
Zurück zum Zitat Brown PF, Della Pietra VJ (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479 Brown PF, Della Pietra VJ (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479
7.
Zurück zum Zitat Casacuberta F, Ney H, Och FJ, Vidal E, Vilar JM, Barrachina S, García-Varea I, Llorens D, Martínez C, Molau S, Nevado F, Pastor M, Picó D, Sanchis A, Tillmann C (2004) Some approaches to statistical and finite-state speech-to-speech translation. Comput Speech Lang 18:25–47CrossRef Casacuberta F, Ney H, Och FJ, Vidal E, Vilar JM, Barrachina S, García-Varea I, Llorens D, Martínez C, Molau S, Nevado F, Pastor M, Picó D, Sanchis A, Tillmann C (2004) Some approaches to statistical and finite-state speech-to-speech translation. Comput Speech Lang 18:25–47CrossRef
8.
Zurück zum Zitat Caseiro D, Trancoso I (2006) A specialized on-the-fly algorithm for lexicon and language model composition. IEEE Trans Audio Speech Lang Process 14(4):1281–1291CrossRef Caseiro D, Trancoso I (2006) A specialized on-the-fly algorithm for lexicon and language model composition. IEEE Trans Audio Speech Lang Process 14(4):1281–1291CrossRef
9.
Zurück zum Zitat Chelba C, Jelinek F (2000) Structured language modeling. Comput Speech Lang 14(4):283–332CrossRef Chelba C, Jelinek F (2000) Structured language modeling. Comput Speech Lang 14(4):283–332CrossRef
10.
Zurück zum Zitat Chien JT, Chueh CH (2011) Dirichlet class language models for speech recognition. Audio Speech Lang Process IEEE Trans 19(3):482–495CrossRef Chien JT, Chueh CH (2011) Dirichlet class language models for speech recognition. Audio Speech Lang Process IEEE Trans 19(3):482–495CrossRef
11.
Zurück zum Zitat García P, Vidal E (1990) Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Trans Pattern Anal Mach Intell 12(9):920–925CrossRef García P, Vidal E (1990) Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Trans Pattern Anal Mach Intell 12(9):920–925CrossRef
12.
Zurück zum Zitat GENIO (2006) Gestor Embebido Natural de Interfaz Oral. INTEK project. Industry Department. Basque Government GENIO (2006) Gestor Embebido Natural de Interfaz Oral. INTEK project. Industry Department. Basque Government
13.
Zurück zum Zitat Guijarrubia VG, Torres MI (2010) Text- and speech-based phonotactic models for spoken language identification of basque and spanish. Pattern Recognit Lett 31(6):523–532 CIARP 2008: Robust and Efficient Analysis of Signals and Images.CrossRef Guijarrubia VG, Torres MI (2010) Text- and speech-based phonotactic models for spoken language identification of basque and spanish. Pattern Recognit Lett 31(6):523–532 CIARP 2008: Robust and Efficient Analysis of Signals and Images.CrossRef
14.
Zurück zum Zitat Hahn S, Dinarelli M, Raymond C, Lefevre F, Lehnen P, de Mori R, Moschitti A, Ney H, Riccardi G (2011) Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Trans Audio Speech Lang Process 19(6):1569–1583CrossRef Hahn S, Dinarelli M, Raymond C, Lefevre F, Lehnen P, de Mori R, Moschitti A, Ney H, Riccardi G (2011) Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Trans Audio Speech Lang Process 19(6):1569–1583CrossRef
15.
Zurück zum Zitat Jelinek F (1997) Statistical methods for speech recognition. MIT Press, Cambridge Jelinek F (1997) Statistical methods for speech recognition. MIT Press, Cambridge
16.
Zurück zum Zitat Jelinek F, Mercer RL (1980) Interpolated estimation of markov source parameters from sparse data. In: Workshop on Pattern Recognition in practise. North-Holland, The Netherlands, pp 381–397 Jelinek F, Mercer RL (1980) Interpolated estimation of markov source parameters from sparse data. In: Workshop on Pattern Recognition in practise. North-Holland, The Netherlands, pp 381–397
17.
Zurück zum Zitat Johnston M, Bangalore S (2000) Finite-state multimodal parsing and understanding. In: Proceedings of the 18th conference on Computational linguistics, vol 1, COLING ’00. Association for Computational Linguistics, Stroudsburg, pp 369–375 Johnston M, Bangalore S (2000) Finite-state multimodal parsing and understanding. In: Proceedings of the 18th conference on Computational linguistics, vol 1, COLING ’00. Association for Computational Linguistics, Stroudsburg, pp 369–375
18.
Zurück zum Zitat Jurafsky D, Wooters C, Segal J, Stolcke A, Fosler E, Tajchman G, Morgan N (1995) Using a stochastic context-free grammar as a language model for speech recognition. In: Proceedings of ICASSP ’95. IEEE Computer Society, Detroit, pp 189–192 Jurafsky D, Wooters C, Segal J, Stolcke A, Fosler E, Tajchman G, Morgan N (1995) Using a stochastic context-free grammar as a language model for speech recognition. In: Proceedings of ICASSP ’95. IEEE Computer Society, Detroit, pp 189–192
19.
Zurück zum Zitat Justo R, Pérez A, Torres MI (2011) Impact of the approaches involved on word-graph derivation from the asr system. In: Proceedings of the IbPRIA 2011 (To be published in LNCS). Las Palmas de Gran Canaria, Spain Justo R, Pérez A, Torres MI (2011) Impact of the approaches involved on word-graph derivation from the asr system. In: Proceedings of the IbPRIA 2011 (To be published in LNCS). Las Palmas de Gran Canaria, Spain
20.
Zurück zum Zitat Justo R, Saz O, Miguel A, Torres MI, Lleida E (2013) Improving language models in speech-based human-machine interaction. Int J Adv Robot Syst 10(87):1–11CrossRef Justo R, Saz O, Miguel A, Torres MI, Lleida E (2013) Improving language models in speech-based human-machine interaction. Int J Adv Robot Syst 10(87):1–11CrossRef
21.
Zurück zum Zitat Justo R, Torres MI (2009) Phrase classes in two-level language models for asr. Pattern Anal Appl 12(4):427–437MathSciNetCrossRef Justo R, Torres MI (2009) Phrase classes in two-level language models for asr. Pattern Anal Appl 12(4):427–437MathSciNetCrossRef
22.
Zurück zum Zitat Khudanpur S, Wu J (2000) Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling. Comput Speech Lang 4:355–372CrossRef Khudanpur S, Wu J (2000) Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling. Comput Speech Lang 4:355–372CrossRef
23.
Zurück zum Zitat Kirchhoff K, Vergyri D, Bilmes J, Duh K, Stolcke A (2006) Morphology-based language modeling for conversational arabic speech recognition. Comput Speech Lang 20(4):589–608CrossRef Kirchhoff K, Vergyri D, Bilmes J, Duh K, Stolcke A (2006) Morphology-based language modeling for conversational arabic speech recognition. Comput Speech Lang 20(4):589–608CrossRef
24.
Zurück zum Zitat Klakow D (1998) Log-linear interpolation of language models. In: Proceedings of ICSLP ’98, pp 1695–1699 Klakow D (1998) Log-linear interpolation of language models. In: Proceedings of ICSLP ’98, pp 1695–1699
25.
Zurück zum Zitat Kuo HK, Mangu L, Emami A, Zitouni I, Lee YS (2009) Syntactic features for arabic speech recognition. In: Automatic Speech Recognition Understanding, 2009. ASRU 2009. IEEE Workshop on, pp 327–332 Kuo HK, Mangu L, Emami A, Zitouni I, Lee YS (2009) Syntactic features for arabic speech recognition. In: Automatic Speech Recognition Understanding, 2009. ASRU 2009. IEEE Workshop on, pp 327–332
26.
Zurück zum Zitat Lin BS, Chen B, Wang HM, Lee LS (2002) A hierarchical tag-graph search scheme with layered grammar rules for spontaneous speech understanding. Pattern Recognit. Lett. 23:819–831CrossRef Lin BS, Chen B, Wang HM, Lee LS (2002) A hierarchical tag-graph search scheme with layered grammar rules for spontaneous speech understanding. Pattern Recognit. Lett. 23:819–831CrossRef
27.
Zurück zum Zitat Mohri M, Riley M (2001) A weight pushing algorithm for large vocabulary speech recognition. In: Proceedings of INTERSPEECH ’01. Aalborg, pp 1603–1606 Mohri M, Riley M (2001) A weight pushing algorithm for large vocabulary speech recognition. In: Proceedings of INTERSPEECH ’01. Aalborg, pp 1603–1606
28.
Zurück zum Zitat Mori D, Bechet R, Hakkani-Tur F, McTear D, Riccardi M, Tur G (2008) Spoken language understanding. IEEE Signal Process Mag 25(3):50–58CrossRef Mori D, Bechet R, Hakkani-Tur F, McTear D, Riccardi M, Tur G (2008) Spoken language understanding. IEEE Signal Process Mag 25(3):50–58CrossRef
29.
Zurück zum Zitat Niesler T, Whittaker E, Woodland P (1998) Comparison of part-of-speech and automatically derived category-based language models for speech recognition. In: ICASSP’98. Seattle, pp 177–180 Niesler T, Whittaker E, Woodland P (1998) Comparison of part-of-speech and automatically derived category-based language models for speech recognition. In: ICASSP’98. Seattle, pp 177–180
30.
Zurück zum Zitat Oflazer K, Inkelas S (2006) The architecture and the implementation of a finite state pronunciation lexicon for turkish. Comput. Speech Lang. 20:80–106CrossRef Oflazer K, Inkelas S (2006) The architecture and the implementation of a finite state pronunciation lexicon for turkish. Comput. Speech Lang. 20:80–106CrossRef
31.
Zurück zum Zitat Pereira F, Riley MD (1996) Speech recognition by composition of weighted finite automata. In: Finite-State Language Processing. MIT Press, pp 431–453 Pereira F, Riley MD (1996) Speech recognition by composition of weighted finite automata. In: Finite-State Language Processing. MIT Press, pp 431–453
32.
Zurück zum Zitat Raymond C, Bechet F, Camelin N, Mori RD, Damnati G (2007) Sequential decision strategies for machine interpretation of speech. Audio Speech Lang Process IEEE Trans 15(1) Raymond C, Bechet F, Camelin N, Mori RD, Damnati G (2007) Sequential decision strategies for machine interpretation of speech. Audio Speech Lang Process IEEE Trans 15(1)
33.
Zurück zum Zitat San-Segundo R, Montero J, Córdoba R, Sama V, Fernández F, D Haro L, López-Ludeña V, Sánchez D, García A (2012) Design, development and field evaluation of a spanish into sign language translation system. Pattern Anal Appl 15:203–224 San-Segundo R, Montero J, Córdoba R, Sama V, Fernández F, D Haro L, López-Ludeña V, Sánchez D, García A (2012) Design, development and field evaluation of a spanish into sign language translation system. Pattern Anal Appl 15:203–224
34.
Zurück zum Zitat Segarra E, Sanchis E, Galiano M, Hurtado FGL (2002) Extracting semantic information through automatic learning techniques. Int J Pattern Recognit Artif Intell 16(3):301–307CrossRef Segarra E, Sanchis E, Galiano M, Hurtado FGL (2002) Extracting semantic information through automatic learning techniques. Int J Pattern Recognit Artif Intell 16(3):301–307CrossRef
35.
Zurück zum Zitat Seon CN, Kim H, Seo J (2011) Efficient appointment information extraction from short messages in mobile devices with limited hardware resources. Pattern Recognit Lett 32(2):127–133CrossRef Seon CN, Kim H, Seo J (2011) Efficient appointment information extraction from short messages in mobile devices with limited hardware resources. Pattern Recognit Lett 32(2):127–133CrossRef
36.
Zurück zum Zitat Torres MI, Casacuberta F (2011) Stochastic k-tss bi-languages for machine translation. In: Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing. Springer, Blois, pp 98–106 Torres MI, Casacuberta F (2011) Stochastic k-tss bi-languages for machine translation. In: Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing. Springer, Blois, pp 98–106
37.
Zurück zum Zitat Torres MI, Varona A (2001) k-TSS language models in speech recognition systems. Comput Speech Lang 15(2):127–149CrossRef Torres MI, Varona A (2001) k-TSS language models in speech recognition systems. Comput Speech Lang 15(2):127–149CrossRef
38.
Zurück zum Zitat Vidal E, Thollard FC, de la Higuera FC, Carrasco R (2005) Probabilistic finite-state machines— part II. IEEE Trans Pattern Anal Mach Intell 27(7):1025–1039 Vidal E, Thollard FC, de la Higuera FC, Carrasco R (2005) Probabilistic finite-state machines— part II. IEEE Trans Pattern Anal Mach Intell 27(7):1025–1039
39.
Zurück zum Zitat Woods WA (1975) What’s in a link: foundations for semantic networks. In: Bobrow DG, Collins A (eds) Representation and understanding. Academic Press, pp 35–82 Woods WA (1975) What’s in a link: foundations for semantic networks. In: Bobrow DG, Collins A (eds) Representation and understanding. Academic Press, pp 35–82
40.
Zurück zum Zitat Zitouni I (2007) Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition. Comput Speech Lang 21(1):99–104CrossRef Zitouni I (2007) Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition. Comput Speech Lang 21(1):99–104CrossRef
Metadaten
Titel
Integration of complex language models in ASR and LU systems
verfasst von
Raquel Justo
M. Inés Torres
Publikationsdatum
01.08.2015
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 3/2015
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-014-0436-0

Weitere Artikel der Ausgabe 3/2015

Pattern Analysis and Applications 3/2015 Zur Ausgabe

Industrial and Commercial Application

Automatic grading system for human tear films

Premium Partner