Top

International Journal of Speech Technology

Published in:

01-06-2012 | Original Paper

Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling

Authors: G. Bouselmi, D. Fohr, I. Illina

Published in: International Journal of Speech Technology | Issue 2/2012

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This article presents an approach for the automatic recognition of non-native speech. Some non-native speakers tend to pronounce phonemes as they would in their native language. Model adaptation can improve the recognition rate for non-native speakers, but has difficulties dealing with pronunciation errors like phoneme insertions or substitutions. For these pronunciation mismatches, pronunciation modeling can make the recognition system more robust. Our approach is based on acoustic model transformation and pronunciation modeling for multiple non-native accents. For acoustic model transformation, two approaches are evaluated: MAP and model re-estimation. For pronunciation modeling, confusion rules (alternate pronunciations) are automatically extracted from a small non-native speech corpus. This paper presents a novel approach to introduce confusion rules in the recognition system which are automatically learned through pronunciation modelling. The modified HMM of a foreign spoken language phoneme includes its canonical pronunciation along with all the alternate non-native pronunciations, so that spoken language phonemes pronounced correctly by a non-native speaker could be recognized. We evaluate our approaches on the European project HIWIRE non-native corpus which contains English sentences pronounced by French, Italian, Greek and Spanish speakers. Two cases are studied: the native language of the test speaker is either known or unknown. Our approach gives better recognition results than the classical acoustic adaptation of HMM when the foreign origin of the speaker is known. We obtain 22% WER reduction compared to the reference system.

Furthermore, we take into account the written form of the spoken words: non-native speakers may rely on the writing of the words in order to pronounce them. This approach does not provide any further improvements.

previous article Filterbank optimization for robust ASR using GA and PSO

next article A HMM-WDLT framework for HNM-based voice conversion with parametric adjustment in formant bandwidth, duration and excitation

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

For native speakers it would be more efficient to use a native ASR.

Bartkova, K., & Jouvet, D. (2006). Using multilingual units for improved modeling of pronunciation variants. In Proceedings IEEE international conference on acoustic, speech and signal processing, Toulouse, France.

Bartkova, K., & Jouvet, D. (2007). On using units trained on foreign data for improved multiple accent speech recognition. Speech Communication, 49, 836–846. CrossRef

Bisani, M., & Ney, H. (2003). Multigram-based grapheme-to-phoneme conversion for LVCSR. In Proceedings Interspeech.

Bouselmi, G., Fohr, D., Illina, I., & Haton, J.-P. (2005). Fully automated non-native speech recognition using confusion-based acoustic model integration. In Proceedings Interspeech, Lisboa.

Bouselmi, G., Fohr, D., Illina, I., & Haton, J.-P. (2006). Fully automated non-native speech recognition using confusion-based acoustic model integration and graphemic constraints. In Proceedings IEEE international conference on acoustic, speech and signal processing (pp. 345–348), Toulouse, France.

Bouselmi, G., Fohr, D., & Illina, I. (2007). Combined acoustic and pronunciation modelling for non-native speech recognition. In Proceedings Interspeech (pp. 1449–1452), Antwerp, Belgium.

Clarke, C., & Jurafsky, D. (2006). Limitations of MLLR adaptation with Spanish-accented English: an error analysis. In Proceedings international conference on spoken language processing (pp. 1117–1120), Pittsburgh, PA, USA.

Coile, B. V. (1990). Inductive learning of grapheme-to-phoneme rules. In Proceedings international conference on spoken language processing.

Compernolle, D. V. (2001). Recognizing speech of goats, wolves, sheep and … non-natives. Speech Communication, 35(1–2), 71–79. MATHCrossRef

Cremelie, N., & Martens, J.-P. (1997). Automatic rule based generation of word pronunciation networks. In Proceedings of Eurospeech97 (pp. 2459–2462).

Cremelie, N., & Martens, J.-P. (1999). In search of better pronunciation models for speech recognition. Speech Communication, 29(2–4), 115–136. CrossRef

Flege, J., Schirru, C., & MacKay, I. (2003). Interaction between the native and second language phonetic subsystems. Speech Communication, 40, 467–491. CrossRef

Gillick, L., & Cox, S. (1989). Some statistical issues in the comparison of speech recognition algorithms. In Proceedings IEEE international conference on acoustic, speech and signal processing (pp. 532–535).

Goronzy, S., Rapp, S., & Kompe, R. (2004). Generating non-native pronunciation variants for lexicon adaptation. Speech Communication, 42, 109–123. CrossRef

He, X., & Zhao, Y. (2003). Fast model selection based speaker adaptation for non native speech. IEEE Transactions on Acoustics, Speech, and Signal Processing, pp. 298–307.

Jeffers, R. J., & Lehiste, I. (1979). Principles and methods for historical linguistics. Cambridge: MIT Press.

Jurafsky, D., Ward, W., Jianping, Z., Herold, K., Xiuyang, Y., & Sen, Z. (2001). What kind of pronunciation variation is hard for triphones to model. In Proceedings IEEE international conference on acoustic, speech and signal processing.

Ladefoged, P., & Maddieson, I. (1996). The sounds of the world’s languages. Oxford: Blackwell Publishers.

Lawson, A., Harris, D., & Grieco, J. (2003). Effect of foreign accent on speech recognition in the NATO N-4 corpus. In Proceedings interspeech (pp. 1505–1508), Geneva, Switzerland.

Livescu, K., & Glass, J. (2000). Lexical modeling of non-native speech for automatic speech recognition. In Proceedings IEEE international conference on acoustic, speech and signal processing (pp. 1683–1686), Istanbul, Turkey.

Minematsu, N., Osaki, K., & Hirose, K. (2003). Improvement of non-native speech recognition by effectively modeling frequently observed pronunciation habits. In Proceedings interspeech (pp. 2597–2600), Geneva, Switzerland.

Morgan, J. (2004). Making a speech recognizer tolerate non-native speech through Gaussian mixture merging. In Proceedings InSTIL/ICALL (pp. 213–216), Venice, Italy.

Oh, Y. R., Yoon, J. S., & Kim, H. K. (2007). Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition. Speech Communication, 49, 59–70. CrossRef

Raux, A. (2004). Automated lexical adaptation and speaker clustering based on pronunciation habits for non-native speech recognition. In Proceedings international conference on spoken language processing (pp. 613–616), Jeju Island, Korea.

Saraclar, M., Nock, H., & Khudanpur, S. (2000). Pronunciation modeling by sharing Gaussian densities across phonetic models. Computer Speech & Language, 14, 137–160. CrossRef

Schaden, S. (2003). Generating non-native pronunciation lexicons by phonological rules. In Proceedings ICPhS (pp. 2545–2548).

Stouten, F., & Martens, J.-P. (2007). Recognition of foreign names spoken by native speakers. In Proceedings Interspeech (pp. 2133–2136), Antwerp, Belgium.

Tomokiyo, M., & Waibel, A. (2001). Adaptation methods for non-native speech. In Multilinguality in spoken language processing (pp. 137–140), Aalborg, Denmark.

University, C. M. (1998). The CMU pronouncing dictionary v.0.6d. http://www.speech.cs.cmu.edu/.

Title: Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling
Authors: G. Bouselmi
D. Fohr
I. Illina
Publication date: 01-06-2012
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 2/2012
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-012-9134-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2012

Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach

Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification

A pertinent learning machine input feature for speaker discrimination by voice

Integration of multiple acoustic and language models for improved Hindi speech recognition system

Speaker verification using excitation source information

Emotion recognition from speech using source, system, and prosodic features