Skip to main content

2024 | OriginalPaper | Buchkapitel

Mispronunciation Detection Using Feature Learning

verfasst von : Priyanka Chhabra, Shailja Chhillar, Riya Tanwar, Muskan Verma, Gaurav Indra

Erschienen in: Proceedings of Third International Conference on Computing and Communication Networks

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This research describes a study into the use of feature learning approaches for mispronunciation detection. Mispronunciation detection is essential in speech recognition and language learning applications. Mispronunciation detection has traditionally depended on handcrafted features and rule-based algorithms, which frequently have poor generalisation abilities and demand a lot of manual work but recent advances in deep learning and feature learning have demonstrated promising results in enhancing the accuracy and robustness of mispronunciation detection systems. The study proposes an innovative method using Mel-frequency cepstral coefficients (MFCC) and SVM. Comparing TF-IDF and MFCC feature extraction, labelled audio recordings are categorised as accurate or incorrect pronunciation. Audio data is preprocessed, and MFCC features are extracted using the librosa library. SVM learns patterns between features and labels during training. Performance is evaluated on a common voice dataset, achieving 71% accuracy with MFCC and 70% with TF-IDF. Additional preprocessing and hyperparameter tuning result in 71% accuracy for TF-IDF. Overall, this research shows that MFCC feature extraction and an SVM classifier are effective tools for mispronunciation identification. By utilising the strength of feature learning on the Common Voice dataset, this research advances mispronunciation detection systems, ultimately enhancing language learning processes and facilitating the creation of more precise speech processing applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cucchiarini, C., de Wet, F., Strik, H., Boves, L.: Assessment of Dutch pronunciation by means of automatic speech recognition technology. In: Proceedings of the ICSLP 1998, pp. 751–754 (1998) Cucchiarini, C., de Wet, F., Strik, H., Boves, L.: Assessment of Dutch pronunciation by means of automatic speech recognition technology. In: Proceedings of the ICSLP 1998, pp. 751–754 (1998)
2.
Zurück zum Zitat Maqsood, M., Habib, H.A., Nawaz, T.: Feature selection for Arabic mispronunciation detection based on sequential floating forward selection and data mining classifiers. Pakistan J. Sci. 68(4), 445–452 (2016) Maqsood, M., Habib, H.A., Nawaz, T.: Feature selection for Arabic mispronunciation detection based on sequential floating forward selection and data mining classifiers. Pakistan J. Sci. 68(4), 445–452 (2016)
3.
Zurück zum Zitat Li, K., Qian, X., Meng, H.: Mispronunciation detection and diagnosis in L2 English speech using multidistribution deep neural networks,. IEEE/ACM Trans. Audio, Speech, Lang. Process 25(1), 193–207 (2017) Li, K., Qian, X., Meng, H.: Mispronunciation detection and diagnosis in L2 English speech using multidistribution deep neural networks,. IEEE/ACM Trans. Audio, Speech, Lang. Process 25(1), 193–207 (2017)
4.
Zurück zum Zitat Amdal, I., Johnsen, M.H., Versvik, E.: Automatic evaluation of quantity contrast in nonnative Norwegian speech. In: Proceedings of the International Workshop Speech Language Technology Education (2009), pp. 21–24 Amdal, I., Johnsen, M.H., Versvik, E.: Automatic evaluation of quantity contrast in nonnative Norwegian speech. In: Proceedings of the International Workshop Speech Language Technology Education (2009), pp. 21–24
5.
Zurück zum Zitat Lee, A., Zhang, Y., Glass, J.: Mispronunciation detection via dynamic time warping on deep belief network-based posteriorgrams. In: Proceedings of the IEEE International Conference Acoustical, Speech Signal Process. (ICASSP), pp. 8227–8231 (2013) Lee, A., Zhang, Y., Glass, J.: Mispronunciation detection via dynamic time warping on deep belief network-based posteriorgrams. In: Proceedings of the IEEE International Conference Acoustical, Speech Signal Process. (ICASSP), pp. 8227–8231 (2013)
6.
Zurück zum Zitat Joshi, S., Deo, N., Rao, P.: Vowel mispronunciation detection using DNN acoustic models with cross-lingual training. In: Proceedings of the 16th Annual Conference International Speech Communication Association, pp. 697–701 (2015) Joshi, S., Deo, N., Rao, P.: Vowel mispronunciation detection using DNN acoustic models with cross-lingual training. In: Proceedings of the 16th Annual Conference International Speech Communication Association, pp. 697–701 (2015)
7.
Zurück zum Zitat Strik, H., Truong, K., de Wet, F., Cucchiarini, C.: Comparing different approaches for automatic pronunciation error detection. Speech Commun. 51(10), 845–852 (2009)CrossRef Strik, H., Truong, K., de Wet, F., Cucchiarini, C.: Comparing different approaches for automatic pronunciation error detection. Speech Commun. 51(10), 845–852 (2009)CrossRef
8.
Zurück zum Zitat Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: Proceedings of the IEEE International Conference Acoustical, Speech, Signal Process (ICASSP), pp. 1331–1334 (1997) Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: Proceedings of the IEEE International Conference Acoustical, Speech, Signal Process (ICASSP), pp. 1331–1334 (1997)
9.
Zurück zum Zitat Wei, S., Hu, G., Hu, Y., Wang, R.-H.: A new method for mispronunciation detection using support vector machine based on pronunciation space models. Speech Commun. 51(10), 896–905 (2009)CrossRef Wei, S., Hu, G., Hu, Y., Wang, R.-H.: A new method for mispronunciation detection using support vector machine based on pronunciation space models. Speech Commun. 51(10), 896–905 (2009)CrossRef
10.
Zurück zum Zitat Witt, S.M., Young, S.J.: Phone-level pronunciation scoring and assessment for interactive language learning. Speech Commun. 30(2–3), 95–108 (2000). IEEE Trans. Fuzzy Syst. 26(6), 3847–3859 (2018) Witt, S.M., Young, S.J.: Phone-level pronunciation scoring and assessment for interactive language learning. Speech Commun. 30(2–3), 95–108 (2000). IEEE Trans. Fuzzy Syst. 26(6), 3847–3859 (2018)
11.
Zurück zum Zitat Ito, A., Lim, Y.-L., Suzuki, M., Makino, S.: Pronunciation error detection method based on error rule clustering using a decision tree. In: Proceedings of the 9th European Conference Speech Communication Technology, 2005, pp. 173–176 (2005) Ito, A., Lim, Y.-L., Suzuki, M., Makino, S.: Pronunciation error detection method based on error rule clustering using a decision tree. In: Proceedings of the 9th European Conference Speech Communication Technology, 2005, pp. 173–176 (2005)
12.
Zurück zum Zitat Georgoulas, G., Georgopoulos, V.C., Stylios, C.D.: Speech sound classification and detection of articulation disorders with support vector machines and wavelets. In: Proceedings of the International Conference IEEE Engineering Medical Biology Society, Aug./Sep. 2006, pp. 2199–2202 (2006) Georgoulas, G., Georgopoulos, V.C., Stylios, C.D.: Speech sound classification and detection of articulation disorders with support vector machines and wavelets. In: Proceedings of the International Conference IEEE Engineering Medical Biology Society, Aug./Sep. 2006, pp. 2199–2202 (2006)
13.
Zurück zum Zitat Li, H., Liang, J., Wang, S., Xu, B.: An efficient mispronounciation detction method using GLDS-SVM and formant enhanced features. In: Proceedings of the IEEE International Conference Acoustical, Speech Signal Process (ICASSP), Apr. 2009, pp. 4845–4848 (2009) Li, H., Liang, J., Wang, S., Xu, B.: An efficient mispronounciation detction method using GLDS-SVM and formant enhanced features. In: Proceedings of the IEEE International Conference Acoustical, Speech Signal Process (ICASSP), Apr. 2009, pp. 4845–4848 (2009)
Metadaten
Titel
Mispronunciation Detection Using Feature Learning
verfasst von
Priyanka Chhabra
Shailja Chhillar
Riya Tanwar
Muskan Verma
Gaurav Indra
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-0892-5_24