Skip to main content
main-content
Top

Hint

Swipe to navigate through the articles of this issue

Published in: International Journal on Interactive Design and Manufacturing (IJIDeM) 2-3/2021

03-08-2021 | Original Paper

Automated evaluation of foreign language speaking performance with machine learning

Authors: Ramon F. Brena, Evelyn Zuvirie, Alan Preciado, Aristh Valdiviezo, Miguel Gonzalez-Mendoza, Carlos Zozaya-Gorostiza

Published in: International Journal on Interactive Design and Manufacturing (IJIDeM) | Issue 2-3/2021

Login to get access
share
SHARE

Abstract

In a globalized world, the need to speak foreign languages, particularly English, is imperative. One challenge for learning foreign languages at the scale of millions is that, although teaching content is widely available, speaking skills are harder to develop than vocabulary, because feedback from a teacher is needed to correct pronunciation, intonation, etc. There are currently no automated tools to evaluate the fluency or pronunciation level of language students, so this evaluation, which is required even for placing the student into the right level, requires an interview with a language teacher. We have proposed a supervised machine-learning method for automatically evaluating both the fluency and the pronunciation of a language student, as well as detecting specific pronunciation mistakes, taking English as the target language. In order to train a classifier for the classes “low”, “intermediate” and “high”, we first built datasets of audio samples of English-learning students talking. Each audio was divided into small segments, and for each segment a set of features were calculated. We trained several classifiers, which made predictions about the level of a given non-native English speaker. We performed a series of tests with the trained classifiers, comparing the predicted class of audio segments not included in the training dataset, for accuracy, precision, and other measures. Results were promising, as for both fluency and pronunciation we obtained accuracy values of 94% and 99.9% in predictions, the second one being the highest accuracy ever reported on the literature for such predictions.
Literature
1.
go back to reference Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M.: Tensorflow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016) Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M.: Tensorflow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016)
2.
go back to reference Arafa, M.N., Elbarougy, R., Ewees, A.A., Behery, G.M.: A dataset for speech recognition to support Arabic phoneme pronunciation. Int J Image Graph Signal Process 11, 31 (2018) CrossRef Arafa, M.N., Elbarougy, R., Ewees, A.A., Behery, G.M.: A dataset for speech recognition to support Arabic phoneme pronunciation. Int J Image Graph Signal Process 11, 31 (2018) CrossRef
3.
go back to reference Bishop, C.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Berlin (2006) MATH Bishop, C.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Berlin (2006) MATH
4.
go back to reference Bowles, M.: Machine Learning in Python: Essential Techniques for Predictive Analysis. Wiley, Hoboken (2015) CrossRef Bowles, M.: Machine Learning in Python: Essential Techniques for Predictive Analysis. Wiley, Hoboken (2015) CrossRef
5.
go back to reference Black, M. P., Bone, D., Skordilis, Z.I., Gupta, R., Xia, W., Papadopoulos, P., Chakravarthula, S.N., Xiao, B., Segbroeck, M.V., Kim, J., Georgiou, P.G.: Automated evaluation of non-native English pronunciation quality: combining knowledge-and data-driven features at multiple time scales. In: Sixteenth Annual Conference of the International Speech Communication Association (2015) Black, M. P., Bone, D., Skordilis, Z.I., Gupta, R., Xia, W., Papadopoulos, P., Chakravarthula, S.N., Xiao, B., Segbroeck, M.V., Kim, J., Georgiou, P.G.: Automated evaluation of non-native English pronunciation quality: combining knowledge-and data-driven features at multiple time scales. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
6.
go back to reference Camastra, F., Vinciarelli, A.: Machine Learning for Audio, Image and Video Analysis: Theory and Applications. Springer, Berlin (2015) CrossRef Camastra, F., Vinciarelli, A.: Machine Learning for Audio, Image and Video Analysis: Theory and Applications. Springer, Berlin (2015) CrossRef
7.
go back to reference Chen, L., Zechner, K., Xi, X.: Improved pronunciation features for construct-driven assessment of non-native spontaneous speech. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (2009) Chen, L., Zechner, K., Xi, X.: Improved pronunciation features for construct-driven assessment of non-native spontaneous speech. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (2009)
8.
go back to reference Delgado-Contreras, J.R., García-Vázquez, J.P., Brena, R.: Classification of environmental audio signals using statistical time and frequency features. In: 2014 International Conference on Electronics, Communications and Computers (CONIELECOMP) (2014) Delgado-Contreras, J.R., García-Vázquez, J.P., Brena, R.: Classification of environmental audio signals using statistical time and frequency features. In: 2014 International Conference on Electronics, Communications and Computers (CONIELECOMP) (2014)
9.
go back to reference Engwall, O., Bälter, O.: Pronunciation feedback from real and virtual language teachers. Comput. Assist. Lang. Learn. 20(3), 235–262 (2007) CrossRef Engwall, O., Bälter, O.: Pronunciation feedback from real and virtual language teachers. Comput. Assist. Lang. Learn. 20(3), 235–262 (2007) CrossRef
10.
go back to reference Ehsani, F., Knodt, E.: Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm. Lang Learn Technol 21, 54–73 (1998) Ehsani, F., Knodt, E.: Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm. Lang Learn Technol 21, 54–73 (1998)
11.
go back to reference Fu, J., Chiba, Y., Nose, T., Ito, A.: Automatic assessment of English proficiency for Japanese learners without reference sentences based on deep neural network acoustic models. Speech Commun. 116, 86–97 (2020) CrossRef Fu, J., Chiba, Y., Nose, T., Ito, A.: Automatic assessment of English proficiency for Japanese learners without reference sentences based on deep neural network acoustic models. Speech Commun. 116, 86–97 (2020) CrossRef
12.
go back to reference Giannakopoulos, T.: Pyaudioanalysis: An open-source python library for audio signal analysis. PLoS ONE 10(12), 144610 (2015) CrossRef Giannakopoulos, T.: Pyaudioanalysis: An open-source python library for audio signal analysis. PLoS ONE 10(12), 144610 (2015) CrossRef
13.
go back to reference Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649 (2013) Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649 (2013)
14.
go back to reference Gulli, A., Pal, S.: Deep Learning with Keras. Packt Publishing Ltd, Birmingham (2017) Gulli, A., Pal, S.: Deep Learning with Keras. Packt Publishing Ltd, Birmingham (2017)
15.
go back to reference Khan, M.K., Al-Khatib, W.G.: Machine-learning based classification of speech and music. Multimed. Syst. 12(1), 55–67 (2006) CrossRef Khan, M.K., Al-Khatib, W.G.: Machine-learning based classification of speech and music. Multimed. Syst. 12(1), 55–67 (2006) CrossRef
16.
go back to reference Kulkarni, A., Iyer, D., Sridharan, S.R.: Audio segmentation. In: CITESEER. IEEE, International Conference on Data Mining, 29 Nov.–2 Dec, San Jose, California (2001) Kulkarni, A., Iyer, D., Sridharan, S.R.: Audio segmentation. In: CITESEER. IEEE, International Conference on Data Mining, 29 Nov.–2 Dec, San Jose, California (2001)
17.
go back to reference Khalid, S., Khalil, T., Nasreen, S.: A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and Information Conference (pp. 372–378). IEEE (2014) Khalid, S., Khalil, T., Nasreen, S.: A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and Information Conference (pp. 372–378). IEEE (2014)
18.
go back to reference Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: a review of classification techniques. In: Emerging Artificial Intelligence Applications in Computer Engineering, pp. 3–24 (2007) Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: a review of classification techniques. In: Emerging Artificial Intelligence Applications in Computer Engineering, pp. 3–24 (2007)
19.
go back to reference Lantz, B.: Machine Learning with R. Packt Publishing Ltd, Birmingham (2015) Lantz, B.: Machine Learning with R. Packt Publishing Ltd, Birmingham (2015)
20.
go back to reference Liu, Z., Huang, J., Wang, Y., Chen, T.: Audio feature extraction and analysis for scene classification. In Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing (1997) Liu, Z., Huang, J., Wang, Y., Chen, T.: Audio feature extraction and analysis for scene classification. In Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing (1997)
21.
go back to reference Liu, H., Motoda, H.: Computational Methods of Feature Selection. CRC Press, Boca Raton (2007) CrossRef Liu, H., Motoda, H.: Computational Methods of Feature Selection. CRC Press, Boca Raton (2007) CrossRef
22.
go back to reference McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., Nieto, O., Librosa: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference. Austin, Texas, July 6–12, pp.18–24 (2015) McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., Nieto, O., Librosa: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference. Austin, Texas, July 6–12, pp.18–24 (2015)
23.
go back to reference Orozco-Arevalo, M.G., Clúster: “S-Impura” en la pronunciación del idioma inglés en los estudiantes de la Universidad Central del Ecuador, de la Facultad de Filosofía, Letras y Ciencias de la Educación, de la carrera Plurilingüe de séptimo y octavos niveles de inglés en el período escolar 2017–2018 (Bachelor's thesis, Quito: UCE) Orozco-Arevalo, M.G., Clúster: “S-Impura” en la pronunciación del idioma inglés en los estudiantes de la Universidad Central del Ecuador, de la Facultad de Filosofía, Letras y Ciencias de la Educación, de la carrera Plurilingüe de séptimo y octavos niveles de inglés en el período escolar 2017–2018 (Bachelor's thesis, Quito: UCE)
24.
go back to reference Piczak, K.J. (2015) Environmental sound classification with convolutional neural networks. In: IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), Boston, USA, September 17–20. Piczak, K.J. (2015) Environmental sound classification with convolutional neural networks. In: IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), Boston, USA, September 17–20.
25.
go back to reference Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) MathSciNetMATH Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) MathSciNetMATH
26.
go back to reference Silla, Jr C.N., Kaestner, C.A., Koerich, A.L.: Automatic music genre classification using ensemble of classifiers. In: IEEE International Conference on Systems, Man and Cybernetics (2007) Silla, Jr C.N., Kaestner, C.A., Koerich, A.L.: Automatic music genre classification using ensemble of classifiers. In: IEEE International Conference on Systems, Man and Cybernetics (2007)
27.
go back to reference Subramanian, H., Rao, P., Roy, S.D.: Audio signal classification. EE Dept., IIT Bombay (2004) Subramanian, H., Rao, P., Roy, S.D.: Audio signal classification. EE Dept., IIT Bombay (2004)
28.
go back to reference Smola, A., Vishwanathan, S.V.N.: Introduction to Machine Learning. Cambridge University Press, Cambridge (2008) Smola, A., Vishwanathan, S.V.N.: Introduction to Machine Learning. Cambridge University Press, Cambridge (2008)
29.
go back to reference Schmidt, M., Walters, R., Ault, B., Poudel, K., Mischke, A., Jones, S., Sockhecke, A., Spears, M., Clarke, P., Makram, R., Meagher, S.: A simple web utility for automatic speech quantification in dyadic reading interactions. In: International Conference on Human-Computer Interaction, Jul 26 (pp. 122–130), Springer (2019) Schmidt, M., Walters, R., Ault, B., Poudel, K., Mischke, A., Jones, S., Sockhecke, A., Spears, M., Clarke, P., Makram, R., Meagher, S.: A simple web utility for automatic speech quantification in dyadic reading interactions. In: International Conference on Human-Computer Interaction, Jul 26 (pp. 122–130), Springer (2019)
30.
go back to reference Sammut, C., Webb, G.I.: Encyclopedia of machine learning and data mining. Springer Publishing Company, Berlin (2017) CrossRef Sammut, C., Webb, G.I.: Encyclopedia of machine learning and data mining. Springer Publishing Company, Berlin (2017) CrossRef
31.
go back to reference Volle, L.M.: Analyzing oral skills in voice e-mail and online interviews. Lang. Learn. Technol. 9(3), 146–163 (2005) Volle, L.M.: Analyzing oral skills in voice e-mail and online interviews. Lang. Learn. Technol. 9(3), 146–163 (2005)
32.
go back to reference Wang, Y., Gales, M.J.F., Knill, K.M., Kyriakopoulos, K., Malinin, A., van Dalen, R.C., Rashid, M.: Towards automatic assessment of spontaneous spoken English. Speech Commun. 104, 47–56 (2018) CrossRef Wang, Y., Gales, M.J.F., Knill, K.M., Kyriakopoulos, K., Malinin, A., van Dalen, R.C., Rashid, M.: Towards automatic assessment of spontaneous spoken English. Speech Commun. 104, 47–56 (2018) CrossRef
33.
go back to reference Wetzel, J.M., Killen, J.: A Preliminary Report on the Zero-Crossing-Rate Technique for Average Shear Measurement in Flowing Fluid. University of Minnesota, Minneapolis (1972) Wetzel, J.M., Killen, J.: A Preliminary Report on the Zero-Crossing-Rate Technique for Average Shear Measurement in Flowing Fluid. University of Minnesota, Minneapolis (1972)
34.
go back to reference Wall, M.E., Rechtsteiner, A., & Rocha, L.M.: Singular value decomposition and principal component analysis. In: A Practical Approach to Microarray Data Analysis, pp. 91–109. Springer, Boston (2003) Wall, M.E., Rechtsteiner, A., & Rocha, L.M.: Singular value decomposition and principal component analysis. In: A Practical Approach to Microarray Data Analysis, pp. 91–109. Springer, Boston (2003)
35.
go back to reference Yang, X., Loukina, A., Evanini, K.: Machine learning approaches to improving pronunciation error detection on an imbalanced corpus. In: 2014 IEEE Spoken Language Technology Workshop, South Lake Tahoe, California and Nevada, Dec 7–10 (2014) Yang, X., Loukina, A., Evanini, K.: Machine learning approaches to improving pronunciation error detection on an imbalanced corpus. In: 2014 IEEE Spoken Language Technology Workshop, South Lake Tahoe, California and Nevada, Dec 7–10 (2014)
36.
go back to reference Zechner, K., Higgins, D., Xi, X., Williamson, D.M.: Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Commun. 51(10), 883–895 (2009) CrossRef Zechner, K., Higgins, D., Xi, X., Williamson, D.M.: Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Commun. 51(10), 883–895 (2009) CrossRef
Metadata
Title
Automated evaluation of foreign language speaking performance with machine learning
Authors
Ramon F. Brena
Evelyn Zuvirie
Alan Preciado
Aristh Valdiviezo
Miguel Gonzalez-Mendoza
Carlos Zozaya-Gorostiza
Publication date
03-08-2021
Publisher
Springer Paris
Published in
International Journal on Interactive Design and Manufacturing (IJIDeM) / Issue 2-3/2021
Print ISSN: 1955-2513
Electronic ISSN: 1955-2505
DOI
https://doi.org/10.1007/s12008-021-00759-z

Other articles of this Issue 2-3/2021

International Journal on Interactive Design and Manufacturing (IJIDeM) 2-3/2021 Go to the issue

Premium Partner