Skip to main content
Erschienen in: Cognitive Computation 5/2021

16.07.2021

Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems

verfasst von: Musatafa Abbas Abbood Albadr, Sabrina Tiun, Masri Ayob, Manal Mohammed, Fahad Taha AL-Dhief

Erschienen in: Cognitive Computation | Ausgabe 5/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Spoken language identification (LID) is the process of determining and classifying natural language from a given content and dataset. Data must be processed to extract useful features to perform LID. The mel-frequency cepstral coefficient (MFCC) is one of the most popular feature extraction techniques in LID. The MFCC features are generated to serve as inputs for the classification stage. In this study, reduction in the MFCC feature dimension is investigated because large data size affects the computational time and resources (i.e., memory space) and slows the identification speed. The implementation of data reduction techniques to retain the most important feature parameters is also evaluated in this study. The investigation of data reduction is based on standard deviation (STD) calculation and principal component analysis (PCA). The features based on MFCC and the reduced dimensions based on STD and PCA results are then used as inputs to an optimized extreme learning machine (ELM) classifier called the optimized genetic algorithm-ELM (OGA-ELM). Several sets of data samples with one dimension of principal components (i.e., 119) are utilized for the evaluation. The results are generated using two different datasets. The first dataset is derived from eight separate languages, whereas the second dataset is a part of the National Institute of Standards and Technology Language Recognition Evaluation 2009 dataset. To evaluate the performance of the proposed method, this study utilizes several assessment measures, namely, accuracy, recall, precision, F-measure, G-mean, and identification time. The best LID performance is observed when the MFCC based on STD and PCA features with 119 feature dimensions is used with OGA-ELM as the classifier. The experimental results show that the proposed MFCC method achieves 99.38% accuracy using the first dataset. Additionally, it achieves accuracies of up to 97.60%, 96.80%, and 91.20% using the second dataset with durations of 30, 10, and 3 s, respectively. The proposed MFCC method exhibits the fastest computational time in all experiments, requiring only a few seconds to identify languages. Using a data reduction technique can substantially speed up the computational time, overcome resource limitations, and improve LID performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lee KA, et al. The 2015 NIST Language Recognition Evaluation: the Shared View of I2R, Fantastic4 and SingaMS. in Interspeech. 2016. Lee KA, et al. The 2015 NIST Language Recognition Evaluation: the Shared View of I2R, Fantastic4 and SingaMS. in Interspeech. 2016.
2.
Zurück zum Zitat Garg A, Gupta V, Jindal M. A survey of language identification techniques and applications. J Emerg Technol Web Intell. 2014;6(4):388–400. Garg A, Gupta V, Jindal M. A survey of language identification techniques and applications. J Emerg Technol Web Intell. 2014;6(4):388–400.
3.
Zurück zum Zitat Li J, et al. LSTM time and frequency recurrence for automatic speech recognition. in Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. 2015. IEEE. Li J, et al. LSTM time and frequency recurrence for automatic speech recognition. in Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. 2015. IEEE.
4.
Zurück zum Zitat Hafen RP, Henry MJ. Speech information retrieval: a review. Multimedia Syst. 2012;18(6):499–518.CrossRef Hafen RP, Henry MJ. Speech information retrieval: a review. Multimedia Syst. 2012;18(6):499–518.CrossRef
5.
Zurück zum Zitat Albadr MAA, et al. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PloS one, 2018. 13(4): p. e0194770. Albadr MAA, et al. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PloS one, 2018. 13(4): p. e0194770.
6.
Zurück zum Zitat Ali A, et al. Big data for development: applications and techniques. Big Data Analytics. 2016;1(1):2.CrossRef Ali A, et al. Big data for development: applications and techniques. Big Data Analytics. 2016;1(1):2.CrossRef
7.
Zurück zum Zitat Al-Dhief FT, et al. A survey of voice pathology surveillance systems based on Internet of things and machine learning algorithms. IEEE Access. 2020;8:64514–33.CrossRef Al-Dhief FT, et al. A survey of voice pathology surveillance systems based on Internet of things and machine learning algorithms. IEEE Access. 2020;8:64514–33.CrossRef
8.
Zurück zum Zitat Aleti A, et al. An efficient method for uncertainty propagation in robust software performance estimation. J Syst Softw. 2018;138:222–35.CrossRef Aleti A, et al. An efficient method for uncertainty propagation in robust software performance estimation. J Syst Softw. 2018;138:222–35.CrossRef
9.
Zurück zum Zitat Salamon J, Bello JP. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett. 2017;24(3):279–83.CrossRef Salamon J, Bello JP. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett. 2017;24(3):279–83.CrossRef
10.
Zurück zum Zitat Anusuya M, Katti S. Speech recognition by machine: a review. 2010. Anusuya M, Katti S. Speech recognition by machine: a review. 2010.
11.
Zurück zum Zitat Schutte KT. Parts-based models and local features for automatic speech recognition. 2009, Citeseer. Schutte KT. Parts-based models and local features for automatic speech recognition. 2009, Citeseer.
12.
Zurück zum Zitat Deshwal D, Sangwan P, Kumar D. Feature extraction methods in language identification: a survey. Wireless Pers Commun. 2019;107(4):2071–103.CrossRef Deshwal D, Sangwan P, Kumar D. Feature extraction methods in language identification: a survey. Wireless Pers Commun. 2019;107(4):2071–103.CrossRef
13.
Zurück zum Zitat Han W, et al. An efficient MFCC extraction method in speech recognition. in 2006 IEEE international symposium on circuits and systems. 2006. IEEE. Han W, et al. An efficient MFCC extraction method in speech recognition. in 2006 IEEE international symposium on circuits and systems. 2006. IEEE.
14.
Zurück zum Zitat Renanti MD, Buono A, Kusuma WA. Infant cries identification by using codebook as feature matching, and MFCC as feature extraction. J Theor Appl Inf Technol. 2013;56(2):437–42. Renanti MD, Buono A, Kusuma WA. Infant cries identification by using codebook as feature matching, and MFCC as feature extraction. J Theor Appl Inf Technol. 2013;56(2):437–42.
15.
Zurück zum Zitat Trang H, Loc TH, Nam HBH. Proposed combination of PCA and MFCC feature extraction in speech recognition system. in 2014 International Conference on Advanced Technologies for Communications (ATC 2014). 2014. IEEE. Trang H, Loc TH, Nam HBH. Proposed combination of PCA and MFCC feature extraction in speech recognition system. in 2014 International Conference on Advanced Technologies for Communications (ATC 2014). 2014. IEEE.
16.
Zurück zum Zitat Ahmed AI, et al. Speaker recognition using PCA-based feature transformation. Speech Commun. 2019;110:33–46.CrossRef Ahmed AI, et al. Speaker recognition using PCA-based feature transformation. Speech Commun. 2019;110:33–46.CrossRef
17.
Zurück zum Zitat Krishna SR, Rajeswara R, Vizianagaram V. SVM based emotion recognition using spectral features and PCA. Int J Pure Appl Math. 2017;114(9):227–35. Krishna SR, Rajeswara R, Vizianagaram V. SVM based emotion recognition using spectral features and PCA. Int J Pure Appl Math. 2017;114(9):227–35.
18.
Zurück zum Zitat Nirjhor S, Chowdhury MAR, Sabab M. Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques. 2019, Brac University. Nirjhor S, Chowdhury MAR, Sabab M. Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques. 2019, Brac University.
19.
Zurück zum Zitat Saleh M, Ibrahim N, Ramli D. Data reduction on MFCC features based on kernel PCA for speaker verification system. WALIA journal. 2014;30(2):56–62. Saleh M, Ibrahim N, Ramli D. Data reduction on MFCC features based on kernel PCA for speaker verification system. WALIA journal. 2014;30(2):56–62.
20.
Zurück zum Zitat Winursito A, Hidayat R, Bejo A. Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. in 2018 International Conference on Information and Communications Technology (ICOIACT). 2018. IEEE. Winursito A, Hidayat R, Bejo A. Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. in 2018 International Conference on Information and Communications Technology (ICOIACT). 2018. IEEE.
21.
Zurück zum Zitat Albadr MA, et al. Genetic algorithm based on natural selection theory for optimization problems. Symmetry. 2020;12(11):1758.CrossRef Albadr MA, et al. Genetic algorithm based on natural selection theory for optimization problems. Symmetry. 2020;12(11):1758.CrossRef
22.
Zurück zum Zitat Huang GB, et al. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2011. 42(2): p. 513–529. Huang GB, et al. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2011. 42(2): p. 513–529.
23.
Zurück zum Zitat Huang GB, Chen L, Siew CK. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Networks. 2006;17(4):879–92.CrossRef Huang GB, Chen L, Siew CK. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Networks. 2006;17(4):879–92.CrossRef
24.
Zurück zum Zitat Kaya H, Karpov AA. Efficient and effective strategies for cross-corpus acoustic emotion recognition. Neurocomputing. 2018;275:1028–34.CrossRef Kaya H, Karpov AA. Efficient and effective strategies for cross-corpus acoustic emotion recognition. Neurocomputing. 2018;275:1028–34.CrossRef
25.
Zurück zum Zitat Chauhan PM, Desai NP. Mel frequency cepstral coefficients (mfcc) based speaker identification in noisy environment using wiener filter. in Green Computing Communication and Electrical Engineering (ICGCCEE), 2014 International Conference on. 2014. IEEE. Chauhan PM, Desai NP. Mel frequency cepstral coefficients (mfcc) based speaker identification in noisy environment using wiener filter. in Green Computing Communication and Electrical Engineering (ICGCCEE), 2014 International Conference on. 2014. IEEE.
26.
Zurück zum Zitat Martinez J, et al. Speaker recognition using Mel frequency cepstral coefficients (MFCC) and vector quantization (VQ) techniques. in Electrical Communications and Computers (CONIELECOMP), 2012 22nd International Conference on. 2012. IEEE. Martinez J, et al. Speaker recognition using Mel frequency cepstral coefficients (MFCC) and vector quantization (VQ) techniques. in Electrical Communications and Computers (CONIELECOMP), 2012 22nd International Conference on. 2012. IEEE.
27.
Zurück zum Zitat Mannepalli K, Sastry PN, Suman M. MFCC-GMM based accent recognition system for Telugu speech signals. Int J Speech Technol. 2016;19(1):87–93.CrossRef Mannepalli K, Sastry PN, Suman M. MFCC-GMM based accent recognition system for Telugu speech signals. Int J Speech Technol. 2016;19(1):87–93.CrossRef
28.
Zurück zum Zitat Soorajkumar R, et al. Text-independent automatic accent identification system for Kannada language. in Proceedings of the International Conference on Data Engineering and Communication Technology. 2017. Springer. Soorajkumar R, et al. Text-independent automatic accent identification system for Kannada language. in Proceedings of the International Conference on Data Engineering and Communication Technology. 2017. Springer.
29.
Zurück zum Zitat Olvera MM, Sánchez A, Escobar LH. Web-based automatic language identification system. Int J Inf Electr Eng. 2016;6(5):304. Olvera MM, Sánchez A, Escobar LH. Web-based automatic language identification system. Int J Inf Electr Eng. 2016;6(5):304.
30.
Zurück zum Zitat Rajpal A, et al. Native language identification using spectral and source-based features. Interspeech. 2016;2016:2383–7. Rajpal A, et al. Native language identification using spectral and source-based features. Interspeech. 2016;2016:2383–7.
31.
Zurück zum Zitat Sarmah K, Bhattacharjee U. GMM based language identification using MFCC and SDC features. Int J Comput Appl. 2014. 85(5). Sarmah K, Bhattacharjee U. GMM based language identification using MFCC and SDC features. Int J Comput Appl. 2014. 85(5).
32.
Zurück zum Zitat Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process. 1980;28(4):357–66.CrossRef Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process. 1980;28(4):357–66.CrossRef
33.
Zurück zum Zitat Lee SM, et al. Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition. in IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU'01. 2001. IEEE. Lee SM, et al. Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition. in IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU'01. 2001. IEEE.
34.
Zurück zum Zitat Ganchev T, Fakotakis N, Kokkinakis G. Comparative evaluation of various MFCC implementations on the speaker verification task. in Proceedings of the SPECOM. 2005. Ganchev T, Fakotakis N, Kokkinakis G. Comparative evaluation of various MFCC implementations on the speaker verification task. in Proceedings of the SPECOM. 2005.
35.
Zurück zum Zitat Lima A, et al. On the use of kernel PCA for feature extraction in speech recognition. IEICE Trans Inf Syst. 2004;87(12):2802–11. Lima A, et al. On the use of kernel PCA for feature extraction in speech recognition. IEICE Trans Inf Syst. 2004;87(12):2802–11.
36.
Zurück zum Zitat Hasan MR, Jamil M, Rahman M. Speaker identification using mel frequency cepstral coefficients. variations, 2004. 1(4). Hasan MR, Jamil M, Rahman M. Speaker identification using mel frequency cepstral coefficients. variations, 2004. 1(4).
37.
Zurück zum Zitat Mishra P, Agrawal S. Recognition of voice using Mel cepstral coefficient & vector quantization. Int J Eng Res Appl. 2012;2(2):933–8. Mishra P, Agrawal S. Recognition of voice using Mel cepstral coefficient & vector quantization. Int J Eng Res Appl. 2012;2(2):933–8.
38.
Zurück zum Zitat Kalamani M, Valarmathy S, Anitha S. Automatic speech recognition using ELM and KNN classifiers. Int J Innov Res Comp Commun Engr. 2015;3(4):3145–52. Kalamani M, Valarmathy S, Anitha S. Automatic speech recognition using ELM and KNN classifiers. Int J Innov Res Comp Commun Engr. 2015;3(4):3145–52.
39.
Zurück zum Zitat Albadr MAA, Tiun S. Spoken language identification based on particle swarm optimisation–extreme learning machine approach. Circuits, Systems, and Signal Processing, 2020: p. 1–27. Albadr MAA, Tiun S. Spoken language identification based on particle swarm optimisation–extreme learning machine approach. Circuits, Systems, and Signal Processing, 2020: p. 1–27.
40.
Zurück zum Zitat Huang GB. An insight into extreme learning machines: random neurons, random features and kernels. Cogn Comput. 2014;6(3):376–90.CrossRef Huang GB. An insight into extreme learning machines: random neurons, random features and kernels. Cogn Comput. 2014;6(3):376–90.CrossRef
41.
Zurück zum Zitat Albadra MAA, Tiuna S. Extreme learning machine: a review. Int J Appl Eng Res. 2017;12(14):4610–23. Albadra MAA, Tiuna S. Extreme learning machine: a review. Int J Appl Eng Res. 2017;12(14):4610–23.
42.
Zurück zum Zitat Huang G, et al. Trends in extreme learning machines: a review. Neural Netw. 2015;61:32–48.CrossRef Huang G, et al. Trends in extreme learning machines: a review. Neural Netw. 2015;61:32–48.CrossRef
43.
Zurück zum Zitat Huang GB, Zhu QY, Siew CK. Extreme learning machine: theory and applications. Neurocomputing. 2006;70(1–3):489–501.CrossRef Huang GB, Zhu QY, Siew CK. Extreme learning machine: theory and applications. Neurocomputing. 2006;70(1–3):489–501.CrossRef
44.
Zurück zum Zitat Solé-Casals J, et al. Improving a leaves automatic recognition process using PCA. in 2nd International Workshop on Practical Applications of Computational Biology and Bioinformatics (IWPACBB 2008). 2009. Springer. Solé-Casals J, et al. Improving a leaves automatic recognition process using PCA. in 2nd International Workshop on Practical Applications of Computational Biology and Bioinformatics (IWPACBB 2008). 2009. Springer.
45.
Zurück zum Zitat Leitner C, Pernkopf F, Kubin G. Kernel PCA for speech enhancement. in Twelfth Annual Conference of the International Speech Communication Association. 2011. Leitner C, Pernkopf F, Kubin G. Kernel PCA for speech enhancement. in Twelfth Annual Conference of the International Speech Communication Association. 2011.
46.
Zurück zum Zitat Sokolova M, Japkowicz N, Szpakowicz S. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. in Australasian joint conference on artificial intelligence. 2006. Springer. Sokolova M, Japkowicz N, Szpakowicz S. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. in Australasian joint conference on artificial intelligence. 2006. Springer.
47.
Zurück zum Zitat Tiun S. Experiments on Malay short text classification. in 2017 6th International Conference on Electrical Engineering and Informatics (ICEEI). 2017. IEEE. Tiun S. Experiments on Malay short text classification. in 2017 6th International Conference on Electrical Engineering and Informatics (ICEEI). 2017. IEEE.
48.
Zurück zum Zitat Candil RZ. Exploiting temporal context in speech technologies using lstm recurrent neural networks. 2018, Universidad Autónoma de Madrid. Candil RZ. Exploiting temporal context in speech technologies using lstm recurrent neural networks. 2018, Universidad Autónoma de Madrid.
49.
Zurück zum Zitat Gonzalez-Dominguez J, et al. Frame-by-frame language identification in short utterances using deep neural networks. Neural Netw. 2015;64:49–58.CrossRef Gonzalez-Dominguez J, et al. Frame-by-frame language identification in short utterances using deep neural networks. Neural Netw. 2015;64:49–58.CrossRef
50.
Zurück zum Zitat Lozano-Diez A, et al. An end-to-end approach to language identification in short utterances using convolutional neural networks. in Sixteenth Annual Conference of the International Speech Communication Association. 2015. Lozano-Diez A, et al. An end-to-end approach to language identification in short utterances using convolutional neural networks. in Sixteenth Annual Conference of the International Speech Communication Association. 2015.
51.
Zurück zum Zitat Nercessian S, Torres-Carrasquillo P, Martinez-Montes G. Approaches for language identification in mismatched environments. in 2016 IEEE Spoken Language Technology Workshop (SLT). 2016. IEEE. Nercessian S, Torres-Carrasquillo P, Martinez-Montes G. Approaches for language identification in mismatched environments. in 2016 IEEE Spoken Language Technology Workshop (SLT). 2016. IEEE.
52.
Zurück zum Zitat Singh OP. Exploration of sparse representation techniques in language recognition. 2019. Singh OP. Exploration of sparse representation techniques in language recognition. 2019.
53.
Zurück zum Zitat Zazo R, et al. Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PloS one, 2016. 11(1): p. e0146917. Zazo R, et al. Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PloS one, 2016. 11(1): p. e0146917.
Metadaten
Titel
Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems
verfasst von
Musatafa Abbas Abbood Albadr
Sabrina Tiun
Masri Ayob
Manal Mohammed
Fahad Taha AL-Dhief
Publikationsdatum
16.07.2021
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 5/2021
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-021-09914-w

Weitere Artikel der Ausgabe 5/2021

Cognitive Computation 5/2021 Zur Ausgabe