nach oben

Cognitive Computation

Erschienen in:

16.07.2021

Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems

verfasst von: Musatafa Abbas Abbood Albadr, Sabrina Tiun, Masri Ayob, Manal Mohammed, Fahad Taha AL-Dhief

Erschienen in: Cognitive Computation | Ausgabe 5/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Spoken language identification (LID) is the process of determining and classifying natural language from a given content and dataset. Data must be processed to extract useful features to perform LID. The mel-frequency cepstral coefficient (MFCC) is one of the most popular feature extraction techniques in LID. The MFCC features are generated to serve as inputs for the classification stage. In this study, reduction in the MFCC feature dimension is investigated because large data size affects the computational time and resources (i.e., memory space) and slows the identification speed. The implementation of data reduction techniques to retain the most important feature parameters is also evaluated in this study. The investigation of data reduction is based on standard deviation (STD) calculation and principal component analysis (PCA). The features based on MFCC and the reduced dimensions based on STD and PCA results are then used as inputs to an optimized extreme learning machine (ELM) classifier called the optimized genetic algorithm-ELM (OGA-ELM). Several sets of data samples with one dimension of principal components (i.e., 119) are utilized for the evaluation. The results are generated using two different datasets. The first dataset is derived from eight separate languages, whereas the second dataset is a part of the National Institute of Standards and Technology Language Recognition Evaluation 2009 dataset. To evaluate the performance of the proposed method, this study utilizes several assessment measures, namely, accuracy, recall, precision, F-measure, G-mean, and identification time. The best LID performance is observed when the MFCC based on STD and PCA features with 119 feature dimensions is used with OGA-ELM as the classifier. The experimental results show that the proposed MFCC method achieves 99.38% accuracy using the first dataset. Additionally, it achieves accuracies of up to 97.60%, 96.80%, and 91.20% using the second dataset with durations of 30, 10, and 3 s, respectively. The proposed MFCC method exhibits the fastest computational time in all experiments, requiring only a few seconds to identify languages. Using a data reduction technique can substantially speed up the computational time, overcome resource limitations, and improve LID performance.

Vorheriger Artikel HANA: Hierarchical Attention Network Assembling for Semantic Segmentation

Nächster Artikel Applying Attention-Based Models for Detecting Cognitive Processes and Mental Health Conditions

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Lee KA, et al. The 2015 NIST Language Recognition Evaluation: the Shared View of I2R, Fantastic4 and SingaMS. in Interspeech. 2016.

Garg A, Gupta V, Jindal M. A survey of language identification techniques and applications. J Emerg Technol Web Intell. 2014;6(4):388–400.

Li J, et al. LSTM time and frequency recurrence for automatic speech recognition. in Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. 2015. IEEE.

Hafen RP, Henry MJ. Speech information retrieval: a review. Multimedia Syst. 2012;18(6):499–518.CrossRef

Albadr MAA, et al. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PloS one, 2018. 13(4): p. e0194770.

Ali A, et al. Big data for development: applications and techniques. Big Data Analytics. 2016;1(1):2.CrossRef

Al-Dhief FT, et al. A survey of voice pathology surveillance systems based on Internet of things and machine learning algorithms. IEEE Access. 2020;8:64514–33.CrossRef

Aleti A, et al. An efficient method for uncertainty propagation in robust software performance estimation. J Syst Softw. 2018;138:222–35.CrossRef

Salamon J, Bello JP. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett. 2017;24(3):279–83.CrossRef

10.

Anusuya M, Katti S. Speech recognition by machine: a review. 2010.

11.

Schutte KT. Parts-based models and local features for automatic speech recognition. 2009, Citeseer.

12.

Deshwal D, Sangwan P, Kumar D. Feature extraction methods in language identification: a survey. Wireless Pers Commun. 2019;107(4):2071–103.CrossRef

13.

Han W, et al. An efficient MFCC extraction method in speech recognition. in 2006 IEEE international symposium on circuits and systems. 2006. IEEE.

14.

Renanti MD, Buono A, Kusuma WA. Infant cries identification by using codebook as feature matching, and MFCC as feature extraction. J Theor Appl Inf Technol. 2013;56(2):437–42.

15.

Trang H, Loc TH, Nam HBH. Proposed combination of PCA and MFCC feature extraction in speech recognition system. in 2014 International Conference on Advanced Technologies for Communications (ATC 2014). 2014. IEEE.

16.

Ahmed AI, et al. Speaker recognition using PCA-based feature transformation. Speech Commun. 2019;110:33–46.CrossRef

17.

Krishna SR, Rajeswara R, Vizianagaram V. SVM based emotion recognition using spectral features and PCA. Int J Pure Appl Math. 2017;114(9):227–35.

18.

Nirjhor S, Chowdhury MAR, Sabab M. Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques. 2019, Brac University.

19.

Saleh M, Ibrahim N, Ramli D. Data reduction on MFCC features based on kernel PCA for speaker verification system. WALIA journal. 2014;30(2):56–62.

20.

Winursito A, Hidayat R, Bejo A. Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. in 2018 International Conference on Information and Communications Technology (ICOIACT). 2018. IEEE.

21.

Albadr MA, et al. Genetic algorithm based on natural selection theory for optimization problems. Symmetry. 2020;12(11):1758.CrossRef

22.

Huang GB, et al. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2011. 42(2): p. 513–529.

23.

Huang GB, Chen L, Siew CK. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Networks. 2006;17(4):879–92.CrossRef

24.

Kaya H, Karpov AA. Efficient and effective strategies for cross-corpus acoustic emotion recognition. Neurocomputing. 2018;275:1028–34.CrossRef

25.

Chauhan PM, Desai NP. Mel frequency cepstral coefficients (mfcc) based speaker identification in noisy environment using wiener filter. in Green Computing Communication and Electrical Engineering (ICGCCEE), 2014 International Conference on. 2014. IEEE.

26.

Martinez J, et al. Speaker recognition using Mel frequency cepstral coefficients (MFCC) and vector quantization (VQ) techniques. in Electrical Communications and Computers (CONIELECOMP), 2012 22nd International Conference on. 2012. IEEE.

27.

Mannepalli K, Sastry PN, Suman M. MFCC-GMM based accent recognition system for Telugu speech signals. Int J Speech Technol. 2016;19(1):87–93.CrossRef

28.

Soorajkumar R, et al. Text-independent automatic accent identification system for Kannada language. in Proceedings of the International Conference on Data Engineering and Communication Technology. 2017. Springer.

29.

Olvera MM, Sánchez A, Escobar LH. Web-based automatic language identification system. Int J Inf Electr Eng. 2016;6(5):304.

30.

Rajpal A, et al. Native language identification using spectral and source-based features. Interspeech. 2016;2016:2383–7.

31.

Sarmah K, Bhattacharjee U. GMM based language identification using MFCC and SDC features. Int J Comput Appl. 2014. 85(5).

32.

Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process. 1980;28(4):357–66.CrossRef

33.

Lee SM, et al. Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition. in IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU'01. 2001. IEEE.

34.

Ganchev T, Fakotakis N, Kokkinakis G. Comparative evaluation of various MFCC implementations on the speaker verification task. in Proceedings of the SPECOM. 2005.

35.

Lima A, et al. On the use of kernel PCA for feature extraction in speech recognition. IEICE Trans Inf Syst. 2004;87(12):2802–11.

36.

Hasan MR, Jamil M, Rahman M. Speaker identification using mel frequency cepstral coefficients. variations, 2004. 1(4).

37.

Mishra P, Agrawal S. Recognition of voice using Mel cepstral coefficient & vector quantization. Int J Eng Res Appl. 2012;2(2):933–8.

38.

Kalamani M, Valarmathy S, Anitha S. Automatic speech recognition using ELM and KNN classifiers. Int J Innov Res Comp Commun Engr. 2015;3(4):3145–52.

39.

Albadr MAA, Tiun S. Spoken language identification based on particle swarm optimisation–extreme learning machine approach. Circuits, Systems, and Signal Processing, 2020: p. 1–27.

40.

Huang GB. An insight into extreme learning machines: random neurons, random features and kernels. Cogn Comput. 2014;6(3):376–90.CrossRef

41.

Albadra MAA, Tiuna S. Extreme learning machine: a review. Int J Appl Eng Res. 2017;12(14):4610–23.

42.

Huang G, et al. Trends in extreme learning machines: a review. Neural Netw. 2015;61:32–48.CrossRef

43.

Huang GB, Zhu QY, Siew CK. Extreme learning machine: theory and applications. Neurocomputing. 2006;70(1–3):489–501.CrossRef

44.

Solé-Casals J, et al. Improving a leaves automatic recognition process using PCA. in 2nd International Workshop on Practical Applications of Computational Biology and Bioinformatics (IWPACBB 2008). 2009. Springer.

45.

Leitner C, Pernkopf F, Kubin G. Kernel PCA for speech enhancement. in Twelfth Annual Conference of the International Speech Communication Association. 2011.

46.

Sokolova M, Japkowicz N, Szpakowicz S. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. in Australasian joint conference on artificial intelligence. 2006. Springer.

47.

Tiun S. Experiments on Malay short text classification. in 2017 6th International Conference on Electrical Engineering and Informatics (ICEEI). 2017. IEEE.

48.

Candil RZ. Exploiting temporal context in speech technologies using lstm recurrent neural networks. 2018, Universidad Autónoma de Madrid.

49.

Gonzalez-Dominguez J, et al. Frame-by-frame language identification in short utterances using deep neural networks. Neural Netw. 2015;64:49–58.CrossRef

50.

Lozano-Diez A, et al. An end-to-end approach to language identification in short utterances using convolutional neural networks. in Sixteenth Annual Conference of the International Speech Communication Association. 2015.

51.

Nercessian S, Torres-Carrasquillo P, Martinez-Montes G. Approaches for language identification in mismatched environments. in 2016 IEEE Spoken Language Technology Workshop (SLT). 2016. IEEE.

52.

Singh OP. Exploration of sparse representation techniques in language recognition. 2019.

53.

Zazo R, et al. Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PloS one, 2016. 11(1): p. e0146917.

Titel: Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems
verfasst von: Musatafa Abbas Abbood Albadr
Sabrina Tiun
Masri Ayob
Manal Mohammed
Fahad Taha AL-Dhief
Publikationsdatum: 16.07.2021
Verlag: Springer US
Erschienen in: Cognitive Computation / Ausgabe 5/2021
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI: https://doi.org/10.1007/s12559-021-09914-w

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 5/2021

Online Handwriting, Signature and Touch Dynamics: Tasks and Potential Applications in the Field of Security and Health

Binary Chimp Optimization Algorithm (BChOA): a New Binary Meta-heuristic for Solving Optimization Problems

Training Affective Computer Vision Models by Crowdsourcing Soft-Target Labels

Aspect-Based Sentiment Analysis for User Reviews

Intuitionistic Fuzzy Three-Factor Ratio Models and Multi-preference Fusion

A New GAN-Based Approach to Data Augmentation and Image Segmentation for Crack Detection in Thermal Imaging Tests