Top

International Journal of Machine Learning and Cybernetics

Published in:

28-01-2019 | Original Article

A lazy learning-based language identification from speech using MFCC-2 features

Authors: Himadri Mukherjee, Sk Md Obaidullah, K. C. Santosh, Santanu Phadikar, Kaushik Roy

Published in: International Journal of Machine Learning and Cybernetics | Issue 1/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Developing an automatic speech recognition system for multilingual countries like India is a challenging task due to the fact that the people are inured to using multiple languages while talking. This makes language identification from speech an important and essential task prior to recognition of the same. In this paper a system is proposed towards language identification from multilingual speech signals. A new second level Mel frequency cepstral coefficient-based feature named MFCC-2 that handles the large and uneven dimensionality of MFCC has been used to characterize languages in the thick of English, Bangla and Hindi. The system has been tested with recordings of as many as 12,000 utterances of numerals and 41,884 clips extracted from YouTube videos considering background music, data from multiple environments, avoidance of noise suppression and use of keywords from different languages in a single phrase. The highest and average accuracies (for Top-3 classifiers from a pool of nine classifiers) of 98.09% and 95.54%, respectively were achieved for YouTube data.

next article Symmetric uncertainty class-feature association map for feature selection in microarray dataset

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

inform now

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

inform now

Ali R, Naim I (2015) User feedback based metasearching using neural network. Int J Mach Learn Cybern 6(2):265–275CrossRef

Audacity. http://www.audacityteam.org/. Accessed 20 Oct 2018

Bang S, Kang J, Jhun M, Kim E (2017) Hierarchically penalized support vector machine with grouped variables. Int J Mach Learn Cybern 8(4):1211–1221CrossRef

Bekker AJ, Opher I, Lapidot I, Goldberger J (2016) Intra-cluster training strategy for deep learning with applications to language identification. In: MLSP, pp 1–6

Berkling KM, Barnard E (1994) Language identification of six languages based on a common set of broad phonemes. In: ICSLP, pp 1891–1894

Bhalke D, Rao CR, Bormane DS (2016) Automatic musical instrument classification using fractional fourier transform based-mfcc features and counter propagation neural network. J Intell Inf Syst 46(3):425–446CrossRef

Bouguelia MR, Nowaczyk S, Santosh K, Verikas A (2018) Agreeing to disagree: active learning with noisy labels without crowdsourcing. Int J Mach Learn Cybern 9(8):1307–1319CrossRef

Bracewell RN, Bracewell RN (1986) The Fourier transform and its applications, vol 31999. McGraw-Hill, New YorkMATH

Chandrasekhar V, Sargin ME, Ross DA (2011) Automatic language identification in music videos with low level audio and visual features. In: ICASSP, pp 5724–5727

10.

Chen S, Cao J, Gan L, Song Q, Han D (2018) Experimental study on generalization capability of extended naive bayesian classifier. Int J Mach Learn Cybern 9(1):5–19CrossRef

11.

Cleary JG, Trigg LE (1995) K*: an instance-based learner using an entropic distance measure identification. In: 12th ICML, pp 108–114CrossRef

12.

Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH

13.

Ethnologue. http://www.ethnologue.com/. Accessed 20 Oct 2018

14.

Fei J, Wang T (2018) Adaptive fuzzy-neural-network based on rbfnn control for active power filter. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-018-0792-y CrossRef

15.

Galván IM, Valls JM, García M, Isasi P (2011) A lazy learning approach for building classification models. Int J Intell Syst 26(8):773–786CrossRef

16.

Garcia EK, Feldman S, Gupta MR, Srivastava S (2009) Completely lazy learning. IEEE Trans Knowl Data Eng 9:1274–1285

17.

Ghazikhani A, Monsefi R, Yazdi HS (2014) Online neural network model for non-stationary and imbalanced data stream classification. Int J Mach Learn Cybern 5(1):51–62CrossRef

18.

Gheisari S, Meybodi M, Dehghan M, Ebadzadeh M (2017) Bayesian network structure training based on a game of learning automata. Int J Mach Learn Cybern 8(4):1093–1105CrossRef

19.

Haldar R, Mishra PK (2016) A novel approach for multilingual speech recognition with back propagation artificial neural network. Int J Recent Innov Trends Comput Commun 4(5):312–318

20.

Halder C, Obaidullah SM, Roy K (2015) Effect of writer information on bangla handwritten character recognition. In: Computer vision, pattern recognition, image processing and graphics (NCVPRIPG), 2015 fifth national conference on, IEEE, pp 1–4

21.

Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor 11(1):10–18CrossRef

22.

Hieronymus J, Kadambe S (1997) Robust spoken language identification using large vocabulary speech recognition. In: ICASSP, pp 1111–1114

23.

Kashiwagi Y, Zhang C, Saito D, Minematsu N (2016) Divergence estimation based on deep neural networks and its use for language identification. In: ICASSP, pp 5435–5439

24.

Koolagudi SG, Rastogi D, Rao KS (2012) Identification of language using mel-frequency cepstral coefficients (mfcc). Proc Eng 38:3391–3398CrossRef

25.

Lamel LF, Gauvain JL (1994) Language identification using phone-based acoustic likelihoods. ICASSP 1:293–296

26.

Lopez-Moreno I, Gonzalez-Dominguez J, Plchot O, Martinez D, Gonzalez-Rodriguez J, Moreno P (2014) Automatic language identification using deep neural networks. In: ICASSP, pp 5374–5378

27.

Lowe S, Demedts A, Gillick L, Mandel M, Peskin B (1994) Language identification via large vocabulary speaker independent continuous speech recognition. In: ARPA human language technology workshop, pp 437–441

28.

Mendoza S, Gillick L, Ito Y, Lowe S, Newman M (1996) Automatic language identification using large vocabulary continuous speech recognition. In: ICASSP, pp 785–788

29.

Mohanty S (2011) Phonotactic model for spoken language identification in indian language perspective. Int J Comput Appl 19(9):18–24

30.

Muda L, Begam M, Elamvazuthi I (2010) Voice recognition algorithms using mel frequency cepstral coefficient (mfcc) and dynamic time warping (dtw) techniques. Int J Comput Appl 2(3):138–143

31.

Mukherjee H, Dhar A, Phadikar S, Roy K (2017) Recal-a language identification system. In: Signal processing and communication (ICSPC), 2017 international conference on, IEEE, pp 300–304

32.

Mukherjee H, Obaidullah SM, Santosh K, Phadikar S, Roy K (2018) Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. Int J Speech Technol 21(4):735–760CrossRef

33.

Muthusamy YK, Berkling KM, T Arai RAC, Barnard E (1993) A comparison of approaches to automatic language identification using telephone speech. In: Eurospeech, pp 1307–1310

34.

Niesler T, Willett D (2006) Language identification and multilingual speech recognition using discriminatively trained acoustic models. In: Multilingual speech and language processing

35.

Obaidullah SM, Halder C, Santosh KC, Das N, Roy K (2017) PHDIndic_11: page-level handwritten document image dataset of 11 official indic scripts for script identification. Multimed Tools Appl 77(2):1643–1678CrossRef

36.

Peng Z, Hu Q, Dang J (2017) Multi-kernel svm based depression recognition using social media data. Int J Mach Learn Cybern 10(1):43–57CrossRef

37.

Philippot E, Santosh K, Belaïd A, Belaïd Y (2015) Bayesian networks for incomplete data analysis in form processing. Int J Mach Learn Cybern 6(3):347–363CrossRef

38.

Rai MK, Neetish, Fahad MS, Yadav J, Rao KS (2016) Language identification using plda based on i-vector in noisy environment. In: ICACCI, pp 1014–1020

39.

Ranjan S, Yu C, Zhang C, Kelly F, Hansen JHL (2016) Language recognition using deep neural network with very limited training data. In: ICASSP, pp 5830–5834

40.

Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. Signal Process Lett 22(10):1671–1675CrossRef

41.

Sharkawy AB, El-Sharief MA, Soliman MES (2014) Surface roughness prediction in end milling process using intelligent systems. Int J Mach Learn Cybern 5(1):135–150CrossRef

42.

Singer E, Torres-Carrasquillo P, Gleason T, Campbell W, Reynolds D (2003) Acoustic, phonetic, and discriminative approaches to automatic language identification. In: Eurospeech, pp 1345–1348

43.

Singha J, Laskar RH (2017) Hand gesture recognition using two-level speed normalization, feature selection and classifier fusion. Multimed Syst 23(4):499–514CrossRef

44.

Vajda S, Santosh K (2016) A fast k-nearest neighbor classifier using unsupervised clustering. In: International conference on recent trends in image processing and pattern recognition, Springer, pp 185–193

45.

Verma P, Das PK (2015) i-vectors in speech processing applications: a survey. Int J Speech Technol 18(4):529–546CrossRef

46.

Webb GI (2010) Lazy learning, Springer US, Boston, pp 571–572. https://doi.org/10.1007/978-0-387-30164-8_443

47.

(WEKA) CP. http://weka.sourceforge.net/doc.stable/. Accessed 20 Oct 2018

48.

Wong K, Siu M (2004) Automatic language identification using discrete hidden markov model. In: ICSLP, pp 399–402

49.

Yang L, Xu Z (2017) Feature extraction by pca and diagnosis of breast tumors using SVM with DE-based parameter tuning. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-017-0741-1 CrossRef

50.

Yang X, Dong Y, Li J (2017) Review of data features-based music emotion recognition methods. Multimed Syst 24(4):365–389CrossRef

51.

YouTube. https://www.youtube.com/. Accessed 20 Oct 2018

52.

Zhang Y (2017) A projected-based neural network method for second-order cone programming. Int J Mach Learn Cybern 8(6):1907–1914CrossRef

53.

Zissman MA, Berkling KM (2001) Automatic language identification. Speech Commun 35:115–124CrossRef

54.

Zissman MA, Singer E (1994) Automatic language identification of telephone speech messages using phoneme recognition and n-gram modeling. In: ICASSP, pp 305–308

Title: A lazy learning-based language identification from speech using MFCC-2 features
Authors: Himadri Mukherjee
Sk Md Obaidullah
K. C. Santosh
Santanu Phadikar
Kaushik Roy
Publication date: 28-01-2019
Publisher: Springer Berlin Heidelberg
Published in: International Journal of Machine Learning and Cybernetics / Issue 1/2020
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-019-00928-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Other articles of this Issue 1/2020

Surface electromyography feature extraction via convolutional neural network

On selective learning in stochastic stepwise ensembles

Automatic optic disc detection using low-rank representation based semi-supervised extreme learning machine

Fine-art painting classification via two-channel dual path networks

Local attribute reductions of formal contexts

Extreme vector machine for fast training on large data