Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 1/2020

28-01-2019 | Original Article

A lazy learning-based language identification from speech using MFCC-2 features

Authors: Himadri Mukherjee, Sk Md Obaidullah, K. C. Santosh, Santanu Phadikar, Kaushik Roy

Published in: International Journal of Machine Learning and Cybernetics | Issue 1/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Developing an automatic speech recognition system for multilingual countries like India is a challenging task due to the fact that the people are inured to using multiple languages while talking. This makes language identification from speech an important and essential task prior to recognition of the same. In this paper a system is proposed towards language identification from multilingual speech signals. A new second level Mel frequency cepstral coefficient-based feature named MFCC-2 that handles the large and uneven dimensionality of MFCC has been used to characterize languages in the thick of English, Bangla and Hindi. The system has been tested with recordings of as many as 12,000 utterances of numerals and 41,884 clips extracted from YouTube videos considering background music, data from multiple environments, avoidance of noise suppression and use of keywords from different languages in a single phrase. The highest and average accuracies (for Top-3 classifiers from a pool of nine classifiers) of 98.09% and 95.54%, respectively were achieved for YouTube data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Ali R, Naim I (2015) User feedback based metasearching using neural network. Int J Mach Learn Cybern 6(2):265–275CrossRef Ali R, Naim I (2015) User feedback based metasearching using neural network. Int J Mach Learn Cybern 6(2):265–275CrossRef
3.
go back to reference Bang S, Kang J, Jhun M, Kim E (2017) Hierarchically penalized support vector machine with grouped variables. Int J Mach Learn Cybern 8(4):1211–1221CrossRef Bang S, Kang J, Jhun M, Kim E (2017) Hierarchically penalized support vector machine with grouped variables. Int J Mach Learn Cybern 8(4):1211–1221CrossRef
4.
go back to reference Bekker AJ, Opher I, Lapidot I, Goldberger J (2016) Intra-cluster training strategy for deep learning with applications to language identification. In: MLSP, pp 1–6 Bekker AJ, Opher I, Lapidot I, Goldberger J (2016) Intra-cluster training strategy for deep learning with applications to language identification. In: MLSP, pp 1–6
5.
go back to reference Berkling KM, Barnard E (1994) Language identification of six languages based on a common set of broad phonemes. In: ICSLP, pp 1891–1894 Berkling KM, Barnard E (1994) Language identification of six languages based on a common set of broad phonemes. In: ICSLP, pp 1891–1894
6.
go back to reference Bhalke D, Rao CR, Bormane DS (2016) Automatic musical instrument classification using fractional fourier transform based-mfcc features and counter propagation neural network. J Intell Inf Syst 46(3):425–446CrossRef Bhalke D, Rao CR, Bormane DS (2016) Automatic musical instrument classification using fractional fourier transform based-mfcc features and counter propagation neural network. J Intell Inf Syst 46(3):425–446CrossRef
7.
go back to reference Bouguelia MR, Nowaczyk S, Santosh K, Verikas A (2018) Agreeing to disagree: active learning with noisy labels without crowdsourcing. Int J Mach Learn Cybern 9(8):1307–1319CrossRef Bouguelia MR, Nowaczyk S, Santosh K, Verikas A (2018) Agreeing to disagree: active learning with noisy labels without crowdsourcing. Int J Mach Learn Cybern 9(8):1307–1319CrossRef
8.
go back to reference Bracewell RN, Bracewell RN (1986) The Fourier transform and its applications, vol 31999. McGraw-Hill, New YorkMATH Bracewell RN, Bracewell RN (1986) The Fourier transform and its applications, vol 31999. McGraw-Hill, New YorkMATH
9.
go back to reference Chandrasekhar V, Sargin ME, Ross DA (2011) Automatic language identification in music videos with low level audio and visual features. In: ICASSP, pp 5724–5727 Chandrasekhar V, Sargin ME, Ross DA (2011) Automatic language identification in music videos with low level audio and visual features. In: ICASSP, pp 5724–5727
10.
go back to reference Chen S, Cao J, Gan L, Song Q, Han D (2018) Experimental study on generalization capability of extended naive bayesian classifier. Int J Mach Learn Cybern 9(1):5–19CrossRef Chen S, Cao J, Gan L, Song Q, Han D (2018) Experimental study on generalization capability of extended naive bayesian classifier. Int J Mach Learn Cybern 9(1):5–19CrossRef
11.
go back to reference Cleary JG, Trigg LE (1995) K*: an instance-based learner using an entropic distance measure identification. In: 12th ICML, pp 108–114CrossRef Cleary JG, Trigg LE (1995) K*: an instance-based learner using an entropic distance measure identification. In: 12th ICML, pp 108–114CrossRef
12.
go back to reference Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
15.
go back to reference Galván IM, Valls JM, García M, Isasi P (2011) A lazy learning approach for building classification models. Int J Intell Syst 26(8):773–786CrossRef Galván IM, Valls JM, García M, Isasi P (2011) A lazy learning approach for building classification models. Int J Intell Syst 26(8):773–786CrossRef
16.
go back to reference Garcia EK, Feldman S, Gupta MR, Srivastava S (2009) Completely lazy learning. IEEE Trans Knowl Data Eng 9:1274–1285 Garcia EK, Feldman S, Gupta MR, Srivastava S (2009) Completely lazy learning. IEEE Trans Knowl Data Eng 9:1274–1285
17.
go back to reference Ghazikhani A, Monsefi R, Yazdi HS (2014) Online neural network model for non-stationary and imbalanced data stream classification. Int J Mach Learn Cybern 5(1):51–62CrossRef Ghazikhani A, Monsefi R, Yazdi HS (2014) Online neural network model for non-stationary and imbalanced data stream classification. Int J Mach Learn Cybern 5(1):51–62CrossRef
18.
go back to reference Gheisari S, Meybodi M, Dehghan M, Ebadzadeh M (2017) Bayesian network structure training based on a game of learning automata. Int J Mach Learn Cybern 8(4):1093–1105CrossRef Gheisari S, Meybodi M, Dehghan M, Ebadzadeh M (2017) Bayesian network structure training based on a game of learning automata. Int J Mach Learn Cybern 8(4):1093–1105CrossRef
19.
go back to reference Haldar R, Mishra PK (2016) A novel approach for multilingual speech recognition with back propagation artificial neural network. Int J Recent Innov Trends Comput Commun 4(5):312–318 Haldar R, Mishra PK (2016) A novel approach for multilingual speech recognition with back propagation artificial neural network. Int J Recent Innov Trends Comput Commun 4(5):312–318
20.
go back to reference Halder C, Obaidullah SM, Roy K (2015) Effect of writer information on bangla handwritten character recognition. In: Computer vision, pattern recognition, image processing and graphics (NCVPRIPG), 2015 fifth national conference on, IEEE, pp 1–4 Halder C, Obaidullah SM, Roy K (2015) Effect of writer information on bangla handwritten character recognition. In: Computer vision, pattern recognition, image processing and graphics (NCVPRIPG), 2015 fifth national conference on, IEEE, pp 1–4
21.
go back to reference Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor 11(1):10–18CrossRef Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor 11(1):10–18CrossRef
22.
go back to reference Hieronymus J, Kadambe S (1997) Robust spoken language identification using large vocabulary speech recognition. In: ICASSP, pp 1111–1114 Hieronymus J, Kadambe S (1997) Robust spoken language identification using large vocabulary speech recognition. In: ICASSP, pp 1111–1114
23.
go back to reference Kashiwagi Y, Zhang C, Saito D, Minematsu N (2016) Divergence estimation based on deep neural networks and its use for language identification. In: ICASSP, pp 5435–5439 Kashiwagi Y, Zhang C, Saito D, Minematsu N (2016) Divergence estimation based on deep neural networks and its use for language identification. In: ICASSP, pp 5435–5439
24.
go back to reference Koolagudi SG, Rastogi D, Rao KS (2012) Identification of language using mel-frequency cepstral coefficients (mfcc). Proc Eng 38:3391–3398CrossRef Koolagudi SG, Rastogi D, Rao KS (2012) Identification of language using mel-frequency cepstral coefficients (mfcc). Proc Eng 38:3391–3398CrossRef
25.
go back to reference Lamel LF, Gauvain JL (1994) Language identification using phone-based acoustic likelihoods. ICASSP 1:293–296 Lamel LF, Gauvain JL (1994) Language identification using phone-based acoustic likelihoods. ICASSP 1:293–296
26.
go back to reference Lopez-Moreno I, Gonzalez-Dominguez J, Plchot O, Martinez D, Gonzalez-Rodriguez J, Moreno P (2014) Automatic language identification using deep neural networks. In: ICASSP, pp 5374–5378 Lopez-Moreno I, Gonzalez-Dominguez J, Plchot O, Martinez D, Gonzalez-Rodriguez J, Moreno P (2014) Automatic language identification using deep neural networks. In: ICASSP, pp 5374–5378
27.
go back to reference Lowe S, Demedts A, Gillick L, Mandel M, Peskin B (1994) Language identification via large vocabulary speaker independent continuous speech recognition. In: ARPA human language technology workshop, pp 437–441 Lowe S, Demedts A, Gillick L, Mandel M, Peskin B (1994) Language identification via large vocabulary speaker independent continuous speech recognition. In: ARPA human language technology workshop, pp 437–441
28.
go back to reference Mendoza S, Gillick L, Ito Y, Lowe S, Newman M (1996) Automatic language identification using large vocabulary continuous speech recognition. In: ICASSP, pp 785–788 Mendoza S, Gillick L, Ito Y, Lowe S, Newman M (1996) Automatic language identification using large vocabulary continuous speech recognition. In: ICASSP, pp 785–788
29.
go back to reference Mohanty S (2011) Phonotactic model for spoken language identification in indian language perspective. Int J Comput Appl 19(9):18–24 Mohanty S (2011) Phonotactic model for spoken language identification in indian language perspective. Int J Comput Appl 19(9):18–24
30.
go back to reference Muda L, Begam M, Elamvazuthi I (2010) Voice recognition algorithms using mel frequency cepstral coefficient (mfcc) and dynamic time warping (dtw) techniques. Int J Comput Appl 2(3):138–143 Muda L, Begam M, Elamvazuthi I (2010) Voice recognition algorithms using mel frequency cepstral coefficient (mfcc) and dynamic time warping (dtw) techniques. Int J Comput Appl 2(3):138–143
31.
go back to reference Mukherjee H, Dhar A, Phadikar S, Roy K (2017) Recal-a language identification system. In: Signal processing and communication (ICSPC), 2017 international conference on, IEEE, pp 300–304 Mukherjee H, Dhar A, Phadikar S, Roy K (2017) Recal-a language identification system. In: Signal processing and communication (ICSPC), 2017 international conference on, IEEE, pp 300–304
32.
go back to reference Mukherjee H, Obaidullah SM, Santosh K, Phadikar S, Roy K (2018) Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. Int J Speech Technol 21(4):735–760CrossRef Mukherjee H, Obaidullah SM, Santosh K, Phadikar S, Roy K (2018) Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. Int J Speech Technol 21(4):735–760CrossRef
33.
go back to reference Muthusamy YK, Berkling KM, T Arai RAC, Barnard E (1993) A comparison of approaches to automatic language identification using telephone speech. In: Eurospeech, pp 1307–1310 Muthusamy YK, Berkling KM, T Arai RAC, Barnard E (1993) A comparison of approaches to automatic language identification using telephone speech. In: Eurospeech, pp 1307–1310
34.
go back to reference Niesler T, Willett D (2006) Language identification and multilingual speech recognition using discriminatively trained acoustic models. In: Multilingual speech and language processing Niesler T, Willett D (2006) Language identification and multilingual speech recognition using discriminatively trained acoustic models. In: Multilingual speech and language processing
35.
go back to reference Obaidullah SM, Halder C, Santosh KC, Das N, Roy K (2017) PHDIndic_11: page-level handwritten document image dataset of 11 official indic scripts for script identification. Multimed Tools Appl 77(2):1643–1678CrossRef Obaidullah SM, Halder C, Santosh KC, Das N, Roy K (2017) PHDIndic_11: page-level handwritten document image dataset of 11 official indic scripts for script identification. Multimed Tools Appl 77(2):1643–1678CrossRef
36.
go back to reference Peng Z, Hu Q, Dang J (2017) Multi-kernel svm based depression recognition using social media data. Int J Mach Learn Cybern 10(1):43–57CrossRef Peng Z, Hu Q, Dang J (2017) Multi-kernel svm based depression recognition using social media data. Int J Mach Learn Cybern 10(1):43–57CrossRef
37.
go back to reference Philippot E, Santosh K, Belaïd A, Belaïd Y (2015) Bayesian networks for incomplete data analysis in form processing. Int J Mach Learn Cybern 6(3):347–363CrossRef Philippot E, Santosh K, Belaïd A, Belaïd Y (2015) Bayesian networks for incomplete data analysis in form processing. Int J Mach Learn Cybern 6(3):347–363CrossRef
38.
go back to reference Rai MK, Neetish, Fahad MS, Yadav J, Rao KS (2016) Language identification using plda based on i-vector in noisy environment. In: ICACCI, pp 1014–1020 Rai MK, Neetish, Fahad MS, Yadav J, Rao KS (2016) Language identification using plda based on i-vector in noisy environment. In: ICACCI, pp 1014–1020
39.
go back to reference Ranjan S, Yu C, Zhang C, Kelly F, Hansen JHL (2016) Language recognition using deep neural network with very limited training data. In: ICASSP, pp 5830–5834 Ranjan S, Yu C, Zhang C, Kelly F, Hansen JHL (2016) Language recognition using deep neural network with very limited training data. In: ICASSP, pp 5830–5834
40.
go back to reference Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. Signal Process Lett 22(10):1671–1675CrossRef Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. Signal Process Lett 22(10):1671–1675CrossRef
41.
go back to reference Sharkawy AB, El-Sharief MA, Soliman MES (2014) Surface roughness prediction in end milling process using intelligent systems. Int J Mach Learn Cybern 5(1):135–150CrossRef Sharkawy AB, El-Sharief MA, Soliman MES (2014) Surface roughness prediction in end milling process using intelligent systems. Int J Mach Learn Cybern 5(1):135–150CrossRef
42.
go back to reference Singer E, Torres-Carrasquillo P, Gleason T, Campbell W, Reynolds D (2003) Acoustic, phonetic, and discriminative approaches to automatic language identification. In: Eurospeech, pp 1345–1348 Singer E, Torres-Carrasquillo P, Gleason T, Campbell W, Reynolds D (2003) Acoustic, phonetic, and discriminative approaches to automatic language identification. In: Eurospeech, pp 1345–1348
43.
go back to reference Singha J, Laskar RH (2017) Hand gesture recognition using two-level speed normalization, feature selection and classifier fusion. Multimed Syst 23(4):499–514CrossRef Singha J, Laskar RH (2017) Hand gesture recognition using two-level speed normalization, feature selection and classifier fusion. Multimed Syst 23(4):499–514CrossRef
44.
go back to reference Vajda S, Santosh K (2016) A fast k-nearest neighbor classifier using unsupervised clustering. In: International conference on recent trends in image processing and pattern recognition, Springer, pp 185–193 Vajda S, Santosh K (2016) A fast k-nearest neighbor classifier using unsupervised clustering. In: International conference on recent trends in image processing and pattern recognition, Springer, pp 185–193
45.
go back to reference Verma P, Das PK (2015) i-vectors in speech processing applications: a survey. Int J Speech Technol 18(4):529–546CrossRef Verma P, Das PK (2015) i-vectors in speech processing applications: a survey. Int J Speech Technol 18(4):529–546CrossRef
48.
go back to reference Wong K, Siu M (2004) Automatic language identification using discrete hidden markov model. In: ICSLP, pp 399–402 Wong K, Siu M (2004) Automatic language identification using discrete hidden markov model. In: ICSLP, pp 399–402
50.
go back to reference Yang X, Dong Y, Li J (2017) Review of data features-based music emotion recognition methods. Multimed Syst 24(4):365–389CrossRef Yang X, Dong Y, Li J (2017) Review of data features-based music emotion recognition methods. Multimed Syst 24(4):365–389CrossRef
52.
go back to reference Zhang Y (2017) A projected-based neural network method for second-order cone programming. Int J Mach Learn Cybern 8(6):1907–1914CrossRef Zhang Y (2017) A projected-based neural network method for second-order cone programming. Int J Mach Learn Cybern 8(6):1907–1914CrossRef
53.
go back to reference Zissman MA, Berkling KM (2001) Automatic language identification. Speech Commun 35:115–124CrossRef Zissman MA, Berkling KM (2001) Automatic language identification. Speech Commun 35:115–124CrossRef
54.
go back to reference Zissman MA, Singer E (1994) Automatic language identification of telephone speech messages using phoneme recognition and n-gram modeling. In: ICASSP, pp 305–308 Zissman MA, Singer E (1994) Automatic language identification of telephone speech messages using phoneme recognition and n-gram modeling. In: ICASSP, pp 305–308
Metadata
Title
A lazy learning-based language identification from speech using MFCC-2 features
Authors
Himadri Mukherjee
Sk Md Obaidullah
K. C. Santosh
Santanu Phadikar
Kaushik Roy
Publication date
28-01-2019
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 1/2020
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-019-00928-3

Other articles of this Issue 1/2020

International Journal of Machine Learning and Cybernetics 1/2020 Go to the issue