Skip to main content
Erschienen in: Wireless Personal Communications 4/2019

22.04.2019

Feature Extraction Methods in Language Identification: A Survey

verfasst von: Deepti Deshwal, Pardeep Sangwan, Divya Kumar

Erschienen in: Wireless Personal Communications | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Language Identification (LI) is one of the widely emerging field in the areas of speech processing to accurately identify the language from the data base based on some features of the speech signal. LI technologies have a wide set of applications in different spheres due to the growing advancement in the field of artificial intelligence and machine learning. Feature extraction is one of the fundamental and significant process performed in LI. This review presents main paradigms of research in Feature Extraction methods that will provide a deep insight to the researchers about the feature extraction techniques for future studies in LI. Broadly, this review summarizes and compare various feature extraction approaches with and without noise compensation techniques as the current trend is towards robust universal Language Identification framework. This paper categorizes the different feature extraction approaches on the basis of different features, human speech production system/peripheral auditory system, spectral or cepstral analysis, and lastly on the basis of transform. Moreover, the different noise compensation-based feature extraction techniques are also covered in the review. This paper also presents, that Mel-Frequency Cepstral Coefficients (MFCCs) are the most popular approach. Results indicates that MFCC fused with other feature vectors and cleansing approaches gives improved performance as compared to the pure MFCC based Feature Extraction approaches. This study also describes the different categories at the front end of the LI system from research point of view.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aggarwal, G., & Singh, L. (2018). Classification of intellectual disability using LPC, LPCC, and WLPCC parameterization techniques. International Journal of Computers and Applications, 1–10. Aggarwal, G., & Singh, L. (2018). Classification of intellectual disability using LPC, LPCC, and WLPCC parameterization techniques. International Journal of Computers and Applications, 1–10.
2.
Zurück zum Zitat Agrawal, P., & Ganapathy, S. (2017). Speech representation learning using unsupervised data-driven modulation filtering for robust ASR. In Proceedings of the Interspeech (pp. 2446–2450). Agrawal, P., & Ganapathy, S. (2017). Speech representation learning using unsupervised data-driven modulation filtering for robust ASR. In Proceedings of the Interspeech (pp. 2446–2450).
3.
Zurück zum Zitat Al-Ali, A. K. H., Dean, D., Senadji, B., Chandran, V., & Naik, G. R. (2017). Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access, 5, 15400–15413.CrossRef Al-Ali, A. K. H., Dean, D., Senadji, B., Chandran, V., & Naik, G. R. (2017). Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access, 5, 15400–15413.CrossRef
4.
Zurück zum Zitat Alonso, J. B., et al. (2017). Automatic anuran identification using noise removal and audio activity detection. Expert Systems with Applications, 73, 83–92.CrossRef Alonso, J. B., et al. (2017). Automatic anuran identification using noise removal and audio activity detection. Expert Systems with Applications, 73, 83–92.CrossRef
5.
Zurück zum Zitat Ambikairajah, E., et al. (2011). Language identification: A tutorial. IEEE Circuits and Systems Magazine, 11(2), 82–108.CrossRef Ambikairajah, E., et al. (2011). Language identification: A tutorial. IEEE Circuits and Systems Magazine, 11(2), 82–108.CrossRef
6.
Zurück zum Zitat Atal, B. S. (1976). Automatic recognition of speakers from their voices. Proceedings of the IEEE, 64(4), 460–475.CrossRef Atal, B. S. (1976). Automatic recognition of speakers from their voices. Proceedings of the IEEE, 64(4), 460–475.CrossRef
7.
Zurück zum Zitat Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under resourced languages: A survey. Speech Communication, 56, 85–100.CrossRef Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under resourced languages: A survey. Speech Communication, 56, 85–100.CrossRef
8.
Zurück zum Zitat Bharali, S. S., & Kalita, S. K. (2015). A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language. International Journal of Speech Technology, 18(4), 673–684.CrossRef Bharali, S. S., & Kalita, S. K. (2015). A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language. International Journal of Speech Technology, 18(4), 673–684.CrossRef
9.
Zurück zum Zitat Bharali, S. S., & Kalita, S. K. (2018). Speech recognition with reference to Assamese language using novel fusion technique. International Journal of Speech Technology, 21, 1–13.CrossRef Bharali, S. S., & Kalita, S. K. (2018). Speech recognition with reference to Assamese language using novel fusion technique. International Journal of Speech Technology, 21, 1–13.CrossRef
10.
Zurück zum Zitat Bharti, S. S., Gupta, M., & Agarwal, S. (2018). Background noise identification system based on random forest for speech. In International conference on intelligent computing and applications (pp. 323–332). Springer, Singapore. Bharti, S. S., Gupta, M., & Agarwal, S. (2018). Background noise identification system based on random forest for speech. In International conference on intelligent computing and applications (pp. 323–332). Springer, Singapore.
11.
Zurück zum Zitat Bielefeld, B. (1994). Language identification using shifted delta cepstrum. In Fourteenth annual speech research symposium. Bielefeld, B. (1994). Language identification using shifted delta cepstrum. In Fourteenth annual speech research symposium.
12.
Zurück zum Zitat Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.CrossRef Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.CrossRef
13.
Zurück zum Zitat Borde, P., Varpe, A., Manza, R., & Yannawar, P. (2015). Recognition of isolated words using Zernike and MFCC features for audio visual speech recognition. International Journal of Speech Technology, 18(2), 167–175.CrossRef Borde, P., Varpe, A., Manza, R., & Yannawar, P. (2015). Recognition of isolated words using Zernike and MFCC features for audio visual speech recognition. International Journal of Speech Technology, 18(2), 167–175.CrossRef
14.
Zurück zum Zitat Botha, G. R., & Etienne, B. (2012). Factors that affect the accuracy of text based language identification. Computer Speech and Language, 26, 302–320.CrossRef Botha, G. R., & Etienne, B. (2012). Factors that affect the accuracy of text based language identification. Computer Speech and Language, 26, 302–320.CrossRef
15.
Zurück zum Zitat Clemente, I. A., Heckmann, M., & Wrede, B. (2012). Incremental word learning: Efficient hmm initialization and large margin discriminative adaptation. Speech Communication, 54(9), 1029–1048.CrossRef Clemente, I. A., Heckmann, M., & Wrede, B. (2012). Incremental word learning: Efficient hmm initialization and large margin discriminative adaptation. Speech Communication, 54(9), 1029–1048.CrossRef
16.
Zurück zum Zitat Das, R. K., & Prasanna, S. M. (2018). Speaker verification from short utterance perspective: a review. IETE Technical Review, 35(6), 599–617.CrossRef Das, R. K., & Prasanna, S. M. (2018). Speaker verification from short utterance perspective: a review. IETE Technical Review35(6), 599–617.CrossRef
17.
Zurück zum Zitat Dehak, N., et al. (2011). Front end factor analysis for speaker verification. IEEE Transactions Audio, Speech, Language Processing, 19(4), 788–798.CrossRef Dehak, N., et al. (2011). Front end factor analysis for speaker verification. IEEE Transactions Audio, Speech, Language Processing, 19(4), 788–798.CrossRef
18.
Zurück zum Zitat Dehak, N., Torres-Carrasquillo, P. A., Reynolds, D., & Dehak, R. (2011). Language recognition via i-vectors and dimensionality reduction. In Twelfth annual conference of the international speech communication association. Dehak, N., Torres-Carrasquillo, P. A., Reynolds, D., & Dehak, R. (2011). Language recognition via i-vectors and dimensionality reduction. In Twelfth annual conference of the international speech communication association.
19.
Zurück zum Zitat Dey, S., Rajan, R., Padmanabhan, R., & Murthy, H. A. (2011). Feature diversity for emotion, language and speaker verification. In National conference on communications (NCC), IEEE (pp. 1–5). Dey, S., Rajan, R., Padmanabhan, R., & Murthy, H. A. (2011). Feature diversity for emotion, language and speaker verification. In National conference on communications (NCC), IEEE (pp. 1–5).
20.
Zurück zum Zitat Dişken, G., et al. (2017). A review on feature extraction for speaker recognition under degraded conditions. IETE Technical Review, 3(34), 321–332.CrossRef Dişken, G., et al. (2017). A review on feature extraction for speaker recognition under degraded conditions. IETE Technical Review, 3(34), 321–332.CrossRef
21.
Zurück zum Zitat Dustor, A., & Szwarc, P. (2010). Spoken language identification based on GMM models. In International conference in signals and electronic systems (ICSES), IEEE (pp. 105–108). Dustor, A., & Szwarc, P. (2010). Spoken language identification based on GMM models. In International conference in signals and electronic systems (ICSES), IEEE (pp. 105–108).
22.
Zurück zum Zitat Dutta, S. K., & Singh, L. J. (2018). A comparison of three spectral features for phone recognition in sub-optimal environments. International Journal of Applied Pattern Recognition, 5(2), 137–148.CrossRef Dutta, S. K., & Singh, L. J. (2018). A comparison of three spectral features for phone recognition in sub-optimal environments. International Journal of Applied Pattern Recognition, 5(2), 137–148.CrossRef
23.
Zurück zum Zitat Echeverry-Correa, J., et al. (2015). Topic identification techniques applied to dynamic language model adaptation for automatic speech recognition. Expert System with Applications, 42(1), 101–112.CrossRef Echeverry-Correa, J., et al. (2015). Topic identification techniques applied to dynamic language model adaptation for automatic speech recognition. Expert System with Applications, 42(1), 101–112.CrossRef
24.
Zurück zum Zitat El-Fattah, M. A. A., et al. (2014). Speech enhancement with an adaptive Wiener filter. International Journal of Speech Technology, 17(1), 53–64.CrossRef El-Fattah, M. A. A., et al. (2014). Speech enhancement with an adaptive Wiener filter. International Journal of Speech Technology, 17(1), 53–64.CrossRef
25.
Zurück zum Zitat Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.CrossRef Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.CrossRef
26.
Zurück zum Zitat Fernando, S., Sethu, V., Ambikairajah, E., & Epps, J. (2017). Bidirectional modelling for short duration language identification. In Interspeech (pp. 2809–2813). Fernando, S., Sethu, V., Ambikairajah, E., & Epps, J. (2017). Bidirectional modelling for short duration language identification. In Interspeech (pp. 2809–2813).
27.
Zurück zum Zitat Ferrer, L., Lei, Y., McLaren, M., & Scheffer, N. (2016). Study of senone-based deep neural network approaches for spoken language recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(1), 105–116.CrossRef Ferrer, L., Lei, Y., McLaren, M., & Scheffer, N. (2016). Study of senone-based deep neural network approaches for spoken language recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(1), 105–116.CrossRef
28.
Zurück zum Zitat Fer, R., et al. (2017). Multilingually trained bottleneck features in spoken language recognition. Computer Speech and Language, 46, 252–267.CrossRef Fer, R., et al. (2017). Multilingually trained bottleneck features in spoken language recognition. Computer Speech and Language, 46, 252–267.CrossRef
29.
Zurück zum Zitat Gehring, J., Miao, Y., Metze, F., & Waibel, A. (2013). Extracting deep bottleneck features using stacked auto-encoders. In IEEE international conference in acoustics, speech and signal processing (ICASSP), IEEE (pp. 3377–3381). Gehring, J., Miao, Y., Metze, F., & Waibel, A. (2013). Extracting deep bottleneck features using stacked auto-encoders. In IEEE international conference in acoustics, speech and signal processing (ICASSP), IEEE (pp. 3377–3381).
30.
Zurück zum Zitat Gelly, G., Gauvain, J. L., Le, V. B., & Messaoudi, A. (2016). A divide-and-conquer approach for language identification based on recurrent neural networks. In INTERSPEECH (pp. 3231–3235). Gelly, G., Gauvain, J. L., Le, V. B., & Messaoudi, A. (2016). A divide-and-conquer approach for language identification based on recurrent neural networks. In INTERSPEECH (pp. 3231–3235).
31.
Zurück zum Zitat Giwa, O., & Davel, M. H. (2014). Language identification of individual words with joint sequence models. In Fifteenth annual conference of the international speech communication association (pp. 14–18). Giwa, O., & Davel, M. H. (2014). Language identification of individual words with joint sequence models. In Fifteenth annual conference of the international speech communication association (pp. 14–18).
32.
Zurück zum Zitat Giwa, O., & Davel, M. H. (2015). Text-based language identification of multilingual names. In Pattern recognition association of South Africa and robotics and mechatronics international conference (PRASA-RobMech) (pp. 166–171). IEEE. Giwa, O., & Davel, M. H. (2015). Text-based language identification of multilingual names. In Pattern recognition association of South Africa and robotics and mechatronics international conference (PRASA-RobMech) (pp. 166–171). IEEE.
33.
Zurück zum Zitat Gonzalez-Dominguez, J., Lopez-Moreno, I., Moreno, P. J., & Gonzalez-Rodriguez, J. (2015). Frame-by-frame language identification in short utterances using deep neural networks. Neural Networks, 64, 49–58.CrossRef Gonzalez-Dominguez, J., Lopez-Moreno, I., Moreno, P. J., & Gonzalez-Rodriguez, J. (2015). Frame-by-frame language identification in short utterances using deep neural networks. Neural Networks, 64, 49–58.CrossRef
34.
Zurück zum Zitat Gonzalez, D. R., & de Lara, J. R. C. (2009). Speaker verification with shifted delta cepstral features: Its pseudo-prosodic behaviour. In Proceedings of the I Iberian SLTech. Gonzalez, D. R., & de Lara, J. R. C. (2009). Speaker verification with shifted delta cepstral features: Its pseudo-prosodic behaviour. In Proceedings of the I Iberian SLTech.
35.
Zurück zum Zitat Goodman, F. J., Martin, A. F., & Wohlford, R. E. (1989). Improved automatic language identification in noisy speech. In International conference on acoustics, speech, and signal processing ICASSP-89 (pp. 528–531). IEEE. Goodman, F. J., Martin, A. F., & Wohlford, R. E. (1989). Improved automatic language identification in noisy speech. In International conference on acoustics, speech, and signal processing ICASSP-89 (pp. 528–531). IEEE.
36.
Zurück zum Zitat Gupta, K., & Gupta, D. (2016). An analysis on LPC, RASTA and MFCC techniques in automatic speech recognition system. In 6th international conference in cloud system and big data engineering (confluence) (pp. 493–497). IEEE. Gupta, K., & Gupta, D. (2016). An analysis on LPC, RASTA and MFCC techniques in automatic speech recognition system. In 6th international conference in cloud system and big data engineering (confluence) (pp. 493–497). IEEE.
37.
Zurück zum Zitat Heigold, G. et al. (2013). Multilingual acoustic models using distributed deep neural networks. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 8619–8623). IEEE. Heigold, G. et al. (2013). Multilingual acoustic models using distributed deep neural networks. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 8619–8623). IEEE.
38.
Zurück zum Zitat Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752.CrossRef Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752.CrossRef
39.
Zurück zum Zitat Hinton, G., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRef Hinton, G., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRef
40.
Zurück zum Zitat Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7–8), 588–601.CrossRef Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7–8), 588–601.CrossRef
41.
Zurück zum Zitat Hwang, I., Park, H. M., & Chang, J. H. (2016). Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection. Computer Speech and Language, 38, 1–12.CrossRef Hwang, I., Park, H. M., & Chang, J. H. (2016). Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection. Computer Speech and Language, 38, 1–12.CrossRef
42.
Zurück zum Zitat Hynek, H., & Nelson, M. (1994). Rasta processing of speech. IEEE Transactions on Speech Audio Processing, 2(4), 578–589.CrossRef Hynek, H., & Nelson, M. (1994). Rasta processing of speech. IEEE Transactions on Speech Audio Processing, 2(4), 578–589.CrossRef
43.
Zurück zum Zitat Itrat, M., et al. (2017). Automatic language identification for languages of Pakistan. International Journal of Computer Science and Network Security, 17(2), 161–169. Itrat, M., et al. (2017). Automatic language identification for languages of Pakistan. International Journal of Computer Science and Network Security, 17(2), 161–169.
44.
Zurück zum Zitat Joudaki, S., et al. (2014). Vision-based sign language classification: A directional review. IETE Technical Review, 5(31), 383–391.CrossRef Joudaki, S., et al. (2014). Vision-based sign language classification: A directional review. IETE Technical Review, 5(31), 383–391.CrossRef
45.
Zurück zum Zitat Jin, M., et al. (2018). LID-senones and their statistics for language identification. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 26(1), 171–183.CrossRef Jin, M., et al. (2018). LID-senones and their statistics for language identification. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 26(1), 171–183.CrossRef
46.
Zurück zum Zitat Komatsu, M. (2007). Reviewing human language identification. In Speaker classification II (pp. 206–228). Springer, Berlin. Komatsu, M. (2007). Reviewing human language identification. In Speaker classification II (pp. 206–228). Springer, Berlin.
47.
Zurück zum Zitat Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech using source, system, and prosodic features. International Journal of Speech Technology, 15(2), 265–289.CrossRef Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech using source, system, and prosodic features. International Journal of Speech Technology, 15(2), 265–289.CrossRef
48.
Zurück zum Zitat Lehner, B., Sonnleitner, R., & Widmer, G. (2013). Towards light-weight, real-time-capable singing voice detection. In ISMIR (pp. 53–58). Lehner, B., Sonnleitner, R., & Widmer, G. (2013). Towards light-weight, real-time-capable singing voice detection. In ISMIR (pp. 53–58).
49.
Zurück zum Zitat Li, H., Ma, B., & Lee, C. H. (2007). A vector space modeling approach to spoken language identification. IEEE Transactions on Audio, Speech and Language Processing, 15(1), 271–284.CrossRef Li, H., Ma, B., & Lee, C. H. (2007). A vector space modeling approach to spoken language identification. IEEE Transactions on Audio, Speech and Language Processing, 15(1), 271–284.CrossRef
50.
Zurück zum Zitat Li, H., Ma, B., & Lee, K. A. (2013). Spoken language recognition: From fundamentals to practice. Proceedings of IEEE, 101(5), 1136–1159.CrossRef Li, H., Ma, B., & Lee, K. A. (2013). Spoken language recognition: From fundamentals to practice. Proceedings of IEEE, 101(5), 1136–1159.CrossRef
51.
Zurück zum Zitat Li, K. P. (1997). Automatic language identification/verification system. U.S. Patent 5,689,616 (Google Patents). Li, K. P. (1997). Automatic language identification/verification system. U.S. Patent 5,689,616 (Google Patents).
52.
Zurück zum Zitat Li, M., & Narayanan, S. (2014). Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification. Computer Speech and Language, 28(4), 940–958.CrossRef Li, M., & Narayanan, S. (2014). Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification. Computer Speech and Language, 28(4), 940–958.CrossRef
53.
Zurück zum Zitat Loizou, P. C. (2007). Speech enhancement: Theory and practice. Boca Raton: CRC press.CrossRef Loizou, P. C. (2007). Speech enhancement: Theory and practice. Boca Raton: CRC press.CrossRef
54.
Zurück zum Zitat Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O., Martinez, D., Gonzalez-Rodriguez, J., & Moreno, P. (2014). Automatic language identification using deep neural networks. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5337–5341). IEEE. Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O., Martinez, D., Gonzalez-Rodriguez, J., & Moreno, P. (2014). Automatic language identification using deep neural networks. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5337–5341). IEEE.
55.
Zurück zum Zitat Lopez-Moreno, I., et al. (2016). On the use of deep feed forward neural networks for automatic language identification. Computer Speech and Language, 40, 46–59.CrossRef Lopez-Moreno, I., et al. (2016). On the use of deep feed forward neural networks for automatic language identification. Computer Speech and Language, 40, 46–59.CrossRef
56.
Zurück zum Zitat Lyu, D. C., Chng, E. S., & Li, H. (2013). Language diarization for conversational code-switch speech with pronunciation dictionary adaptation. In IEEE China summit and international conference on signal and information processing (pp. 147–150). IEEE. Lyu, D. C., Chng, E. S., & Li, H. (2013). Language diarization for conversational code-switch speech with pronunciation dictionary adaptation. In IEEE China summit and international conference on signal and information processing (pp. 147–150). IEEE.
57.
Zurück zum Zitat Malmasi, S., & Cahill, A. (2015). Measuring feature diversity in native language identification. In Proceedings of the tenth workshop on innovative use of NLP for building educational applications (pp. 49–55). Malmasi, S., & Cahill, A. (2015). Measuring feature diversity in native language identification. In Proceedings of the tenth workshop on innovative use of NLP for building educational applications (pp. 49–55).
58.
Zurück zum Zitat Malmasi, S., Refaee, E., & Dras, M. (2015). Arabic dialect identification using a parallel multidialectal corpus. In International conference of the pacific association for computational linguistics (pp. 35–53). Springer, Singapore. Malmasi, S., Refaee, E., & Dras, M. (2015). Arabic dialect identification using a parallel multidialectal corpus. In International conference of the pacific association for computational linguistics (pp. 35–53). Springer, Singapore.
59.
Zurück zum Zitat Martinez, D., Burget, L., Ferrer, L., & Scheffer, N. (2012). iVector-based prosodic system for language identification. In IEEE international conference in acoustics, speech and signal processing (ICASSP) (pp. 4861–48640. Martinez, D., Burget, L., Ferrer, L., & Scheffer, N. (2012). iVector-based prosodic system for language identification. In IEEE international conference in acoustics, speech and signal processing (ICASSP) (pp. 4861–48640.
60.
Zurück zum Zitat Matejka, P., Burget, L., Schwarz, P., & Cernocky, J. (2006). Speaker and language recognition workshop in Brno university of technology system for nist 2005 language recognition evaluation. In Odyssey 2006 (pp. 1–7). IEEE. Matejka, P., Burget, L., Schwarz, P., & Cernocky, J. (2006). Speaker and language recognition workshop in Brno university of technology system for nist 2005 language recognition evaluation. In Odyssey 2006 (pp. 1–7). IEEE.
61.
Zurück zum Zitat Matejka, P., et al. (2014). Neural network bottleneck features for language identification. In Proceedings of Odyssey 2014 speaker and language recognition workshop (pp. 299–304). Matejka, P., et al. (2014). Neural network bottleneck features for language identification. In Proceedings of Odyssey 2014 speaker and language recognition workshop (pp. 299–304).
62.
Zurück zum Zitat Mehrabani, M., & Hansen, J. H. (2015). Automatic analysis of dialect/language sets. International Journal of Speech Technology, 18(3), 277–286.CrossRef Mehrabani, M., & Hansen, J. H. (2015). Automatic analysis of dialect/language sets. International Journal of Speech Technology, 18(3), 277–286.CrossRef
63.
Zurück zum Zitat Mohamed, A. R., Dahl, G. E., & Hinton, G. (2012). Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech and Language Processing, 20(1), 14–22.CrossRef Mohamed, A. R., Dahl, G. E., & Hinton, G. (2012). Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech and Language Processing, 20(1), 14–22.CrossRef
64.
Zurück zum Zitat Moritz, N., Adiloğlu, K., Anemüller, J., Goetze, S., & Kollmeier, B. (2017). Multi-channel speech enhancement and amplitude modulation analysis for noise robust automatic speech recognition. Computer Speech & Language, 46, 558–573.CrossRef Moritz, N., Adiloğlu, K., Anemüller, J., Goetze, S., & Kollmeier, B. (2017). Multi-channel speech enhancement and amplitude modulation analysis for noise robust automatic speech recognition. Computer Speech & Language, 46, 558–573.CrossRef
65.
Zurück zum Zitat Mun, S., Shon, S., Kim, W., & Ko, H. (2016). Deep neural network bottleneck features for acoustic event recognition. In INTERSPEECH (pp. 2954–2957). Mun, S., Shon, S., Kim, W., & Ko, H. (2016). Deep neural network bottleneck features for acoustic event recognition. In INTERSPEECH (pp. 2954–2957).
66.
Zurück zum Zitat Nayana, P., Mathew, D., & Thomas, A. (2017). Performance comparison of speaker recognition systems using GMM and i-vector methods with PNCC and RASTA PLP features. In International conference on intelligent computing, instrumentation and control technologies(ICICICT) (pp. 438–443). Nayana, P., Mathew, D., & Thomas, A. (2017). Performance comparison of speaker recognition systems using GMM and i-vector methods with PNCC and RASTA PLP features. In International conference on intelligent computing, instrumentation and control technologies(ICICICT) (pp. 438–443).
67.
Zurück zum Zitat Nercessian, S., Torres-Carrasquillo, P., & Martinez-Montes, G. (2016). Approaches for language identification in mismatched environments. Spoken language technology workshop (SLT) (pp. 335–340). IEEE. Nercessian, S., Torres-Carrasquillo, P., & Martinez-Montes, G. (2016). Approaches for language identification in mismatched environments. Spoken language technology workshop (SLT) (pp. 335–340). IEEE.
68.
Zurück zum Zitat Ng, T., et al. (2012). Developing a speech activity detection system for the DARPA RATS program. In Thirteenth annual conference of the international speech communication association (pp. 1969–1972). Ng, T., et al. (2012). Developing a speech activity detection system for the DARPA RATS program. In Thirteenth annual conference of the international speech communication association (pp. 1969–1972).
69.
Zurück zum Zitat Olvera, M. M., Sánchez, A., & Escobar, L. H. (2016). Web-based automatic language identification system. International Journal of Information and Electronics Engineering, 6(5), 304–307. Olvera, M. M., Sánchez, A., & Escobar, L. H. (2016). Web-based automatic language identification system. International Journal of Information and Electronics Engineering, 6(5), 304–307.
70.
Zurück zum Zitat Padmanabhan, J., & Johnson Premkumar, M. J. (2015). Machine learning in automatic speech recognition: A survey. IETE Technical Review, 32(4), 240–251.CrossRef Padmanabhan, J., & Johnson Premkumar, M. J. (2015). Machine learning in automatic speech recognition: A survey. IETE Technical Review, 32(4), 240–251.CrossRef
71.
Zurück zum Zitat Palo, H. K., Chandra, M., & Mohanty, M. N. (2017). Emotion recognition using MLP and GMM for Oriya language. International Journal of Computational Vision and Robotics, 7(4), 426–442.CrossRef Palo, H. K., Chandra, M., & Mohanty, M. N. (2017). Emotion recognition using MLP and GMM for Oriya language. International Journal of Computational Vision and Robotics, 7(4), 426–442.CrossRef
72.
Zurück zum Zitat Phadikar, S., et al. (2017). Bengali phonetics identification using wavelet based signal feature. In International conference on computational intelligence, communications, and business analytics (pp. 253–265). Springer, Singapore. Phadikar, S., et al. (2017). Bengali phonetics identification using wavelet based signal feature. In International conference on computational intelligence, communications, and business analytics (pp. 253–265). Springer, Singapore.
73.
Zurück zum Zitat Poonkuzhali, C., Karthiprakash, R., Valarmathy, S., & Kalamani, M. (2013). An approach to feature selection algorithm based on ant colony optimization for automatic speech recognition. Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 2(11), 5671–5678. Poonkuzhali, C., Karthiprakash, R., Valarmathy, S., & Kalamani, M. (2013). An approach to feature selection algorithm based on ant colony optimization for automatic speech recognition. Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 2(11), 5671–5678.
74.
Zurück zum Zitat Qawaqneh, Z., Mallouh, A. A., & Barkana, B. D. (2017). Age and gender classification from speech and face images by jointly fine-tuned deep neural networks. Expert Systems with Applications, 85, 76–86.CrossRef Qawaqneh, Z., Mallouh, A. A., & Barkana, B. D. (2017). Age and gender classification from speech and face images by jointly fine-tuned deep neural networks. Expert Systems with Applications, 85, 76–86.CrossRef
75.
Zurück zum Zitat Qi, J., Wang, D., Xu, J., & Tejedor Noguerales, J. (2013). Bottleneck features based on gammatone frequency cepstral coefficients. In Interspeech. Qi, J., Wang, D., Xu, J., & Tejedor Noguerales, J. (2013). Bottleneck features based on gammatone frequency cepstral coefficients. In Interspeech.
76.
Zurück zum Zitat Rajput, N., & Verma, S. K. (2014). Back propagation feed forward neural network approach for speech recognition. In 2014 3rd international conference on reliability, Infocom technologies and optimization (ICRITO) (trends and future directions) (pp. 1–6). IEEE. Rajput, N., & Verma, S. K. (2014). Back propagation feed forward neural network approach for speech recognition. In 2014 3rd international conference on reliability, Infocom technologies and optimization (ICRITO) (trends and future directions) (pp. 1–6). IEEE.
77.
Zurück zum Zitat Rao, K. S., Maity, S., & Reddy, V. R. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology, 16(4), 413–430.CrossRef Rao, K. S., Maity, S., & Reddy, V. R. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology, 16(4), 413–430.CrossRef
78.
Zurück zum Zitat Rao, K. S., Reddy, V. R., & Maity, S. (2015). Language identification using spectral and prosodic features. Berlin: Springer.CrossRef Rao, K. S., Reddy, V. R., & Maity, S. (2015). Language identification using spectral and prosodic features. Berlin: Springer.CrossRef
79.
Zurück zum Zitat Revathi, A., Jeyalakshmi, C., & Muruganantham, T. (2018). Perceptual features based rapid and robust language identification system for various indian classical languages. Computational vision and bio inspired computing (pp. 291–305). Cham: Springer. Revathi, A., Jeyalakshmi, C., & Muruganantham, T. (2018). Perceptual features based rapid and robust language identification system for various indian classical languages. Computational vision and bio inspired computing (pp. 291–305). Cham: Springer.
80.
Zurück zum Zitat Richardson, F., Reynolds, D., & Dehak, N. (2015). A unified deep neural network for speaker and language recognition. arXiv preprint arXiv:1504.00923. Richardson, F., Reynolds, D., & Dehak, N. (2015). A unified deep neural network for speaker and language recognition. arXiv preprint arXiv:​1504.​00923.
81.
Zurück zum Zitat Richardson, F., Reynolds, D., & Dehak, N. (2015). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10), 1671–1675.CrossRef Richardson, F., Reynolds, D., & Dehak, N. (2015). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10), 1671–1675.CrossRef
82.
Zurück zum Zitat Sadjadi, S. O., & Hansen, J. H. (2015). Mean Hilbert envelope coefficients (MHEC) for Robust speaker and language identification. Speech Communication, 72, 138–148.CrossRef Sadjadi, S. O., & Hansen, J. H. (2015). Mean Hilbert envelope coefficients (MHEC) for Robust speaker and language identification. Speech Communication, 72, 138–148.CrossRef
83.
Zurück zum Zitat Sahidullah, M., & Saha, G. (2012). Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Communication, 54(4), 543–565.CrossRef Sahidullah, M., & Saha, G. (2012). Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Communication, 54(4), 543–565.CrossRef
84.
Zurück zum Zitat Sangwan, P. (2017). Feature Extraction for Speaker Recognition: A Systematic Study. Global Journal of Enterprise Information System, 9(4), 19–26. Sangwan, P. (2017). Feature Extraction for Speaker Recognition: A Systematic Study. Global Journal of Enterprise Information System, 9(4), 19–26.
85.
Zurück zum Zitat Sangwan, P., & Bhardwaj, S. (2017). A structured approach towards robust database collection for speaker recognition. Global Journal of Enterprise Information System, 9(3), 53–58.CrossRef Sangwan, P., & Bhardwaj, S. (2017). A structured approach towards robust database collection for speaker recognition. Global Journal of Enterprise Information System, 9(3), 53–58.CrossRef
86.
Zurück zum Zitat Sarma, B. D., & Prasanna, S. M. (2018). Acoustic–phonetic analysis for speech recognition: A review. IETE Technical Review, 35(3), 305–327.CrossRef Sarma, B. D., & Prasanna, S. M. (2018). Acoustic–phonetic analysis for speech recognition: A review. IETE Technical Review, 35(3), 305–327.CrossRef
87.
Zurück zum Zitat Segbroeck, V., Travadi, M. R., & Narayanan, S. S. (2015). Rapid language identification. IEEE Transactions on Audio, Speech and Language Processing, 7, 1118–1129.CrossRef Segbroeck, V., Travadi, M. R., & Narayanan, S. S. (2015). Rapid language identification. IEEE Transactions on Audio, Speech and Language Processing, 7, 1118–1129.CrossRef
88.
Zurück zum Zitat Sharma, D. P., & Atkins, J. (2014). Automatic speech recognition systems: Challenges and recent implementation trends. International Journal of Signal and Imaging Systems Engineering, 7(4), 220–234.CrossRef Sharma, D. P., & Atkins, J. (2014). Automatic speech recognition systems: Challenges and recent implementation trends. International Journal of Signal and Imaging Systems Engineering, 7(4), 220–234.CrossRef
89.
Zurück zum Zitat Sheikhan, M., Gharavian, D., & Ashoftedel, F. (2012). Using DTW neural–based MFCC warping to improve emotional speech recognition. Neural Computer and Application, 21, 1765–1773.CrossRef Sheikhan, M., Gharavian, D., & Ashoftedel, F. (2012). Using DTW neural–based MFCC warping to improve emotional speech recognition. Neural Computer and Application, 21, 1765–1773.CrossRef
90.
Zurück zum Zitat Sim, K. C., & Li, H. (2008). On acoustic diversification front-end for spoken language identification. IEEE Transactions on Audio, Speech and Language Processing, 16(5), 1029–1037.CrossRef Sim, K. C., & Li, H. (2008). On acoustic diversification front-end for spoken language identification. IEEE Transactions on Audio, Speech and Language Processing, 16(5), 1029–1037.CrossRef
91.
Zurück zum Zitat Singer, E., et al. (2003). Acoustic phonetic and discriminative approaches to automatic language recognition. In Proceedings of the Eurospeech (pp. 1345–1348). Singer, E., et al. (2003). Acoustic phonetic and discriminative approaches to automatic language recognition. In Proceedings of the Eurospeech (pp. 1345–1348).
92.
Zurück zum Zitat Sivaraman, G., et al. (2016). Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion. In INTERSPEECH (pp. 455–459). Sivaraman, G., et al. (2016). Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion. In INTERSPEECH (pp. 455–459).
93.
Zurück zum Zitat Song, Y., et al. (2015). Improved language identification using deep bottleneck network. In IEEE international conference in acoustics, speech and signal processing (ICASSP) (pp. 4200–4204). IEEE. Song, Y., et al. (2015). Improved language identification using deep bottleneck network. In IEEE international conference in acoustics, speech and signal processing (ICASSP) (pp. 4200–4204). IEEE.
94.
Zurück zum Zitat Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM neural networks for language modeling. In Thirteenth annual conference of the international speech communication association, USA (pp. 194–197). Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM neural networks for language modeling. In Thirteenth annual conference of the international speech communication association, USA (pp. 194–197).
95.
Zurück zum Zitat Sun, Y., Wen, G., & Wang, J. (2015). Weighted spectral features based on local Hu moments for speech emotion recognition. Journal of Biomedical Signal Processing and Control, 18, 80–90.CrossRef Sun, Y., Wen, G., & Wang, J. (2015). Weighted spectral features based on local Hu moments for speech emotion recognition. Journal of Biomedical Signal Processing and Control, 18, 80–90.CrossRef
96.
Zurück zum Zitat Takçı, Hidayet, & Güngör, T. (2012). A high performance centroid-based classification approach for language identification. Pattern Recognition Letters, 33(16), 2077–2084.CrossRef Takçı, Hidayet, & Güngör, T. (2012). A high performance centroid-based classification approach for language identification. Pattern Recognition Letters, 33(16), 2077–2084.CrossRef
97.
Zurück zum Zitat Tang, Z., et al. (2018). Phonetic temporal neural model for language identification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(1), 134–144.CrossRef Tang, Z., et al. (2018). Phonetic temporal neural model for language identification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(1), 134–144.CrossRef
98.
Zurück zum Zitat Thirumuru, R., & Vuppala, A. K. (2018). Automatic detection of retroflex approximants in a continuous tamil speech. Circuits, Systems, and Signal Processing, 37(7), 2837–2851.MathSciNetCrossRef Thirumuru, R., & Vuppala, A. K. (2018). Automatic detection of retroflex approximants in a continuous tamil speech. Circuits, Systems, and Signal Processing, 37(7), 2837–2851.MathSciNetCrossRef
99.
Zurück zum Zitat Torres-Carrasquillo, P. A., et al. (2002). Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In Seventh international conference on spoken language processing. Torres-Carrasquillo, P. A., et al. (2002). Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In Seventh international conference on spoken language processing.
100.
Zurück zum Zitat Upadhyay, N., & Karmakar, A. (2015). Speech enhancement using spectral subtraction-type algorithms: A comparison and simulation study. Procedia Computer Science, 54, 574–584.CrossRef Upadhyay, N., & Karmakar, A. (2015). Speech enhancement using spectral subtraction-type algorithms: A comparison and simulation study. Procedia Computer Science, 54, 574–584.CrossRef
101.
Zurück zum Zitat Verma, P., & Das, P. K. (2015). i-Vectors in speech processing applications: A survey. International Journal of Speech Technology, 18(4), 529–546.CrossRef Verma, P., & Das, P. K. (2015). i-Vectors in speech processing applications: A survey. International Journal of Speech Technology, 18(4), 529–546.CrossRef
102.
Zurück zum Zitat Viana, H. O., & Mello, C. A. (2014). Speech description through MINERS: Model invariant to noise and environment robust for speech. In 2014 IEEE international conference on systems, man and cybernatics (SMC) (pp. 489–494). IEEE. Viana, H. O., & Mello, C. A. (2014). Speech description through MINERS: Model invariant to noise and environment robust for speech. In 2014 IEEE international conference on systems, man and cybernatics (SMC) (pp. 489–494). IEEE.
103.
Zurück zum Zitat Vu, N., Imseng, D., Povey, D., & Molicek, P., Schultz, T., & Bourlard, H. (2014). Multilingual deep neural network based acoustic modelling for rapid language adaptation. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7639–7643). IEEE. Vu, N., Imseng, D., Povey, D., & Molicek, P., Schultz, T., & Bourlard, H. (2014). Multilingual deep neural network based acoustic modelling for rapid language adaptation. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 7639–7643). IEEE.
104.
Zurück zum Zitat Wang, H., et al. (2013). Shifted-delta mlp features for spoken language recognition. IEEE Signal Processing Letters, 20(1), 15–18.CrossRef Wang, H., et al. (2013). Shifted-delta mlp features for spoken language recognition. IEEE Signal Processing Letters, 20(1), 15–18.CrossRef
105.
Zurück zum Zitat Wang, H., Xu, Y., & Li, M. (2011). Study on the MFCC similarity-based voice activity detection algorithm. In 2nd international conference in artificial intelligence, management science and electronic commerce (AIMSEC) (pp. 4391–4394). IEEE. Wang, H., Xu, Y., & Li, M. (2011). Study on the MFCC similarity-based voice activity detection algorithm. In 2nd international conference in artificial intelligence, management science and electronic commerce (AIMSEC) (pp. 4391–4394). IEEE.
106.
Zurück zum Zitat Weninger, F., et al. (2014). Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments. Computer Speech and Language (Elsevier), 28(4), 888–902.CrossRef Weninger, F., et al. (2014). Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments. Computer Speech and Language (Elsevier), 28(4), 888–902.CrossRef
107.
Zurück zum Zitat Williamson, D. S., & Wang, D. (2017). Speech dereverberation and denoising using complex ratio masks. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5590–5594). IEEE. Williamson, D. S., & Wang, D. (2017). Speech dereverberation and denoising using complex ratio masks. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5590–5594). IEEE.
108.
Zurück zum Zitat Wong, K. Y. E. (2004). Automatic spoken language identification utilizing acoustic and phonetic speech information. Doctoral dissertation, Queensland University of Technology. Wong, K. Y. E. (2004). Automatic spoken language identification utilizing acoustic and phonetic speech information. Doctoral dissertation, Queensland University of Technology.
109.
Zurück zum Zitat Yapanel, U., & Hansen, J. (2008). A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition. Journal of Speech Communication, 50, 142–152.CrossRef Yapanel, U., & Hansen, J. (2008). A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition. Journal of Speech Communication, 50, 142–152.CrossRef
110.
Zurück zum Zitat Yilmaz, E., McLaren, M., van den Heuvel, H., & van Leeuwen, D. A. (2017). Language diarization for semi-supervised bilingual acoustic model training. Automatic speech recognition and understanding workshop (ASRU). 2017 IEEE (pp. 91–96). IEEE. Yilmaz, E., McLaren, M., van den Heuvel, H., & van Leeuwen, D. A. (2017). Language diarization for semi-supervised bilingual acoustic model training. Automatic speech recognition and understanding workshop (ASRU). 2017 IEEE (pp. 91–96). IEEE.
111.
Zurück zum Zitat Zazo, R., et al. (2016). Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PloS One, 11(1), e0146917.CrossRef Zazo, R., et al. (2016). Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PloS One, 11(1), e0146917.CrossRef
112.
Zurück zum Zitat Zhang, X. L., & Wu, J. (2013). Deep belief networks based voice activity detection. IEEE Transactions on Audio, Speech and Language Processing, 21(4), 697–710.CrossRef Zhang, X. L., & Wu, J. (2013). Deep belief networks based voice activity detection. IEEE Transactions on Audio, Speech and Language Processing, 21(4), 697–710.CrossRef
Metadaten
Titel
Feature Extraction Methods in Language Identification: A Survey
verfasst von
Deepti Deshwal
Pardeep Sangwan
Divya Kumar
Publikationsdatum
22.04.2019
Verlag
Springer US
Erschienen in
Wireless Personal Communications / Ausgabe 4/2019
Print ISSN: 0929-6212
Elektronische ISSN: 1572-834X
DOI
https://doi.org/10.1007/s11277-019-06373-3

Weitere Artikel der Ausgabe 4/2019

Wireless Personal Communications 4/2019 Zur Ausgabe

Neuer Inhalt