Skip to main content
Erschienen in: International Journal of Speech Technology 1/2016

28.11.2015

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

verfasst von: Phani Kumar Polasi, Kalva Sri Rama Krishna

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Language Identification has gained significant importance in recent years, both in research and commercial market place, demanding an improvement in the ability of machines to distinguish between languages. Although methods like Gaussian mixture models, hidden Markov models and neural networks are used for identifying languages the problem of language identification in noisy environments could not be addressed so far. This paper addresses the performance of automatic language identification system in noisy environments. A comparative performance analysis of speech enhancement techniques like minimum mean squared estimation, spectral subtraction and temporal processing, with different types of noise at different SNRs, is presented here. Though these individual enhancement techniques may not yield good performance with different types of noise at different SNRs, it is proposed to combine the evidences of all these techniques to improve the overall performance of the system significantly. The language identification studies are performed using IITKGP-MLILSC (IIT Kharagpur-Multilingual Indian Language Speech Corpus) databases which consists of 27 languages.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ambikairajah, E., et al. (2011). Language identification: A tutorial. Circuits and Systems Magazine IEEE, 11(2), 82–108.CrossRef Ambikairajah, E., et al. (2011). Language identification: A tutorial. Circuits and Systems Magazine IEEE, 11(2), 82–108.CrossRef
Zurück zum Zitat Benesty, J., Sondhi, M. M., & Huang, Y. (Eds.). (2008). Springer handbook of speech processing. Berlin: Springer. Benesty, J., Sondhi, M. M., & Huang, Y. (Eds.). (2008). Springer handbook of speech processing. Berlin: Springer.
Zurück zum Zitat Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 113–120.CrossRef Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 113–120.CrossRef
Zurück zum Zitat Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38.MathSciNetMATH Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38.MathSciNetMATH
Zurück zum Zitat Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 33(2), 443–445.CrossRef Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 33(2), 443–445.CrossRef
Zurück zum Zitat Foil, J. (1986). Language identification using noisy speech. Acoustics, Speech, and Signal Processing, IEEE international conference on ICASSP’86. Vol. 11. IEEE. Foil, J. (1986). Language identification using noisy speech. Acoustics, Speech, and Signal Processing, IEEE international conference on ICASSP’86. Vol. 11. IEEE.
Zurück zum Zitat Goodman, F. J., Martin, A. F., & Wohlford, R. (1989). Improved automatic language identification in noisy speech. Acoustics, Speech, and Signal Processing, 1989. ICASSP-89., 1989 international conference on. IEEE. Goodman, F. J., Martin, A. F., & Wohlford, R. (1989). Improved automatic language identification in noisy speech. Acoustics, Speech, and Signal Processing, 1989. ICASSP-89., 1989 international conference on. IEEE.
Zurück zum Zitat Hegde, R. M., & Murthy, H. A. (2005) Automatic language identification and discrimination using the modified group delay feature. In Intelligent Sensing and Information Processing, 2005. Proceedings of 2005 International Conference on. IEEE. Hegde, R. M., & Murthy, H. A. (2005) Automatic language identification and discrimination using the modified group delay feature. In Intelligent Sensing and Information Processing, 2005. Proceedings of 2005 International Conference on. IEEE.
Zurück zum Zitat Jothilakshmi, S., Ramalingam, V., & Palanivel, S. (2012). A hierarchical language identification system for Indian languages. Digital Signal Processing, 22(3), 544–553.CrossRefMathSciNet Jothilakshmi, S., Ramalingam, V., & Palanivel, S. (2012). A hierarchical language identification system for Indian languages. Digital Signal Processing, 22(3), 544–553.CrossRefMathSciNet
Zurück zum Zitat Krishnamoorthy, P., & Prasanna, S. R. M. (2009). Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments. Sadhana, 34(5), 729–754.CrossRef Krishnamoorthy, P., & Prasanna, S. R. M. (2009). Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments. Sadhana, 34(5), 729–754.CrossRef
Zurück zum Zitat Lander, T., Cole, R., Oshika, B., & Noel, M. (1995). The OGI 22 language telephone speech corpus. In Eurospeech (pp. 1894–1903). Lander, T., Cole, R., Oshika, B., & Noel, M. (1995). The OGI 22 language telephone speech corpus. In Eurospeech (pp. 1894–1903).
Zurück zum Zitat Lawson, A., McLaren, M., Lei, Y., Mitra, V., Scheffer, N., Ferrer, L., & Graciarena, M. (2013). Improving language identification robustness to highly channel-degraded speech through multiple system fusion. In INTERSPEECH (pp. 1507–1510). Lyon. Lawson, A., McLaren, M., Lei, Y., Mitra, V., Scheffer, N., Ferrer, L., & Graciarena, M. (2013). Improving language identification robustness to highly channel-degraded speech through multiple system fusion. In INTERSPEECH (pp. 1507–1510). Lyon.
Zurück zum Zitat Maity, S., et al. (2012). IITKGP-MLILSC speech database for language identification. Communications (NCC), 2012 National Conference on. IEEE. Maity, S., et al. (2012). IITKGP-MLILSC speech database for language identification. Communications (NCC), 2012 National Conference on. IEEE.
Zurück zum Zitat Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50(10), 782–796.CrossRef Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50(10), 782–796.CrossRef
Zurück zum Zitat Nakagawa, S., Ueda, Y., & Seino T. (1992). Speaker-independent, text-independent language identification by HMM. ICSLP. Vol. 92. Nakagawa, S., Ueda, Y., & Seino T. (1992). Speaker-independent, text-independent language identification by HMM. ICSLP. Vol. 92.
Zurück zum Zitat Rao, K. S., Maity, S., & Reddy, V. R. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology, 16(4), 413–430.CrossRef Rao, K. S., Maity, S., & Reddy, V. R. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology, 16(4), 413–430.CrossRef
Zurück zum Zitat Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.CrossRef Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.CrossRef
Zurück zum Zitat Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.CrossRef Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.CrossRef
Zurück zum Zitat Vuppala, A. K., Rao, K. S., Chakrabarti, S., Krishnamoorthy, P., & Prasanna, S. R. M. (2011). Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing. International Journal of Speech Technology, 14(3), 259–272.CrossRef Vuppala, A. K., Rao, K. S., Chakrabarti, S., Krishnamoorthy, P., & Prasanna, S. R. M. (2011). Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing. International Journal of Speech Technology, 14(3), 259–272.CrossRef
Zurück zum Zitat Vuppala, A. K., & Sreenivasa Rao, K. (2013). Vowel onset point detection for noisy speech using spectral energy at formant frequencies. International Journal of Speech Technology, 16(2), 229–235.CrossRef Vuppala, A. K., & Sreenivasa Rao, K. (2013). Vowel onset point detection for noisy speech using spectral energy at formant frequencies. International Journal of Speech Technology, 16(2), 229–235.CrossRef
Zurück zum Zitat Zissman, M. A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), 31.CrossRef Zissman, M. A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), 31.CrossRef
Metadaten
Titel
Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise
verfasst von
Phani Kumar Polasi
Kalva Sri Rama Krishna
Publikationsdatum
28.11.2015
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-015-9326-0

Weitere Artikel der Ausgabe 1/2016

International Journal of Speech Technology 1/2016 Zur Ausgabe

Neuer Inhalt