Skip to main content
Erschienen in: International Journal of Speech Technology 3/2018

12.12.2017

Language identification using phase information

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The present work investigates the importance of phase in language identification (LID). We have proposed three phase based features for the language recognition task. In this work, auto-regressive model with scale factor error augmentation have been used for better representation of phase based features. We have developed three group delay based systems, namely, normal group delay based system, auto-regressive model group delay based system and auto-regressive group delay with scale factor augmentation based system. As mel-frequency cepstral coefficients (MFCCs) are extracted from the magnitude of the Fourier transform, we have combined this MFCC-based system with our phase-based systems to exploit the complete information contained in a speech signal. In this work, we have used IITKGP-MLILSC speech database and OGI Multi-language Telephone Speech (OGI-MLTS) corpus for our experiments. We have used Gaussian mixture models for building the language models. From the experimental results it is observed that the LID accuracy obtained from our proposed phase based features is comparable with MFCC features. We have also observed some performance improvement in the LID accuracy on combining the proposed phase-based systems with the state of the art MFCC-based system.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alsteris, L. D., & Paliwal, K. K. (2004). Importance of window shape for phase-only reconstruction of speech. IEEE, 1, 1–573.CrossRef Alsteris, L. D., & Paliwal, K. K. (2004). Importance of window shape for phase-only reconstruction of speech. IEEE, 1, 1–573.CrossRef
Zurück zum Zitat Alvin, M. Robert, W. Goodman, F.J. (1989). Improved automatic language identification in noisy speech. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 528–531). Alvin, M. Robert, W. Goodman, F.J. (1989). Improved automatic language identification in noisy speech. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 528–531).
Zurück zum Zitat Balleda, J. Murthy, H. A. & Nagarajan, T. (2000). Language identification from short segments of speech. In Interspeech (pp. 1033–1036). Balleda, J. Murthy, H. A. &  Nagarajan, T. (2000). Language identification from short segments of speech. In Interspeech (pp. 1033–1036).
Zurück zum Zitat Bhaskar, B. Nandi, D. & Rao, K. S. (2013). Analysis of language identification performance based on gender and hierarchial grouping approaches. In International Conference on Natural Language Processing (ICON-2013), CDAC, Noida, India. Bhaskar, B.  Nandi, D. & Rao, K. S. (2013). Analysis of language identification performance based on gender and hierarchial grouping approaches. In International Conference on Natural Language Processing (ICON-2013), CDAC, Noida, India.
Zurück zum Zitat Dutta, A. K. & Rao, K. S. (2015, August, 20-22). Robust language identification using power normalized cepstral coefficients. In Eighth International Conference on Contemporary Computing, IC3 Noida, India (pp. 253–256). Dutta, A. K. & Rao, K. S. (2015, August, 20-22). Robust language identification using power normalized cepstral coefficients. In Eighth International Conference on Contemporary Computing, IC3 Noida, India (pp. 253–256).
Zurück zum Zitat Foil, J. T. (1986). Language identification using noisy speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 861–864). Foil, J. T. (1986). Language identification using noisy speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 861–864).
Zurück zum Zitat Hegde, R. M., Murthy, H. A., & Gadde, V. R. R. (2007). Significance of the modified group delay feature in speech recognition. IEEE Transactions on Audio, Speech & Language Processing, 15(1), 190–202.CrossRef Hegde, R. M., Murthy, H. A., & Gadde, V. R. R. (2007). Significance of the modified group delay feature in speech recognition. IEEE Transactions on Audio, Speech & Language Processing, 15(1), 190–202.CrossRef
Zurück zum Zitat Itahashi, S. Zhou, J. X. & Tanaka, K. (1994). Spoken language discrimination using speech fundamental frequency. In Third International Conference on Spoken Language Processing. Itahashi, S. Zhou, J. X. &  Tanaka, K. (1994). Spoken language discrimination using speech fundamental frequency. In Third International Conference on Spoken Language Processing.
Zurück zum Zitat Kadambe, S. & Hieronymus, J. L. (1995). Language identification with phonological and lexical models. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 5, pp. 3507–3510). Kadambe, S. & Hieronymus, J. L. (1995). Language identification with phonological and lexical models. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 5, pp. 3507–3510).
Zurück zum Zitat Leonard, G. (1980). Language recognition test and evaluation. Leonard, G. (1980). Language recognition test and evaluation.
Zurück zum Zitat Li, K.-P. (1994). Automatic language identification using syllabic spectral features. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 1–297). Li, K.-P. (1994). Automatic language identification using syllabic spectral features. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 1–297).
Zurück zum Zitat Liu, L., He, J., & Palm, G. (1997). Effects of phase on the perception of intervocalic stop consonants. Speech Communication, 22(4), 403–417.CrossRef Liu, L., He, J., & Palm, G. (1997). Effects of phase on the perception of intervocalic stop consonants. Speech Communication, 22(4), 403–417.CrossRef
Zurück zum Zitat Loweimi, E. Ahadi, S. M. & Sheikhzadeh, H. (2011). Phase-only speech reconstruction using very short frames. In Twelfth Annual Conference of the International Speech Communication Association. Loweimi, E. Ahadi, S. M. &  Sheikhzadeh, H. (2011). Phase-only speech reconstruction using very short frames. In Twelfth Annual Conference of the International Speech Communication Association.
Zurück zum Zitat Maity, S. Vuppala, A. K. Rao, K. S. & Nandi, D. (2012). IITKGP-MLILSC speech database for language identification. In IEEE National Conference on Communications (NCC) (pp. 1–5). Maity, S. Vuppala, A. K. Rao, K. S. &  Nandi, D. (2012). IITKGP-MLILSC speech database for language identification. In IEEE National Conference on Communications (NCC) (pp. 1–5).
Zurück zum Zitat Martínez, D. Burget, L. Ferrer, L. & Scheffer, N. (2012). ivector-based prosodic system for language identification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4861–4864). Martínez, D.  Burget, L.  Ferrer, L. &  Scheffer, N. (2012). ivector-based prosodic system for language identification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4861–4864).
Zurück zum Zitat Mary, L. & Yegnanarayana, B. (2004). Autoassociative neural network models for language identification. In IEEE Intelligent Sensing and Information Processing. Proceedings of International Conference on (pp. 317–320). Mary, L. &  Yegnanarayana, B. (2004). Autoassociative neural network models for language identification. In IEEE Intelligent Sensing and Information Processing. Proceedings of International Conference on (pp. 317–320).
Zurück zum Zitat Mary, L. (2006). Multilevel implicit features for language and speaker recognition. Mary, L. (2006). Multilevel implicit features for language and speaker recognition.
Zurück zum Zitat Mary, Y. B. L. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Commun, 50, 782–796.CrossRef Mary, Y. B. L. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Commun, 50, 782–796.CrossRef
Zurück zum Zitat Murthy, H. A. (1992). Algorithms for processing fourier transform phase of signals, Ph. D. Dissertation, Department of Computer Science and Engineering, Indian Institute of Technology, Madras, India. Murthy, H. A. (1992). Algorithms for processing fourier transform phase of signals, Ph. D. Dissertation, Department of Computer Science and Engineering, Indian Institute of Technology, Madras, India.
Zurück zum Zitat Muthusamy, Y. K. Cole, R. Gopalakrishnan, M. et al., (1991). A segment-based approach to automatic language identification. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 353–356). Muthusamy, Y. K.  Cole, R.  Gopalakrishnan, M. et al., (1991). A segment-based approach to automatic language identification. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 353–356).
Zurück zum Zitat Muthusamy, Y. K. Cole, R. A. Oshika, B. T. Consortium, L. D. et al., (1992). The ogi multi-language telephone speech corpus. In Citeseer ICSLP (vol. 92, pp. 895–898). Muthusamy, Y. K. Cole, R. A. Oshika, B. T. Consortium, L. D. et al., (1992). The ogi multi-language telephone speech corpus. In Citeseer ICSLP (vol. 92, pp. 895–898).
Zurück zum Zitat Nagarajan, T. & Murthy, H. A. (2004). Language identification using parallel syllable-like unit recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 1–401). Nagarajan, T. & Murthy, H. A. (2004). Language identification using parallel syllable-like unit recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 1–401).
Zurück zum Zitat Nandi, D. Dutta, A. K. & Rao, K. S. (2014). Significance of CV transition and steady vowel regions for language identification, in IEEE Seventh International Conference on Contemporary Computing (IC3) (pp. 513–517). Nandi, D. Dutta, A. K. & Rao, K. S. (2014). Significance of CV transition and steady vowel regions for language identification, in IEEE Seventh International Conference on Contemporary Computing (IC3) (pp. 513–517).
Zurück zum Zitat Nandi, D., Pati, D., & Rao, K. S. (2015). Implicit excitation source features for robust language identification. International Journal of Speech Technology, 18(3), 459–477.CrossRef Nandi, D., Pati, D., & Rao, K. S. (2015). Implicit excitation source features for robust language identification. International Journal of Speech Technology, 18(3), 459–477.CrossRef
Zurück zum Zitat Ohm, G. S. (1843). Uber die definition des tones, nebst daran geknfter theorie der sirene und hnlicher tonbildender vorichtungen. Annual Review of Physical Chemistry, 135(8), 513–565.CrossRef Ohm, G. S. (1843). Uber die definition des tones, nebst daran geknfter theorie der sirene und hnlicher tonbildender vorichtungen. Annual Review of Physical Chemistry, 135(8), 513–565.CrossRef
Zurück zum Zitat Oppenheim, A. V., & Lim, J. S. (1981). The importance of phase in signals. Proceedings of the IEEE, 69, 529–550.CrossRef Oppenheim, A. V., & Lim, J. S. (1981). The importance of phase in signals. Proceedings of the IEEE, 69, 529–550.CrossRef
Zurück zum Zitat Oppenheim, A. V., Schafer, R. W., Buck, J. R., et al. (1989). Discrete-time signal processing. New Jersey: Prentice-hall Englewood Cliffs.MATH Oppenheim, A. V., Schafer, R. W., Buck, J. R., et al. (1989). Discrete-time signal processing. New Jersey: Prentice-hall Englewood Cliffs.MATH
Zurück zum Zitat Pellegrino, F. & André-Obrecht, R. (1999). An unsupervised approach to language identification. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 2, pp. 833–836). Pellegrino, F. &  André-Obrecht, R. (1999). An unsupervised approach to language identification. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 2, pp. 833–836).
Zurück zum Zitat Rao, K. S., Maity, S., & Reddy, V. R. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology, 16(4), 413–430.CrossRef Rao, K. S., Maity, S., & Reddy, V. R. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology, 16(4), 413–430.CrossRef
Zurück zum Zitat Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.CrossRef Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.CrossRef
Zurück zum Zitat Sangwan, A. Mehrabani, M. & Hansen, J. H. (2010). Automatic language analysis and identification based on speech production knowledge. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) (pp. 5006–5009). Sangwan, A.  Mehrabani, M. & Hansen, J. H. (2010). Automatic language analysis and identification based on speech production knowledge. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) (pp. 5006–5009).
Zurück zum Zitat Savic, M. Acosta, E. & Gupta, S. K. (1991). An automatic language identification system. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 817–820). Savic, M.  Acosta, E. & Gupta, S. K. (1991). An automatic language identification system. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 817–820).
Zurück zum Zitat Sugiyama, M. (1991). Automatic language recognition using acoustic features. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 813–816). Sugiyama, M. (1991). Automatic language recognition using acoustic features. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 813–816).
Zurück zum Zitat Tribolet, J. (1977). A new phase unwrapping algorithm. IEEE Transactions on Acoustics Speech and Signal Processing, 25(2), 170–177.CrossRefMATH Tribolet, J. (1977). A new phase unwrapping algorithm. IEEE Transactions on Acoustics Speech and Signal Processing, 25(2), 170–177.CrossRefMATH
Metadaten
Titel
Language identification using phase information
Publikationsdatum
12.12.2017
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-017-9482-5

Weitere Artikel der Ausgabe 3/2018

International Journal of Speech Technology 3/2018 Zur Ausgabe

Neuer Inhalt