nach oben

International Journal of Speech Technology

Erschienen in:

12.12.2017

Language identification using phase information

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The present work investigates the importance of phase in language identification (LID). We have proposed three phase based features for the language recognition task. In this work, auto-regressive model with scale factor error augmentation have been used for better representation of phase based features. We have developed three group delay based systems, namely, normal group delay based system, auto-regressive model group delay based system and auto-regressive group delay with scale factor augmentation based system. As mel-frequency cepstral coefficients (MFCCs) are extracted from the magnitude of the Fourier transform, we have combined this MFCC-based system with our phase-based systems to exploit the complete information contained in a speech signal. In this work, we have used IITKGP-MLILSC speech database and OGI Multi-language Telephone Speech (OGI-MLTS) corpus for our experiments. We have used Gaussian mixture models for building the language models. From the experimental results it is observed that the LID accuracy obtained from our proposed phase based features is comparable with MFCC features. We have also observed some performance improvement in the LID accuracy on combining the proposed phase-based systems with the state of the art MFCC-based system.

Vorheriger Artikel Combining evidences from excitation source and vocal tract system features for Indian language identification using deep neural networks

Nächster Artikel Prosody modification for speech recognition in emotionally mismatched conditions

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Alsteris, L. D., & Paliwal, K. K. (2004). Importance of window shape for phase-only reconstruction of speech. IEEE, 1, 1–573.CrossRef

Alvin, M. Robert, W. Goodman, F.J. (1989). Improved automatic language identification in noisy speech. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 528–531).

Balleda, J. Murthy, H. A. & Nagarajan, T. (2000). Language identification from short segments of speech. In Interspeech (pp. 1033–1036).

Bhaskar, B. Nandi, D. & Rao, K. S. (2013). Analysis of language identification performance based on gender and hierarchial grouping approaches. In International Conference on Natural Language Processing (ICON-2013), CDAC, Noida, India.

Dutta, A. K. & Rao, K. S. (2015, August, 20-22). Robust language identification using power normalized cepstral coefficients. In Eighth International Conference on Contemporary Computing, IC3 Noida, India (pp. 253–256).

Foil, J. T. (1986). Language identification using noisy speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 861–864).

Hegde, R. M., Murthy, H. A., & Gadde, V. R. R. (2007). Significance of the modified group delay feature in speech recognition. IEEE Transactions on Audio, Speech & Language Processing, 15(1), 190–202.CrossRef

Itahashi, S. Zhou, J. X. & Tanaka, K. (1994). Spoken language discrimination using speech fundamental frequency. In Third International Conference on Spoken Language Processing.

Kadambe, S. & Hieronymus, J. L. (1995). Language identification with phonological and lexical models. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 5, pp. 3507–3510).

Leonard, G. (1980). Language recognition test and evaluation.

Li, K.-P. (1994). Automatic language identification using syllabic spectral features. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 1–297).

Liu, L., He, J., & Palm, G. (1997). Effects of phase on the perception of intervocalic stop consonants. Speech Communication, 22(4), 403–417.CrossRef

Loweimi, E. Ahadi, S. M. & Sheikhzadeh, H. (2011). Phase-only speech reconstruction using very short frames. In Twelfth Annual Conference of the International Speech Communication Association.

Maity, S. Vuppala, A. K. Rao, K. S. & Nandi, D. (2012). IITKGP-MLILSC speech database for language identification. In IEEE National Conference on Communications (NCC) (pp. 1–5).

Martínez, D. Burget, L. Ferrer, L. & Scheffer, N. (2012). ivector-based prosodic system for language identification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4861–4864).

Mary, L. & Yegnanarayana, B. (2004). Autoassociative neural network models for language identification. In IEEE Intelligent Sensing and Information Processing. Proceedings of International Conference on (pp. 317–320).

Mary, L. (2006). Multilevel implicit features for language and speaker recognition.

Mary, Y. B. L. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Commun, 50, 782–796.CrossRef

Murthy, H. A. (1992). Algorithms for processing fourier transform phase of signals, Ph. D. Dissertation, Department of Computer Science and Engineering, Indian Institute of Technology, Madras, India.

Muthusamy, Y. K. Cole, R. Gopalakrishnan, M. et al., (1991). A segment-based approach to automatic language identification. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 353–356).

Muthusamy, Y. K. Cole, R. A. Oshika, B. T. Consortium, L. D. et al., (1992). The ogi multi-language telephone speech corpus. In Citeseer ICSLP (vol. 92, pp. 895–898).

Nagarajan, T. & Murthy, H. A. (2004). Language identification using parallel syllable-like unit recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 1, pp. 1–401).

Nandi, D. Dutta, A. K. & Rao, K. S. (2014). Significance of CV transition and steady vowel regions for language identification, in IEEE Seventh International Conference on Contemporary Computing (IC3) (pp. 513–517).

Nandi, D., Pati, D., & Rao, K. S. (2015). Implicit excitation source features for robust language identification. International Journal of Speech Technology, 18(3), 459–477.CrossRef

Ohm, G. S. (1843). Uber die definition des tones, nebst daran geknfter theorie der sirene und hnlicher tonbildender vorichtungen. Annual Review of Physical Chemistry, 135(8), 513–565.CrossRef

Oppenheim, A. V., & Lim, J. S. (1981). The importance of phase in signals. Proceedings of the IEEE, 69, 529–550.CrossRef

Oppenheim, A. V., Schafer, R. W., Buck, J. R., et al. (1989). Discrete-time signal processing. New Jersey: Prentice-hall Englewood Cliffs.MATH

Pellegrino, F. & André-Obrecht, R. (1999). An unsupervised approach to language identification. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (vol. 2, pp. 833–836).

Rao, K. S., Maity, S., & Reddy, V. R. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology, 16(4), 413–430.CrossRef

Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.CrossRef

Sangwan, A. Mehrabani, M. & Hansen, J. H. (2010). Automatic language analysis and identification based on speech production knowledge. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) (pp. 5006–5009).

Savic, M. Acosta, E. & Gupta, S. K. (1991). An automatic language identification system. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 817–820).

Sugiyama, M. (1991). Automatic language recognition using acoustic features. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 813–816).

Tribolet, J. (1977). A new phase unwrapping algorithm. IEEE Transactions on Acoustics Speech and Signal Processing, 25(2), 170–177.CrossRefMATH

Titel: Language identification using phase information
Publikationsdatum: 12.12.2017
Erschienen in: International Journal of Speech Technology / Ausgabe 3/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-017-9482-5

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Arbeitszeit/© granata68 / Fotolia, E-Autos im Fuhrpark: Lohnt sich das noch?/© Petair / stock.adobe.com, Kryptowährungen/© gopixa / Getty Images / iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2018

Improved i-vector extraction technique for speaker verification with short utterances

Sparse coding of i-vector/JFA latent vector over ensemble dictionaries for language identification systems

Assessment on impact of various types of dysarthria on acoustic parameters of speech

Distant speech processing for smart home: comparison of ASR approaches in scattered microphone network for voice command

Gain optimization for millimeter wave reflectarray antennas based on a phase gradient approach

Study on processing of wavelet speech denoising in speech recognition system

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.