Skip to main content
Erschienen in: International Journal of Speech Technology 4/2013

01.12.2013

Identification of Indian languages using multi-level spectral and prosodic features

verfasst von: V. Ramu Reddy, Sudhamay Maity, K. Sreenivasa Rao

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper spectral and prosodic features extracted from different levels are explored for analyzing the language specific information present in speech. In this work, spectral features extracted from frames of 20 ms (block processing), individual pitch cycles (pitch synchronous analysis) and glottal closure regions are used for discriminating the languages. Prosodic features extracted from syllable, tri-syllable and multi-word (phrase) levels are proposed in addition to spectral features for capturing the language specific information. In this study, language specific prosody is represented by intonation, rhythm and stress features at syllable and tri-syllable (words) levels, whereas temporal variations in fundamental frequency (F 0 contour), durations of syllables and temporal variations in intensities (energy contour) are used to represent the prosody at multi-word (phrase) level. For analyzing the language specific information in the proposed features, Indian language speech database (IITKGP-MLILSC) is used. Gaussian mixture models are used to capture the language specific information from the proposed features. The evaluation results indicate that language identification performance is improved with combination of features. Performance of proposed features is also analyzed on standard Oregon Graduate Institute Multi-Language Telephone-based Speech (OGI-MLTS) database.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ambikairajah, E., Li, H., Wang, L., Yin, B., & Sethu, V. (2011). Language identification: a tutorial. IEEE Circuits and Systems Magazine, 11(2), 82–108. CrossRef Ambikairajah, E., Li, H., Wang, L., Yin, B., & Sethu, V. (2011). Language identification: a tutorial. IEEE Circuits and Systems Magazine, 11(2), 82–108. CrossRef
Zurück zum Zitat Benesty, J., Sondhi, M. M., & Huang, Y. (2007). Springer handbook of speech processing. New York: Springer. Benesty, J., Sondhi, M. M., & Huang, Y. (2007). Springer handbook of speech processing. New York: Springer.
Zurück zum Zitat Bhaskararao, P. (2005). Salient phonetic features of Indian languages in speech technology. Sadhana, 36(5), 587–599. CrossRef Bhaskararao, P. (2005). Salient phonetic features of Indian languages in speech technology. Sadhana, 36(5), 587–599. CrossRef
Zurück zum Zitat Carrasquillo, P. A. T., Reynolds, D. A., & Deller, J. R. (2002). Language identification using Gaussian mixture model tokenization. In Proceedings of IEEE int. conf. acoust., speech, and signal processing (Vol. I, pp. 757–760). Carrasquillo, P. A. T., Reynolds, D. A., & Deller, J. R. (2002). Language identification using Gaussian mixture model tokenization. In Proceedings of IEEE int. conf. acoust., speech, and signal processing (Vol. I, pp. 757–760).
Zurück zum Zitat Cimarusti, D., & Eves, R. B. (1982). Development of an automatic identification system of spoken languages: phase I. In Proceedings of IEEE int. conf. acoust., speech, and signal processing, May 1982 (pp. 1661–1663). Cimarusti, D., & Eves, R. B. (1982). Development of an automatic identification system of spoken languages: phase I. In Proceedings of IEEE int. conf. acoust., speech, and signal processing, May 1982 (pp. 1661–1663).
Zurück zum Zitat Cole, R. A., Inouye, J. W. T., Muthusamy, Y. K., & Gopalakrishnan, M. (1989). Language identification with neural networks: a feasibility study. In Proc. IEEE pacific rim conf. communications, computers and signal processing (pp. 525–529). CrossRef Cole, R. A., Inouye, J. W. T., Muthusamy, Y. K., & Gopalakrishnan, M. (1989). Language identification with neural networks: a feasibility study. In Proc. IEEE pacific rim conf. communications, computers and signal processing (pp. 525–529). CrossRef
Zurück zum Zitat Corredor-Ardoy, C., Gauvain, J., Adda-Decker, M., & Lamel, L. (1997). Language identification with language-independent acoustic models. In Proc. EUROSPEECH-1997 (pp. 55–58). Corredor-Ardoy, C., Gauvain, J., Adda-Decker, M., & Lamel, L. (1997). Language identification with language-independent acoustic models. In Proc. EUROSPEECH-1997 (pp. 55–58).
Zurück zum Zitat Cummins, F., Gers, F., & Schmidhuber, J. (1999). Comparing prosody across languages. Tech. rep. I. D. S. I. A. Technical report IDSIA-07-99, Istituto Molle di Studie sull’Intelligenza Artificiale, CH6900 Lugano, Switzerland. Cummins, F., Gers, F., & Schmidhuber, J. (1999). Comparing prosody across languages. Tech. rep. I. D. S. I. A. Technical report IDSIA-07-99, Istituto Molle di Studie sull’Intelligenza Artificiale, CH6900 Lugano, Switzerland.
Zurück zum Zitat Cutler, A., & Ladd, D. R. (1983). Prosody: models and measurements. Berlin: Springer. CrossRef Cutler, A., & Ladd, D. R. (1983). Prosody: models and measurements. Berlin: Springer. CrossRef
Zurück zum Zitat Dalsgaard, P., & Andersen, O. (1992). Identification of mono- and polyphonemes using acoustic-phonetic features derived by a self-organising neural network. In Proc. int. conf. spoken language processing (ICSLP- 1992) (pp. 547–550). Dalsgaard, P., & Andersen, O. (1992). Identification of mono- and polyphonemes using acoustic-phonetic features derived by a self-organising neural network. In Proc. int. conf. spoken language processing (ICSLP- 1992) (pp. 547–550).
Zurück zum Zitat Dutoit, T. (1997). An introduction to text-to-speech synthesis. Dordrecht: Kluwer Academic. CrossRef Dutoit, T. (1997). An introduction to text-to-speech synthesis. Dordrecht: Kluwer Academic. CrossRef
Zurück zum Zitat Ember, M., & Ember, C. R. (1999). Cross-language predictors of consonant-vowel syllables. American Anthropologist, 101, 730–742. CrossRef Ember, M., & Ember, C. R. (1999). Cross-language predictors of consonant-vowel syllables. American Anthropologist, 101, 730–742. CrossRef
Zurück zum Zitat Gangashetty, S. V. (2005). Neural network models for recognition of consonant-vowel units of speech in multiple languages. Ph.D. thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras. Gangashetty, S. V. (2005). Neural network models for recognition of consonant-vowel units of speech in multiple languages. Ph.D. thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras.
Zurück zum Zitat Gobbel, A. E. T., & Hutchins, S. E. (1996). On using prosodic cues in language identification. Proceedings of International Conference on Spoken Language Processing (ICSLP), 101, 1768–1772. CrossRef Gobbel, A. E. T., & Hutchins, S. E. (1996). On using prosodic cues in language identification. Proceedings of International Conference on Spoken Language Processing (ICSLP), 101, 1768–1772. CrossRef
Zurück zum Zitat Guoliang, Z., Fang, Z., & Zhanjiang, S. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(16), 582–589. MATH Guoliang, Z., Fang, Z., & Zhanjiang, S. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(16), 582–589. MATH
Zurück zum Zitat Gussenhoven, C., Reepp, B. H., Rietveld, A., Rump, H. H., & Terken, J. (1997). The perceptual prominence of fundamental frequency peaks. The Journal of the Acoustical Society of America, 102, 3009–3022. CrossRef Gussenhoven, C., Reepp, B. H., Rietveld, A., Rump, H. H., & Terken, J. (1997). The perceptual prominence of fundamental frequency peaks. The Journal of the Acoustical Society of America, 102, 3009–3022. CrossRef
Zurück zum Zitat Hazen, T. J., & Zue, V. W. (1997). Segment-based automatic language identification. The Journal of the Acoustical Society of America, 101, 2323–2331. CrossRef Hazen, T. J., & Zue, V. W. (1997). Segment-based automatic language identification. The Journal of the Acoustical Society of America, 101, 2323–2331. CrossRef
Zurück zum Zitat Ives, R. (1986). A minimal rule AI expert system for real-time classification of natural spoken languages. In Proc. 2nd annual artificial intelligence and advanced computer technology conf. (pp. 337–340). Ives, R. (1986). A minimal rule AI expert system for real-time classification of natural spoken languages. In Proc. 2nd annual artificial intelligence and advanced computer technology conf. (pp. 337–340).
Zurück zum Zitat Jayaram, A. K. V. S., Ramasubramanian, V., & Sreenivas, T. V. (2003). Language identification using parallel sub-word recognition. In Proceedings of IEEE int. conf. acoust., speech, and signal processing (Vol. I, pp. 32–35). Jayaram, A. K. V. S., Ramasubramanian, V., & Sreenivas, T. V. (2003). Language identification using parallel sub-word recognition. In Proceedings of IEEE int. conf. acoust., speech, and signal processing (Vol. I, pp. 32–35).
Zurück zum Zitat Jothilakshmi, S., Ramalingam, V., & Palanivel, S. (2012). Hierarchical language identification system for Indian languages. Digital Signal Processing, 22, 544–553. MathSciNetCrossRef Jothilakshmi, S., Ramalingam, V., & Palanivel, S. (2012). Hierarchical language identification system for Indian languages. Digital Signal Processing, 22, 544–553. MathSciNetCrossRef
Zurück zum Zitat Jyotsna, B., Murthy, H. A., & Nagarajan, T. (2000). Language identification from short segments of speech. In Proceedings of int. conf. spoken language processing, Beijing, China, Oct. 2000 (pp. 1033–1036). Jyotsna, B., Murthy, H. A., & Nagarajan, T. (2000). Language identification from short segments of speech. In Proceedings of int. conf. spoken language processing, Beijing, China, Oct. 2000 (pp. 1033–1036).
Zurück zum Zitat Koolagudi, S. G., & Sreenivasa Rao, K. (2012). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology, 15(3), 495–511. CrossRef Koolagudi, S. G., & Sreenivasa Rao, K. (2012). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology, 15(3), 495–511. CrossRef
Zurück zum Zitat Krakow, R. A. (1999). Physiological organization of syllables: a review. Journal of Phonetics, 27, 23–54. CrossRef Krakow, R. A. (1999). Physiological organization of syllables: a review. Journal of Phonetics, 27, 23–54. CrossRef
Zurück zum Zitat Kumar Vuppala, A., & Sreenivasa Rao, K. (2013). Vowel onset point detection for noisy speech using spectral energy at formant frequencies. International Journal of Speech Technology, 16(2), 229–235. CrossRef Kumar Vuppala, A., & Sreenivasa Rao, K. (2013). Vowel onset point detection for noisy speech using spectral energy at formant frequencies. International Journal of Speech Technology, 16(2), 229–235. CrossRef
Zurück zum Zitat Kumar Vuppala, A., Yadav, J., Chakrabarti, S., & Sreenivasa Rao, K. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech, and Language Processing, 20, 1894–1903. CrossRef Kumar Vuppala, A., Yadav, J., Chakrabarti, S., & Sreenivasa Rao, K. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech, and Language Processing, 20, 1894–1903. CrossRef
Zurück zum Zitat Lamel, L. F., & Gauvain, J. L. (1994). Language identification using phonebased acoustic likelihoods. In Proceedings of IEEE int. conf. acoust., speech, and signal processing, Apr. 1994 (Vol. 1, pp. 293–296). Lamel, L. F., & Gauvain, J. L. (1994). Language identification using phonebased acoustic likelihoods. In Proceedings of IEEE int. conf. acoust., speech, and signal processing, Apr. 1994 (Vol. 1, pp. 293–296).
Zurück zum Zitat Lander, T., Cole, R., Oshika, B., & Noel, M. (1995). The OGI 22 language telephone speech corpus. In Proc. EUROSPEECH-1995 (pp. 817–820). Lander, T., Cole, R., Oshika, B., & Noel, M. (1995). The OGI 22 language telephone speech corpus. In Proc. EUROSPEECH-1995 (pp. 817–820).
Zurück zum Zitat Lavanya, P., Kishore, P., & Madhavi, G. (2005). A simple approach for building transliteration editors for Indian languages. Journal of Zhejiang University. Science, 6A(11), 1354–1361. CrossRef Lavanya, P., Kishore, P., & Madhavi, G. (2005). A simple approach for building transliteration editors for Indian languages. Journal of Zhejiang University. Science, 6A(11), 1354–1361. CrossRef
Zurück zum Zitat Leonard, R. G., & Doddington, G. R. (1974). Automatic language identification. Tech. Rep., A.F.R.A.D. Centre Tech. Rep. RADC-TR-74-200. Leonard, R. G., & Doddington, G. R. (1974). Automatic language identification. Tech. Rep., A.F.R.A.D. Centre Tech. Rep. RADC-TR-74-200.
Zurück zum Zitat Lin, C. Y., & Wang, H. C. (2006). Language identification using pitch contour information in the ergodic Markov model. In Proc. 2006 IEEE int. conf. acoustics, speech and signal processing (ICASSP 2006). Lin, C. Y., & Wang, H. C. (2006). Language identification using pitch contour information in the ergodic Markov model. In Proc. 2006 IEEE int. conf. acoustics, speech and signal processing (ICASSP 2006).
Zurück zum Zitat Lu-Feng, Z., Man-hung, S., Xi, Y., & Gish, H. (2006). Discriminatively trained language models using support vector machines for language identification. In Proc. speaker and language recognition workshop, IEEE odyssey 2006 (pp. 1–6). Lu-Feng, Z., Man-hung, S., Xi, Y., & Gish, H. (2006). Discriminatively trained language models using support vector machines for language identification. In Proc. speaker and language recognition workshop, IEEE odyssey 2006 (pp. 1–6).
Zurück zum Zitat MacNeilage, P. F. (1998). The frame/content theory of evolution of speech production. Behavial and Brain Sciences, 21, 499–546. MacNeilage, P. F. (1998). The frame/content theory of evolution of speech production. Behavial and Brain Sciences, 21, 499–546.
Zurück zum Zitat Mahadeva Prasanna, S. R., Gangashetty, S. V., & Yegnanarayana, B. (2001). Significance of vowel onset point for speech analysis. In Proc. int. conf. signal processing and communication, Bangalore, India, Jul. 2001 (Vol. 1, pp. 81–86). Mahadeva Prasanna, S. R., Gangashetty, S. V., & Yegnanarayana, B. (2001). Significance of vowel onset point for speech analysis. In Proc. int. conf. signal processing and communication, Bangalore, India, Jul. 2001 (Vol. 1, pp. 81–86).
Zurück zum Zitat Maity, S., Kumar Vuppala, A., Sreenivasa Rao, K., & Nandi, D. (2012). IITKGP-MLILSC speech database for language identification. In National conference on communications (NCC), Kharagpur, India, Feb. 2012 (pp. 1–3). New York: IEEE Press. CrossRef Maity, S., Kumar Vuppala, A., Sreenivasa Rao, K., & Nandi, D. (2012). IITKGP-MLILSC speech database for language identification. In National conference on communications (NCC), Kharagpur, India, Feb. 2012 (pp. 1–3). New York: IEEE Press. CrossRef
Zurück zum Zitat Man-Hung, S., Xi, Y., & Gish, H. (2009). Discriminatively trained GMMs for language classification using boosting methods. IEEE Transactions on Audio, Speech, and Language Processing, 17(1), 187–197. CrossRef Man-Hung, S., Xi, Y., & Gish, H. (2009). Discriminatively trained GMMs for language classification using boosting methods. IEEE Transactions on Audio, Speech, and Language Processing, 17(1), 187–197. CrossRef
Zurück zum Zitat Mart´ınez, D., Burget, L., Ferrer, L., & Scheffer, N. (2012). iVector-based prosodic system for language identification. In ICASSP. Mart´ınez, D., Burget, L., Ferrer, L., & Scheffer, N. (2012). iVector-based prosodic system for language identification. In ICASSP.
Zurück zum Zitat Mary, L. (2006). Multilevel implicit features for language and speaker recognition. Ph.D. thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras. Mary, L. (2006). Multilevel implicit features for language and speaker recognition. Ph.D. thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras.
Zurück zum Zitat Mary, L., & Yegnanarayana, B. (2004). Autoassociative neural network models for language identification. In Proc. int. conf. intelligent sensing and information processing, Chennai, India (pp. 317–320). Mary, L., & Yegnanarayana, B. (2004). Autoassociative neural network models for language identification. In Proc. int. conf. intelligent sensing and information processing, Chennai, India (pp. 317–320).
Zurück zum Zitat Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50, 782–796. CrossRef Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50, 782–796. CrossRef
Zurück zum Zitat Mary, L., Rao, K. S., & Yegnanarayana, B. (2005). Neural network classifiers for language identification using syntactic and prosodic features. In Proc. IEEE int. conf. intelligent sensing and information processing, Chennai, India, Jan. 2005 (pp. 404–408). Mary, L., Rao, K. S., & Yegnanarayana, B. (2005). Neural network classifiers for language identification using syntactic and prosodic features. In Proc. IEEE int. conf. intelligent sensing and information processing, Chennai, India, Jan. 2005 (pp. 404–408).
Zurück zum Zitat Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613. CrossRef Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613. CrossRef
Zurück zum Zitat Muthusamy, Y. K., Cole, R. A., & Oshika, B. T. (1992). The OGI multi-language telephone speech corpus. In Proceedings of int. conf. spoken language processing (pp. 895–898). Muthusamy, Y. K., Cole, R. A., & Oshika, B. T. (1992). The OGI multi-language telephone speech corpus. In Proceedings of int. conf. spoken language processing (pp. 895–898).
Zurück zum Zitat Nagarajan, T., & Murthy, H. A. (2002). Language identification using spectral vector distribution across languages. In Proc. international conference on natural language processing (pp. 327–335). Nagarajan, T., & Murthy, H. A. (2002). Language identification using spectral vector distribution across languages. In Proc. international conference on natural language processing (pp. 327–335).
Zurück zum Zitat Nakagawa, S., Ueda, Y., & Seino, T. (1992). Speaker-independent, text independent language identification by HMM. In Proc. int. conf. spoken language processing (ICSLP-1992) (pp. 1011–1014). Nakagawa, S., Ueda, Y., & Seino, T. (1992). Speaker-independent, text independent language identification by HMM. In Proc. int. conf. spoken language processing (ICSLP-1992) (pp. 1011–1014).
Zurück zum Zitat Navratil, J. (2001). Spoken language recognition a step toward multilinguality in speech processing. IEEE Transactions on Speech and Audio Processing, 9(6), 678–685. CrossRef Navratil, J. (2001). Spoken language recognition a step toward multilinguality in speech processing. IEEE Transactions on Speech and Audio Processing, 9(6), 678–685. CrossRef
Zurück zum Zitat Nayeemulla Khan, A., Gangashetty, S. V., & Yegnanarayana, B. (2003). Syllabic properties of three Indian languages: implications for speech recognition and language identification. In Proc. int. conf. natural language processing, Mysore, India, Dec. 2003 (pp. 125–134). Nayeemulla Khan, A., Gangashetty, S. V., & Yegnanarayana, B. (2003). Syllabic properties of three Indian languages: implications for speech recognition and language identification. In Proc. int. conf. natural language processing, Mysore, India, Dec. 2003 (pp. 125–134).
Zurück zum Zitat Ohman, S. E. G. (1966). Coarticulation in VCV utterances: spectrographic measurements. The Journal of the Acoustical Society of America, 39, 151–168. CrossRef Ohman, S. E. G. (1966). Coarticulation in VCV utterances: spectrographic measurements. The Journal of the Acoustical Society of America, 39, 151–168. CrossRef
Zurück zum Zitat Pellegrino, F., & Andre-Abrecht, R. (1999). An unsupervised approach to language identification. In Proceedings of IEEE int. conf. acoust., speech, and signal processing (pp. 833–836). Pellegrino, F., & Andre-Abrecht, R. (1999). An unsupervised approach to language identification. In Proceedings of IEEE int. conf. acoust., speech, and signal processing (pp. 833–836).
Zurück zum Zitat Pellegrino, F., Farinas, J., & André-Obrecht, R. (1999). Comparison of two phonetic approaches to language identification. In Proc. EUROSPEECH’99 (pp. 399–402). Pellegrino, F., Farinas, J., & André-Obrecht, R. (1999). Comparison of two phonetic approaches to language identification. In Proc. EUROSPEECH’99 (pp. 399–402).
Zurück zum Zitat Ramasubramanian, V., Sai Jayaram, A. K. V., & Sreenivas, T. V. (2003). Language identification using parallel phone recognition. In WSLP, TIFR, Mumbai, Jan. 2003 (pp. 109–116). Ramasubramanian, V., Sai Jayaram, A. K. V., & Sreenivas, T. V. (2003). Language identification using parallel phone recognition. In WSLP, TIFR, Mumbai, Jan. 2003 (pp. 109–116).
Zurück zum Zitat Ramus, F., & Mehler, J. (1999). Language identification with suprasegmental cues: a study based on speech resynthesis. The Journal of the Acoustical Society of America, 105, 512–521. CrossRef Ramus, F., & Mehler, J. (1999). Language identification with suprasegmental cues: a study based on speech resynthesis. The Journal of the Acoustical Society of America, 105, 512–521. CrossRef
Zurück zum Zitat Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in speech signal. Cognition, 73(3), 265–292. CrossRef Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in speech signal. Cognition, 73(3), 265–292. CrossRef
Zurück zum Zitat Rao, K. S. (2005). Acquisition and incorporation prosody knowledge for speech systems in Indian languages. Ph.D. thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, May 2005. Rao, K. S. (2005). Acquisition and incorporation prosody knowledge for speech systems in Indian languages. Ph.D. thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, May 2005.
Zurück zum Zitat Rao, K. S. (2010). Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Computer Speech & Language, 24(1), 474–494. CrossRef Rao, K. S. (2010). Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Computer Speech & Language, 24(1), 474–494. CrossRef
Zurück zum Zitat Rao, K.S. (2012). Application of prosody models for developing speech systems in Indian languages. International Journal of Speech Technology, 14, 19–33. CrossRef Rao, K.S. (2012). Application of prosody models for developing speech systems in Indian languages. International Journal of Speech Technology, 14, 19–33. CrossRef
Zurück zum Zitat Rao, K. S., & Vuppala, A. K. (2013). Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Communication, 55, 745–756. CrossRef Rao, K. S., & Vuppala, A. K. (2013). Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Communication, 55, 745–756. CrossRef
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2006). Prosody modification using instants of significant excitation. IEEE Transactions on Speech and Audio Processing, 14, 972–980. CrossRef Rao, K. S., & Yegnanarayana, B. (2006). Prosody modification using instants of significant excitation. IEEE Transactions on Speech and Audio Processing, 14, 972–980. CrossRef
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech & Language, 21, 282–295. CrossRef Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech & Language, 21, 282–295. CrossRef
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2009a). Intonation modeling for Indian languages. In International conference on spoken language processing (ICSLP) (pp. 733–736). Rao, K. S., & Yegnanarayana, B. (2009a). Intonation modeling for Indian languages. In International conference on spoken language processing (ICSLP) (pp. 733–736).
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2009b). Intonation modeling for Indian languages. Computer Speech & Language, 23(2), 240–256. CrossRef Rao, K. S., & Yegnanarayana, B. (2009b). Intonation modeling for Indian languages. Computer Speech & Language, 23(2), 240–256. CrossRef
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2009). Duration modification using glottal closure instants and vowel onset points. Speech Communication, 51, 1263–1269. CrossRef Rao, K. S., & Yegnanarayana, B. (2009). Duration modification using glottal closure instants and vowel onset points. Speech Communication, 51, 1263–1269. CrossRef
Zurück zum Zitat Rao, K. S., Vuppala, A. K. & Chakrabarti, S. (2012). Spotting and recognition of consonant-vowel units from continuous speech using accurate vowel onset points. Circuits, Systems, and Signal Processing, 31(4), 1459–1474. CrossRef Rao, K. S., Vuppala, A. K. & Chakrabarti, S. (2012). Spotting and recognition of consonant-vowel units from continuous speech using accurate vowel onset points. Circuits, Systems, and Signal Processing, 31(4), 1459–1474. CrossRef
Zurück zum Zitat Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17(1–2), 91–108. CrossRef Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17(1–2), 91–108. CrossRef
Zurück zum Zitat Riek, L., Mistreta, W., & Morgan, D. (1991). Experiments in language identification. Tech. Rep., Lockheed Sanders Tech. Rep. SPCOT-91-002. Riek, L., Mistreta, W., & Morgan, D. (1991). Experiments in language identification. Tech. Rep., Lockheed Sanders Tech. Rep. SPCOT-91-002.
Zurück zum Zitat Rouas, J. L. (2007). Automatic prosodic variations modeling for language and dialect discrimination. IEEE Transactions on Audio, Speech, and Language Processing, 15, 1904–1911. CrossRef Rouas, J. L. (2007). Automatic prosodic variations modeling for language and dialect discrimination. IEEE Transactions on Audio, Speech, and Language Processing, 15, 1904–1911. CrossRef
Zurück zum Zitat Rouas, J.-L., Farinas, J., Pellegrino, F., & André-Obrecht, R. (2005). Rhythmic unit extraction and modelling for automatic language identification. Speech Communication, 47, 436–456. CrossRef Rouas, J.-L., Farinas, J., Pellegrino, F., & André-Obrecht, R. (2005). Rhythmic unit extraction and modelling for automatic language identification. Speech Communication, 47, 436–456. CrossRef
Zurück zum Zitat Sangwan, A., Mehrabani, M., & Hansen, J. H. L. (2010). Automatic language analysis and identification based on speech production knowledge. In ICASSP. Sangwan, A., Mehrabani, M., & Hansen, J. H. L. (2010). Automatic language analysis and identification based on speech production knowledge. In ICASSP.
Zurück zum Zitat Sekhar, C. C. (1996). Neural network models for recognition of stop consonant-vowel (SCV) segments in continuous speech. Ph.D. thesis, Indian Institute of Technology Madras, Department of Computer Science and Engg, Chennai, India. Sekhar, C. C. (1996). Neural network models for recognition of stop consonant-vowel (SCV) segments in continuous speech. Ph.D. thesis, Indian Institute of Technology Madras, Department of Computer Science and Engg, Chennai, India.
Zurück zum Zitat Shriberg, E., Stolcke, A., Hakkani-Tur, D., & Tur, G. (2000). Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32, 127–154. CrossRef Shriberg, E., Stolcke, A., Hakkani-Tur, D., & Tur, G. (2000). Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32, 127–154. CrossRef
Zurück zum Zitat Sreenivasa Rao, K., Maity, S., & Ramu Reddy, V. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology. doi:10.1007/s10772-013-9193-5. Sreenivasa Rao, K., Maity, S., & Ramu Reddy, V. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology. doi:10.​1007/​s10772-013-9193-5.
Zurück zum Zitat Taylor, P. (2000). Analysis and synthesis of intonation using the tilt model. The Journal of the Acoustical Society of America, 107, 1697–1714. CrossRef Taylor, P. (2000). Analysis and synthesis of intonation using the tilt model. The Journal of the Acoustical Society of America, 107, 1697–1714. CrossRef
Zurück zum Zitat Ueda, Y., & Nakagawa, S. (1990). Diction for phoneme/syllable/word-category and identification of language using HMM. In Proc. int. conf. spoken language processing (ICSLP-1990) (pp. 1209–1212). Ueda, Y., & Nakagawa, S. (1990). Diction for phoneme/syllable/word-category and identification of language using HMM. In Proc. int. conf. spoken language processing (ICSLP-1990) (pp. 1209–1212).
Zurück zum Zitat Wong, E., & Sridharan, S. (2002). Gaussian mixture model based language identification system. In Proc. int. conf. spoken language processing (ICSLP-2002) (pp. 93–96). Wong, E., & Sridharan, S. (2002). Gaussian mixture model based language identification system. In Proc. int. conf. spoken language processing (ICSLP-2002) (pp. 93–96).
Zurück zum Zitat Xu, Y. (1998). Consistency of tone-syllable alignment across different syllable structures and speaking rates. Phonetica, 55, 179–203. CrossRef Xu, Y. (1998). Consistency of tone-syllable alignment across different syllable structures and speaking rates. Phonetica, 55, 179–203. CrossRef
Zurück zum Zitat Zissman, M. A. (1993). Automatic langauge identification using Gaussian mixture and hidden Markov models. In Proceedings of IEEE int. conf. acoust., speech, and signal processing, Apr. 1993 (pp. 399–402). CrossRef Zissman, M. A. (1993). Automatic langauge identification using Gaussian mixture and hidden Markov models. In Proceedings of IEEE int. conf. acoust., speech, and signal processing, Apr. 1993 (pp. 399–402). CrossRef
Zurück zum Zitat Zissman, M. A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4, 31–44. CrossRef Zissman, M. A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4, 31–44. CrossRef
Metadaten
Titel
Identification of Indian languages using multi-level spectral and prosodic features
verfasst von
V. Ramu Reddy
Sudhamay Maity
K. Sreenivasa Rao
Publikationsdatum
01.12.2013
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2013
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-013-9198-0

Weitere Artikel der Ausgabe 4/2013

International Journal of Speech Technology 4/2013 Zur Ausgabe