Skip to main content
Erschienen in: International Journal of Speech Technology 3/2012

01.09.2012

Neural network based feature transformation for emotion independent speaker identification

verfasst von: Sreenivasa Rao Krothapalli, Jaynath Yadav, Sourjya Sarkar, Shashidhar G. Koolagudi, Anil Kumar Vuppala

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper we are proposing neural network based feature transformation framework for developing emotion independent speaker identification system. Most of the present speaker recognition systems may not perform well during emotional environments. In real life, humans extensively express emotions during conversations for effectively conveying the messages. Therefore, in this work we propose the speaker recognition system, robust to variations in emotional moods of speakers. Neural network models are explored to transform the speaker specific spectral features from any specific emotion to neutral. In this work, we have considered eight emotions namely, Anger, Sad, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise. The emotional databases developed in Hindi, Telugu and German are used in this work for analyzing the effect of proposed feature transformation on the performance of speaker identification system. In this work, spectral features are represented by mel-frequency cepstral coefficients, and speaker models are developed using Gaussian mixture models. Performance of the speaker identification system is analyzed with various feature mapping techniques. Results have demonstrated that the proposed neural network based feature transformation has improved the speaker identification performance by 20 %. Feature transformation at the syllable level has shown the better performance, compared to sentence level.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abe, M., Nakamura, S., Shikano, K., & Kuwabara, H. (1988). Voice conversion through vector quantization. In Proc. IEEE int. conf. acoust., speech, signal processing, May 1988 (Vol. 1, pp. 655–658). Abe, M., Nakamura, S., Shikano, K., & Kuwabara, H. (1988). Voice conversion through vector quantization. In Proc. IEEE int. conf. acoust., speech, signal processing, May 1988 (Vol. 1, pp. 655–658).
Zurück zum Zitat Benesty, J., Sondhi, M. M. & Huang, Y. (Eds.) (2008). Springer handbook on speech processing. Berlin: Springer. Benesty, J., Sondhi, M. M. & Huang, Y. (Eds.) (2008). Springer handbook on speech processing. Berlin: Springer.
Zurück zum Zitat Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacretaz, D., & Reynolds, D. (2004). A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing, 4, 430–451. Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacretaz, D., & Reynolds, D. (2004). A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing, 4, 430–451.
Zurück zum Zitat Bou-Ghazale, S. E., & Hansen, J. H. L. (1996). Generating stressed speech from neutral speech using a modified celp vocoder. Speech Communication, 20, 93–110. CrossRef Bou-Ghazale, S. E., & Hansen, J. H. L. (1996). Generating stressed speech from neutral speech using a modified celp vocoder. Speech Communication, 20, 93–110. CrossRef
Zurück zum Zitat Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In INTERSPEECH-2005 (pp. 1517–1520). Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech. In INTERSPEECH-2005 (pp. 1517–1520).
Zurück zum Zitat Campbell, N. (2004). Perception of affect in speech—towards an automatic processing of paralinguistic information in spoken conversation. In ICSLP, Jeju, October 2004. Campbell, N. (2004). Perception of affect in speech—towards an automatic processing of paralinguistic information in spoken conversation. In ICSLP, Jeju, October 2004.
Zurück zum Zitat Campbell, J., Reynolds, D., & Dunn, R. (2003). Fusing high- and low-level features for speaker recognition. In Proc. European conf. speech commun. technol. (pp. 2665–2668). Campbell, J., Reynolds, D., & Dunn, R. (2003). Fusing high- and low-level features for speaker recognition. In Proc. European conf. speech commun. technol. (pp. 2665–2668).
Zurück zum Zitat Desai, S., Black, A. W., Yegnanarayana, B., & Prahallad, K. (2010). Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 18, 954–964. CrossRef Desai, S., Black, A. W., Yegnanarayana, B., & Prahallad, K. (2010). Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 18, 954–964. CrossRef
Zurück zum Zitat Dunn, R., Reynolds, D., & Quatieri, T. (2000). Approaches to speaker detection and tracking in multi-speaker audio. Digital Signal Processing, 10(1), 93–112. CrossRef Dunn, R., Reynolds, D., & Quatieri, T. (2000). Approaches to speaker detection and tracking in multi-speaker audio. Digital Signal Processing, 10(1), 93–112. CrossRef
Zurück zum Zitat Fine, S., Navaratil, J., & Gopinath, R. (2001). A hybrid GMM/SVM approach to speaker identification. In Proc. IEEE int. conf. acoust., speech, signal processing, Utah, USA, May 2001 (Vol. 1). Fine, S., Navaratil, J., & Gopinath, R. (2001). A hybrid GMM/SVM approach to speaker identification. In Proc. IEEE int. conf. acoust., speech, signal processing, Utah, USA, May 2001 (Vol. 1).
Zurück zum Zitat Govind, D., Prasanna, S. R. M., & Yegnanarayana, B. (2004). Neutral to target emotion conversion using source and suprasegmental information. In Proc. INTERSPEECH 2011, Florence, Italy, August 2004 (pp. 73–76). Govind, D., Prasanna, S. R. M., & Yegnanarayana, B. (2004). Neutral to target emotion conversion using source and suprasegmental information. In Proc. INTERSPEECH 2011, Florence, Italy, August 2004 (pp. 73–76).
Zurück zum Zitat Gupta, C. S. (2003). Significance of source features for speaker recognition. Master’s thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India. Gupta, C. S. (2003). Significance of source features for speaker recognition. Master’s thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India.
Zurück zum Zitat Hansen, J. H. L., & Womack, B. D. (1996). Feature analysis and neural network-based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 4, 307–313. CrossRef Hansen, J. H. L., & Womack, B. D. (1996). Feature analysis and neural network-based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 4, 307–313. CrossRef
Zurück zum Zitat Haykin, S. (1999). Neural networks: a comprehensive foundation. New Delhi: Pearson Education Aisa, Inc. MATH Haykin, S. (1999). Neural networks: a comprehensive foundation. New Delhi: Pearson Education Aisa, Inc. MATH
Zurück zum Zitat Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing. New York: Prentice-Hall, Inc. Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing. New York: Prentice-Hall, Inc.
Zurück zum Zitat Kishore, S. P., & Yegnanarayana, B. (2001). Online text-independent speaker verification system using autoassociative neural network models. In Int. joint conf. neural networks, Washington, USA, Aug. 2001 (pp. 1548–1553). Kishore, S. P., & Yegnanarayana, B. (2001). Online text-independent speaker verification system using autoassociative neural network models. In Int. joint conf. neural networks, Washington, USA, Aug. 2001 (pp. 1548–1553).
Zurück zum Zitat Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: speech database for emotion analysis. In LNCS. Communications in computer and information science, Aug. 2009. Berlin: Springer. Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: speech database for emotion analysis. In LNCS. Communications in computer and information science, Aug. 2009. Berlin: Springer.
Zurück zum Zitat Koolagudi, S. G., Reddy, R., Yadav, J., & Rao, K. S. (2011). Iitkgp-sehsc: Hindi speech corpus for emotion analysis. In International conference on devices and communication, Mesra, India, Birla Institute of Technology, Feb. 2011. New York: IEEE Press. Koolagudi, S. G., Reddy, R., Yadav, J., & Rao, K. S. (2011). Iitkgp-sehsc: Hindi speech corpus for emotion analysis. In International conference on devices and communication, Mesra, India, Birla Institute of Technology, Feb. 2011. New York: IEEE Press.
Zurück zum Zitat Li, D., Yang, Y., Wu, Z., & Wu, T. (2005). Lecture notes in computer science.: Vol. 3784. Emotion-state conversion for speaker recognition. Berlin: Springer. ISBN: 978-3-540-29621-8. Li, D., Yang, Y., Wu, Z., & Wu, T. (2005). Lecture notes in computer science.: Vol. 3784. Emotion-state conversion for speaker recognition. Berlin: Springer. ISBN: 978-3-540-29621-8.
Zurück zum Zitat Marshimi, M., Toda, T., Shikano, K., & Campbell, N. (2001). Evaluation of cross-language voice conversion based on GMM and STRAIGHT. In Proceedings of EUROSPEECH 2001, Aalborg, Denmark, Sept. 2001 (pp. 361–364). Marshimi, M., Toda, T., Shikano, K., & Campbell, N. (2001). Evaluation of cross-language voice conversion based on GMM and STRAIGHT. In Proceedings of EUROSPEECH 2001, Aalborg, Denmark, Sept. 2001 (pp. 361–364).
Zurück zum Zitat Mary, L. (2006). Multi level implicit features for language and speaker recognition. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology, Madras, Chennai, India, June. Mary, L. (2006). Multi level implicit features for language and speaker recognition. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology, Madras, Chennai, India, June.
Zurück zum Zitat Mary, L., & Yegnanarayana, B. (2006). Prosodic features for speaker verification. In Proc. int. conf. spoken language processing, Pittsburgh, PA, USA, Sep. 2006 (pp. 917–920). Mary, L., & Yegnanarayana, B. (2006). Prosodic features for speaker verification. In Proc. int. conf. spoken language processing, Pittsburgh, PA, USA, Sep. 2006 (pp. 917–920).
Zurück zum Zitat Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50, 782–796. CrossRef Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50, 782–796. CrossRef
Zurück zum Zitat Mary, L., Rao, K. S., Gangashetty, S. V., & Yegnanarayana, B. (2004). Neural network models for capturing duration and intonation knowledge for language and speaker identification. In Int. conf. cognitive and neural systems, Boston, MA, USA, May 2004. Mary, L., Rao, K. S., Gangashetty, S. V., & Yegnanarayana, B. (2004). Neural network models for capturing duration and intonation knowledge for language and speaker identification. In Int. conf. cognitive and neural systems, Boston, MA, USA, May 2004.
Zurück zum Zitat Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and mfcc features for speaker recognition. IEEE Signal Processing Letters, 13, 52–55. CrossRef Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and mfcc features for speaker recognition. IEEE Signal Processing Letters, 13, 52–55. CrossRef
Zurück zum Zitat Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613. CrossRef Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613. CrossRef
Zurück zum Zitat Narendra, N. P., Rao, K. S., Ghosh, K., Vempada, R. R., & Maity, S. (2011). Development of syllable-based text-to-speech synthesis system in Bengali. International Journal of Speech Technology, 14, 167–182. CrossRef Narendra, N. P., Rao, K. S., Ghosh, K., Vempada, R. R., & Maity, S. (2011). Development of syllable-based text-to-speech synthesis system in Bengali. International Journal of Speech Technology, 14, 167–182. CrossRef
Zurück zum Zitat Narendranadh, M., Murthy, H. A., Rajendran, S., & Yegnanarayana, B. (1995). Transformation of formants for voice conversion using artificial neural networks. Speech Communication, 16, 206–216. Narendranadh, M., Murthy, H. A., Rajendran, S., & Yegnanarayana, B. (1995). Transformation of formants for voice conversion using artificial neural networks. Speech Communication, 16, 206–216.
Zurück zum Zitat Prasanna, S. R. M., & Pradhan, G. (2011). Significance of vowel-like regions for speaker verification under degraded condition. IEEE Transactions on Speech and Audio Processing, 19, 2552–2565. CrossRef Prasanna, S. R. M., & Pradhan, G. (2011). Significance of vowel-like regions for speaker verification under degraded condition. IEEE Transactions on Speech and Audio Processing, 19, 2552–2565. CrossRef
Zurück zum Zitat Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall. Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.
Zurück zum Zitat Raja, G. S., & Dandapat, S. (2010). Speaker recognition under stressed condition. International Journal of Speech Technology, 13, 141–161. CrossRef Raja, G. S., & Dandapat, S. (2010). Speaker recognition under stressed condition. International Journal of Speech Technology, 13, 141–161. CrossRef
Zurück zum Zitat Rao, K. S. (2010). Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Computer Speech and Language, 24. Rao, K. S. (2010). Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Computer Speech and Language, 24.
Zurück zum Zitat Rao, K. S. (2011). Role of neural networks for developing speech systems. Sadhana, 36, 783–836. CrossRef Rao, K. S. (2011). Role of neural networks for developing speech systems. Sadhana, 36, 783–836. CrossRef
Zurück zum Zitat Rao, K. S., & Koolagudi, S. G. (2007). Transformation of speaker characteristics in speech using support vector machines. In 15th international conference on advanced computing and communication (ADCOM-2007), Guwahati, India, Dec. 2007. Rao, K. S., & Koolagudi, S. G. (2007). Transformation of speaker characteristics in speech using support vector machines. In 15th international conference on advanced computing and communication (ADCOM-2007), Guwahati, India, Dec. 2007.
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2006a). Voice conversion by prosody and vocal tract modification. In 9th int. conf. information technology, Bhubaneswar, Orissa, India. Rao, K. S., & Yegnanarayana, B. (2006a). Voice conversion by prosody and vocal tract modification. In 9th int. conf. information technology, Bhubaneswar, Orissa, India.
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2006b). Prosody modification using instants of significant excitation. IEEE Transactions on Speech and Audio Processing, 14, 972–980. CrossRef Rao, K. S., & Yegnanarayana, B. (2006b). Prosody modification using instants of significant excitation. IEEE Transactions on Speech and Audio Processing, 14, 972–980. CrossRef
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech & Language, 21, 282–295. CrossRef Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech & Language, 21, 282–295. CrossRef
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2008). Intonation modeling for Indian languages. Computer Speech and Language. Rao, K. S., & Yegnanarayana, B. (2008). Intonation modeling for Indian languages. Computer Speech and Language.
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2009). Duration modification using glottal closure instants and vowel onset points. Speech Communication, 51, 1263–1269. CrossRef Rao, K. S., & Yegnanarayana, B. (2009). Duration modification using glottal closure instants and vowel onset points. Speech Communication, 51, 1263–1269. CrossRef
Zurück zum Zitat Rao, K. S., Laskar, R. H., & Koolagudi, S. G. (2007a). Voice transformation by mapping the features at syllable level. In 2nd international conference on pattern recognition and machine intelligence (Premi-2007), Kolkata, India, Dec. 2007. Rao, K. S., Laskar, R. H., & Koolagudi, S. G. (2007a). Voice transformation by mapping the features at syllable level. In 2nd international conference on pattern recognition and machine intelligence (Premi-2007), Kolkata, India, Dec. 2007.
Zurück zum Zitat Rao, K. S., Laskar, R. H., & Koolagudi, S. G. (2007b). Voice transformation by mapping the features at syllable level. In R. K. D. A. Ghosh & S. K. Pal (Eds.), LNCS, ISI Kolkata (pp. 479–486). Heidelberg: Springer. Rao, K. S., Laskar, R. H., & Koolagudi, S. G. (2007b). Voice transformation by mapping the features at syllable level. In R. K. D. A. Ghosh & S. K. Pal (Eds.), LNCS, ISI Kolkata (pp. 479–486). Heidelberg: Springer.
Zurück zum Zitat Reddy, K. S. (2004). Source and system features for speaker recognition. Master’s thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India. Reddy, K. S. (2004). Source and system features for speaker recognition. Master’s thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India.
Zurück zum Zitat Reynolds, D., & Rose, R. (1995). Robust text independent speaker identification using Gaussian mixture speaker models. IEEE Transactions Speech and Audio Processing, 72–83. Reynolds, D., & Rose, R. (1995). Robust text independent speaker identification using Gaussian mixture speaker models. IEEE Transactions Speech and Audio Processing, 72–83.
Zurück zum Zitat Reynolds, D., Quatieri, T., & Dunn, R. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1), 19–41. CrossRef Reynolds, D., Quatieri, T., & Dunn, R. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1), 19–41. CrossRef
Zurück zum Zitat Scherer, K. R., Johnstone, T., & Banziger, T. (1998). Automatic verification of emotionally stressed speakers: the problem of individual differences. In Proceedings of the international workshop on speech and computer, St. Petersburg. Scherer, K. R., Johnstone, T., & Banziger, T. (1998). Automatic verification of emotionally stressed speakers: the problem of individual differences. In Proceedings of the international workshop on speech and computer, St. Petersburg.
Zurück zum Zitat Shahin, I. (2006). Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models. Speech Communication, 48, 1047–1055. CrossRef Shahin, I. (2006). Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models. Speech Communication, 48, 1047–1055. CrossRef
Zurück zum Zitat Shahin, I. (2008). Using emotions to identify speakers. In 5th int. workshop on signal processing and its applications. Shahin, I. (2008). Using emotions to identify speakers. In 5th int. workshop on signal processing and its applications.
Zurück zum Zitat Shahin, I. (2009). Speaker identification in emotional environments. Iranian Journal of Electrical and Computer Engineering, 8(1), 41–46. MathSciNet Shahin, I. (2009). Speaker identification in emotional environments. Iranian Journal of Electrical and Computer Engineering, 8(1), 41–46. MathSciNet
Zurück zum Zitat Toda, T., Saruwatari, H., & Shikano, K. (2001). Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In Proc. IEEE int. conf. acoust., speech, signal processing, May 2001 (Vol. 2, pp. 841–844). Toda, T., Saruwatari, H., & Shikano, K. (2001). Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In Proc. IEEE int. conf. acoust., speech, signal processing, May 2001 (Vol. 2, pp. 841–844).
Zurück zum Zitat Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2011). Improved consonant-vowel recognition for low bit-rate coded speech. International Journal of Adaptive Control Signal Processing, doi:10.1002/acs.1286. Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2011). Improved consonant-vowel recognition for low bit-rate coded speech. International Journal of Adaptive Control Signal Processing, doi:10.​1002/​acs.​1286.
Zurück zum Zitat Vuppala, A. K., Yadav, J., Chakrabarti, S., & Rao, K. S. (2011). Application of prosody models for developing speech systems in Indian languages. International Journal of Speech Technology, 14, 19–33. CrossRef Vuppala, A. K., Yadav, J., Chakrabarti, S., & Rao, K. S. (2011). Application of prosody models for developing speech systems in Indian languages. International Journal of Speech Technology, 14, 19–33. CrossRef
Zurück zum Zitat Vuppala, A. K., Yadav, J., Chakrabarti, S., & Rao, K. S. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Speech and Audio Processing, 20, 1894–1903. CrossRef Vuppala, A. K., Yadav, J., Chakrabarti, S., & Rao, K. S. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Speech and Audio Processing, 20, 1894–1903. CrossRef
Zurück zum Zitat Wu, W., Zheng, T. F., Xu, M. X., & Bao, H. J. (2006). Study on speaker verification on emotional speech. In Proc. of int. conf. on spoken language processing (ICSLP-2006) (pp. 2102–2105). Wu, W., Zheng, T. F., Xu, M. X., & Bao, H. J. (2006). Study on speaker verification on emotional speech. In Proc. of int. conf. on spoken language processing (ICSLP-2006) (pp. 2102–2105).
Zurück zum Zitat Yegnanarayana, B. (1999). Artificial neural networks. New Delhi: Prentice-Hall. Yegnanarayana, B. (1999). Artificial neural networks. New Delhi: Prentice-Hall.
Zurück zum Zitat Yegnanarayana, B., & Kishore, S. P. (2002). AANN an alternative to GMM for pattern recognition. Neural Networks, 15, 459–469. CrossRef Yegnanarayana, B., & Kishore, S. P. (2002). AANN an alternative to GMM for pattern recognition. Neural Networks, 15, 459–469. CrossRef
Zurück zum Zitat Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001). Source and system features for speaker recognition using AANN models. In Proc. IEEE int. conf. acoust., speech, signal processing, Salt Lake City, Utah, USA, May 2001 (pp. 409–412). Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001). Source and system features for speaker recognition using AANN models. In Proc. IEEE int. conf. acoust., speech, signal processing, Salt Lake City, Utah, USA, May 2001 (pp. 409–412).
Zurück zum Zitat Zachariah, J. M. (2002). Text-dependent speaker verification using segmental suprasegmental and source features. Master’s thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, March. Zachariah, J. M. (2002). Text-dependent speaker verification using segmental suprasegmental and source features. Master’s thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, March.
Metadaten
Titel
Neural network based feature transformation for emotion independent speaker identification
verfasst von
Sreenivasa Rao Krothapalli
Jaynath Yadav
Sourjya Sarkar
Shashidhar G. Koolagudi
Anil Kumar Vuppala
Publikationsdatum
01.09.2012
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2012
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9148-2

Weitere Artikel der Ausgabe 3/2012

International Journal of Speech Technology 3/2012 Zur Ausgabe

Neuer Inhalt