Skip to main content
Erschienen in: International Journal of Speech Technology 3/2018

20.11.2017

Neural network and GMM based feature mappings for consonant–vowel recognition in emotional environment

verfasst von: Jainath Yadav, K. Sreenivasa Rao

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this work, we propose a mapping function based feature transformation framework for developing consonant–vowel (CV) recognition system in the emotional environment. An effective way of conveying messages is by expressing emotions during human conversations. The characteristics of CV units differ from one emotion to other emotions. The performance of existing CV recognition systems is degraded in emotional environments. Therefore, we have proposed mapping functions based on artificial neural network and GMM models for increasing the accuracy of CV recognition in the emotional environment. The CV recognition system has been explored to transform emotional features to neutral features using proposed mapping functions at CV and phone levels to minimize mismatch between training and testing environments. Vowel onset and offset points have been used to identify vowel, consonant and transition segments. Transition segments are identified by considering initial 15% speech samples between vowel onset and offset points. The average performance of CV recognition system is increased significantly using feature mapping technique at phone level in three emotional environments (anger, happiness, and sadness).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abe, M., Nakamura, S., Shikano, K., & Kuwabara, H. (1988). Voice conversion through vector quantization. Proceedings of IEEE International Conference on Acoustics, Speech Signal Processing, 1, 655–658. Abe, M., Nakamura, S., Shikano, K., & Kuwabara, H. (1988). Voice conversion through vector quantization. Proceedings of IEEE International Conference on Acoustics, Speech Signal Processing, 1, 655–658.
Zurück zum Zitat Buera, L., Lleida, E., Miguel, A., & Ortega, A. (2004). Multi-environment models based linear normalization for speech recognition in car conditions. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Buera, L., Lleida, E., Miguel, A., & Ortega, A. (2004). Multi-environment models based linear normalization for speech recognition in car conditions. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing.
Zurück zum Zitat Buera, L., Lleida, E., Miguel, A., Ortega, A., & Saz, S. (2007). Cepstral vector normalization based on stereo data for robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1098–1113.CrossRef Buera, L., Lleida, E., Miguel, A., Ortega, A., & Saz, S. (2007). Cepstral vector normalization based on stereo data for robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1098–1113.CrossRef
Zurück zum Zitat Buera, L., Miguel, A., Saz, S., Ortega, A., & Lleida, E. (2010). Unsupervised data-driven feature vector normalization with acoustic model adaptation for robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18(2), 296–309.CrossRef Buera, L., Miguel, A., Saz, S., Ortega, A., & Lleida, E. (2010). Unsupervised data-driven feature vector normalization with acoustic model adaptation for robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18(2), 296–309.CrossRef
Zurück zum Zitat Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.CrossRef Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.CrossRef
Zurück zum Zitat Chauhan, R., Yadav, J., & Koolagudi, S. (2011). Text independent emotion recognition using spectral features. In International Conference on Contemporary Computing, vol. 168, pp. 359–370. Chauhan, R., Yadav, J., & Koolagudi, S. (2011). Text independent emotion recognition using spectral features. In International Conference on Contemporary Computing, vol. 168, pp. 359–370.
Zurück zum Zitat Deng, L., Acero, A., Jiang, L., Droppo, J., & Huang, X. (2001). High-performance robust speech recognition using stereo training data. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 301–304. Deng, L., Acero, A., Jiang, L., Droppo, J., & Huang, X. (2001). High-performance robust speech recognition using stereo training data. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 301–304.
Zurück zum Zitat Desai, S., Black, A . W., Yegnanarayana, B., & Prahallad, K. (2010). Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 18(5), 954–964.CrossRef Desai, S., Black, A . W., Yegnanarayana, B., & Prahallad, K. (2010). Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 18(5), 954–964.CrossRef
Zurück zum Zitat Gangashetty, S. V. (2004). Neural network models for recognition of consonant–vowel units of speech In: multiple languages, Ph.D. dissertation, IIT Madras. Gangashetty, S. V. (2004). Neural network models for recognition of consonant–vowel units of speech In: multiple languages, Ph.D. dissertation, IIT Madras.
Zurück zum Zitat Gangashetty, S. V., Sekhar, C. C., & Yegnanarayana, B. (2005). Combining evidence from multiple classifiers for recognition of consonant–vowel units of speech in multiple languages. In Proceedings of IEEE International Conference on Intelligent Sensing and Information Processing, pp. 387–391. Gangashetty, S. V., Sekhar, C. C., & Yegnanarayana, B. (2005). Combining evidence from multiple classifiers for recognition of consonant–vowel units of speech in multiple languages. In Proceedings of IEEE International Conference on Intelligent Sensing and Information Processing, pp. 387–391.
Zurück zum Zitat Haykin, S. (1999). Neural networks: A comprehensive foundation. New Delhi: Pearson Education Asia Inc.MATH Haykin, S. (1999). Neural networks: A comprehensive foundation. New Delhi: Pearson Education Asia Inc.MATH
Zurück zum Zitat Himawan, I., Motlicek, P., Imseng, D., Potard, B., Kim, N., & Lee, J. (2015). Learning feature mapping using deep neural network bottleneck features for distant large vocabulary speech recognition. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4540–4544. Himawan, I., Motlicek, P., Imseng, D., Potard, B., Kim, N., & Lee, J. (2015). Learning feature mapping using deep neural network bottleneck features for distant large vocabulary speech recognition. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4540–4544.
Zurück zum Zitat Himawan, I., Motlicek, P., Imseng, D., & Sridharan, S. (2016). Feature mapping using far-field microphones for distant speech recognition. Speech Communication, 83, 1–9.CrossRef Himawan, I., Motlicek, P., Imseng, D., & Sridharan, S. (2016). Feature mapping using far-field microphones for distant speech recognition. Speech Communication, 83, 1–9.CrossRef
Zurück zum Zitat Ho, T . K., Hull, J . J., & Srihari, S . N. (1994). Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(1), 66–75.CrossRef Ho, T . K., Hull, J . J., & Srihari, S . N. (1994). Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(1), 66–75.CrossRef
Zurück zum Zitat Koolagudi, S., Reddy, R., Yadav, J., & Rao, K. S. (2011). IITKGP-SEHSC: Hindi speech corpus for emotion analysis. In International Conference on Devices and Communications, BIT Mesra, India , pp. 1–5. Koolagudi, S., Reddy, R., Yadav, J., & Rao, K. S. (2011). IITKGP-SEHSC: Hindi speech corpus for emotion analysis. In International Conference on Devices and Communications, BIT Mesra, India , pp. 1–5.
Zurück zum Zitat Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology, 15, 495–511.CrossRef Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology, 15, 495–511.CrossRef
Zurück zum Zitat Krothapalli, S. R., Yadav, J., Sarkar, S., Koolagudi, S. G., & Vuppala, A. K. (2012). Neural network based feature transformation for emotion independent speaker identification. International Journal of Speech Technology, 15(3), 335–349.CrossRef Krothapalli, S. R., Yadav, J., Sarkar, S., Koolagudi, S. G., & Vuppala, A. K. (2012). Neural network based feature transformation for emotion independent speaker identification. International Journal of Speech Technology, 15(3), 335–349.CrossRef
Zurück zum Zitat Marshimi, M., Toda, T., Shikano, K., & Campbell, N. (2001). Evaluation of cross-language voice conversion based on GMM and STRAIGHT. In Proceedings of EUROSPEECH, Aalborg, Denmark, pp. 361–364. Marshimi, M., Toda, T., Shikano, K., & Campbell, N. (2001). Evaluation of cross-language voice conversion based on GMM and STRAIGHT. In Proceedings of EUROSPEECH, Aalborg, Denmark, pp. 361–364.
Zurück zum Zitat Mary, L., Rao, K. S., Gangashetty, S. V., & Yegnanarayana, B. (2004). Neural network models for capturing duration and intonation knowledge for language and speaker identification. In International Conference on Cognitive and Neural Systems, Boston, MA, USA. Mary, L., Rao, K. S., Gangashetty, S. V., & Yegnanarayana, B. (2004). Neural network models for capturing duration and intonation knowledge for language and speaker identification. In International Conference on Cognitive and Neural Systems, Boston, MA, USA.
Zurück zum Zitat Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613.CrossRef Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613.CrossRef
Zurück zum Zitat Nandi, D., Dutta, A., & Rao, K. S. (2014). Significance of CV transition and steady vowel regions for language identification. In International Conference on Contemporary Computing, Noida, India, pp. 513–517. Nandi, D., Dutta, A., & Rao, K. S. (2014). Significance of CV transition and steady vowel regions for language identification. In International Conference on Contemporary Computing, Noida, India, pp. 513–517.
Zurück zum Zitat Narendra, N. P., Rao, K. S., Ghosh, K., Vempada, R. R., & Maity, S. (2011). Development of syllable-based text-to-speech synthesis system in bengali. International Journal of Speech Technology, 14, 167–182.CrossRef Narendra, N. P., Rao, K. S., Ghosh, K., Vempada, R. R., & Maity, S. (2011). Development of syllable-based text-to-speech synthesis system in bengali. International Journal of Speech Technology, 14, 167–182.CrossRef
Zurück zum Zitat Narendranadh, M., Murthy, H. A., Rajendran, S., & Yegnanarayana, B. (1995). Transformation of formants for voice conversion using artificial neural networks. Speech Communication, 16(2), 206–216. Narendranadh, M., Murthy, H. A., Rajendran, S., & Yegnanarayana, B. (1995). Transformation of formants for voice conversion using artificial neural networks. Speech Communication, 16(2), 206–216.
Zurück zum Zitat Picone, J . W. (1993). Signal modeling techniques in speech recognition. Proceedings of IEEE, 81(9), 1215–1247.CrossRef Picone, J . W. (1993). Signal modeling techniques in speech recognition. Proceedings of IEEE, 81(9), 1215–1247.CrossRef
Zurück zum Zitat Prasanna, S. R. M., Reddy, B. V. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 556–565.CrossRef Prasanna, S. R. M., Reddy, B. V. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 556–565.CrossRef
Zurück zum Zitat Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRef Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRef
Zurück zum Zitat Rao, K. S. (2010). Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Computer Speech and Language, 24(3), 474–494.CrossRef Rao, K. S. (2010). Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Computer Speech and Language, 24(3), 474–494.CrossRef
Zurück zum Zitat Rao, K. S. (2011). Role of neural networks for developing speech systems. Sadhana, 36, 783–836.CrossRef Rao, K. S. (2011). Role of neural networks for developing speech systems. Sadhana, 36, 783–836.CrossRef
Zurück zum Zitat Rao, K. S., & Koolagudi, S. G. (2007). Transformation of speaker characteristics in speech using support vector machines. In 15th International Conference on Advanced Computing and Communication (ADCOM-2007), Guwahati, India. Rao, K. S., & Koolagudi, S. G. (2007). Transformation of speaker characteristics in speech using support vector machines. In 15th International Conference on Advanced Computing and Communication (ADCOM-2007), Guwahati, India.
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2006). Voice conversion by prosody and vocal tract modification. In 9th International Conference on Information Technology, Bhubaneswar, Orissa, India. Rao, K. S., & Yegnanarayana, B. (2006). Voice conversion by prosody and vocal tract modification. In 9th International Conference on Information Technology, Bhubaneswar, Orissa, India.
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech and Language, 21, 282–295.CrossRef Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech and Language, 21, 282–295.CrossRef
Zurück zum Zitat Rao, K. S., & Yegnanarayana, B. (2009). Intonation modeling for indian languages. Computer Speech and Language, 23(2), 240–256.CrossRef Rao, K. S., & Yegnanarayana, B. (2009). Intonation modeling for indian languages. Computer Speech and Language, 23(2), 240–256.CrossRef
Zurück zum Zitat Rao, K. S., Laskar, R. H., & Koolagudi, S. G. (2007). Voice transformation by mapping the features at syllable level. In 2nd International Conference on Pattern Recognition and Machine Intelligence, Kolkata, India. Rao, K. S., Laskar, R. H., & Koolagudi, S. G. (2007). Voice transformation by mapping the features at syllable level. In 2nd International Conference on Pattern Recognition and Machine Intelligence, Kolkata, India.
Zurück zum Zitat Sarkar, S., & Rao, K. S. (2013). Speaker verification in noisy environment using gmm supervectors. In National Conference on Communications (NCC), pp. 1–5. Sarkar, S., & Rao, K. S. (2013). Speaker verification in noisy environment using gmm supervectors. In National Conference on Communications (NCC), pp. 1–5.
Zurück zum Zitat Sarkar, S., & Rao, K . S. (2014a). Robust speaker recognition in noisy environments. New York: Springer International Publishing. Sarkar, S., & Rao, K . S. (2014a). Robust speaker recognition in noisy environments. New York: Springer International Publishing.
Zurück zum Zitat Sarkar, S., & Rao, K. S. (2014b). Stochastic feature compensation methods for speaker verification in noisy environments. Applied Soft Computing, 19, 198–214.CrossRef Sarkar, S., & Rao, K. S. (2014b). Stochastic feature compensation methods for speaker verification in noisy environments. Applied Soft Computing, 19, 198–214.CrossRef
Zurück zum Zitat Sarkar, S., & Rao, K. S. (2017). Supervector-based approaches in a discriminative framework for speaker verification in noisy environments. International Journal of Speech Technology, 20(2), 387–416.CrossRef Sarkar, S., & Rao, K. S. (2017). Supervector-based approaches in a discriminative framework for speaker verification in noisy environments. International Journal of Speech Technology, 20(2), 387–416.CrossRef
Zurück zum Zitat Sekhar, C. C. (1996). Neural network models for recognition of stop consonant–vowel (SCV) segments in continuous speech, Ph.D. dissertation, Department of Computer Science and Engineering, IIT Madras, India. Sekhar, C. C. (1996). Neural network models for recognition of stop consonant–vowel (SCV) segments in continuous speech, Ph.D. dissertation, Department of Computer Science and Engineering, IIT Madras, India.
Zurück zum Zitat Toda, T., Black, A. W., & Tokuda, K. (2007). Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech, and Language Processing, 15(8), 2222–2235.CrossRef Toda, T., Black, A. W., & Tokuda, K. (2007). Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech, and Language Processing, 15(8), 2222–2235.CrossRef
Zurück zum Zitat Toda, T., Saruwatari, H., & Shikano, K. (2001). Voice conversion algorithm based on gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In Proceedings of IEEE International Conference on Acoustics, Speech, Signal Processing, vol. 2, pp. 841–844. Toda, T., Saruwatari, H., & Shikano, K. (2001). Voice conversion algorithm based on gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In Proceedings of IEEE International Conference on Acoustics, Speech, Signal Processing, vol. 2, pp. 841–844.
Zurück zum Zitat Vuppala, A., Rao, K., & Chakrabarti, S. (2012a). Improved consonanti–vowel recognition for low bit-rate coded speech. Wiley International Journal of Adaptive Control and Signal Processing, 26(4), 333–349.CrossRef Vuppala, A., Rao, K., & Chakrabarti, S. (2012a). Improved consonanti–vowel recognition for low bit-rate coded speech. Wiley International Journal of Adaptive Control and Signal Processing, 26(4), 333–349.CrossRef
Zurück zum Zitat Vuppala, A., Yadav, J., Chakrabarti, S., & Rao, K. S. (2012b). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1894–1903.CrossRef Vuppala, A., Yadav, J., Chakrabarti, S., & Rao, K. S. (2012b). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1894–1903.CrossRef
Zurück zum Zitat Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2012c). Spotting and recognition of consonant–vowel units from continuous speech using accurate detection of vowel onset points. Circuits, Systems, and Signal Processing, 31(4), 1459–1474.CrossRef Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2012c). Spotting and recognition of consonant–vowel units from continuous speech using accurate detection of vowel onset points. Circuits, Systems, and Signal Processing, 31(4), 1459–1474.CrossRef
Zurück zum Zitat Vuppala, A. K., Rao, K. S., Chakrabarti, S., Krishnamoorthy, P., & Prasanna, S. (2011). Recognition of consonant–vowel (CV) units under background noise using combined temporal and spectral preprocessing. International Journal of Speech Technology, 14(3), 259–272.CrossRef Vuppala, A. K., Rao, K. S., Chakrabarti, S., Krishnamoorthy, P., & Prasanna, S. (2011). Recognition of consonant–vowel (CV) units under background noise using combined temporal and spectral preprocessing. International Journal of Speech Technology, 14(3), 259–272.CrossRef
Zurück zum Zitat Xihao, S., & Miyanaga, Y. (2013). Dynamic time warping for speech recognition with training part to reduce the computation. In International Symposium on Signals, Circuits and Systems (ISSCS), pp. 1–4. Xihao, S., & Miyanaga, Y. (2013). Dynamic time warping for speech recognition with training part to reduce the computation. In International Symposium on Signals, Circuits and Systems (ISSCS), pp. 1–4.
Zurück zum Zitat Yadav, J., & Rao, K. (2013). Detection of vowel offset point from speech signal. IEEE Signal Processing Letters, 20(4), 299–302.CrossRef Yadav, J., & Rao, K. (2013). Detection of vowel offset point from speech signal. IEEE Signal Processing Letters, 20(4), 299–302.CrossRef
Zurück zum Zitat Yadav, J., & Rao, K. S. (2016). Prosodic mapping using neural networks for emotion conversion in hindi language. Circuits, Systems and Signal Processing, 35(1), 139–162.MathSciNetCrossRef Yadav, J., & Rao, K. S. (2016). Prosodic mapping using neural networks for emotion conversion in hindi language. Circuits, Systems and Signal Processing, 35(1), 139–162.MathSciNetCrossRef
Zurück zum Zitat Yegnanarayana, B. (2004). Artificial neural networks. New Delhi: Prentice-Hall of India. Yegnanarayana, B. (2004). Artificial neural networks. New Delhi: Prentice-Hall of India.
Zurück zum Zitat Yegnanarayana, B., & Kishore, S. P. (2002). AANN an alternative to GMM for pattern recognition. Neural Networks, 15, 459–469.CrossRef Yegnanarayana, B., & Kishore, S. P. (2002). AANN an alternative to GMM for pattern recognition. Neural Networks, 15, 459–469.CrossRef
Zurück zum Zitat Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., & Woodland, P. (2000). The HTK book version 3.0. Cambridge: Cambridge University Press. Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., & Woodland, P. (2000). The HTK book version 3.0. Cambridge: Cambridge University Press.
Zurück zum Zitat Zhang, Z., Wang, L., Kai, A., Yamada, T., Li, W., & Iwahashi, M. (2015). Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification. EURASIP Journal on Audio, Speech, and Music Processing, 2015(1), 12.CrossRef Zhang, Z., Wang, L., Kai, A., Yamada, T., Li, W., & Iwahashi, M. (2015). Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification. EURASIP Journal on Audio, Speech, and Music Processing, 2015(1), 12.CrossRef
Metadaten
Titel
Neural network and GMM based feature mappings for consonant–vowel recognition in emotional environment
verfasst von
Jainath Yadav
K. Sreenivasa Rao
Publikationsdatum
20.11.2017
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-017-9478-1

Weitere Artikel der Ausgabe 3/2018

International Journal of Speech Technology 3/2018 Zur Ausgabe

Neuer Inhalt