Skip to main content

2017 | OriginalPaper | Buchkapitel

Enhanced Automatic Speech Recognition with Non-acoustic Parameters

verfasst von : N. S. Sreekanth, N. K. Narayanan

Erschienen in: Proceedings of the International Conference on Signal, Networks, Computing, and Systems

Verlag: Springer India

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A novel method for improving the accuracy of automatic speech recognition system by adding non-acoustic parameters are discussed in this paper. The gestural features which are commonly co-expressive with speech is considered for improving the accuracy of ASR system in noisy environment. Both dynamic and static gestures are integrated with speech recognition system and tested in various environmental conditions, i.e., noise levels. The accuracy of continuous speech recognition system and isolated word recognition system are tested with and without gestures under various noise conditions. The addition of visual features provides stable recognition accuracy under different environmental noise conditions for acoustic signals.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Dong Yu, Li Deng; Droppo, J.; Jian Wu; Gong, Yifan; Acero, A. “A minimum-mean-square-error noise reduction algorithm on Mel-frequency cepstra for robust speech recognition” Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on DOI:10.1109/ICASSP.2008.4518541. pp. 4041–4044. Dong Yu, Li Deng; Droppo, J.; Jian Wu; Gong, Yifan; Acero, A. “A minimum-mean-square-error noise reduction algorithm on Mel-frequency cepstra for robust speech recognition” Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on DOI:10.​1109/​ICASSP.​2008.​4518541. pp. 4041–4044.
2.
Zurück zum Zitat Wouters, Jan; Vanden Berghe, Jeff “Speech Recognition in Noise for Cochlear Implantees with a Two-Microphone Monaural Adaptive Noise Reduction System”- Ear & Hearing: Journal of American Auditory society. October 2001 - Volume 22 - Issue 5 - pp 420–430. Wouters, Jan; Vanden Berghe, Jeff “Speech Recognition in Noise for Cochlear Implantees with a Two-Microphone Monaural Adaptive Noise Reduction System”- Ear & Hearing: Journal of American Auditory society. October 2001 - Volume 22 - Issue 5 - pp 420–430.
3.
Zurück zum Zitat Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, Joe Woelfel, “Sphinx-4: A Flexible Open Source Framework for Speech Recognition” White paper -SMLI TR2004-0811 c2004 SUN MICROSYSTEMS INC. Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, Joe Woelfel, “Sphinx-4: A Flexible Open Source Framework for Speech Recognition” White paper -SMLI TR2004-0811 c2004 SUN MICROSYSTEMS INC.
4.
Zurück zum Zitat Maycel Isaac Faraj, Josef Bigun, “Lip Motion Features for Biometric Person Recognition” Book chapter of Medical Information Science Reference, IGI Global, Chapter XVII, pp. 495–532. Year 2009. Maycel Isaac Faraj, Josef Bigun, “Lip Motion Features for Biometric Person Recognition” Book chapter of Medical Information Science Reference, IGI Global, Chapter XVII, pp. 495–532. Year 2009.
5.
Zurück zum Zitat P.Prajith, “Investigations on the applications of dynamical instabilities and deterministic chaos for speech signal processing”, Ph.D Thesis, University of Calicut 2008. P.Prajith, “Investigations on the applications of dynamical instabilities and deterministic chaos for speech signal processing”, Ph.D Thesis, University of Calicut 2008.
6.
Zurück zum Zitat Petajan, E. (1984). Automatic lipreading to enhance speech recognition. Global Telecommunications Conference. (pp. 265–272). Petajan, E. (1984). Automatic lipreading to enhance speech recognition. Global Telecommunications Conference. (pp. 265–272).
7.
Zurück zum Zitat Mase, K., & Pentland, A. (1991). Automatic lip-reading by opticalflow analysis. Systems and Computers in Japan, 22(6), 67–76.CrossRef Mase, K., & Pentland, A. (1991). Automatic lip-reading by opticalflow analysis. Systems and Computers in Japan, 22(6), 67–76.CrossRef
8.
Zurück zum Zitat Kittler, J., Li, Y., Matas, J., & Sanchez, M. (1997). Combining evidence in multimodal personal identity recognition systems. Proceedings of the First 48 International Conference on Audio- and Video-Based Biometric Person Authentication, LNCS 1206, (pp. 327–334). Kittler, J., Li, Y., Matas, J., & Sanchez, M. (1997). Combining evidence in multimodal personal identity recognition systems. Proceedings of the First 48 International Conference on Audio- and Video-Based Biometric Person Authentication, LNCS 1206, (pp. 327–334).
9.
Zurück zum Zitat Yamamoto, E., Nakamura, S., & Shikano, K. (1998). Lip movement synthesis from speech based on hidden markov models. Journal of Speech Communication, 26(1), 105–115.CrossRef Yamamoto, E., Nakamura, S., & Shikano, K. (1998). Lip movement synthesis from speech based on hidden markov models. Journal of Speech Communication, 26(1), 105–115.CrossRef
10.
Zurück zum Zitat Neti, C Potamianos, G.; Luettin, J.; Matthews, I.; Glotin, H.; Vergyri, D.” Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop “, IEEE Fourth Workshop on Multimedia Signal Processing, 2001, pp. 619–624. Neti, C Potamianos, G.; Luettin, J.; Matthews, I.; Glotin, H.; Vergyri, D.” Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop “, IEEE Fourth Workshop on Multimedia Signal Processing, 2001, pp. 619–624.
11.
Zurück zum Zitat Mitra, V; Hosung Nam; Espy-Wilson, C.Y.; Saltzman, E.; Goldstein, L”Gesture-based Dynamic Bayesian Network for noise robust speech recognition”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011. pp. 5172–5175, IEEE-DOI:10.1109/ICASSP.2011.5947522. Mitra, V; Hosung Nam; Espy-Wilson, C.Y.; Saltzman, E.; Goldstein, L”Gesture-based Dynamic Bayesian Network for noise robust speech recognition”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011. pp. 5172–5175, IEEE-DOI:10.​1109/​ICASSP.​2011.​5947522.
12.
Zurück zum Zitat Ze Lei; Zhao Hui Gan; Min Jiang; Ke Dong “Artificial robot navigation based on gesture and speech recognition”, International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), 2014, pp. 323–327, IEEE DOI:10.1109/SPAC.2014.6982708. Ze Lei; Zhao Hui Gan; Min Jiang; Ke Dong “Artificial robot navigation based on gesture and speech recognition”, International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), 2014, pp. 323–327, IEEE DOI:10.​1109/​SPAC.​2014.​6982708.
13.
Zurück zum Zitat Wu-chun Feng “An integrated multimedia environment for speech recognition using handwriting and written gestures”, Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 2003. IEEE DOI:10.1109/HICSS.2003.1174293. Wu-chun Feng “An integrated multimedia environment for speech recognition using handwriting and written gestures”, Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 2003. IEEE DOI:10.​1109/​HICSS.​2003.​1174293.
14.
Zurück zum Zitat Lei Chen; Harper, M.; Quek, F. “Gesture patterns during speech repairs”, Proceedings of Fourth IEEE International Conference on Multimodal Interfaces, 2002. pp. 155–160, DOI:10.1109/ICMI.2002.1166985. Lei Chen; Harper, M.; Quek, F. “Gesture patterns during speech repairs”, Proceedings of Fourth IEEE International Conference on Multimodal Interfaces, 2002. pp. 155–160, DOI:10.​1109/​ICMI.​2002.​1166985.
15.
Zurück zum Zitat Lei Yang, Hui Li, Xiaoyu Wu, Dewei Zhao, Jun Zhai. ― An algorithm of skin detection based on texture‖. IEEE Image and Signal Processing (CSIP), 2011. Lei Yang, Hui Li, Xiaoyu Wu, Dewei Zhao, Jun Zhai. ― An algorithm of skin detection based on texture‖. IEEE Image and Signal Processing (CSIP), 2011.
16.
Zurück zum Zitat Noor Adnan Ibraheem, RafiqulZaman Khan “Survey on Various Gesture Recognition Technologies and Techniques”, International Journal of Computer Applications (0975–8887), Volume 50 – No.7, July 2012, pp. 38–44. Noor Adnan Ibraheem, RafiqulZaman Khan “Survey on Various Gesture Recognition Technologies and Techniques”, International Journal of Computer Applications (0975–8887), Volume 50 – No.7, July 2012, pp. 38–44.
17.
Zurück zum Zitat B.J Manikandan, Gowri Shankar, V Anoop, A Datta, V S Chakravarthy: LEKHAK: A System for Online Recognition of Handwritten Tamil Characters. Proceeding of the International Conference on Natural Language Processing (ICON-2002) Vikas Publishing House Pvt. Ltd. pp. 285–291. B.J Manikandan, Gowri Shankar, V Anoop, A Datta, V S Chakravarthy: LEKHAK: A System for Online Recognition of Handwritten Tamil Characters. Proceeding of the International Conference on Natural Language Processing (ICON-2002) Vikas Publishing House Pvt. Ltd. pp. 285–291.
18.
Zurück zum Zitat Daniel Jurafsky and James H. Martin “Speech and Language Processing”, Prentice Hall, Englewood Cliffs, New Jersey 07632, 2000. Daniel Jurafsky and James H. Martin “Speech and Language Processing”, Prentice Hall, Englewood Cliffs, New Jersey 07632, 2000.
Metadaten
Titel
Enhanced Automatic Speech Recognition with Non-acoustic Parameters
verfasst von
N. S. Sreekanth
N. K. Narayanan
Copyright-Jahr
2017
Verlag
Springer India
DOI
https://doi.org/10.1007/978-81-322-3592-7_10

Neuer Inhalt