Skip to main content
Top

2016 | OriginalPaper | Chapter

6. Where Speech Recognition Is Going: Conclusion and Future Scope

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Today, voice and natural language processing are at the forefront of any human machine interaction environment. The chapter emphasizes the tremendous progress that has taken place in machine learning, statistical data-mining and pattern recognition approaches that can help in making speech interfaces more versatile and pervasive. The growing requirements of speech interfaces also warn against the impediments that may come in the way of successful implementation of acoustically robust natural interfaces. Finally, the chapter underlines the technical advances and research efforts to be undertaken for high performance real-time speech recognition that will completely change the way humans interact with their computing devices.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Geoffrey Z, Picheny M (2004) Advances in large vocabulary continuous speech recognition. Adv Comput 60:249–291CrossRef Geoffrey Z, Picheny M (2004) Advances in large vocabulary continuous speech recognition. Adv Comput 60:249–291CrossRef
2.
go back to reference Campbell N (2007) On the use of nonverbal speech sounds in human communication. In: Campbell N (ed) Verbal and nonverbal communication behaviours LNAI, vol 4775. Springer, New York, pp 117–128CrossRef Campbell N (2007) On the use of nonverbal speech sounds in human communication. In: Campbell N (ed) Verbal and nonverbal communication behaviours LNAI, vol 4775. Springer, New York, pp 117–128CrossRef
3.
go back to reference Laver J (1980) The phonetic description of voice quality. Cambridge University Press, Cambridge Laver J (1980) The phonetic description of voice quality. Cambridge University Press, Cambridge
4.
go back to reference Roach P, Stibbard R, Osborne J, Arnfield S, Setter J (1998) Transcription of prosodic and paralinguistic features of emotional speech. J Int Phonetic Assoc 28(1–2):83–94CrossRef Roach P, Stibbard R, Osborne J, Arnfield S, Setter J (1998) Transcription of prosodic and paralinguistic features of emotional speech. J Int Phonetic Assoc 28(1–2):83–94CrossRef
5.
go back to reference Crystal D (1969) Prosodic systems and intonation in English: David Crystal. Cambridge University Press, Cambridge Crystal D (1969) Prosodic systems and intonation in English: David Crystal. Cambridge University Press, Cambridge
7.
go back to reference Rolf C, Granström B (1997) Speech synthesis. In: Hardcastle WJ, Laver J (eds) The handbook of phonetic sciences. Blackwell Publishers Ltd, Oxford, pp 768–788 Rolf C, Granström B (1997) Speech synthesis. In: Hardcastle WJ, Laver J (eds) The handbook of phonetic sciences. Blackwell Publishers Ltd, Oxford, pp 768–788
8.
go back to reference Schultz T, Rogina I (1995) Acoustic and language modeling of human and nonhuman noises for human-to-human spontaneous speech recognition. In: Proceedings of ICASSP, IEEE, vol 1, Detroit, pp 293–296 Schultz T, Rogina I (1995) Acoustic and language modeling of human and nonhuman noises for human-to-human spontaneous speech recognition. In: Proceedings of ICASSP, IEEE, vol 1, Detroit, pp 293–296
9.
go back to reference Siu M, Ostendorf M (1996) Modeling disfluencies in conversational speech. In: Proceedings of the 4th international conference on spoken language processing (ICSLP-96), vol I, Atlanta, pp 386–389 Siu M, Ostendorf M (1996) Modeling disfluencies in conversational speech. In: Proceedings of the 4th international conference on spoken language processing (ICSLP-96), vol I, Atlanta, pp 386–389
10.
go back to reference Siu MH, Ostendorf M (2000) Variable N-grams and extensions for conversational speech language modeling. IEEE Trans Speech Audio Process 8(1):63–75CrossRef Siu MH, Ostendorf M (2000) Variable N-grams and extensions for conversational speech language modeling. IEEE Trans Speech Audio Process 8(1):63–75CrossRef
11.
go back to reference Prylipko D, Vlasenko B, Stolcke A, Wendemuth A (2012) Language modeling of nonverbal vocalizations in spontaneous speech. In: Proceedings of 15th international conference on text, speech and dialogue, 2012. LNCS 7499. Springer, Heidelberg, pp 4625–4628 Prylipko D, Vlasenko B, Stolcke A, Wendemuth A (2012) Language modeling of nonverbal vocalizations in spontaneous speech. In: Proceedings of 15th international conference on text, speech and dialogue, 2012. LNCS 7499. Springer, Heidelberg, pp 4625–4628
12.
go back to reference Mary ZJ, Tian X, Woods KJ, Poeppel D (2015) Multiple levels of linguistic and paralinguistic features contribute to voice recognition. Sci Rep 5:11475 Mary ZJ, Tian X, Woods KJ, Poeppel D (2015) Multiple levels of linguistic and paralinguistic features contribute to voice recognition. Sci Rep 5:11475
13.
go back to reference Schötz S (2002) Linguistic & paralinguistic phonetic variation in speaker recognition & text-to-speech synthesis. GSLT papers: speech technology 1 Schötz S (2002) Linguistic & paralinguistic phonetic variation in speaker recognition & text-to-speech synthesis. GSLT papers: speech technology 1
14.
go back to reference Furui S (1997) Recent advances in speaker recognition. Pattern Recogn Lett 18(9):859–872CrossRef Furui S (1997) Recent advances in speaker recognition. Pattern Recogn Lett 18(9):859–872CrossRef
15.
go back to reference Klatt D (1987) Review of text-to-speech conversion for English. J Acoust Soc Am 82:737–783CrossRef Klatt D (1987) Review of text-to-speech conversion for English. J Acoust Soc Am 82:737–783CrossRef
16.
go back to reference Roach P (2000). The emotion in speech project. In: Proceedings of the ISCA workshop on speech and emotion. Newcastle, Northern Ireland, Sept 2000, pp 53–59 Roach P (2000). The emotion in speech project. In: Proceedings of the ISCA workshop on speech and emotion. Newcastle, Northern Ireland, Sept 2000, pp 53–59
17.
go back to reference Gustafson-Capková S (2001) Emotions in speech: tagset and acoustic correlates. Term paper in speech technology 1, Swedish National Graduate School of Language Technology (GSLT), Stockholm University, Department of Linguistics Gustafson-Capková S (2001) Emotions in speech: tagset and acoustic correlates. Term paper in speech technology 1, Swedish National Graduate School of Language Technology (GSLT), Stockholm University, Department of Linguistics
18.
go back to reference Bahl L, Brown P, de Souza P, Mercer R (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Tokyo, Japan, pp 49–52 Bahl L, Brown P, de Souza P, Mercer R (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Tokyo, Japan, pp 49–52
19.
go back to reference He X, Deng L, Wu C (2008) Discriminative learning in sequential pattern recognition. IEEE Signal Process Mag 25(5):14–36CrossRef He X, Deng L, Wu C (2008) Discriminative learning in sequential pattern recognition. IEEE Signal Process Mag 25(5):14–36CrossRef
20.
go back to reference Deng L (1993) A stochastic model of speech incorporating hierarchical nonstationarity. IEEE Trans Speech Audio Process 1(4):471–475MathSciNetCrossRef Deng L (1993) A stochastic model of speech incorporating hierarchical nonstationarity. IEEE Trans Speech Audio Process 1(4):471–475MathSciNetCrossRef
21.
go back to reference Deng L, Aksmanovic M, Sun D, Wu J (1994) Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states. IEEE Trans Speech Audio Process 2:507–520CrossRef Deng L, Aksmanovic M, Sun D, Wu J (1994) Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states. IEEE Trans Speech Audio Process 2:507–520CrossRef
22.
go back to reference Poritz A (1998) Hidden Markov models: a guided tour. In: Proceedings of the international conference on acoustics, speech, and signal processing, vol 1, Seattle, WA, pp 1–4 Poritz A (1998) Hidden Markov models: a guided tour. In: Proceedings of the international conference on acoustics, speech, and signal processing, vol 1, Seattle, WA, pp 1–4
23.
go back to reference Glass J (2003) A probabilistic framework for segment-based speech recognition. In: Russell M, Bilmes J (eds) New computational paradigms for acoustic modeling in speech recognition, computer, speech and language (special issue), vol 17(2–3), pp 137–152 Glass J (2003) A probabilistic framework for segment-based speech recognition. In: Russell M, Bilmes J (eds) New computational paradigms for acoustic modeling in speech recognition, computer, speech and language (special issue), vol 17(2–3), pp 137–152
24.
go back to reference Deng L, Yu D, Acero A (2006) Structured speech modeling. IEEE Trans Audio, Speech Lang Process (special issue on Rich Transcription) 14(5):1492–1504 Deng L, Yu D, Acero A (2006) Structured speech modeling. IEEE Trans Audio, Speech Lang Process (special issue on Rich Transcription) 14(5):1492–1504
25.
go back to reference Chelba C, Jelinek F (2000) Structured language modeling. Comput Speech Lang 14:283–332CrossRef Chelba C, Jelinek F (2000) Structured language modeling. Comput Speech Lang 14:283–332CrossRef
26.
go back to reference Wang Y, Mahajan M, Huang X (2000) A unified context-free grammar and n-gram model for spoken language processing. In: Proceedings of the international conference on acoustics, speech, and signal processing, Istanbul, Turkey, vol 3, pp 1639–1642 Wang Y, Mahajan M, Huang X (2000) A unified context-free grammar and n-gram model for spoken language processing. In: Proceedings of the international conference on acoustics, speech, and signal processing, Istanbul, Turkey, vol 3, pp 1639–1642
27.
go back to reference Kumar N, Andreou A (1998) Heteroscedastic analysis and reduced rank HMMs for improved speech recognition. Speech Commun 26:283–297CrossRef Kumar N, Andreou A (1998) Heteroscedastic analysis and reduced rank HMMs for improved speech recognition. Speech Commun 26:283–297CrossRef
28.
go back to reference Morgan N, Zhu Q, Stolcke A, Sonmez K, Sivadas S, Shinozaki T, Ostendorf M, Jain P, Hermansky H, Ellis D, Doddington G, Chen B, Cetin O, Bourlard H, Athineos M (2005) Pushing the envelope—Aside. IEEE Signal Process Mag 22:81–88CrossRef Morgan N, Zhu Q, Stolcke A, Sonmez K, Sivadas S, Shinozaki T, Ostendorf M, Jain P, Hermansky H, Ellis D, Doddington G, Chen B, Cetin O, Bourlard H, Athineos M (2005) Pushing the envelope—Aside. IEEE Signal Process Mag 22:81–88CrossRef
29.
go back to reference Gauvain J-L, Lee C-H (1997) Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 7:711–720 Gauvain J-L, Lee C-H (1997) Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 7:711–720
30.
go back to reference Leggetter C, Woodland P (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput Speech Lang 9:171–185CrossRef Leggetter C, Woodland P (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput Speech Lang 9:171–185CrossRef
Metadata
Title
Where Speech Recognition Is Going: Conclusion and Future Scope
Author
Swati Johar
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-28047-9_6