Top

Published in:

2016 | OriginalPaper | Chapter

6. Where Speech Recognition Is Going: Conclusion and Future Scope

Author : Swati Johar

Published in: Emotion, Affect and Personality in Speech

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Today, voice and natural language processing are at the forefront of any human machine interaction environment. The chapter emphasizes the tremendous progress that has taken place in machine learning, statistical data-mining and pattern recognition approaches that can help in making speech interfaces more versatile and pervasive. The growing requirements of speech interfaces also warn against the impediments that may come in the way of successful implementation of acoustically robust natural interfaces. Finally, the chapter underlines the technical advances and research efforts to be undertaken for high performance real-time speech recognition that will completely change the way humans interact with their computing devices.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Emotional Speech Recognition

Geoffrey Z, Picheny M (2004) Advances in large vocabulary continuous speech recognition. Adv Comput 60:249–291CrossRef

Campbell N (2007) On the use of nonverbal speech sounds in human communication. In: Campbell N (ed) Verbal and nonverbal communication behaviours LNAI, vol 4775. Springer, New York, pp 117–128CrossRef

Laver J (1980) The phonetic description of voice quality. Cambridge University Press, Cambridge

Roach P, Stibbard R, Osborne J, Arnfield S, Setter J (1998) Transcription of prosodic and paralinguistic features of emotional speech. J Int Phonetic Assoc 28(1–2):83–94CrossRef

Crystal D (1969) Prosodic systems and intonation in English: David Crystal. Cambridge University Press, Cambridge

Carlson R (2002) Dialogue system. Slide presentation, speech technology, GSLT, Göteborg, 23 Oct 2002. http://www.speech.kth.se/~rolf/gslt/GSLT021023_dialogue.pdf. Accessed 17 August 2015

Rolf C, Granström B (1997) Speech synthesis. In: Hardcastle WJ, Laver J (eds) The handbook of phonetic sciences. Blackwell Publishers Ltd, Oxford, pp 768–788

Schultz T, Rogina I (1995) Acoustic and language modeling of human and nonhuman noises for human-to-human spontaneous speech recognition. In: Proceedings of ICASSP, IEEE, vol 1, Detroit, pp 293–296

Siu M, Ostendorf M (1996) Modeling disfluencies in conversational speech. In: Proceedings of the 4th international conference on spoken language processing (ICSLP-96), vol I, Atlanta, pp 386–389

10.

Siu MH, Ostendorf M (2000) Variable N-grams and extensions for conversational speech language modeling. IEEE Trans Speech Audio Process 8(1):63–75CrossRef

11.

Prylipko D, Vlasenko B, Stolcke A, Wendemuth A (2012) Language modeling of nonverbal vocalizations in spontaneous speech. In: Proceedings of 15th international conference on text, speech and dialogue, 2012. LNCS 7499. Springer, Heidelberg, pp 4625–4628

12.

Mary ZJ, Tian X, Woods KJ, Poeppel D (2015) Multiple levels of linguistic and paralinguistic features contribute to voice recognition. Sci Rep 5:11475

13.

Schötz S (2002) Linguistic & paralinguistic phonetic variation in speaker recognition & text-to-speech synthesis. GSLT papers: speech technology 1

14.

Furui S (1997) Recent advances in speaker recognition. Pattern Recogn Lett 18(9):859–872CrossRef

15.

Klatt D (1987) Review of text-to-speech conversion for English. J Acoust Soc Am 82:737–783CrossRef

16.

Roach P (2000). The emotion in speech project. In: Proceedings of the ISCA workshop on speech and emotion. Newcastle, Northern Ireland, Sept 2000, pp 53–59

17.

Gustafson-Capková S (2001) Emotions in speech: tagset and acoustic correlates. Term paper in speech technology 1, Swedish National Graduate School of Language Technology (GSLT), Stockholm University, Department of Linguistics

18.

Bahl L, Brown P, de Souza P, Mercer R (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Tokyo, Japan, pp 49–52

19.

He X, Deng L, Wu C (2008) Discriminative learning in sequential pattern recognition. IEEE Signal Process Mag 25(5):14–36CrossRef

20.

Deng L (1993) A stochastic model of speech incorporating hierarchical nonstationarity. IEEE Trans Speech Audio Process 1(4):471–475MathSciNetCrossRef

21.

Deng L, Aksmanovic M, Sun D, Wu J (1994) Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states. IEEE Trans Speech Audio Process 2:507–520CrossRef

22.

Poritz A (1998) Hidden Markov models: a guided tour. In: Proceedings of the international conference on acoustics, speech, and signal processing, vol 1, Seattle, WA, pp 1–4

23.

Glass J (2003) A probabilistic framework for segment-based speech recognition. In: Russell M, Bilmes J (eds) New computational paradigms for acoustic modeling in speech recognition, computer, speech and language (special issue), vol 17(2–3), pp 137–152

24.

Deng L, Yu D, Acero A (2006) Structured speech modeling. IEEE Trans Audio, Speech Lang Process (special issue on Rich Transcription) 14(5):1492–1504

25.

Chelba C, Jelinek F (2000) Structured language modeling. Comput Speech Lang 14:283–332CrossRef

26.

Wang Y, Mahajan M, Huang X (2000) A unified context-free grammar and n-gram model for spoken language processing. In: Proceedings of the international conference on acoustics, speech, and signal processing, Istanbul, Turkey, vol 3, pp 1639–1642

27.

Kumar N, Andreou A (1998) Heteroscedastic analysis and reduced rank HMMs for improved speech recognition. Speech Commun 26:283–297CrossRef

28.

Morgan N, Zhu Q, Stolcke A, Sonmez K, Sivadas S, Shinozaki T, Ostendorf M, Jain P, Hermansky H, Ellis D, Doddington G, Chen B, Cetin O, Bourlard H, Athineos M (2005) Pushing the envelope—Aside. IEEE Signal Process Mag 22:81–88CrossRef

29.

Gauvain J-L, Lee C-H (1997) Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 7:711–720

30.

Leggetter C, Woodland P (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput Speech Lang 9:171–185CrossRef

Title: Where Speech Recognition Is Going: Conclusion and Future Scope
Author: Swati Johar
Publisher: Springer International Publishing
Book: Emotion, Affect and Personality in Speech
Print ISBN: 978-3-319-28045-5

Electronic ISBN: 978-3-319-28047-9

Copyright Year: 2016
DOI: https://doi.org/10.1007/978-3-319-28047-9_6