Skip to main content

2008 | OriginalPaper | Buchkapitel

34. The Business of Speech Technologies

verfasst von : Jay Wilpon, Mazin E. Gilbert, Jordan Cohen, Ph.D

Erschienen in: Springer Handbook of Speech Processing

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the fast pace of developments of communications networks and devices, immediate and easy access to information and services is now the expected norm. Several critical technologies have entered the marketplace as key enablers to help make this a reality. In particular, speech technologies, such as speech recognition and natural language understanding, have changed the landscape of how services are provided by businesses to consumers forever. In 30 short years, speech has progressed from an idea in research laboratories across the world, to a multibillion-dollar industry of software, hardware, service hosting, and professional services. Speech is now almost ubiquitous in cell phones. Yet, the industry is still very much in its infancy with its focus being on simple low hanging fruit applications of the technologies where the current state of technology actually fits a specific market need, such as voice enabling of call center services or voice dialing over a cell phone.
With broadband access to networks (and therefore data), anywhere, anytime, and using any device, almost a reality, speech technologies will continue to be essential for unlocking the potential that such access provides. However, to unlock this potential, advances in basic speech technologies beyond the current state of the art are essential. In this chapter, we review the business of speech technologies and its development since the 1980s. How did it start? What were the key inventions that got us where we are, and the services innovations that supported the industry over the past few decades? What are the future trends on how speech technologies will be used? And what are the key technical challenges researchers must address and resolve for the industry to move forward to meet this vision of the future? This chapter is by no means meant to be exhaustive, but it gives the reader an understanding of speech technologies, the speech business, and areas where continued technical invention and innovation will be needed before the ubiquitous use of speech technologies can be seen in the marketplace.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
34.1.
Zurück zum Zitat J.R. Pierce: Whither speech recognition?, J. Acoust. Soc. Am. 46(4), 1029-1051 (1969) J.R. Pierce: Whither speech recognition?, J. Acoust. Soc. Am. 46(4), 1029-1051 (1969)
34.2.
Zurück zum Zitat K.H. Davis, R. Biddulph, S. Balashek: Automatic recognition of spoken digits. In: Communication Theory, ed. by W. Jackson (Butterworths, London 1953) K.H. Davis, R. Biddulph, S. Balashek: Automatic recognition of spoken digits. In: Communication Theory, ed. by W. Jackson (Butterworths, London 1953)
34.3.
Zurück zum Zitat A. Lolje, M. Riley, D. Hindle, F. Pereira: The AT&T 60000 word speech-to-text system, Proc. Spoken Language Technology Workshop (Morgan Kaufmann, Austin 1995) pp. 162-165 A. Lolje, M. Riley, D. Hindle, F. Pereira: The AT&T 60000 word speech-to-text system, Proc. Spoken Language Technology Workshop (Morgan Kaufmann, Austin 1995) pp. 162-165
34.4.
Zurück zum Zitat L. Rabiner, B.-H. Juang: Fundamentals of Speech Recognition (Prentice Hall, Englewood Cliffs 1993)MATH L. Rabiner, B.-H. Juang: Fundamentals of Speech Recognition (Prentice Hall, Englewood Cliffs 1993)MATH
34.5.
Zurück zum Zitat F.C. Pereira, M. Riley: Speech recognition by composition of weighted finite automata. In: Finite-State Devices for Natural Language Processing, ed. by E. Roche, Y. Schabes (MIT Press, Cambridge 1997) F.C. Pereira, M. Riley: Speech recognition by composition of weighted finite automata. In: Finite-State Devices for Natural Language Processing, ed. by E. Roche, Y. Schabes (MIT Press, Cambridge 1997)
34.6.
Zurück zum Zitat V. Goffin, C. Allauzen, E. Bocchieri, D. Hakkani-Tur, A. Ljolje, S. Parthasarathy, M. Rahim, G. Riccardi, M. Saraclar: The AT&T watson speech recognizer, Proc. IEEE ICASSP (2005) V. Goffin, C. Allauzen, E. Bocchieri, D. Hakkani-Tur, A. Ljolje, S. Parthasarathy, M. Rahim, G. Riccardi, M. Saraclar: The AT&T watson speech recognizer, Proc. IEEE ICASSP (2005)
34.7.
Zurück zum Zitat J. Huang, B. Kingsbury, L. Mangu, M. Padmanabhan, G. Saon, G. Zweig: Recent improvements in speech recognition performance on large conversational speech, Proc. ICSLP (2000) J. Huang, B. Kingsbury, L. Mangu, M. Padmanabhan, G. Saon, G. Zweig: Recent improvements in speech recognition performance on large conversational speech, Proc. ICSLP (2000)
34.8.
Zurück zum Zitat B. Atal: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, J. Acoust. Soc. Am. 55(6), 1304-1312 (1974)CrossRef B. Atal: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, J. Acoust. Soc. Am. 55(6), 1304-1312 (1974)CrossRef
34.10.
Zurück zum Zitat F. Jelinek: Continuous speech recognition by statistical methods, Proc. IEEE 64(4), 532-556 (1976)CrossRef F. Jelinek: Continuous speech recognition by statistical methods, Proc. IEEE 64(4), 532-556 (1976)CrossRef
34.11.
Zurück zum Zitat L.R. Rabiner: A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77(2), 257-286 (1989)CrossRef L.R. Rabiner: A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77(2), 257-286 (1989)CrossRef
34.12.
Zurück zum Zitat F. Jelinek: Statistical Methods for Speech Recognition (MIT Press, Cambridge 1997) F. Jelinek: Statistical Methods for Speech Recognition (MIT Press, Cambridge 1997)
34.13.
Zurück zum Zitat L.R. Bahl, P.F. Brown, P.V. De Souza, R.L. Mercer: Maximum mutual information estimation of HMM parameters for speech recognition, Proc. IEEE ICASSP (1986) L.R. Bahl, P.F. Brown, P.V. De Souza, R.L. Mercer: Maximum mutual information estimation of HMM parameters for speech recognition, Proc. IEEE ICASSP (1986)
34.14.
Zurück zum Zitat M.H. Cohen, J.P. Giangola, J. Balogh: Voice User Interface Design (Addison Wesley, Boston 2004) M.H. Cohen, J.P. Giangola, J. Balogh: Voice User Interface Design (Addison Wesley, Boston 2004)
34.15.
Zurück zum Zitat A. Smola, P. Bartlett, B. Scholkopf, D. Schuurmans: Advances in Large Margin Classifiers (MIT Press, Cambridge 2000)MATH A. Smola, P. Bartlett, B. Scholkopf, D. Schuurmans: Advances in Large Margin Classifiers (MIT Press, Cambridge 2000)MATH
34.16.
Zurück zum Zitat R. Schapire, M. Rochery, M. Rahim, N. Gupta: Incorporating prior knowledge into boosting, Proc. Nineteenth Int. Conf. Machine Learning (2002) R. Schapire, M. Rochery, M. Rahim, N. Gupta: Incorporating prior knowledge into boosting, Proc. Nineteenth Int. Conf. Machine Learning (2002)
34.17.
Zurück zum Zitat J. Baker: The Dragon system - an overview, IEEE Trans. ASSP 23(1), 24-29 (1975)CrossRef J. Baker: The Dragon system - an overview, IEEE Trans. ASSP 23(1), 24-29 (1975)CrossRef
34.18.
Zurück zum Zitat A. Gorin, G. Riccardi, J. Wright: How May I Help You?, Speech Commun. 23, 113-127 (1997)CrossRefMATH A. Gorin, G. Riccardi, J. Wright: How May I Help You?, Speech Commun. 23, 113-127 (1997)CrossRefMATH
34.21.
Zurück zum Zitat R. Natarajan, B. Prasad, B. Suhm, D. McCarthy: Speech enabled natural language call routing: BBN call director, Proc. Int. Conf. Spoken Language Process. (2002) R. Natarajan, B. Prasad, B. Suhm, D. McCarthy: Speech enabled natural language call routing: BBN call director, Proc. Int. Conf. Spoken Language Process. (2002)
34.22.
Zurück zum Zitat L. Lee, R. Rose: A Frequency Warping Approach to Speaker Normalization, IEEE Trans. Speech Audio Process. 6, 49-60 (1998)CrossRef L. Lee, R. Rose: A Frequency Warping Approach to Speaker Normalization, IEEE Trans. Speech Audio Process. 6, 49-60 (1998)CrossRef
34.23.
Zurück zum Zitat D.A. Reynolds, R.C. Rose: Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process. 3(1), 72-83 (1995)CrossRef D.A. Reynolds, R.C. Rose: Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process. 3(1), 72-83 (1995)CrossRef
34.24.
Zurück zum Zitat X.D. Huang, A. Acero, H.-W. Hon: Spoken Language Processing (Prentice Hall, Englewood Cliffs 2001) X.D. Huang, A. Acero, H.-W. Hon: Spoken Language Processing (Prentice Hall, Englewood Cliffs 2001)
34.25.
Zurück zum Zitat M. Rahim, B.-H. Juang: Signal bias removal by maximum likelihood estimation for robust speech recognition, IEEE Trans. Speech Audio Process. 4(1), 19-30 (1996)CrossRef M. Rahim, B.-H. Juang: Signal bias removal by maximum likelihood estimation for robust speech recognition, IEEE Trans. Speech Audio Process. 4(1), 19-30 (1996)CrossRef
34.26.
Zurück zum Zitat S. Bangalore, G. Riccardi: Stochastic finite-state models for spoken language machine translation, Mach. Transl. 17(3), 165-184 (2002)CrossRef S. Bangalore, G. Riccardi: Stochastic finite-state models for spoken language machine translation, Mach. Transl. 17(3), 165-184 (2002)CrossRef
34.27.
Zurück zum Zitat N. Gupta, G. Tur, D. Hakkani-Tür, S. Bangalore, G. Riccardi, M. Rahim: The AT&T spoken language understanding system, IEEE Trans. Audio Speech Lang. Process. 14(1), 213-222 (2006)CrossRef N. Gupta, G. Tur, D. Hakkani-Tür, S. Bangalore, G. Riccardi, M. Rahim: The AT&T spoken language understanding system, IEEE Trans. Audio Speech Lang. Process. 14(1), 213-222 (2006)CrossRef
34.28.
Zurück zum Zitat G. Riccardi, D. Hakkani-Tür: Active and unsupervised learning for automatic speech recognition, Proc. 8th European Conf. Speech Commun. and Technol. (2003) G. Riccardi, D. Hakkani-Tür: Active and unsupervised learning for automatic speech recognition, Proc. 8th European Conf. Speech Commun. and Technol. (2003)
34.30.
Zurück zum Zitat R. Nakatsu: Anser - An application of speech technology to the Japanese banking industry, Computer 23(8), 43-48 (1990)CrossRef R. Nakatsu: Anser - An application of speech technology to the Japanese banking industry, Computer 23(8), 43-48 (1990)CrossRef
34.37.
Zurück zum Zitat J. Wilpon, L.R. Rabiner, C.H. Lee, E.R. Goldman: Automatic recognition of keywords in unconstrained speech using hidden Markov models, IEEE Trans. Acoust. Speech Signal Process. 38(11), 1870-1878 (1990)CrossRef J. Wilpon, L.R. Rabiner, C.H. Lee, E.R. Goldman: Automatic recognition of keywords in unconstrained speech using hidden Markov models, IEEE Trans. Acoust. Speech Signal Process. 38(11), 1870-1878 (1990)CrossRef
34.38.
Zurück zum Zitat W.T. Hartwell, M.A. Johnson, J. Picone: Automatic speech recognition using echo cancellation, US Patent 4,914,692 (1990) W.T. Hartwell, M.A. Johnson, J. Picone: Automatic speech recognition using echo cancellation, US Patent 4,914,692 (1990)
34.39.
Zurück zum Zitat V. Franco: Automation of operator services at AT&T, Proc. Voice (1993) V. Franco: Automation of operator services at AT&T, Proc. Voice (1993)
34.42.
Zurück zum Zitat L.R. Rabiner: Applications of voice processing to telecommunications, Proc. IEEE 82(2), 199-228 (1994)CrossRef L.R. Rabiner: Applications of voice processing to telecommunications, Proc. IEEE 82(2), 199-228 (1994)CrossRef
34.43.
Zurück zum Zitat H. Sakoe, C. Chiba: Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. Acoust. Speech Signal Process. ASSP-26, 43-49 (1978)CrossRefMATH H. Sakoe, C. Chiba: Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. Acoust. Speech Signal Process. ASSP-26, 43-49 (1978)CrossRefMATH
34.45.
Zurück zum Zitat A.H. Gray Jr., J.D. Markel: Distance measures for speech processing, IEEE Trans. ASSP 24(5), 380-391 (1976)CrossRef A.H. Gray Jr., J.D. Markel: Distance measures for speech processing, IEEE Trans. ASSP 24(5), 380-391 (1976)CrossRef
34.46.
Zurück zum Zitat M. Przybocki, A. Martin: NISTʼs Assessment of Text Independent Speaker Recognition Performance (2005) M. Przybocki, A. Martin: NISTʼs Assessment of Text Independent Speaker Recognition Performance (2005)
34.47.
Zurück zum Zitat M. Johnston, S. Bangalore, G. Vasireddy, A. Stent, P. Ehlen, M. Walker, S. Whittaker, P. Maloor: MATCH: An architecture for multimodal dialogue systems, Proc. 40th Annual Meeting of the Association for Computational Linguistics (2002) M. Johnston, S. Bangalore, G. Vasireddy, A. Stent, P. Ehlen, M. Walker, S. Whittaker, P. Maloor: MATCH: An architecture for multimodal dialogue systems, Proc. 40th Annual Meeting of the Association for Computational Linguistics (2002)
34.49.
Zurück zum Zitat T. Paek, E. Horvitz: Conversation as action under uncertainty, Proc. Conf. Uncertainty in Artificial Intelligence (UAI) (2000) T. Paek, E. Horvitz: Conversation as action under uncertainty, Proc. Conf. Uncertainty in Artificial Intelligence (UAI) (2000)
34.50.
Zurück zum Zitat J.D. Williams: Partially Observable Markov Decision processes for Spoken Dialog Management, Ph.D. Thesis (University of Cambridge, Cambridge 2006) J.D. Williams: Partially Observable Markov Decision processes for Spoken Dialog Management, Ph.D. Thesis (University of Cambridge, Cambridge 2006)
34.51.
Zurück zum Zitat I. Witten, E. Frank: Data Mining (Morgan Kaufmann, Austin 1999)MATH I. Witten, E. Frank: Data Mining (Morgan Kaufmann, Austin 1999)MATH
Metadaten
Titel
The Business of Speech Technologies
verfasst von
Jay Wilpon
Mazin E. Gilbert
Jordan Cohen, Ph.D
Copyright-Jahr
2008
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-540-49127-9_34

Neuer Inhalt