nach oben

Erschienen in:

2008 | OriginalPaper | Buchkapitel

34. The Business of Speech Technologies

verfasst von : Jay Wilpon, Mazin E. Gilbert, Jordan Cohen, Ph.D

Erschienen in: Springer Handbook of Speech Processing

Verlag: Springer Berlin Heidelberg

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

With the fast pace of developments of communications networks and devices, immediate and easy access to information and services is now the expected norm. Several critical technologies have entered the marketplace as key enablers to help make this a reality. In particular, speech technologies, such as speech recognition and natural language understanding, have changed the landscape of how services are provided by businesses to consumers forever. In 30 short years, speech has progressed from an idea in research laboratories across the world, to a multibillion-dollar industry of software, hardware, service hosting, and professional services. Speech is now almost ubiquitous in cell phones. Yet, the industry is still very much in its infancy with its focus being on simple low hanging fruit applications of the technologies where the current state of technology actually fits a specific market need, such as voice enabling of call center services or voice dialing over a cell phone.

With broadband access to networks (and therefore data), anywhere, anytime, and using any device, almost a reality, speech technologies will continue to be essential for unlocking the potential that such access provides. However, to unlock this potential, advances in basic speech technologies beyond the current state of the art are essential. In this chapter, we review the business of speech technologies and its development since the 1980s. How did it start? What were the key inventions that got us where we are, and the services innovations that supported the industry over the past few decades? What are the future trends on how speech technologies will be used? And what are the key technical challenges researchers must address and resolve for the industry to move forward to meet this vision of the future? This chapter is by no means meant to be exhaustive, but it gives the reader an understanding of speech technologies, the speech business, and areas where continued technical invention and innovation will be needed before the ubiquitous use of speech technologies can be seen in the marketplace.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Environmental Robustness

Nächstes Kapitel Spoken Dialogue Systems

34.1.

J.R. Pierce: Whither speech recognition?, J. Acoust. Soc. Am. 46(4), 1029-1051 (1969)

34.2.

K.H. Davis, R. Biddulph, S. Balashek: Automatic recognition of spoken digits. In: Communication Theory, ed. by W. Jackson (Butterworths, London 1953)

34.3.

A. Lolje, M. Riley, D. Hindle, F. Pereira: The AT&T 60000 word speech-to-text system, Proc. Spoken Language Technology Workshop (Morgan Kaufmann, Austin 1995) pp. 162-165

34.4.

L. Rabiner, B.-H. Juang: Fundamentals of Speech Recognition (Prentice Hall, Englewood Cliffs 1993)MATH

34.5.

F.C. Pereira, M. Riley: Speech recognition by composition of weighted finite automata. In: Finite-State Devices for Natural Language Processing, ed. by E. Roche, Y. Schabes (MIT Press, Cambridge 1997)

34.6.

V. Goffin, C. Allauzen, E. Bocchieri, D. Hakkani-Tur, A. Ljolje, S. Parthasarathy, M. Rahim, G. Riccardi, M. Saraclar: The AT&T watson speech recognizer, Proc. IEEE ICASSP (2005)

34.7.

J. Huang, B. Kingsbury, L. Mangu, M. Padmanabhan, G. Saon, G. Zweig: Recent improvements in speech recognition performance on large conversational speech, Proc. ICSLP (2000)

34.8.

B. Atal: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, J. Acoust. Soc. Am. 55(6), 1304-1312 (1974)CrossRef

34.9.

K. Vintsyuk: Speech discrimination by dynamic programming, Kibernetika 4, 81-88 (1968)MathSciNetCrossRef

34.10.

F. Jelinek: Continuous speech recognition by statistical methods, Proc. IEEE 64(4), 532-556 (1976)CrossRef

34.11.

L.R. Rabiner: A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77(2), 257-286 (1989)CrossRef

34.12.

F. Jelinek: Statistical Methods for Speech Recognition (MIT Press, Cambridge 1997)

34.13.

L.R. Bahl, P.F. Brown, P.V. De Souza, R.L. Mercer: Maximum mutual information estimation of HMM parameters for speech recognition, Proc. IEEE ICASSP (1986)

34.14.

M.H. Cohen, J.P. Giangola, J. Balogh: Voice User Interface Design (Addison Wesley, Boston 2004)

34.15.

A. Smola, P. Bartlett, B. Scholkopf, D. Schuurmans: Advances in Large Margin Classifiers (MIT Press, Cambridge 2000)MATH

34.16.

R. Schapire, M. Rochery, M. Rahim, N. Gupta: Incorporating prior knowledge into boosting, Proc. Nineteenth Int. Conf. Machine Learning (2002)

34.17.

J. Baker: The Dragon system - an overview, IEEE Trans. ASSP 23(1), 24-29 (1975)CrossRef

34.18.

A. Gorin, G. Riccardi, J. Wright: How May I Help You?, Speech Commun. 23, 113-127 (1997)CrossRefMATH

34.19.

http://www.nexidia.com

34.20.

http://www.verint.com

34.21.

R. Natarajan, B. Prasad, B. Suhm, D. McCarthy: Speech enabled natural language call routing: BBN call director, Proc. Int. Conf. Spoken Language Process. (2002)

34.22.

L. Lee, R. Rose: A Frequency Warping Approach to Speaker Normalization, IEEE Trans. Speech Audio Process. 6, 49-60 (1998)CrossRef

34.23.

D.A. Reynolds, R.C. Rose: Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process. 3(1), 72-83 (1995)CrossRef

34.24.

X.D. Huang, A. Acero, H.-W. Hon: Spoken Language Processing (Prentice Hall, Englewood Cliffs 2001)

34.25.

M. Rahim, B.-H. Juang: Signal bias removal by maximum likelihood estimation for robust speech recognition, IEEE Trans. Speech Audio Process. 4(1), 19-30 (1996)CrossRef

34.26.

S. Bangalore, G. Riccardi: Stochastic finite-state models for spoken language machine translation, Mach. Transl. 17(3), 165-184 (2002)CrossRef

34.27.

N. Gupta, G. Tur, D. Hakkani-Tür, S. Bangalore, G. Riccardi, M. Rahim: The AT&T spoken language understanding system, IEEE Trans. Audio Speech Lang. Process. 14(1), 213-222 (2006)CrossRef

34.28.

G. Riccardi, D. Hakkani-Tür: Active and unsupervised learning for automatic speech recognition, Proc. 8th European Conf. Speech Commun. and Technol. (2003)

34.29.

S. McGlashan: Voice Extensible Markup Language (VoiceXML) Version 2.0 (2004) (http://www.w3.org/TR/2004/PR-voicexml20-20040203)

34.30.

R. Nakatsu: Anser - An application of speech technology to the Japanese banking industry, Computer 23(8), 43-48 (1990)CrossRef

34.31.

http://www.nuance.com

34.32.

http://www.tellme.com

34.33.

http://www.bevocal.com

34.34.

http://www.telureka.com

34.35.

http://www.convergys.com

34.36.

http://www.west.com

34.37.

J. Wilpon, L.R. Rabiner, C.H. Lee, E.R. Goldman: Automatic recognition of keywords in unconstrained speech using hidden Markov models, IEEE Trans. Acoust. Speech Signal Process. 38(11), 1870-1878 (1990)CrossRef

34.38.

W.T. Hartwell, M.A. Johnson, J. Picone: Automatic speech recognition using echo cancellation, US Patent 4,914,692 (1990)

34.39.

V. Franco: Automation of operator services at AT&T, Proc. Voice (1993)

34.40.

S. Shanmugham, D. Burnett: Media Resource Control Protocol Version 2 (MRCPv2) (http://tools.ietf.org/wg/speechsc/draft-ietf-speechsc-mrcpv2/draft-ietf-speechsc-mrcpv2-09.txt)

34.41.

http://www.w3.org/TR/xhtml+voice

34.42.

L.R. Rabiner: Applications of voice processing to telecommunications, Proc. IEEE 82(2), 199-228 (1994)CrossRef

34.43.

H. Sakoe, C. Chiba: Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. Acoust. Speech Signal Process. ASSP-26, 43-49 (1978)CrossRefMATH

34.44.

J. Cooperstock: From the flashing 12:00 to a usable machine: Applying UbiComp to the VCR (http://acm.org/sigchi/chi97/proceedings/short-talk/jrc.htm)

34.45.

A.H. Gray Jr., J.D. Markel: Distance measures for speech processing, IEEE Trans. ASSP 24(5), 380-391 (1976)CrossRef

34.46.

M. Przybocki, A. Martin: NISTʼs Assessment of Text Independent Speaker Recognition Performance (2005)

34.47.

M. Johnston, S. Bangalore, G. Vasireddy, A. Stent, P. Ehlen, M. Walker, S. Whittaker, P. Maloor: MATCH: An architecture for multimodal dialogue systems, Proc. 40th Annual Meeting of the Association for Computational Linguistics (2002)

34.48.

http://www.saltforum.org/saltforum/downloads/SALT1.0.pdf

34.49.

T. Paek, E. Horvitz: Conversation as action under uncertainty, Proc. Conf. Uncertainty in Artificial Intelligence (UAI) (2000)

34.50.

J.D. Williams: Partially Observable Markov Decision processes for Spoken Dialog Management, Ph.D. Thesis (University of Cambridge, Cambridge 2006)

34.51.

I. Witten, E. Frank: Data Mining (Morgan Kaufmann, Austin 1999)MATH

Titel: The Business of Speech Technologies
verfasst von: Jay Wilpon
Mazin E. Gilbert
Jordan Cohen, Ph.D
Verlag: Springer Berlin Heidelberg
Buch: Springer Handbook of Speech Processing
Print ISBN: 978-3-540-49125-5

Electronic ISBN: 978-3-540-49127-9

Copyright-Jahr: 2008
DOI: https://doi.org/10.1007/978-3-540-49127-9_34

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Frank Urbansky/© Peter Eichler / Leipzig, CO2-Fußabdruck/© Jenny Sturm / stock.adobe.com, Interview Entropie Bild 1/© Bernhard Weßling, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.