Skip to main content
Erschienen in:
Buchtitelbild

2010 | OriginalPaper | Buchkapitel

1. History and Development of Speech Recognition

verfasst von : Sadaoki Furui

Erschienen in: Speech Technology

Verlag: Springer US

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Speech is the primary means of communication between humans. For reasons ranging from technological curiosity about the mechanisms for mechanical realization of human speech capabilities to the desire to automate simple tasks which necessitate human–machine interactions, research in automatic speech recognition by machines has attracted a great deal of attention for five decades.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat ATIS Technical Reports (1995). Proc. ARPA Spoken Language Systems Technology Workshop, Austin, TX, 241–280. ATIS Technical Reports (1995). Proc. ARPA Spoken Language Systems Technology Workshop, Austin, TX, 241–280.
3.
Zurück zum Zitat Beek, B., Neuberg, E., Hodge, D. (1977). An assessment of the technology of automatic speech recognition for military applications. IEEE Trans. Acoust., Speech, Signal Process., 25, 310–322. Beek, B., Neuberg, E., Hodge, D. (1977). An assessment of the technology of automatic speech recognition for military applications. IEEE Trans. Acoust., Speech, Signal Process., 25, 310–322.
4.
Zurück zum Zitat Bridle, J. S., Brown, M. D. (1979). Connected word recognition using whole word templates. In: Proc. Inst. Acoustics Autumn Conf., 25–28. Bridle, J. S., Brown, M. D. (1979). Connected word recognition using whole word templates. In: Proc. Inst. Acoustics Autumn Conf., 25–28.
5.
Zurück zum Zitat Chou, W. (2003). Minimum classification error (MCE) approach in pattern recognition. Chou, W., Juang, B.-H. (eds) Pattern Recognition in Speech and Language Processing. CRC Press, New York, 1–49. Chou, W. (2003). Minimum classification error (MCE) approach in pattern recognition. Chou, W., Juang, B.-H. (eds) Pattern Recognition in Speech and Language Processing. CRC Press, New York, 1–49.
6.
Zurück zum Zitat Chow, Y. L., Dunham, M. O., Kimball, O. A. (1987). BYBLOS, the BBN continuous speech recognition system. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Dallas, TX, 89–92. Chow, Y. L., Dunham, M. O., Kimball, O. A. (1987). BYBLOS, the BBN continuous speech recognition system. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Dallas, TX, 89–92.
7.
Zurück zum Zitat Davis, K. H., Biddulph, R., Balashek, S. (1952). Automatic recognition of spoken digits. J. Acoust. Soc. Am., 24 (6), 637–642. Davis, K. H., Biddulph, R., Balashek, S. (1952). Automatic recognition of spoken digits. J. Acoust. Soc. Am., 24 (6), 637–642.
8.
Zurück zum Zitat Ferguson, J. (ed) (1980). Hidden Markov Models for Speech. IDA, Princeton, NJ. Ferguson, J. (ed) (1980). Hidden Markov Models for Speech. IDA, Princeton, NJ.
9.
Zurück zum Zitat Forgie, J. W., Forgie, C. D. (1959). Results obtained from a vowel recognition computer program. J. Acoust. Soc. Am., 31 (11), 1480–1489. Forgie, J. W., Forgie, C. D. (1959). Results obtained from a vowel recognition computer program. J. Acoust. Soc. Am., 31 (11), 1480–1489.
10.
Zurück zum Zitat Fry, D. B., Denes, P. (1959). Theoretical aspects of mechanical speech recognition. The design and operation of the mechanical speech recognizer at University College London. J. British Inst. Radio Eng., 19 (4), 211–229. Fry, D. B., Denes, P. (1959). Theoretical aspects of mechanical speech recognition. The design and operation of the mechanical speech recognizer at University College London. J. British Inst. Radio Eng., 19 (4), 211–229.
11.
Zurück zum Zitat Furui, S. (1986). Speaker independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoust., Speech, Signal Process., 34, 52–59. Furui, S. (1986). Speaker independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoust., Speech, Signal Process., 34, 52–59.
12.
Zurück zum Zitat Furui, S. (2004). Speech-to-text and speech-to-speech summarization of spontaneous speech. IEEE Trans. Speech Audio Process., 12, 401–408. Furui, S. (2004). Speech-to-text and speech-to-speech summarization of spontaneous speech. IEEE Trans. Speech Audio Process., 12, 401–408.
13.
Zurück zum Zitat Furui, S. (2004). Fifty years of progress in speech and speaker recognition. In: Proc. 148th Acoustical Society of America Meeting, San Diego, CA, 2497. Furui, S. (2004). Fifty years of progress in speech and speaker recognition. In: Proc. 148th Acoustical Society of America Meeting, San Diego, CA, 2497.
14.
Zurück zum Zitat Furui, S. (2005). Recent progress in corpus-based spontaneous speech recognition. IEICE Trans. Inf. Syst., E88-D (3), 366–375. Furui, S. (2005). Recent progress in corpus-based spontaneous speech recognition. IEICE Trans. Inf. Syst., E88-D (3), 366–375.
15.
Zurück zum Zitat Gales, M. J. F., Young, S. J. (1993). Parallel model combination for speech recognition in noise. Technical Report, CUED/F-INFENG/TR135. Gales, M. J. F., Young, S. J. (1993). Parallel model combination for speech recognition in noise. Technical Report, CUED/F-INFENG/TR135.
16.
Zurück zum Zitat Itakura, F. (1975). Minimum prediction residual applied to speech recognition. IEEE Trans. Acoust., Speech, Signal Process., 23, 67–72. Itakura, F. (1975). Minimum prediction residual applied to speech recognition. IEEE Trans. Acoust., Speech, Signal Process., 23, 67–72.
17.
Zurück zum Zitat Jelinek, F. (1985). The development of an experimental discrete dictation recognizer. Proc. IEEE, 73 (11), 1616–1624. Jelinek, F. (1985). The development of an experimental discrete dictation recognizer. Proc. IEEE, 73 (11), 1616–1624.
18.
Zurück zum Zitat Jelinek, F., Bahl, L., Mercer, R. (1975). Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Trans. Inf. Theory, 21, 250–256.MATH Jelinek, F., Bahl, L., Mercer, R. (1975). Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Trans. Inf. Theory, 21, 250–256.MATH
19.
Zurück zum Zitat Juang, B. H., Furui, S. (2000). Automatic speech recognition and understanding: A first step toward natural human-machine communication. Proc. IEEE, 88 (8), 1142–1165. Juang, B. H., Furui, S. (2000). Automatic speech recognition and understanding: A first step toward natural human-machine communication. Proc. IEEE, 88 (8), 1142–1165.
20.
Zurück zum Zitat Juang, B. H., Rabiner, L. R. (2005). Automatic speech recognition: History. Brown, K. (ed) Encyclopedia of Language and Linguistics, Second Edition, Oxford: Elsevier, New York, 11, 806–819. Juang, B. H., Rabiner, L. R. (2005). Automatic speech recognition: History. Brown, K. (ed) Encyclopedia of Language and Linguistics, Second Edition, Oxford: Elsevier, New York, 11, 806–819.
21.
Zurück zum Zitat Junqua, J. C., Haton, J. P. (1996). Robustness in Automatic Speech Recognition. Kluwer, Boston. Junqua, J. C., Haton, J. P. (1996). Robustness in Automatic Speech Recognition. Kluwer, Boston.
22.
Zurück zum Zitat Katagiri, S. (2003). Speech pattern recognition using neural networks. Chou, W., Juang, B. H. (eds) Pattern Recognition in Speech and Language Processing. CRC Press, New York, 115–147. Katagiri, S. (2003). Speech pattern recognition using neural networks. Chou, W., Juang, B. H. (eds) Pattern Recognition in Speech and Language Processing. CRC Press, New York, 115–147.
23.
Zurück zum Zitat Kawahara, T., Lee, C. H., Juang, B. H. (1998). Key-phrase detection and verification for flexible speech understanding. IEEE Trans. Speech Audio Process, 6, 558–568. Kawahara, T., Lee, C. H., Juang, B. H. (1998). Key-phrase detection and verification for flexible speech understanding. IEEE Trans. Speech Audio Process, 6, 558–568.
24.
Zurück zum Zitat Klatt, D. (1977). Review of the ARPA speech understanding project. J. Acoust. Soc. Am., 62 (6), 1324–1366. Klatt, D. (1977). Review of the ARPA speech understanding project. J. Acoust. Soc. Am., 62 (6), 1324–1366.
25.
Zurück zum Zitat Koo, M. W., Lee, C. H., Juang, B. H. (2001). Speech recognition and utterance verification based on a generalized confidence score. IEEE Trans. Speech Audio Process, 9, 821–832. Koo, M. W., Lee, C. H., Juang, B. H. (2001). Speech recognition and utterance verification based on a generalized confidence score. IEEE Trans. Speech Audio Process, 9, 821–832.
26.
Zurück zum Zitat Lee, C. H., Giachin, E., Rabiner, L. R., Pieraccini, R., Rosenberg, A. E. (1990). Acoustic modeling for large vocabulary speech recognition. Comput. Speech Lang., 4, 127–165. Lee, C. H., Giachin, E., Rabiner, L. R., Pieraccini, R., Rosenberg, A. E. (1990). Acoustic modeling for large vocabulary speech recognition. Comput. Speech Lang., 4, 127–165.
27.
Zurück zum Zitat Lee, C. H., Rabiner, L. R. (1989). A frame synchronous network search algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process, 37, 1649–1658. Lee, C. H., Rabiner, L. R. (1989). A frame synchronous network search algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process, 37, 1649–1658.
28.
Zurück zum Zitat Lee, K. F., Hon, H., Reddy, R. (1990). An overview of the SPHINX speech recognition system. IEEE Trans. Acoust., Speech, Signal Process, 38, 600–610. Lee, K. F., Hon, H., Reddy, R. (1990). An overview of the SPHINX speech recognition system. IEEE Trans. Acoust., Speech, Signal Process, 38, 600–610.
29.
Zurück zum Zitat Leggetter, C. J., Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang., 9, 171–185. Leggetter, C. J., Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang., 9, 171–185.
30.
Zurück zum Zitat Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE ASSP Mag., 4 (2), 4–22. Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE ASSP Mag., 4 (2), 4–22.
31.
Zurück zum Zitat Lippmann, R. P. (1997). Speech recognition by machines and humans. Speech Communication, 22, 1–15. Lippmann, R. P. (1997). Speech recognition by machines and humans. Speech Communication, 22, 1–15.
32.
Zurück zum Zitat Liu, Y., Shriberg, E., Stolcke, A., Peskin, B., Ang, J., Hillard, D., Ostendorf, M., Tomalin, M., Woodland, P. C., Harper, M. (2005). Structural metadata research in the EARS program. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Montreal, Canada, V, 957–960. Liu, Y., Shriberg, E., Stolcke, A., Peskin, B., Ang, J., Hillard, D., Ostendorf, M., Tomalin, M., Woodland, P. C., Harper, M. (2005). Structural metadata research in the EARS program. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Montreal, Canada, V, 957–960.
33.
Zurück zum Zitat Lowerre, B. (1980). The HARPY speech understanding system. Lea, W (ed) Trends in Speech Recognition. Prentice Hall, NJ, 576–586. Lowerre, B. (1980). The HARPY speech understanding system. Lea, W (ed) Trends in Speech Recognition. Prentice Hall, NJ, 576–586.
34.
Zurück zum Zitat Martin, T. B., Nelson, A. L., Zadell, H. J. (1964). Speech recognition by feature abstraction techniques. Technical Report AL-TDR-64-176, Air Force Avionics Lab. Martin, T. B., Nelson, A. L., Zadell, H. J. (1964). Speech recognition by feature abstraction techniques. Technical Report AL-TDR-64-176, Air Force Avionics Lab.
35.
Zurück zum Zitat Moore, R. C. (1997). Using natural-language knowledge sources in speech recognition. Ponting, K. (ed) Computational Models of Speech Pattern Processing. Springer, Berlin, 304–327. Moore, R. C. (1997). Using natural-language knowledge sources in speech recognition. Ponting, K. (ed) Computational Models of Speech Pattern Processing. Springer, Berlin, 304–327.
36.
Zurück zum Zitat Myers, C. S., Rabiner, L. R. (1981). A level building dynamic time warping algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process., 29, 284–297.MATH Myers, C. S., Rabiner, L. R. (1981). A level building dynamic time warping algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process., 29, 284–297.MATH
37.
Zurück zum Zitat Nagata, K., Kato, Y., Chiba, S. (1963). Spoken digit recognizer for Japanese language. NEC Res. Develop., 6. Nagata, K., Kato, Y., Chiba, S. (1963). Spoken digit recognizer for Japanese language. NEC Res. Develop., 6.
38.
Zurück zum Zitat Olson, H. F., Belar, H. (1956). Phonetic typewriter. J. Acoust. Soc. Am., 28 (6), 1072–1081. Olson, H. F., Belar, H. (1956). Phonetic typewriter. J. Acoust. Soc. Am., 28 (6), 1072–1081.
39.
Zurück zum Zitat Paul, D. B. (1989). The Lincoln robust continuous speech recognizer. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Glasgow, Scotland, 449–452. Paul, D. B. (1989). The Lincoln robust continuous speech recognizer. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Glasgow, Scotland, 449–452.
40.
Zurück zum Zitat Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77 (2), 257–286. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77 (2), 257–286.
41.
Zurück zum Zitat Rabiner, L. R., Juang, B. H. (1993). Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliff, NJ. Rabiner, L. R., Juang, B. H. (1993). Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliff, NJ.
42.
Zurück zum Zitat Rabiner, L. R., Levinson, S. E., Rosenberg, A. E. (1979). Speaker independent recognition of isolated words using clustering techniques. IEEE Trans. Acoust., Speech, Signal Process., 27, 336–349.MATH Rabiner, L. R., Levinson, S. E., Rosenberg, A. E. (1979). Speaker independent recognition of isolated words using clustering techniques. IEEE Trans. Acoust., Speech, Signal Process., 27, 336–349.MATH
43.
Zurück zum Zitat Reddy, D. R. (1966). An approach to computer speech recognition by direct analysis of the speech wave. Technical Report No. C549, Computer Science Department, Stanford University, Stanford. Reddy, D. R. (1966). An approach to computer speech recognition by direct analysis of the speech wave. Technical Report No. C549, Computer Science Department, Stanford University, Stanford.
44.
Zurück zum Zitat Sakai, T., Doshita, S. (1962). The phonetic typewriter, information processing. In: Proc. IFIP Congress, Munich. Sakai, T., Doshita, S. (1962). The phonetic typewriter, information processing. In: Proc. IFIP Congress, Munich.
45.
Zurück zum Zitat Sakoe, H. (1979). Two level DP matching – a dynamic programming based pattern matching algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process., 27, 588–595. Sakoe, H. (1979). Two level DP matching – a dynamic programming based pattern matching algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process., 27, 588–595.
46.
Zurück zum Zitat Sakoe, H., Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust., Speech, Signal Process., 26, 43–49.MATH Sakoe, H., Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust., Speech, Signal Process., 26, 43–49.MATH
47.
Zurück zum Zitat Shinoda, K., Lee, C. H. (2001). A structural Bayes approach to speaker adaptation. IEEE Trans. Speech Audio Process., 9, 276–287. Shinoda, K., Lee, C. H. (2001). A structural Bayes approach to speaker adaptation. IEEE Trans. Speech Audio Process., 9, 276–287.
48.
Zurück zum Zitat Soltau, H., Kingsbury, B., Mangu, L., Povey, D., Saon, G., Zweig, G. (2005). The IBM 2004 conversational telephone system for rich transcription. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Montreal, Canada, I, 205–208. Soltau, H., Kingsbury, B., Mangu, L., Povey, D., Saon, G., Zweig, G. (2005). The IBM 2004 conversational telephone system for rich transcription. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Montreal, Canada, I, 205–208.
49.
Zurück zum Zitat Suzuki, J., Nakata, K. (1961). Recognition of Japanese vowels – preliminary to the recognition of speech. J. Radio Res. Lab., 37 (8), 193–212. Suzuki, J., Nakata, K. (1961). Recognition of Japanese vowels – preliminary to the recognition of speech. J. Radio Res. Lab., 37 (8), 193–212.
50.
Zurück zum Zitat Tappert, C., Dixon, N. R., Rabinowitz, A. S., Chapman, W. D. (1971). Automatic recognition of continuous speech utilizing dynamic segmentation, dual classification, sequential decoding and error recovery. Rome Air Dev. Cen, Rome, NY, Technical Report TR 71–146. Tappert, C., Dixon, N. R., Rabinowitz, A. S., Chapman, W. D. (1971). Automatic recognition of continuous speech utilizing dynamic segmentation, dual classification, sequential decoding and error recovery. Rome Air Dev. Cen, Rome, NY, Technical Report TR 71–146.
51.
Zurück zum Zitat Varga, P., Moore, R. K. (1990). Hidden Markov model decomposition of speech and noise. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Albuquerque, New Mexico, 845–848. Varga, P., Moore, R. K. (1990). Hidden Markov model decomposition of speech and noise. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Albuquerque, New Mexico, 845–848.
52.
Zurück zum Zitat Velichko, V. M., Zagoruyko, N. G. (1970). Automatic recognition of 200 words. Int. J. Man-Machine Studies, 2, 223–234. Velichko, V. M., Zagoruyko, N. G. (1970). Automatic recognition of 200 words. Int. J. Man-Machine Studies, 2, 223–234.
53.
Zurück zum Zitat Vintsyuk, T. K. (1968). Speech discrimination by dynamic programming. Kibernetika, 4 (2), 81–88.MathSciNet Vintsyuk, T. K. (1968). Speech discrimination by dynamic programming. Kibernetika, 4 (2), 81–88.MathSciNet
54.
Zurück zum Zitat Viterbi, J. (1967). Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Inf. Theory, 13, 260–269.MATH Viterbi, J. (1967). Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Inf. Theory, 13, 260–269.MATH
55.
Zurück zum Zitat Waibel, A., Hanazawa, T., Hinton, G., Shiano, K., Lang, K. (1989). Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust., Speech, Signal Process., 37, 393–404. Waibel, A., Hanazawa, T., Hinton, G., Shiano, K., Lang, K. (1989). Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust., Speech, Signal Process., 37, 393–404.
56.
Zurück zum Zitat Weintraub, M., Murveit, H., Cohen, M., Price, P., Bernstein, J., Bell, G. (1989). Linguistic constraints in hidden Markov model based speech recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Glasgow, Scotland, 699–702. Weintraub, M., Murveit, H., Cohen, M., Price, P., Bernstein, J., Bell, G. (1989). Linguistic constraints in hidden Markov model based speech recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Glasgow, Scotland, 699–702.
57.
Zurück zum Zitat Zue, V., Glass, J., Phillips, M., Seneff, S. (1989). The MIT summit speech recognition system, a progress report. In: Proc. DARPA Speech and Natural Language Workshop, Philadelphia, PA, 179–189. Zue, V., Glass, J., Phillips, M., Seneff, S. (1989). The MIT summit speech recognition system, a progress report. In: Proc. DARPA Speech and Natural Language Workshop, Philadelphia, PA, 179–189.
58.
Zurück zum Zitat Zweig, G. (1998). Speech recognition with dynamic Bayesian networks. Ph.D. Thesis, University of California, Berkeley. Zweig, G. (1998). Speech recognition with dynamic Bayesian networks. Ph.D. Thesis, University of California, Berkeley.
Metadaten
Titel
History and Development of Speech Recognition
verfasst von
Sadaoki Furui
Copyright-Jahr
2010
Verlag
Springer US
DOI
https://doi.org/10.1007/978-0-387-73819-2_1

Neuer Inhalt