nach oben

Erschienen in:

2015 | OriginalPaper | Buchkapitel

5. Ensemble Learning Approaches in Speech Recognition

verfasst von : Yunxin Zhao, Jian Xue, Xin Chen

Erschienen in: Speech and Audio Processing for Coding, Enhancement and Recognition

Verlag: Springer New York

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

An overview is made on the ensemble learning efforts that have emerged in automatic speech recognition in recent years. The approaches that are based on different machine learning techniques and target various levels and components of speech recognition are described, and their effectiveness is discussed in terms of the direct performance measure of word error rate and the indirect measures of classification margin, diversity, as well as bias and variance. In addition, methods on reducing storage and computation costs of ensemble models for practical deployments of speech recognition systems are discussed. Ensemble learning for speech recognition has been largely fruitful, and it is expected to continue progress along with the advances in machine learning, speech and language modeling, as well as computing technology.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Recent Speech Coding Technologies and Standards

Nächstes Kapitel Deep Dynamic Models for Learning Hidden Representations of Speech Features

K. Audhkhasi, A.M. Zavou, P.G. Georgiou, S.S. Narayanan, Theoretical analysis of diversity in an ensemble of automatic speech recognition systems. IEEE Trans. ASLP 22(3), 711–726 (2014)

P. Biuhlmann, Bagging, subagging and bragging for improving some prediction algorithms, in Recent Advances and Trends in Nonparametric Statistics, ed. by E.G. Akritas, D.N. Politis (Elsevier, Amsterdam, 2003)

J.K. Bradley, R.E. Schapire, FileterBoost: regression and classification on large datasets, in Advances in Neural Information Processing Systems, ed. by J.C. Platt et al., vol. 20 (MIT Press, Cambridge, 2008)

L. Breiman, Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MathSciNetMATH

L. Breiman, Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefMATH

C. Bresline, M.J.F. Gales, Generating complimentary systems for large vocabulary continuous speech recognition, in Proceeding of Interspeech (2006)

C. Bresline, M.J.F. Gales, Building multiple complementary systems using directed decision tree, in Proceeding of Interspeech (2007), pp. 1441–1444

G. Brown, An information theoretic perspective on multiple classifier system, in Proceedings of MCS (2009), pp. 344–353

S.F. Chen, J.T. Goodman, An empirical study of smoothing techniques for language modeling, in Proceedings of ACI (1996)

10.

X. Chen, Y. Zhao, Data sampling based ensemble acoustic modeling, in Proceedings of ICASSP (2009), pp. 3805–3808

11.

X. Chen, Y. Zhao, Integrating MLP feature and discriminative training in data sampling based ensemble acoustic modeling, in Proceeding of Interspeech (2010), pp. 1349–1352

12.

X. Chen, Y. Zhao, Building acoustic model ensembles by data sampling with enhanced trainings and features. IEEE Trans. ASLP 21(3), 498–507 (2013)

13.

G. Cook, A. Robinson, Boosting the performance of connectionist large vocabulary speech recognition, in Proceeding of ICSLP (1996), pp. 1305–1308

14.

X. Cui, J. Xue, B. Xiang, B. Zhou, A study of bootstrapping with multiple acoustic features for improved automatic speech recognition, in Proceeding of Interspeech (2009), pp. 240–243

15.

X. Cui, J. Huang, J.-T. Chien, Multi-view and multi-objective semi-supervised learning for HMM-based automatic speech recognition. IEEE Trans. ASLP 20(7), 1923–1935 (2012)

16.

X. Cui, J. Xue, X. Chen, P. Olsen, P.L. Dognin, V.C. Upendra, J.R. Hershey, B. Zhou, Hidden Markov acoustic modeling with bootstrap and restructuring for low-resourced languages. IEEE Trans. ASLP 20(8), 2252–2264 (2012)

17.

G.E. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. ASLP 20(1), 30–42 (2012)

18.

L. Deng, D. Sun, A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory feature. J. Acoust. Soc. Am 95(5), 2702–2719 (1994)CrossRef

19.

L. Deng, D. Yu, Deep convex network: a scalable architecture for speech pattern classification, in Proceeding of Interspeech (2011)

20.

L. Deng, D. Yu, J. Platt, Scalable stacking and learning for building deep architectures, in Proceeding of ICASSP (2012a)

21.

L. Deng, G. Tur, X. He, D. Hakkani-Tur, Use of Kernel deep convex networks and end-to-end learning for spoken language understanding, in IEEE workshop on spoken language technologies (2012b)

22.

L. Deng, J. Li, J.-T. Huang, K. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. He, J. Williams, Y. Gong, A. Acero, Recent advances in deep learning for speech research at Microsoft, in Proceeding of ICASSP (2013a)

23.

L. Deng, X. He, J. Gao, Deep stacking networks for information retrieval, in Proceeding of ICASSP (2013b)

24.

T.G. Dietterich, An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Mach. Learn. 1(22), 139–157 (1998)

25.

T.G. Dietterich, Ensemble methods in machine learning, in Proceeding of MCS (2000), pp. 1–15

26.

C. Dimitrakakis, S. Bengio, Boosting HMMs with an application to speech recognition, in Proceeding of ICASSP (2004), pp. V-621–624

27.

C. Dimitrakakis, S. Bengio, Phoneme and sentence-level ensembles for speech recognition. Eurasip J. ASMP (2011). doi:10.1155/2011/426792

28.

J. Du, Y. Hu, H. Jiang, Boosted mixture learning of Gaussian mixture HMMs for speech recognition, in Proceeding of Interspeech (2010), pp. 2942–2945

29.

S. Dupont, H. Bourlard, Using multiple time scales in a multi-stream speech recognition system, in Proceeding of Eurospeech (1997), pp. 3–6

30.

G. Evermann, P.C. Woodland, Posterior probability decoding, confidence estimation and system combination, in Proceeding of speech transcription workshop (2000)

31.

J.G. Fiscus, A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER), in Proceeding of IEEE ASRU Workshop (1997), pp. 347–352

32.

A. Fred and A. K. Jain, Combining multiple clusterings using evidence accumulation, IEEE Trans. PAMI, 27(6), 835–850 (2005)

33.

J. Friedman, P. Hall, On bagging and nonlinear estimation. J. Stat. Plan. Inference 137(3), 669–683 (2007)MathSciNetCrossRefMATH

34.

J. Friedman, T. Hastie, R. Tibshirani, Additive logistic regression: a statistical view of boosting. Ann. Stat. 28(2), 337–407 (2000)MathSciNetCrossRefMATH

35.

Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in Proceeding of ICML (1996), pp. 1–9

36.

Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefMATH

37.

M. Gales, D. Y. Kim, P. C. Woodland, H. Y. Chan, D. Mrva, R. Sinha, and S. E. Tranter, Progress in the CU-HTK broadcast news transcription system, IEEE Trans. ASLP, 14(5), 1513–1525, (2006)

38.

A.K. Halberstadt, J.R. Glass, Heterogeneous measurements and multiple classifiers for speech recognition, in Proceeding of ICSLP (1998), pp. 995–998

39.

T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, New York, 2001)CrossRef

40.

D. Hillard, B. Hoffmeister, M. Ostendorf, R. Schluter, H. Ney, iROVER: improving system combination with classification, in Proceeding of HLT (2007)

41.

T.K. Ho, The random subspace method for constructing decision forests. IEEE Trans. PAMI 20(8), 832–844 (1998)CrossRef

42.

HTK Toolkit, U.K. http://htk.eng.cam.ac

43.

R. Hu, X. Li, Y. Zhao, Acoustic model training using greedy EM, in Proceeding of ICASSP (2005), pp. I697–700

44.

B. Hutchinson, L. Deng, D. Yu, Tensor deep stacking networks. IEEE Trans. PAMI, 35(8) (2013), 1944–1957

45.

D. Jurafsky, J.H. Martin, Speech and Language Processing, 2^nd ed., (Pearson-Prentice Hall, Englewood Cliffs, 2008)

46.

B. Kingsbury, N. Morgan, Recognizing reverberant speech with Rasta-PLP, in Proceeding of ICASSP (1997), pp. 1259–1262

47.

K. Kirchhoff, G.A. Fink, G. Sagerer, Combining acoustic and articulatory feature information for robust speech recognition. Speech Commun. 37, 303–319 (2002)CrossRefMATH

48.

A. Krogh, J. Vedelsby, Neural network ensembles, cross validation, and active learning, in Advances in Neural Information Processing Systems, ed. by G. Tesauro, D.S. Touretzky, T.K. Leen (MIT Press, Cambridge, 1995), pp. 231–238

49.

L.I. Kuncheva, C.J. Whitaker, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)CrossRefMATH

50.

L.I. Kuncheva, Combining pattern classifiers – methods and algorithms (Wiley, Hoboken, NJ, 2004)CrossRefMATH

51.

A. Lazarevic, Z. Obradovic, Effective pruning of neural network classifier ensembles, in Proceeding of ICNN (2001), pp. 796–801

52.

K. Livescu, E. Fosler-Lussier, F. Metze, Subword modeling for automatic speech recognition. IEEE SPM 29(6), 44–57 (2012)CrossRef

53.

C. Ma, H.-K.J. Kuo, H. Soltan, X. Cui, U. Chaudhari, L. Mangu, C.-H. Lee, in Proceeding of ICASSP (2010), pp. 4394–4397

54.

D.D. Margineantu, T.G. Dietterich, Pruning adaptive boosting, in Proceeding of ICML (1997), pp. 211–218

55.

G. Martinez-Munoz, A. Suarez, Aggregation ordering in bagging, in Proceeding of ICAIA (2004), pp. 258–263

56.

P. McMahon, P. McCourt, S. Vaseghi, Discriminative weighting of multi-resolution sub-band cepstral features for speech recognition, in Proceeding of ICSLP (1998), pp. 1055–1058

57.

C. Meyer, H. Schramm, Boosting HMM acoustic models in large vocabulary speech recognition. Speech Commun. 48, 532–548 (2006)CrossRef

58.

T. Mikolov, A. Deoras, S. Kombrink, L. Burget, J. Cernocky, Empirical evaluation and combination of advanced language modeling techniques, in Proceeding of Interspeech (2011)

59.

D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, G. Zweig, FMPE: discriminatively trained features for speech recognition, in Proceeding of ICASSP (2005), pp. I-961–964

60.

Y. Qian, J. Liu, Cross-lingual and ensemble MLPs strategies for low-resource speech recognition, in Proceeding of Interspeech (2012)

61.

L. Rabiner, F. Juang, Fundamentals of Speech Recognition (Prentice Hall, Englewood Cliffs, 1993)

62.

T. Robinson, M. Hochberg, S. Renals, The use of recurrent neural networks in continuous speech recognition, in Automatic Speech and Speaker Recognition – Advanced Topics, ed. by C.H. Lee, K.K. Paliwal, F.K. Soong (Kluwer Academic Publishers, Boston, 1995). Chapter 19

63.

J.J. Rodriguz, L.I. Kuncheva, C.J. Alonso, Rotation forest: a new classifier ensemble method. IEEE Trans. PAMI 28(10), 1619–1630 (2006)CrossRef

64.

G. Saon, H. Soltau, Boosting systems for large vocabulary continuous speech recognition. Speech Commun. 54(2), 212–218 (2012)

65.

R.E. Schapire, The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)

66.

H. Schwenk, Using boosting to improve a hybrid HMM/neural network speech recognition, in Proceeding of ICASSP, pp. 1009–1012 (1999)

67.

T. Shinozaki, S. Furui, Spontaneous speech recognition using a massively parallel decoder, in Proceeding of ICSLP (2004), pp. 1705–1708

68.

O. Siohan, B. Ramabhadran, B. Kingsbury, Constructing ensembles of ASR systems using randomized decision trees, in Proceeding of ICASSP (2005), pp. I-197–I-200

69.

A. Strehl, J. Ghosh, Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNet

70.

E.K. Tang, P.N. Suganthan, X. Yao, An analysis of diversity measures. Mach. Learn. 65(1), 247–271 (2006)CrossRef

71.

H. Tang, M. Hasegawa-Johnson, T. Huang, Toward robust learning of the Gaussian mixture state emission densities for hidden Markov models, in Proceeding of ICASSP (2010), pp. 2274–2277

72.

K. Tumer, J. Ghosh, Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognit. 29(2), 341–348 (1996)CrossRef

73.

G. Tur, L. Deng, D. Hakkani-Tur, X. He, Towards deeper understanding deep convex networks for semantic utterance classification, in Proceeding of ICASSP (2012)

74.

N. Ueda, R. Nakano, Generalization error of ensemble estimators, in Proceeding of ICNN (1996), pp. 90–95

75.

M.D. Wachter, M. Matton, K. Demuynck, P. Wambacq, P. Cools, D. Van Compernolle, Template-based continuous speech recognition. IEEE Trans. ASLP 15(4), 1377–1390 (2007)

76.

D.H. Wolpert, Stacked generalization. Neural Netw. 5(2), 241–259 (1992)MathSciNetCrossRef

77.

S. Wu, B. Kingsbury, N. Mongan, S. Greenberg, Incorporating information from syllable-length time scales into automatic speech recognition, in Proceeding of ICASSP (1998), pp. 721–724

78.

P. Xu, F. Jelinek, Random forest and the data sparseness problem in language modeling. Comput. Speech Lang. 21, 105–152 (2007)CrossRef

79.

J. Xue, Y. Zhao, Random forests of phonetic decision trees for acoustic modeling in conversational speech recognition. IEEE Trans. ASLP 16(3), 519–528 (2008)

80.

R. Zhang, A. Rudnicky, Applying N-best list re-ranking to acoustic model combinations of boosting training, in Proceeding of Interspeech (2004a)

81.

R. Zhang, A. Rudnicky, A frame level boosting training scheme for acoustic modeling, in Proceeding of Interspeech (2004b)

82.

Y. Zhao, X. Zhang, R.-S. Hu, J. Xue, X. Li, L. Che, R. Hu, L. Schopp, An automatic captioning system for telemedicine, in Proceeding of ICASSP (2006), pp. I-957–I-960

83.

Z.-H. Zhou, N. Li, Multi-information ensemble diversity, in Proceeding of MCS (2010), pp. 134–144

84.

Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms (CRC Press, Boca Raton, 2012)

85.

Q. Zhu, A. Stolcke, B.Y. Chen, N. Morgan, Using MLP features in SRI’s conversational speech recognition system, in Proceedings of Interspeech (2005), pp. 921–924

86.

G. Zweig and M. Padmanabhan, Boosting Gaussian mixtures in an LVSCR system, Proc. ICASSP, pp. I-1527–I-1530 (2000)

Titel: Ensemble Learning Approaches in Speech Recognition
verfasst von: Yunxin Zhao
Jian Xue
Xin Chen
Verlag: Springer New York
Buch: Speech and Audio Processing for Coding, Enhancement and Recognition
Print ISBN: 978-1-4939-1455-5

Electronic ISBN: 978-1-4939-1456-2

Copyright-Jahr: 2015
DOI: https://doi.org/10.1007/978-1-4939-1456-2_5

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Gardiner von Trapp/© Alpega Group, Benny Hahn/© ZEP GmbH, Customer Experience/© © oatawa / Getty Images / iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.