Skip to main content

2015 | OriginalPaper | Buchkapitel

5. Ensemble Learning Approaches in Speech Recognition

verfasst von : Yunxin Zhao, Jian Xue, Xin Chen

Erschienen in: Speech and Audio Processing for Coding, Enhancement and Recognition

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

An overview is made on the ensemble learning efforts that have emerged in automatic speech recognition in recent years. The approaches that are based on different machine learning techniques and target various levels and components of speech recognition are described, and their effectiveness is discussed in terms of the direct performance measure of word error rate and the indirect measures of classification margin, diversity, as well as bias and variance. In addition, methods on reducing storage and computation costs of ensemble models for practical deployments of speech recognition systems are discussed. Ensemble learning for speech recognition has been largely fruitful, and it is expected to continue progress along with the advances in machine learning, speech and language modeling, as well as computing technology.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat K. Audhkhasi, A.M. Zavou, P.G. Georgiou, S.S. Narayanan, Theoretical analysis of diversity in an ensemble of automatic speech recognition systems. IEEE Trans. ASLP 22(3), 711–726 (2014) K. Audhkhasi, A.M. Zavou, P.G. Georgiou, S.S. Narayanan, Theoretical analysis of diversity in an ensemble of automatic speech recognition systems. IEEE Trans. ASLP 22(3), 711–726 (2014)
2.
Zurück zum Zitat P. Biuhlmann, Bagging, subagging and bragging for improving some prediction algorithms, in Recent Advances and Trends in Nonparametric Statistics, ed. by E.G. Akritas, D.N. Politis (Elsevier, Amsterdam, 2003) P. Biuhlmann, Bagging, subagging and bragging for improving some prediction algorithms, in Recent Advances and Trends in Nonparametric Statistics, ed. by E.G. Akritas, D.N. Politis (Elsevier, Amsterdam, 2003)
3.
Zurück zum Zitat J.K. Bradley, R.E. Schapire, FileterBoost: regression and classification on large datasets, in Advances in Neural Information Processing Systems, ed. by J.C. Platt et al., vol. 20 (MIT Press, Cambridge, 2008) J.K. Bradley, R.E. Schapire, FileterBoost: regression and classification on large datasets, in Advances in Neural Information Processing Systems, ed. by J.C. Platt et al., vol. 20 (MIT Press, Cambridge, 2008)
6.
Zurück zum Zitat C. Bresline, M.J.F. Gales, Generating complimentary systems for large vocabulary continuous speech recognition, in Proceeding of Interspeech (2006) C. Bresline, M.J.F. Gales, Generating complimentary systems for large vocabulary continuous speech recognition, in Proceeding of Interspeech (2006)
7.
Zurück zum Zitat C. Bresline, M.J.F. Gales, Building multiple complementary systems using directed decision tree, in Proceeding of Interspeech (2007), pp. 1441–1444 C. Bresline, M.J.F. Gales, Building multiple complementary systems using directed decision tree, in Proceeding of Interspeech (2007), pp. 1441–1444
8.
Zurück zum Zitat G. Brown, An information theoretic perspective on multiple classifier system, in Proceedings of MCS (2009), pp. 344–353 G. Brown, An information theoretic perspective on multiple classifier system, in Proceedings of MCS (2009), pp. 344–353
9.
Zurück zum Zitat S.F. Chen, J.T. Goodman, An empirical study of smoothing techniques for language modeling, in Proceedings of ACI (1996) S.F. Chen, J.T. Goodman, An empirical study of smoothing techniques for language modeling, in Proceedings of ACI (1996)
10.
Zurück zum Zitat X. Chen, Y. Zhao, Data sampling based ensemble acoustic modeling, in Proceedings of ICASSP (2009), pp. 3805–3808 X. Chen, Y. Zhao, Data sampling based ensemble acoustic modeling, in Proceedings of ICASSP (2009), pp. 3805–3808
11.
Zurück zum Zitat X. Chen, Y. Zhao, Integrating MLP feature and discriminative training in data sampling based ensemble acoustic modeling, in Proceeding of Interspeech (2010), pp. 1349–1352 X. Chen, Y. Zhao, Integrating MLP feature and discriminative training in data sampling based ensemble acoustic modeling, in Proceeding of Interspeech (2010), pp. 1349–1352
12.
Zurück zum Zitat X. Chen, Y. Zhao, Building acoustic model ensembles by data sampling with enhanced trainings and features. IEEE Trans. ASLP 21(3), 498–507 (2013) X. Chen, Y. Zhao, Building acoustic model ensembles by data sampling with enhanced trainings and features. IEEE Trans. ASLP 21(3), 498–507 (2013)
13.
Zurück zum Zitat G. Cook, A. Robinson, Boosting the performance of connectionist large vocabulary speech recognition, in Proceeding of ICSLP (1996), pp. 1305–1308 G. Cook, A. Robinson, Boosting the performance of connectionist large vocabulary speech recognition, in Proceeding of ICSLP (1996), pp. 1305–1308
14.
Zurück zum Zitat X. Cui, J. Xue, B. Xiang, B. Zhou, A study of bootstrapping with multiple acoustic features for improved automatic speech recognition, in Proceeding of Interspeech (2009), pp. 240–243 X. Cui, J. Xue, B. Xiang, B. Zhou, A study of bootstrapping with multiple acoustic features for improved automatic speech recognition, in Proceeding of Interspeech (2009), pp. 240–243
15.
Zurück zum Zitat X. Cui, J. Huang, J.-T. Chien, Multi-view and multi-objective semi-supervised learning for HMM-based automatic speech recognition. IEEE Trans. ASLP 20(7), 1923–1935 (2012) X. Cui, J. Huang, J.-T. Chien, Multi-view and multi-objective semi-supervised learning for HMM-based automatic speech recognition. IEEE Trans. ASLP 20(7), 1923–1935 (2012)
16.
Zurück zum Zitat X. Cui, J. Xue, X. Chen, P. Olsen, P.L. Dognin, V.C. Upendra, J.R. Hershey, B. Zhou, Hidden Markov acoustic modeling with bootstrap and restructuring for low-resourced languages. IEEE Trans. ASLP 20(8), 2252–2264 (2012) X. Cui, J. Xue, X. Chen, P. Olsen, P.L. Dognin, V.C. Upendra, J.R. Hershey, B. Zhou, Hidden Markov acoustic modeling with bootstrap and restructuring for low-resourced languages. IEEE Trans. ASLP 20(8), 2252–2264 (2012)
17.
Zurück zum Zitat G.E. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. ASLP 20(1), 30–42 (2012) G.E. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. ASLP 20(1), 30–42 (2012)
18.
Zurück zum Zitat L. Deng, D. Sun, A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory feature. J. Acoust. Soc. Am 95(5), 2702–2719 (1994)CrossRef L. Deng, D. Sun, A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory feature. J. Acoust. Soc. Am 95(5), 2702–2719 (1994)CrossRef
19.
Zurück zum Zitat L. Deng, D. Yu, Deep convex network: a scalable architecture for speech pattern classification, in Proceeding of Interspeech (2011) L. Deng, D. Yu, Deep convex network: a scalable architecture for speech pattern classification, in Proceeding of Interspeech (2011)
20.
Zurück zum Zitat L. Deng, D. Yu, J. Platt, Scalable stacking and learning for building deep architectures, in Proceeding of ICASSP (2012a) L. Deng, D. Yu, J. Platt, Scalable stacking and learning for building deep architectures, in Proceeding of ICASSP (2012a)
21.
Zurück zum Zitat L. Deng, G. Tur, X. He, D. Hakkani-Tur, Use of Kernel deep convex networks and end-to-end learning for spoken language understanding, in IEEE workshop on spoken language technologies (2012b) L. Deng, G. Tur, X. He, D. Hakkani-Tur, Use of Kernel deep convex networks and end-to-end learning for spoken language understanding, in IEEE workshop on spoken language technologies (2012b)
22.
Zurück zum Zitat L. Deng, J. Li, J.-T. Huang, K. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. He, J. Williams, Y. Gong, A. Acero, Recent advances in deep learning for speech research at Microsoft, in Proceeding of ICASSP (2013a) L. Deng, J. Li, J.-T. Huang, K. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. He, J. Williams, Y. Gong, A. Acero, Recent advances in deep learning for speech research at Microsoft, in Proceeding of ICASSP (2013a)
23.
Zurück zum Zitat L. Deng, X. He, J. Gao, Deep stacking networks for information retrieval, in Proceeding of ICASSP (2013b) L. Deng, X. He, J. Gao, Deep stacking networks for information retrieval, in Proceeding of ICASSP (2013b)
24.
Zurück zum Zitat T.G. Dietterich, An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Mach. Learn. 1(22), 139–157 (1998) T.G. Dietterich, An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Mach. Learn. 1(22), 139–157 (1998)
25.
Zurück zum Zitat T.G. Dietterich, Ensemble methods in machine learning, in Proceeding of MCS (2000), pp. 1–15 T.G. Dietterich, Ensemble methods in machine learning, in Proceeding of MCS (2000), pp. 1–15
26.
Zurück zum Zitat C. Dimitrakakis, S. Bengio, Boosting HMMs with an application to speech recognition, in Proceeding of ICASSP (2004), pp. V-621–624 C. Dimitrakakis, S. Bengio, Boosting HMMs with an application to speech recognition, in Proceeding of ICASSP (2004), pp. V-621–624
28.
Zurück zum Zitat J. Du, Y. Hu, H. Jiang, Boosted mixture learning of Gaussian mixture HMMs for speech recognition, in Proceeding of Interspeech (2010), pp. 2942–2945 J. Du, Y. Hu, H. Jiang, Boosted mixture learning of Gaussian mixture HMMs for speech recognition, in Proceeding of Interspeech (2010), pp. 2942–2945
29.
Zurück zum Zitat S. Dupont, H. Bourlard, Using multiple time scales in a multi-stream speech recognition system, in Proceeding of Eurospeech (1997), pp. 3–6 S. Dupont, H. Bourlard, Using multiple time scales in a multi-stream speech recognition system, in Proceeding of Eurospeech (1997), pp. 3–6
30.
Zurück zum Zitat G. Evermann, P.C. Woodland, Posterior probability decoding, confidence estimation and system combination, in Proceeding of speech transcription workshop (2000) G. Evermann, P.C. Woodland, Posterior probability decoding, confidence estimation and system combination, in Proceeding of speech transcription workshop (2000)
31.
Zurück zum Zitat J.G. Fiscus, A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER), in Proceeding of IEEE ASRU Workshop (1997), pp. 347–352 J.G. Fiscus, A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER), in Proceeding of IEEE ASRU Workshop (1997), pp. 347–352
32.
Zurück zum Zitat A. Fred and A. K. Jain, Combining multiple clusterings using evidence accumulation, IEEE Trans. PAMI, 27(6), 835–850 (2005) A. Fred and A. K. Jain, Combining multiple clusterings using evidence accumulation, IEEE Trans. PAMI, 27(6), 835–850 (2005)
34.
Zurück zum Zitat J. Friedman, T. Hastie, R. Tibshirani, Additive logistic regression: a statistical view of boosting. Ann. Stat. 28(2), 337–407 (2000)MathSciNetCrossRefMATH J. Friedman, T. Hastie, R. Tibshirani, Additive logistic regression: a statistical view of boosting. Ann. Stat. 28(2), 337–407 (2000)MathSciNetCrossRefMATH
35.
Zurück zum Zitat Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in Proceeding of ICML (1996), pp. 1–9 Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in Proceeding of ICML (1996), pp. 1–9
36.
Zurück zum Zitat Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefMATH Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefMATH
37.
Zurück zum Zitat M. Gales, D. Y. Kim, P. C. Woodland, H. Y. Chan, D. Mrva, R. Sinha, and S. E. Tranter, Progress in the CU-HTK broadcast news transcription system, IEEE Trans. ASLP, 14(5), 1513–1525, (2006) M. Gales, D. Y. Kim, P. C. Woodland, H. Y. Chan, D. Mrva, R. Sinha, and S. E. Tranter, Progress in the CU-HTK broadcast news transcription system, IEEE Trans. ASLP, 14(5), 1513–1525, (2006)
38.
Zurück zum Zitat A.K. Halberstadt, J.R. Glass, Heterogeneous measurements and multiple classifiers for speech recognition, in Proceeding of ICSLP (1998), pp. 995–998 A.K. Halberstadt, J.R. Glass, Heterogeneous measurements and multiple classifiers for speech recognition, in Proceeding of ICSLP (1998), pp. 995–998
39.
Zurück zum Zitat T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, New York, 2001)CrossRef T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, New York, 2001)CrossRef
40.
Zurück zum Zitat D. Hillard, B. Hoffmeister, M. Ostendorf, R. Schluter, H. Ney, iROVER: improving system combination with classification, in Proceeding of HLT (2007) D. Hillard, B. Hoffmeister, M. Ostendorf, R. Schluter, H. Ney, iROVER: improving system combination with classification, in Proceeding of HLT (2007)
41.
Zurück zum Zitat T.K. Ho, The random subspace method for constructing decision forests. IEEE Trans. PAMI 20(8), 832–844 (1998)CrossRef T.K. Ho, The random subspace method for constructing decision forests. IEEE Trans. PAMI 20(8), 832–844 (1998)CrossRef
43.
Zurück zum Zitat R. Hu, X. Li, Y. Zhao, Acoustic model training using greedy EM, in Proceeding of ICASSP (2005), pp. I697–700 R. Hu, X. Li, Y. Zhao, Acoustic model training using greedy EM, in Proceeding of ICASSP (2005), pp. I697–700
44.
Zurück zum Zitat B. Hutchinson, L. Deng, D. Yu, Tensor deep stacking networks. IEEE Trans. PAMI, 35(8) (2013), 1944–1957 B. Hutchinson, L. Deng, D. Yu, Tensor deep stacking networks. IEEE Trans. PAMI, 35(8) (2013), 1944–1957
45.
Zurück zum Zitat D. Jurafsky, J.H. Martin, Speech and Language Processing, 2nd ed., (Pearson-Prentice Hall, Englewood Cliffs, 2008) D. Jurafsky, J.H. Martin, Speech and Language Processing, 2nd ed., (Pearson-Prentice Hall, Englewood Cliffs, 2008)
46.
Zurück zum Zitat B. Kingsbury, N. Morgan, Recognizing reverberant speech with Rasta-PLP, in Proceeding of ICASSP (1997), pp. 1259–1262 B. Kingsbury, N. Morgan, Recognizing reverberant speech with Rasta-PLP, in Proceeding of ICASSP (1997), pp. 1259–1262
47.
Zurück zum Zitat K. Kirchhoff, G.A. Fink, G. Sagerer, Combining acoustic and articulatory feature information for robust speech recognition. Speech Commun. 37, 303–319 (2002)CrossRefMATH K. Kirchhoff, G.A. Fink, G. Sagerer, Combining acoustic and articulatory feature information for robust speech recognition. Speech Commun. 37, 303–319 (2002)CrossRefMATH
48.
Zurück zum Zitat A. Krogh, J. Vedelsby, Neural network ensembles, cross validation, and active learning, in Advances in Neural Information Processing Systems, ed. by G. Tesauro, D.S. Touretzky, T.K. Leen (MIT Press, Cambridge, 1995), pp. 231–238 A. Krogh, J. Vedelsby, Neural network ensembles, cross validation, and active learning, in Advances in Neural Information Processing Systems, ed. by G. Tesauro, D.S. Touretzky, T.K. Leen (MIT Press, Cambridge, 1995), pp. 231–238
49.
Zurück zum Zitat L.I. Kuncheva, C.J. Whitaker, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)CrossRefMATH L.I. Kuncheva, C.J. Whitaker, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)CrossRefMATH
50.
Zurück zum Zitat L.I. Kuncheva, Combining pattern classifiers – methods and algorithms (Wiley, Hoboken, NJ, 2004)CrossRefMATH L.I. Kuncheva, Combining pattern classifiers – methods and algorithms (Wiley, Hoboken, NJ, 2004)CrossRefMATH
51.
Zurück zum Zitat A. Lazarevic, Z. Obradovic, Effective pruning of neural network classifier ensembles, in Proceeding of ICNN (2001), pp. 796–801 A. Lazarevic, Z. Obradovic, Effective pruning of neural network classifier ensembles, in Proceeding of ICNN (2001), pp. 796–801
52.
Zurück zum Zitat K. Livescu, E. Fosler-Lussier, F. Metze, Subword modeling for automatic speech recognition. IEEE SPM 29(6), 44–57 (2012)CrossRef K. Livescu, E. Fosler-Lussier, F. Metze, Subword modeling for automatic speech recognition. IEEE SPM 29(6), 44–57 (2012)CrossRef
53.
Zurück zum Zitat C. Ma, H.-K.J. Kuo, H. Soltan, X. Cui, U. Chaudhari, L. Mangu, C.-H. Lee, in Proceeding of ICASSP (2010), pp. 4394–4397 C. Ma, H.-K.J. Kuo, H. Soltan, X. Cui, U. Chaudhari, L. Mangu, C.-H. Lee, in Proceeding of ICASSP (2010), pp. 4394–4397
54.
Zurück zum Zitat D.D. Margineantu, T.G. Dietterich, Pruning adaptive boosting, in Proceeding of ICML (1997), pp. 211–218 D.D. Margineantu, T.G. Dietterich, Pruning adaptive boosting, in Proceeding of ICML (1997), pp. 211–218
55.
Zurück zum Zitat G. Martinez-Munoz, A. Suarez, Aggregation ordering in bagging, in Proceeding of ICAIA (2004), pp. 258–263 G. Martinez-Munoz, A. Suarez, Aggregation ordering in bagging, in Proceeding of ICAIA (2004), pp. 258–263
56.
Zurück zum Zitat P. McMahon, P. McCourt, S. Vaseghi, Discriminative weighting of multi-resolution sub-band cepstral features for speech recognition, in Proceeding of ICSLP (1998), pp. 1055–1058 P. McMahon, P. McCourt, S. Vaseghi, Discriminative weighting of multi-resolution sub-band cepstral features for speech recognition, in Proceeding of ICSLP (1998), pp. 1055–1058
57.
Zurück zum Zitat C. Meyer, H. Schramm, Boosting HMM acoustic models in large vocabulary speech recognition. Speech Commun. 48, 532–548 (2006)CrossRef C. Meyer, H. Schramm, Boosting HMM acoustic models in large vocabulary speech recognition. Speech Commun. 48, 532–548 (2006)CrossRef
58.
Zurück zum Zitat T. Mikolov, A. Deoras, S. Kombrink, L. Burget, J. Cernocky, Empirical evaluation and combination of advanced language modeling techniques, in Proceeding of Interspeech (2011) T. Mikolov, A. Deoras, S. Kombrink, L. Burget, J. Cernocky, Empirical evaluation and combination of advanced language modeling techniques, in Proceeding of Interspeech (2011)
59.
Zurück zum Zitat D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, G. Zweig, FMPE: discriminatively trained features for speech recognition, in Proceeding of ICASSP (2005), pp. I-961–964 D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, G. Zweig, FMPE: discriminatively trained features for speech recognition, in Proceeding of ICASSP (2005), pp. I-961–964
60.
Zurück zum Zitat Y. Qian, J. Liu, Cross-lingual and ensemble MLPs strategies for low-resource speech recognition, in Proceeding of Interspeech (2012) Y. Qian, J. Liu, Cross-lingual and ensemble MLPs strategies for low-resource speech recognition, in Proceeding of Interspeech (2012)
61.
Zurück zum Zitat L. Rabiner, F. Juang, Fundamentals of Speech Recognition (Prentice Hall, Englewood Cliffs, 1993) L. Rabiner, F. Juang, Fundamentals of Speech Recognition (Prentice Hall, Englewood Cliffs, 1993)
62.
Zurück zum Zitat T. Robinson, M. Hochberg, S. Renals, The use of recurrent neural networks in continuous speech recognition, in Automatic Speech and Speaker Recognition – Advanced Topics, ed. by C.H. Lee, K.K. Paliwal, F.K. Soong (Kluwer Academic Publishers, Boston, 1995). Chapter 19 T. Robinson, M. Hochberg, S. Renals, The use of recurrent neural networks in continuous speech recognition, in Automatic Speech and Speaker Recognition – Advanced Topics, ed. by C.H. Lee, K.K. Paliwal, F.K. Soong (Kluwer Academic Publishers, Boston, 1995). Chapter 19
63.
Zurück zum Zitat J.J. Rodriguz, L.I. Kuncheva, C.J. Alonso, Rotation forest: a new classifier ensemble method. IEEE Trans. PAMI 28(10), 1619–1630 (2006)CrossRef J.J. Rodriguz, L.I. Kuncheva, C.J. Alonso, Rotation forest: a new classifier ensemble method. IEEE Trans. PAMI 28(10), 1619–1630 (2006)CrossRef
64.
Zurück zum Zitat G. Saon, H. Soltau, Boosting systems for large vocabulary continuous speech recognition. Speech Commun. 54(2), 212–218 (2012) G. Saon, H. Soltau, Boosting systems for large vocabulary continuous speech recognition. Speech Commun. 54(2), 212–218 (2012)
65.
Zurück zum Zitat R.E. Schapire, The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990) R.E. Schapire, The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)
66.
Zurück zum Zitat H. Schwenk, Using boosting to improve a hybrid HMM/neural network speech recognition, in Proceeding of ICASSP, pp. 1009–1012 (1999) H. Schwenk, Using boosting to improve a hybrid HMM/neural network speech recognition, in Proceeding of ICASSP, pp. 1009–1012 (1999)
67.
Zurück zum Zitat T. Shinozaki, S. Furui, Spontaneous speech recognition using a massively parallel decoder, in Proceeding of ICSLP (2004), pp. 1705–1708 T. Shinozaki, S. Furui, Spontaneous speech recognition using a massively parallel decoder, in Proceeding of ICSLP (2004), pp. 1705–1708
68.
Zurück zum Zitat O. Siohan, B. Ramabhadran, B. Kingsbury, Constructing ensembles of ASR systems using randomized decision trees, in Proceeding of ICASSP (2005), pp. I-197–I-200 O. Siohan, B. Ramabhadran, B. Kingsbury, Constructing ensembles of ASR systems using randomized decision trees, in Proceeding of ICASSP (2005), pp. I-197–I-200
69.
Zurück zum Zitat A. Strehl, J. Ghosh, Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNet A. Strehl, J. Ghosh, Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNet
70.
Zurück zum Zitat E.K. Tang, P.N. Suganthan, X. Yao, An analysis of diversity measures. Mach. Learn. 65(1), 247–271 (2006)CrossRef E.K. Tang, P.N. Suganthan, X. Yao, An analysis of diversity measures. Mach. Learn. 65(1), 247–271 (2006)CrossRef
71.
Zurück zum Zitat H. Tang, M. Hasegawa-Johnson, T. Huang, Toward robust learning of the Gaussian mixture state emission densities for hidden Markov models, in Proceeding of ICASSP (2010), pp. 2274–2277 H. Tang, M. Hasegawa-Johnson, T. Huang, Toward robust learning of the Gaussian mixture state emission densities for hidden Markov models, in Proceeding of ICASSP (2010), pp. 2274–2277
72.
Zurück zum Zitat K. Tumer, J. Ghosh, Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognit. 29(2), 341–348 (1996)CrossRef K. Tumer, J. Ghosh, Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognit. 29(2), 341–348 (1996)CrossRef
73.
Zurück zum Zitat G. Tur, L. Deng, D. Hakkani-Tur, X. He, Towards deeper understanding deep convex networks for semantic utterance classification, in Proceeding of ICASSP (2012) G. Tur, L. Deng, D. Hakkani-Tur, X. He, Towards deeper understanding deep convex networks for semantic utterance classification, in Proceeding of ICASSP (2012)
74.
Zurück zum Zitat N. Ueda, R. Nakano, Generalization error of ensemble estimators, in Proceeding of ICNN (1996), pp. 90–95 N. Ueda, R. Nakano, Generalization error of ensemble estimators, in Proceeding of ICNN (1996), pp. 90–95
75.
Zurück zum Zitat M.D. Wachter, M. Matton, K. Demuynck, P. Wambacq, P. Cools, D. Van Compernolle, Template-based continuous speech recognition. IEEE Trans. ASLP 15(4), 1377–1390 (2007) M.D. Wachter, M. Matton, K. Demuynck, P. Wambacq, P. Cools, D. Van Compernolle, Template-based continuous speech recognition. IEEE Trans. ASLP 15(4), 1377–1390 (2007)
77.
Zurück zum Zitat S. Wu, B. Kingsbury, N. Mongan, S. Greenberg, Incorporating information from syllable-length time scales into automatic speech recognition, in Proceeding of ICASSP (1998), pp. 721–724 S. Wu, B. Kingsbury, N. Mongan, S. Greenberg, Incorporating information from syllable-length time scales into automatic speech recognition, in Proceeding of ICASSP (1998), pp. 721–724
78.
Zurück zum Zitat P. Xu, F. Jelinek, Random forest and the data sparseness problem in language modeling. Comput. Speech Lang. 21, 105–152 (2007)CrossRef P. Xu, F. Jelinek, Random forest and the data sparseness problem in language modeling. Comput. Speech Lang. 21, 105–152 (2007)CrossRef
79.
Zurück zum Zitat J. Xue, Y. Zhao, Random forests of phonetic decision trees for acoustic modeling in conversational speech recognition. IEEE Trans. ASLP 16(3), 519–528 (2008) J. Xue, Y. Zhao, Random forests of phonetic decision trees for acoustic modeling in conversational speech recognition. IEEE Trans. ASLP 16(3), 519–528 (2008)
80.
Zurück zum Zitat R. Zhang, A. Rudnicky, Applying N-best list re-ranking to acoustic model combinations of boosting training, in Proceeding of Interspeech (2004a) R. Zhang, A. Rudnicky, Applying N-best list re-ranking to acoustic model combinations of boosting training, in Proceeding of Interspeech (2004a)
81.
Zurück zum Zitat R. Zhang, A. Rudnicky, A frame level boosting training scheme for acoustic modeling, in Proceeding of Interspeech (2004b) R. Zhang, A. Rudnicky, A frame level boosting training scheme for acoustic modeling, in Proceeding of Interspeech (2004b)
82.
Zurück zum Zitat Y. Zhao, X. Zhang, R.-S. Hu, J. Xue, X. Li, L. Che, R. Hu, L. Schopp, An automatic captioning system for telemedicine, in Proceeding of ICASSP (2006), pp. I-957–I-960 Y. Zhao, X. Zhang, R.-S. Hu, J. Xue, X. Li, L. Che, R. Hu, L. Schopp, An automatic captioning system for telemedicine, in Proceeding of ICASSP (2006), pp. I-957–I-960
83.
Zurück zum Zitat Z.-H. Zhou, N. Li, Multi-information ensemble diversity, in Proceeding of MCS (2010), pp. 134–144 Z.-H. Zhou, N. Li, Multi-information ensemble diversity, in Proceeding of MCS (2010), pp. 134–144
84.
Zurück zum Zitat Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms (CRC Press, Boca Raton, 2012) Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms (CRC Press, Boca Raton, 2012)
85.
Zurück zum Zitat Q. Zhu, A. Stolcke, B.Y. Chen, N. Morgan, Using MLP features in SRI’s conversational speech recognition system, in Proceedings of Interspeech (2005), pp. 921–924 Q. Zhu, A. Stolcke, B.Y. Chen, N. Morgan, Using MLP features in SRI’s conversational speech recognition system, in Proceedings of Interspeech (2005), pp. 921–924
86.
Zurück zum Zitat G. Zweig and M. Padmanabhan, Boosting Gaussian mixtures in an LVSCR system, Proc. ICASSP, pp. I-1527–I-1530 (2000) G. Zweig and M. Padmanabhan, Boosting Gaussian mixtures in an LVSCR system, Proc. ICASSP, pp. I-1527–I-1530 (2000)
Metadaten
Titel
Ensemble Learning Approaches in Speech Recognition
verfasst von
Yunxin Zhao
Jian Xue
Xin Chen
Copyright-Jahr
2015
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4939-1456-2_5

Neuer Inhalt