Top

Published in:

2021 | OriginalPaper | Chapter

Progressive AutoSpeech: An Efficient and General Framework for Automatic Speech Classification

Authors : Guanghui Zhu, Feng Cheng, Mengchuan Qiu, Zhuoer Xu, Wenjie Wang, Chunfeng Yuan, Yihua Huang

Published in: Advances in Knowledge Discovery and Data Mining

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Speech classification has been widely used in many speech-related applications. However, the complexity of speech classification tasks often exceeds the scope of non-experts, the off-the-shelf speech classification methods are urgently needed. Recently, the automatic speech classification (AutoSpeech) without any human intervention has attracted more and more attention. The practical AutoSpeech solution should be general and can automatically handle classification tasks from different domains. Moreover, AutoSpeech should improve not only the final performance but also the any-time performance especially when the time budget is limited. To address these issues, we propose a three-stage any-time learning algorithm framework called Progressive AutoSpeech for automatic speech classification under a given time budget. Progressive AutoSpeech consists of the fast stage, enhancement stage, and exploration stage. Each stage uses different models and features to ensure generalization. Additionally, we automatically construct ensembles of top-k prediction results to improve the robustness. The experimental results reveal that Progressive AutoSpeech is effective and efficient for a wide range of speech classification tasks and can achieve the best ALC score.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Efficient Nodes Representation Learning with Residual Feature Propagation

next chapter CrowdTeacher: Robust Co-teaching with Noisy Answers and Sample-Specific Perturbations for Tabular Data

Progressive AutoSpeech won the first place in the NeurIPS 2019 AutoSpeech challenge and the second place in the Interspeech 2020 AutoSpeech challenge.

Adavanne, S., Drossos, K., Çakir, E., Virtanen, T.: Stacked convolutional and recurrent neural networks for bird audio detection. In: Proceedings of the European Signal Processing Conference (EUSIPCO), pp. 1729–1733 (2017)

Adavanne, S., Virtanen, T.: Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network. arXiv preprint arXiv:1710.02998 (2017)

Brazdil, P., Giraud-Carrier, C.: Metalearning and algorithm selection: progress, state of the art and introduction to the 2018 special issue. Mach. Learn. 107(1), 1–14 (2018)MathSciNetCrossRef

Carmi, N., Cohen, A., Avigal, M., Lerner, A.: A storyteller’s tale: literature audiobooks genre classification using CNN and RNN architectures. In: Proceedings of Interspeech 2019, pp. 3387–3390 (2019)

Dai, W., Dai, C., Qu, S., Li, J., Das, S.: Very deep convolutional neural networks for raw waveforms. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 421–425 (2017)

Ellis, D.P.W.: Classifying music audio with timbral and chroma features. In: Proceedings of the International Conference on Music Information Retrieval, pp. 339–340 (2007)

Ganchev, T., Fakotakis, N., Kokkinakis, G.: Comparative evaluation of various mfcc implementations on the speaker verification task. In: Proceedings of the International Conference on Speech and Computer, pp. 191–194 (2005)

Hutter, F., Kotthoff, L., Vanschoren, J. (eds.): Automated Machine Learning. TSSCML. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5CrossRef

Irvin, J., Chartock, E., Hollander, N.: Recurrent neural networks with attention for genre classification (2016)

10.

Kim, T., Lee, J., Nam, J.: Comparison and analysis of sample cnn architectures for audio classification. IEEE J. Sel. Topics Signal Process. 13(2), 285–297 (2019)CrossRef

11.

Lin, Y.L., Wei, G.: Speech emotion recognition based on HMM and SVM. In: Proceedings of the International Conference on Machine Learning and Cybernetics, pp. 4898–4901 (2005)

12.

Liu, C., Wang, Y., Kumar, K., Gong, Y.: Investigations on speaker adaptation of LSTM RNN models for speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5020–5024 (2016)

13.

Liu, Z., et al.: Autocv challenge design and baseline results. In: CAp 2019 - Conférence sur l’Apprentissage Automatique. Toulouse, France (2019)

14.

Majumder, N., Poria, S., Hazarika, D., Mihalcea, R., Gelbukh, A., Cambria, E.: Dialoguernn: an attentive RNN for emotion detection in conversations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6818–6825 (2019)

15.

Malik, M., Adavanne, S., Drossos, K., Virtanen, T., Ticha, D., Jarina, R.: Stacked convolutional and recurrent neural networks for music emotion recognition. arXiv preprint arXiv:1706.02292 (2017)

16.

Nagrani, A., Chung, J.S., Zisserman, A.: Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612 (2017)

17.

Nakagawa, S., Wang, L., Ohtsuka, S.: Speaker identification and verification by combining MFCC and phase information. IEEE Trans. Audio Speech Lang. Process. 20(4), 1085–1095 (2011)CrossRef

18.

Padi, B., Mohan, A., Ganapathy, S.: Attention based hybrid i-vector BLSTM model for language recognition. In: Proceedings of Interspeech 2019, pp. 1263–1267 (2019)

19.

Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210 (2015)

20.

Parchami, M., Zhu, W.P., Champagne, B., Plourde, E.: Recent developments in speech enhancement in the short-time fourier transform domain. IEEE Circ. Syst. Mag. 16(3), 45–77 (2016)CrossRef

21.

Park, K., Mulc, T.: Css10: a collection of single speaker speech datasets for 10 languages. arXiv preprint arXiv:1903.11269 (2019)

22.

Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Dig. Signal Process. 10(1–3), 19–41 (2000)CrossRef

23.

Shen, J., et al.: Natural TTS synthesis by conditioning wavenet on mel spectrogram predictions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4779–4783. IEEE (2018)

24.

Wang, J., et al.: Autospeech 2020: the second automated machine learning challenge for speech classification. In: Interspeech 2020, pp. 1967–1971 (2020)

25.

Xie, W., Nagrani, A., Chung, J.S., Zisserman, A.: Utterance-level aggregation for speaker recognition in the wild. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5791–5795 (2019)

Title: Progressive AutoSpeech: An Efficient and General Framework for Automatic Speech Classification
Authors: Guanghui Zhu
Feng Cheng
Mengchuan Qiu
Zhuoer Xu
Wenjie Wang
Chunfeng Yuan
Yihua Huang
Publisher: Springer International Publishing
Book: Advances in Knowledge Discovery and Data Mining
Print ISBN: 978-3-030-75764-9

Electronic ISBN: 978-3-030-75765-6

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-75765-6_14

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner