Top

Published in:

2016 | OriginalPaper | Chapter

Bottleneck Based Front-End for Diarization Systems

Authors : Ignacio Viñals, Jesús Villalba, Alfonso Ortega, Antonio Miguel, Eduardo Lleida

Published in: Advances in Speech and Language Technologies for Iberian Languages

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The goal of this paper is to study the inclusion of deep learning into the diarization task. We propose some novel approaches at the feature extraction stage, substituting the classical usage of short-term features, such as MFCCs and PLPs, by Deep Learning based ones. These new features come from the hidden states at bottleneck layers in neural networks. Trained for ASR tasks.

These new features will be included in the University of Zaragoza ViVoLAB speaker diarization system, designed for the Multi-Genre Broadcast (MGB) challenge of the 2015 ASRU Workshop. This system, designed following the i-vector paradigm, uses the input features to segment the input audio and construct one i-vector per segment. These i-vectors will be clustered into speakers according to generative PLDA models.

The evaluation for our new approach will be carried out with broadcast audio from the 2015 MGB Challenge.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Reversible Speech De-identification Using Parametric Transformations and Watermarking

Miro, X.A., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O.: Speaker diarization: a review of recent research. IEEE Trans. Audio Speech Lang. Process. 20(2), 356–370 (2012)CrossRef

Tranter, S.E., Reynolds, D.: An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Lang. Process. 14(5), 1557–1565 (2006)CrossRef

Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, vol. 6, pp. 127–132 (1998)

Reynolds, D., Torres-Carrasquillo, P.: Approaches and applications of audio diarization. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. V, pp. 953–956 (2005)

Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. (Report) CRIM-06/08-13, CRIM, Montreal, pp. 1–17 (2005)

Vaquero, C., Ortega, A., Miguel, A., Lleida, E.: Quality assessment of speaker diarization for speaker characterization. IEEE Trans. Acoust. Speech Lang. Process. 21(4), 816–827 (2013)CrossRef

Reynolds, D., Kenny, P., Castaldo, F.: A study of new approaches to speaker diarization. In: Interspeech, pp. 1047–1050 (2009)

Hinton, G., Deng, L., Dong, Y., Dahl, G., Mohamed, A., Jaitly, N., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef

Ghalehjegh, S.H., Rose, R.: Deep bottleneck features for I-vector based text-independent speaker verification. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 555–560 (2015)

10.

Richardson, F., Reynolds, D., Dehak, N.: A unified deep neural network for speaker and language recognition. In: Interspeech, pp. 1146–1150 (2015)

11.

Lei, Y., Scheffer, N., Ferrer, L., McLaren, M.: A novel scheme for speaker recognition using a phonetically-aware deep neural network. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1714–1718 (2014)

12.

Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRef

13.

Bell, P., Gales, M.J.F., Thomas Hain, J., Kilgour, P Lanchantin Liu, X., McParland, A., Renals, S., Saz, O., Wester, M., Woodland, P.C.: The MGB challenge: evaluating multi-genre broadcast media recognition. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015S, Scottsdale, Arizona, USA, December 2015, vol. 1, no. 1. IEEE (2015)

14.

Villalba, J., Ortega, A., Miguel, A., Lleida, E.: Variational Bayesian PLDA for speaker diarization in the MGB challenge. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 667–674 (2015)

15.

Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of i-vector length normalization in speaker recognition systems. In: Proceedings of Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 249–252 (2011)

16.

Villalba, J., Lleida, E.: Unsupervised adaptation of PLDA by using variational Bayes methods. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 744–748 (2014)

17.

Liu, Y., Qian, Y., Chen, N., Fu, T., Zhang, Y., Yu, K.: Deep feature for text-dependent speaker verification. Speech Commun. 73, 1–13 (2015)CrossRef

18.

Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: Proceedings of ASRU (2011)

19.

ETSI. ETSI ES 202 050 Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-end Feature Extraction Algorithm; Compression (2002)

Title: Bottleneck Based Front-End for Diarization Systems
Authors: Ignacio Viñals
Jesús Villalba
Alfonso Ortega
Antonio Miguel
Eduardo Lleida
Publisher: Springer International Publishing
Book: Advances in Speech and Language Technologies for Iberian Languages
Print ISBN: 978-3-319-49168-4

Electronic ISBN: 978-3-319-49169-1

Copyright Year: 2016
DOI: https://doi.org/10.1007/978-3-319-49169-1_27

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner