Skip to main content

2017 | OriginalPaper | Buchkapitel

Experiments with Segmentation in an Online Speaker Diarization System

verfasst von : Marie Kunešová, Zbyněk Zajíc, Vlasta Radová

Erschienen in: Text, Speech, and Dialogue

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In offline speaker diarization systems, particularly those aimed at telephone speech, the accuracy of the initial segmentation of a conversation is often a secondary concern. Imprecise segment boundaries are typically corrected during resegmentation, which is performed as the final step of the diarization process. However, such resegmentation is generally not possible in online systems, where past decisions are usually unchangeable. In such situations, correct segmentation becomes critical. In this paper, we evaluate several different segmentation approaches in the context of online diarization by comparing the overall performance of an i-vector-based diarization system set to operate in a sequential manner.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bozonnet, S., Evans, N.W., Fredouille, C.: The LIA-EURECOM RT 2009 speaker diarization system: enhancements in speaker modelling and cluster purification. In: Proceedings ICASSP, pp. 4958–4961. IEEE (2010) Bozonnet, S., Evans, N.W., Fredouille, C.: The LIA-EURECOM RT 2009 speaker diarization system: enhancements in speaker modelling and cluster purification. In: Proceedings ICASSP, pp. 4958–4961. IEEE (2010)
2.
Zurück zum Zitat Canavan, A., Graff, D., Zipperlen, G.: CALLHOME American English speech, LDC97S42. In: LDC Catalog, Linguistic Data Consortium, Philadelphia (1997) Canavan, A., Graff, D., Zipperlen, G.: CALLHOME American English speech, LDC97S42. In: LDC Catalog, Linguistic Data Consortium, Philadelphia (1997)
3.
Zurück zum Zitat Church, K., Zhu, W., Vopicka, J., Pelecanos, J., Dimitriadis, D., Fousek, P.: Speaker diarization: a perspective on challenges and opportunities from theory to practice. In: Proceedings ICASSP, pp. 4950–4954 (2017) Church, K., Zhu, W., Vopicka, J., Pelecanos, J., Dimitriadis, D., Fousek, P.: Speaker diarization: a perspective on challenges and opportunities from theory to practice. In: Proceedings ICASSP, pp. 4950–4954 (2017)
4.
Zurück zum Zitat Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRef Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRef
5.
Zurück zum Zitat Fergani, B., Davy, M., Houacine, A.: Speaker diarization using one-class support vector machines. Speech Commun. 50(5), 355–365 (2008)CrossRef Fergani, B., Davy, M., Houacine, A.: Speaker diarization using one-class support vector machines. Speech Commun. 50(5), 355–365 (2008)CrossRef
6.
Zurück zum Zitat Garcia-Romero, D., Snyder, D., Sell, G., Povey, D., McCree, A.: Speaker diarization using deep neural network embedings. In: Proceedings ICASSP, pp. 4930–4934 (2017) Garcia-Romero, D., Snyder, D., Sell, G., Povey, D., McCree, A.: Speaker diarization using deep neural network embedings. In: Proceedings ICASSP, pp. 4930–4934 (2017)
7.
Zurück zum Zitat Gupta, V.: Speaker change point detection using deep neural nets. In: Proceedings ICASSP, pp. 4420–4424 (2015) Gupta, V.: Speaker change point detection using deep neural nets. In: Proceedings ICASSP, pp. 4420–4424 (2015)
8.
Zurück zum Zitat Hrúz, M., Zajíc, Z.: Convolutional neural network for speaker change detection in telephone speaker diarization system. In: Proceedings ICASSP, pp. 4945–4949 (2017) Hrúz, M., Zajíc, Z.: Convolutional neural network for speaker change detection in telephone speaker diarization system. In: Proceedings ICASSP, pp. 4945–4949 (2017)
9.
Zurück zum Zitat Lapidot, I., Bonastre, J.F.: On the importance of efficient transition modeling for speaker diarization. In: Proceedings Interspeech, 08–12 September 2016, pp. 2190–2193 (2016) Lapidot, I., Bonastre, J.F.: On the importance of efficient transition modeling for speaker diarization. In: Proceedings Interspeech, 08–12 September 2016, pp. 2190–2193 (2016)
11.
Zurück zum Zitat Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Proceedings Interspeech, pp. 1477–1481 (2013) Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Proceedings Interspeech, pp. 1477–1481 (2013)
12.
Zurück zum Zitat Sell, G., Garcia-Romero, D.: Speaker diarization with PLDA i-vector scoring and unsupervised calibration. In: IEEE Spoken Language Technology Workshop, pp. 413–417 (2014) Sell, G., Garcia-Romero, D.: Speaker diarization with PLDA i-vector scoring and unsupervised calibration. In: IEEE Spoken Language Technology Workshop, pp. 413–417 (2014)
13.
Zurück zum Zitat Senoussaoui, M., Kenny, P., Stafylakis, T., Dumouchel, P.: A study of the cosine distance-based mean shift for telephone speech diarization. IEEE/ACM Trans. Audio Speech Lang. Process. 22(1), 217–227 (2014)CrossRef Senoussaoui, M., Kenny, P., Stafylakis, T., Dumouchel, P.: A study of the cosine distance-based mean shift for telephone speech diarization. IEEE/ACM Trans. Audio Speech Lang. Process. 22(1), 217–227 (2014)CrossRef
14.
Zurück zum Zitat Shum, S., Dehak, N., Chuangsuwanich, E., Reynolds, D., Glass, J.: Exploiting intra-conversation variability for speaker diarization. In: Proceedings Interspeech, pp. 945–948 (2011) Shum, S., Dehak, N., Chuangsuwanich, E., Reynolds, D., Glass, J.: Exploiting intra-conversation variability for speaker diarization. In: Proceedings Interspeech, pp. 945–948 (2011)
15.
Zurück zum Zitat Wang, R., Gu, M., Li, L., Xu, M., Zheng, T.F.: Speaker segmentation using deep speaker vectors for fast speaker change scenarios. In: Proceedings ICASSP, pp. 5420–5424 (2017) Wang, R., Gu, M., Li, L., Xu, M., Zheng, T.F.: Speaker segmentation using deep speaker vectors for fast speaker change scenarios. In: Proceedings ICASSP, pp. 5420–5424 (2017)
16.
Zurück zum Zitat Zajíc, Z., Kunešová, M., Radová, V.: Investigation of segmentation in i-vector based speaker diarization of telephone speech. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 411–418. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_49 CrossRef Zajíc, Z., Kunešová, M., Radová, V.: Investigation of segmentation in i-vector based speaker diarization of telephone speech. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 411–418. Springer, Cham (2016). doi:10.​1007/​978-3-319-43958-7_​49 CrossRef
17.
Zurück zum Zitat Zajíc, Z., Machlica, L., Müller, L.: Initialization of fMLLR with sufficient statistics from similar speakers. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS (LNAI), vol. 6836, pp. 187–194. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23538-2_24 CrossRef Zajíc, Z., Machlica, L., Müller, L.: Initialization of fMLLR with sufficient statistics from similar speakers. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS (LNAI), vol. 6836, pp. 187–194. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-23538-2_​24 CrossRef
18.
Zurück zum Zitat Zajíc, Z., Hrúz, M., Müller, L.: Speaker diarization using convolutional neural network for statistics accumulation refinement. In: Proceedings Interspeech (2017, in press) Zajíc, Z., Hrúz, M., Müller, L.: Speaker diarization using convolutional neural network for statistics accumulation refinement. In: Proceedings Interspeech (2017, in press)
19.
Zurück zum Zitat Zhu, W., Pelecanos, J.: Online speaker diarization using adapted i-vector transforms. In: Proceedings ICASSP, pp. 5045–5049. IEEE (2016) Zhu, W., Pelecanos, J.: Online speaker diarization using adapted i-vector transforms. In: Proceedings ICASSP, pp. 5045–5049. IEEE (2016)
Metadaten
Titel
Experiments with Segmentation in an Online Speaker Diarization System
verfasst von
Marie Kunešová
Zbyněk Zajíc
Vlasta Radová
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-64206-2_48