Skip to main content

2016 | OriginalPaper | Buchkapitel

Investigation of Segmentation in i-Vector Based Speaker Diarization of Telephone Speech

verfasst von : Zbyněk Zajíc, Marie Kunešová, Vlasta Radová

Erschienen in: Speech and Computer

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The goal of this paper is to evaluate the contribution of speaker change detection (SCD) to the performance of a speaker diarization system in the telephone domain. We compare the overall performance of an i-vector based system using both SCD-based segmentation and a naive constant length segmentation with overlapping segments. The diarization system performs K-means clustering of i-vectors which represent the individual segments, followed by a resegmentation step. Experiments were done on the English part of the CallHome corpus. The final results indicate that the use of speaker change detection is beneficial, but the differences between the two segmentation approaches are diminished by the use of resegmentation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. Technical report, Idiap (2013) Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. Technical report, Idiap (2013)
2.
Zurück zum Zitat Sell, G., Garcia-Romero, D.: Speaker diarization with PLDA i-vector scoring and unsupervised calibration. In: IEEE Spoken Language Technology Workshop, pp. 413–417 (2014) Sell, G., Garcia-Romero, D.: Speaker diarization with PLDA i-vector scoring and unsupervised calibration. In: IEEE Spoken Language Technology Workshop, pp. 413–417 (2014)
3.
Zurück zum Zitat Fredouille, C., Bozonnet, S., Evans, N.: The lia-eurecom RT 09 speaker diarization system. In: RT-09, NIST Rich Transcription Workshop (2009) Fredouille, C., Bozonnet, S., Evans, N.: The lia-eurecom RT 09 speaker diarization system. In: RT-09, NIST Rich Transcription Workshop (2009)
4.
Zurück zum Zitat Shum, S.H., Dehak, N., Dehak, R., Glass, J.R.: Unsupervised methods for speaker diarization: an integrated and iterative approach. IEEE Trans. Audio Speech Lang. Process. 21(10), 2015–2028 (2013)CrossRef Shum, S.H., Dehak, N., Dehak, R., Glass, J.R.: Unsupervised methods for speaker diarization: an integrated and iterative approach. IEEE Trans. Audio Speech Lang. Process. 21(10), 2015–2028 (2013)CrossRef
5.
Zurück zum Zitat Senoussaoui, M., Kenny, P., Stafylakis, T., Dumouchel, P.: A study of the cosine distance-based mean shift for telephone speech diarization. IEEE Trans. Audio Speech Lang. Process. 22(1), 217–227 (2014)CrossRef Senoussaoui, M., Kenny, P., Stafylakis, T., Dumouchel, P.: A study of the cosine distance-based mean shift for telephone speech diarization. IEEE Trans. Audio Speech Lang. Process. 22(1), 217–227 (2014)CrossRef
6.
Zurück zum Zitat Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRef Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRef
7.
Zurück zum Zitat Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of i-vector length normalization in speaker recognition systems. In: Interspeech 2011, pp. 249–252, Florence (2011) Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of i-vector length normalization in speaker recognition systems. In: Interspeech 2011, pp. 249–252, Florence (2011)
8.
Zurück zum Zitat Shum, S., Dehak, N., Chuangsuwanich, E., Reynolds, D., Glass, J.: Exploiting intra-conversation variability for speaker diarization. In: INTERSPEECH, pp. 945–948, August 2011 Shum, S., Dehak, N., Chuangsuwanich, E., Reynolds, D., Glass, J.: Exploiting intra-conversation variability for speaker diarization. In: INTERSPEECH, pp. 945–948, August 2011
9.
Zurück zum Zitat Gish, H., Siu, M.H., Rohlicek, R.: Segregation of speakers for speech recognition and speaker identification. In: ICASSP, pp. 873–876 (1991) Gish, H., Siu, M.H., Rohlicek, R.: Segregation of speakers for speech recognition and speaker identification. In: ICASSP, pp. 873–876 (1991)
10.
Zurück zum Zitat Zajíc, Z., Machlica, L., Müller, L.: Initialization of fMLLR with Sufficient statistics from similar speakers. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 187–194. Springer, Heidelberg (2011)CrossRef Zajíc, Z., Machlica, L., Müller, L.: Initialization of fMLLR with Sufficient statistics from similar speakers. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 187–194. Springer, Heidelberg (2011)CrossRef
11.
Zurück zum Zitat Kenny, P., Dumouchel, P.: Experiments in speaker verification using factor analysis likelihood ratios. In: Odyssey - Speaker and Language Recognition Workshop, pp. 219–226, Toledo (2004) Kenny, P., Dumouchel, P.: Experiments in speaker verification using factor analysis likelihood ratios. In: Odyssey - Speaker and Language Recognition Workshop, pp. 219–226, Toledo (2004)
12.
Zurück zum Zitat Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. Technical report (2006) Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. Technical report (2006)
13.
Zurück zum Zitat Machlica, L., Zajíc, Z.: Factor analysis and nuisance attribute projection revisited. In: Interspeech 2012, pp. 1570–1573, Portland (2012) Machlica, L., Zajíc, Z.: Factor analysis and nuisance attribute projection revisited. In: Interspeech 2012, pp. 1570–1573, Portland (2012)
14.
Zurück zum Zitat Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)CrossRef Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)CrossRef
15.
Zurück zum Zitat Canavan, A., Graff, D., Zipperlen, G.: CALLHOME American English Speech LDC97S42. LDC Catalog. Philadelphia: Linguistic Data Consortium (1997) Canavan, A., Graff, D., Zipperlen, G.: CALLHOME American English Speech LDC97S42. LDC Catalog. Philadelphia: Linguistic Data Consortium (1997)
16.
Zurück zum Zitat Fiscus, J.G., Ajot, J., Michel, M., Garofolo, J.S.: The rich transcription 2006 spring meeting recognition evaluation. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 309–322. Springer, Heidelberg (2006)CrossRef Fiscus, J.G., Ajot, J., Michel, M., Garofolo, J.S.: The rich transcription 2006 spring meeting recognition evaluation. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 309–322. Springer, Heidelberg (2006)CrossRef
Metadaten
Titel
Investigation of Segmentation in i-Vector Based Speaker Diarization of Telephone Speech
verfasst von
Zbyněk Zajíc
Marie Kunešová
Vlasta Radová
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-43958-7_49