Skip to main content
Top

2016 | OriginalPaper | Chapter

Speaker Detection in Audio Stream via Probabilistic Prediction Using Generalized GEBI

Authors : Koki Sakata, Shota Sakashita, Kazuya Matsuo, Shuichi Kurogi

Published in: Neural Information Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper presents a method of speaker detection using probabilistic prediction for avoiding the tuning of thresholds to detect a speaker in an audio stream. We introduce g-GEBI (generalized GEBI) as a generalization of BI (Bayesian Inference) and GEBI (Gibbs-distribution-based Extended BI) to execute iterative detection of a speaker in audio stream uttered by more than one speaker. Then, we show a method of probabilistic prediction in multiclass classification to classify the results of speaker detection. By means of numerical experiments using recorded real speech data, we examine the properties and the effectiveness of the present method. Especially, we show that g-GEBI and g-BI (generalized BI) are more effective than the conventional BI and GEBI in incremental speaker detection task.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Beigi, H.: Fundamentals of speaker recognition. Springer-Verlag New York Inc. (2011) Beigi, H.: Fundamentals of speaker recognition. Springer-Verlag New York Inc. (2011)
2.
go back to reference Kurogi, S., Sakashita, S., Takeguchi, S., Ueki, T., Matsuo, K.: Probabilistic prediction in multiclass classification derived for flexible text-prompted speaker verification. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9489, pp. 216–225. Springer, Heidelberg (2015). doi:10.1007/978-3-319-26532-2_24 CrossRef Kurogi, S., Sakashita, S., Takeguchi, S., Ueki, T., Matsuo, K.: Probabilistic prediction in multiclass classification derived for flexible text-prompted speaker verification. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9489, pp. 216–225. Springer, Heidelberg (2015). doi:10.​1007/​978-3-319-26532-2_​24 CrossRef
3.
go back to reference Kurogi, S., Ueki, T., Mizobe, Y., Nishida, T.: Text-prompted multistep speaker verification using Gibbs-distribution-based extended Bayesian inference for reducing verification errors. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 184–192. Springer, Heidelberg (2013). doi:10.1007/978-3-642-42051-1_24 CrossRef Kurogi, S., Ueki, T., Mizobe, Y., Nishida, T.: Text-prompted multistep speaker verification using Gibbs-distribution-based extended Bayesian inference for reducing verification errors. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8228, pp. 184–192. Springer, Heidelberg (2013). doi:10.​1007/​978-3-642-42051-1_​24 CrossRef
4.
go back to reference Kurogi, S., Ueki, T., Takeguchi, S., Mizobe, Y.: Properties of text-prompted multistep speaker verification using Gibbs-distribution-based extended Bayesian inference for rejecting unregistered speakers. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds.) ICONIP 2014. LNCS, vol. 8835, pp. 35–43. Springer, Heidelberg (2014). doi:10.1007/978-3-319-12640-1_5 Kurogi, S., Ueki, T., Takeguchi, S., Mizobe, Y.: Properties of text-prompted multistep speaker verification using Gibbs-distribution-based extended Bayesian inference for rejecting unregistered speakers. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds.) ICONIP 2014. LNCS, vol. 8835, pp. 35–43. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-12640-1_​5
5.
go back to reference Slingo, J., Palmer, T.: Uncertainty in weather and climate prediction. Phil. Trans. R. Soc. A 369, 4751–4767 (2011)CrossRefMATH Slingo, J., Palmer, T.: Uncertainty in weather and climate prediction. Phil. Trans. R. Soc. A 369, 4751–4767 (2011)CrossRefMATH
6.
go back to reference Kurogi, S., Ueno, T., Sawa, M.: A batch learning method for competitive associative net and its application to function approximation. In: Proceedings of the SCI 2004, vol. V, pp. 24–28 (2004) Kurogi, S., Ueno, T., Sawa, M.: A batch learning method for competitive associative net and its application to function approximation. In: Proceedings of the SCI 2004, vol. V, pp. 24–28 (2004)
7.
go back to reference Kurogi, S., Mineishi, S., Sato, S.: An analysis of speaker recognition using bagging CAN2 and pole distribution of speech signals. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010. LNCS, vol. 6443, pp. 363–370. Springer, Heidelberg (2010). doi:10.1007/978-3-642-17537-4_45 CrossRef Kurogi, S., Mineishi, S., Sato, S.: An analysis of speaker recognition using bagging CAN2 and pole distribution of speech signals. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010. LNCS, vol. 6443, pp. 363–370. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-17537-4_​45 CrossRef
8.
go back to reference Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)CrossRef Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)CrossRef
Metadata
Title
Speaker Detection in Audio Stream via Probabilistic Prediction Using Generalized GEBI
Authors
Koki Sakata
Shota Sakashita
Kazuya Matsuo
Shuichi Kurogi
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-46681-1_37

Premium Partner