Skip to main content

2013 | OriginalPaper | Buchkapitel

An Unsupervised Approach to Multiple Speaker Tracking for Robust Multimedia Retrieval

verfasst von : M. Phanikumar, Lalan Kumar, Rajesh M. Hegde

Erschienen in: The Era of Interactive Media

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Tagging multi media data based on who is speaking at what time, is important especially in the intelligent retrieval of recordings of meetings and conferences. In this paper an unsupervised approach to tracking more than two speakers in multi media data recorded from multiple visual sensors and a single audio sensor is proposed. The multi-speaker detection and tracking problem is first formulated as a multiple hypothesis testing problem. From this formulation we proceed to derive the multi speaker detection and tracking problem as a condition in mutual information. The proposed method is then evaluated for multi media recordings consisting of four speakers recorded on a multi media recording test bed. Experimental results on the CUAVE multi modal corpus are also discussed. The proposed method exhibits reasonably good performance as demonstrated by the detection (ROC) curves. The results of analysis based on the condition in mutual information are also encouraging. A multiple speaker detection and tracking system implemented using this approach gives reasonable performance in actual meeting room scenarios.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Robert B. Dunn., Douglas A. Reynolds., and Thomas F. Quatieri., Approaches to Speaker Detection and Tracking in Conversational Speech, Digital Signal Processing. Vol. 10, Elsevier Inc., (2000) 93–112. Robert B. Dunn., Douglas A. Reynolds., and Thomas F. Quatieri., Approaches to Speaker Detection and Tracking in Conversational Speech, Digital Signal Processing. Vol. 10, Elsevier Inc., (2000) 93–112.
2.
Zurück zum Zitat John W. Fisher. and Trevor Darrell., Speaker association with signal-level audiovisual fusion, IEEE Transactions on Multimedia. Vol. 6, IEEE (2004) 406–413. John W. Fisher. and Trevor Darrell., Speaker association with signal-level audiovisual fusion, IEEE Transactions on Multimedia. Vol. 6, IEEE (2004) 406–413.
3.
Zurück zum Zitat Patricia Besson., Vlad Popovici., Jean-Marc Vesin., Jean-Philippe Thiran., and Murat Kunt, Approaches to speaker detection and tracking in conversational speech, IEEE Transactions on Multimedia. Vol. 10, IEEE (2008) 63–73. Patricia Besson., Vlad Popovici., Jean-Marc Vesin., Jean-Philippe Thiran., and Murat Kunt, Approaches to speaker detection and tracking in conversational speech, IEEE Transactions on Multimedia. Vol. 10, IEEE (2008) 63–73.
4.
Zurück zum Zitat Nock H.J., Iyengar.G., and Neti.C., Speaker localisation using audio–visual synchrony: An empirical study, ACM International Conference on Multimedia. Vol. 10, ACM (2003). Nock H.J., Iyengar.G., and Neti.C., Speaker localisation using audio–visual synchrony: An empirical study, ACM International Conference on Multimedia. Vol. 10, ACM (2003).
5.
Zurück zum Zitat Emanuel Parzen., On estimation of a probability density function and mode, In: The Annals of Mathematical Statistics. Vol. 33, Institute of Mathematical Statistics (1962) 1065–1076. Emanuel Parzen., On estimation of a probability density function and mode, In: The Annals of Mathematical Statistics. Vol. 33, Institute of Mathematical Statistics (1962) 1065–1076.
6.
Zurück zum Zitat Patricia Besson. and Murat Kunt., Hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection, Journal of Neuro-Engineering and Rehabilitation. Vol. 5, BioMed Central (2008). Patricia Besson. and Murat Kunt., Hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection, Journal of Neuro-Engineering and Rehabilitation. Vol. 5, BioMed Central (2008).
7.
Zurück zum Zitat Steven M. Kay., Fundamentals of statistical signal processing, In: Detection Theory. Vol. 2, Prentice Hall (2002). Steven M. Kay., Fundamentals of statistical signal processing, In: Detection Theory. Vol. 2, Prentice Hall (2002).
8.
Zurück zum Zitat Eric K. Patterson., Sabri Gurbuz., Zekeriya Tufekci., and John N. Gowdy, Moving-talker, speaker-independent feature study and baseline results using the CUAVE multimodal speech corpus, Journal on Applied Signal Processing. Vol. 2002, EURASIP (2002) 1189–1201. Eric K. Patterson., Sabri Gurbuz., Zekeriya Tufekci., and John N. Gowdy, Moving-talker, speaker-independent feature study and baseline results using the CUAVE multimodal speech corpus, Journal on Applied Signal Processing. Vol. 2002, EURASIP (2002) 1189–1201.
9.
Zurück zum Zitat K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, Boston (1990).MATH K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, Boston (1990).MATH
Metadaten
Titel
An Unsupervised Approach to Multiple Speaker Tracking for Robust Multimedia Retrieval
verfasst von
M. Phanikumar
Lalan Kumar
Rajesh M. Hegde
Copyright-Jahr
2013
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4614-3501-3_43

Neuer Inhalt