nach oben

Erschienen in:

2013 | OriginalPaper | Buchkapitel

An Unsupervised Approach to Multiple Speaker Tracking for Robust Multimedia Retrieval

verfasst von : M. Phanikumar, Lalan Kumar, Rajesh M. Hegde

Erschienen in: The Era of Interactive Media

Verlag: Springer New York

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Tagging multi media data based on who is speaking at what time, is important especially in the intelligent retrieval of recordings of meetings and conferences. In this paper an unsupervised approach to tracking more than two speakers in multi media data recorded from multiple visual sensors and a single audio sensor is proposed. The multi-speaker detection and tracking problem is first formulated as a multiple hypothesis testing problem. From this formulation we proceed to derive the multi speaker detection and tracking problem as a condition in mutual information. The proposed method is then evaluated for multi media recordings consisting of four speakers recorded on a multi media recording test bed. Experimental results on the CUAVE multi modal corpus are also discussed. The proposed method exhibits reasonably good performance as demonstrated by the detection (ROC) curves. The results of analysis based on the condition in mutual information are also encouraging. A multiple speaker detection and tracking system implemented using this approach gives reasonable performance in actual meeting room scenarios.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel An Adaptive and Link-Based Method for Video Scene Clustering and Visualization

Nächstes Kapitel On Effects of Visual Query Complexity

Robert B. Dunn., Douglas A. Reynolds., and Thomas F. Quatieri., Approaches to Speaker Detection and Tracking in Conversational Speech, Digital Signal Processing. Vol. 10, Elsevier Inc., (2000) 93–112.

John W. Fisher. and Trevor Darrell., Speaker association with signal-level audiovisual fusion, IEEE Transactions on Multimedia. Vol. 6, IEEE (2004) 406–413.

Patricia Besson., Vlad Popovici., Jean-Marc Vesin., Jean-Philippe Thiran., and Murat Kunt, Approaches to speaker detection and tracking in conversational speech, IEEE Transactions on Multimedia. Vol. 10, IEEE (2008) 63–73.

Nock H.J., Iyengar.G., and Neti.C., Speaker localisation using audio–visual synchrony: An empirical study, ACM International Conference on Multimedia. Vol. 10, ACM (2003).

Emanuel Parzen., On estimation of a probability density function and mode, In: The Annals of Mathematical Statistics. Vol. 33, Institute of Mathematical Statistics (1962) 1065–1076.

Patricia Besson. and Murat Kunt., Hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection, Journal of Neuro-Engineering and Rehabilitation. Vol. 5, BioMed Central (2008).

Steven M. Kay., Fundamentals of statistical signal processing, In: Detection Theory. Vol. 2, Prentice Hall (2002).

Eric K. Patterson., Sabri Gurbuz., Zekeriya Tufekci., and John N. Gowdy, Moving-talker, speaker-independent feature study and baseline results using the CUAVE multimodal speech corpus, Journal on Applied Signal Processing. Vol. 2002, EURASIP (2002) 1189–1201.

K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, Boston (1990).MATH

Titel: An Unsupervised Approach to Multiple Speaker Tracking for Robust Multimedia Retrieval
verfasst von: M. Phanikumar
Lalan Kumar
Rajesh M. Hegde
Verlag: Springer New York
Buch: The Era of Interactive Media
Print ISBN: 978-1-4614-3500-6

Electronic ISBN: 978-1-4614-3501-3

Copyright-Jahr: 2013
DOI: https://doi.org/10.1007/978-1-4614-3501-3_43

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.