nach oben

International Journal of Speech Technology

Erschienen in:

25.04.2016

Performance of speaker localization using microphone array

verfasst von: R. Visalakshi, P. Dhanalakshmi, S. Palanivel

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Speaker localization is a technique to locate and track an active speaker from multiple acoustic sources using microphone array. Microphone array is used to improve the speech quality of recorded speech signal in meeting room and other places. In this work, the time delay estimation between source and each microphone is calculated using a localization method called time differences of arrival (TDOA). TDOA localization consists of two steps namely (a) a time delay estimator and (b) a localization estimator. For time delay estimation, the generalized cross-correlation using phase transform, the generalized cross correlation using maximum likelihood, linear prediction (LP) residual and the Hilbert envelope of the LP residual are chosen for estimating the location of a person. A new speaker localization algorithm known as group search optimization (GSO) algorithm is proposed. The performance of this algorithm is analyzed and compared with Gauss–Newton nonlinear least square method and genetic algorithm. Experimental results show that the proposed GSO method outperforms the other methods in terms of mean square error, root mean square error, mean absolute error, mean absolute percentage error, euclidean distance and mean absolute relative error.

Vorheriger Artikel Performance of speaker identification using CSM and TM

Nächster Artikel Arabic speech synthesis and diacritic recognition

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Alexandridis, A., Griffin, A., & Mouchtaris, A. (2015). Sound source localization and isolation apparatuses, methods and systems. In Foundation for Research and Technology—Hellas (F.O.R.T.H), Institute of Computer Science (I.C.S.).

Carter, G. C. (1993). Coherence and time delay estimation: An applied tutorial for research, development, test and evaluation engineers. In IEEE, Press.

He, S., & Li, X. (2008). Application of a group search optimization based artificial neural network to machine condition monitoring. In Proceedings of IEEE international conference on emerging technologies and factory automation (ETFA).

He, S., Wu, Q. H., & Saunders, J. R. (2009). Group search optimizer: An optimization algorithm inspired by animal searching behavior. In IEEE transactions on evolutionary computation (vol. 13, pp. 973–990).

Himawan, I. (2010). Speech recognition using AD-HOC microphone array. Ph.D thesis, Queensland University of Technology Brisbane.

Jeannes, R. L. B., Scalart, P., Faucon, G., & Beaugeant, C. (2001). Combined noise and echo reduction in hands-free systems: A survey. IEEE Transactions Speech Audio Processing (vol. 9, no. 1/2, pp. 808–820).

Kawaguchi, N., Matsubara, S., Iwa, H., Kajita, S., Takeda, K., & Itakura, F. et al. (2000). Construction of speech corpus in moving car environment. In Proceedings international conference spoken language processing (vol. 3, pp. 362–365). Beijing.

Kepesi, M., Pernkopf, F., & Wohlmayr, M., (2007). Joint position pitch tracking for 2-channel audio. In International workshop on content based multimedia indexing. Bourdeaux.

Kishore, B., Satyanarayana, M. R. S., & Sujatha, K. (2013). Adaptive genetic algorithm with neural network for machinery fault detection. International Journal of Advances in Engineering and Technology, 6, 1639.

Knapp, C. F., & Carter, G. C. (1976). The generalized correlation method for estimation of time delay. In IEEE Transactions on acoustic, speech and signal processing (vol. 24, pp. 320–327).

Lathoud, G. (2005). AV16.3: An audio-visual corpus for speaker localization and tracking. In Lectures notes in computer science.

Nazu, N. (2014). Locating and extracting acoustic and neural signals. Ph.D thesis, Graduate School of Wayne State University.

Nordholm, S., Claesson, I., & Grbiae, N. (2001). Optimal and adaptive microphone arrays for speech input in automobiles. In Digital signal processing (vol. 3, pp. 307–329). Berlin.

Omologo, M., Matassoni, M., & Svaizer, P. (2001). Speech recognition with microphone arrays. In Microphone arrays-signal processing techniques and application (vol. 2, pp. 331–353).

Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. In Speech communication (pp. 1243–1261).

Quazi, A. H. (1981). An overview on the time delay estimation in active and passive systems for target localization. In IEEE Transactions on acoustic, speech and signal processing (vol. 29, pp. 527–533).

Raykar, V. C., Yegnanarayana, B., Prasanna, M. S. R., & Duraiswami, R. (2005). Speaker localization using excitation source information in speech. In IEEE transactions speech audio processing (vol. 13, no. 5, pp. 751–761).

Roig, E. T. (2014). Eigenbeamforming array systems for sound source localization. Ph.D thesis, Technical University of Denmark.

Swamy, R. K., Sri RamaMurty, K., & Yegnanarayana, B. (2007). Determining number of speakers from multispeaker speech signals using excitation source information. In IEEE signal processing letters (vol. 14, no. 7, pp. 481–484).

Wang, H., & Chu, P. (1997). Voice source localization for automatic camera pointing system in videoconferencing. In Proceedings IEEE international conference acoustics, speech, signal processing (pp. 187–190). Orlando.

Zotkin, D., Duraiswami, R., Philomin, V., & Davis, L. (2000). Smart videoconferencing. In International conference multimedia expo (pp. 1597–2000). New York.

Titel: Performance of speaker localization using microphone array
verfasst von: R. Visalakshi
P. Dhanalakshmi
S. Palanivel
Publikationsdatum: 25.04.2016
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 3/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-016-9341-9

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Interview Entropie Bild 1/© Bernhard Weßling, Joerg Schweinsberg/© Datacore Software, Smart Factory Symbolbild/© TensorSpark | Generated with AI | Getty Images, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2016

Performance of speaker identification using CSM and TM

Text-dependent speaker verification using classical LBG, adaptive LBG and FCM vector quantization

Assessment of dysarthric speech using Elman back propagation network (recurrent network) for speech recognition

Analysis and modeling of acoustic information for automatic dialect classification

Erratum to: What we have and what is needed, how to evaluate Arabic Speech Synthesizer?

Speech transmission with COFDM based on different discrete transforms

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.