Skip to main content
Erschienen in: International Journal of Speech Technology 3/2016

25.04.2016

Performance of speaker localization using microphone array

verfasst von: R. Visalakshi, P. Dhanalakshmi, S. Palanivel

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Speaker localization is a technique to locate and track an active speaker from multiple acoustic sources using microphone array. Microphone array is used to improve the speech quality of recorded speech signal in meeting room and other places. In this work, the time delay estimation between source and each microphone is calculated using a localization method called time differences of arrival (TDOA). TDOA localization consists of two steps namely (a) a time delay estimator and (b) a localization estimator. For time delay estimation, the generalized cross-correlation using phase transform, the generalized cross correlation using maximum likelihood, linear prediction (LP) residual and the Hilbert envelope of the LP residual are chosen for estimating the location of a person. A new speaker localization algorithm known as group search optimization (GSO) algorithm is proposed. The performance of this algorithm is analyzed and compared with Gauss–Newton nonlinear least square method and genetic algorithm. Experimental results show that the proposed GSO method outperforms the other methods in terms of mean square error, root mean square error, mean absolute error, mean absolute percentage error, euclidean distance and mean absolute relative error.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alexandridis, A., Griffin, A., & Mouchtaris, A. (2015). Sound source localization and isolation apparatuses, methods and systems. In Foundation for Research and Technology—Hellas (F.O.R.T.H), Institute of Computer Science (I.C.S.). Alexandridis, A., Griffin, A., & Mouchtaris, A. (2015). Sound source localization and isolation apparatuses, methods and systems. In Foundation for Research and Technology—Hellas (F.O.R.T.H), Institute of Computer Science (I.C.S.).
Zurück zum Zitat Carter, G. C. (1993). Coherence and time delay estimation: An applied tutorial for research, development, test and evaluation engineers. In IEEE, Press. Carter, G. C. (1993). Coherence and time delay estimation: An applied tutorial for research, development, test and evaluation engineers. In IEEE, Press.
Zurück zum Zitat He, S., & Li, X. (2008). Application of a group search optimization based artificial neural network to machine condition monitoring. In Proceedings of IEEE international conference on emerging technologies and factory automation (ETFA). He, S., & Li, X. (2008). Application of a group search optimization based artificial neural network to machine condition monitoring. In Proceedings of IEEE international conference on emerging technologies and factory automation (ETFA).
Zurück zum Zitat He, S., Wu, Q. H., & Saunders, J. R. (2009). Group search optimizer: An optimization algorithm inspired by animal searching behavior. In IEEE transactions on evolutionary computation (vol. 13, pp. 973–990). He, S., Wu, Q. H., & Saunders, J. R. (2009). Group search optimizer: An optimization algorithm inspired by animal searching behavior. In IEEE transactions on evolutionary computation (vol. 13, pp. 973–990).
Zurück zum Zitat Himawan, I. (2010). Speech recognition using AD-HOC microphone array. Ph.D thesis, Queensland University of Technology Brisbane. Himawan, I. (2010). Speech recognition using AD-HOC microphone array. Ph.D thesis, Queensland University of Technology Brisbane.
Zurück zum Zitat Jeannes, R. L. B., Scalart, P., Faucon, G., & Beaugeant, C. (2001). Combined noise and echo reduction in hands-free systems: A survey. IEEE Transactions Speech Audio Processing (vol. 9, no. 1/2, pp. 808–820). Jeannes, R. L. B., Scalart, P., Faucon, G., & Beaugeant, C. (2001). Combined noise and echo reduction in hands-free systems: A survey. IEEE Transactions Speech Audio Processing (vol. 9, no. 1/2, pp. 808–820).
Zurück zum Zitat Kawaguchi, N., Matsubara, S., Iwa, H., Kajita, S., Takeda, K., & Itakura, F. et al. (2000). Construction of speech corpus in moving car environment. In Proceedings international conference spoken language processing (vol. 3, pp. 362–365). Beijing. Kawaguchi, N., Matsubara, S., Iwa, H., Kajita, S., Takeda, K., & Itakura, F. et al. (2000). Construction of speech corpus in moving car environment. In Proceedings international conference spoken language processing (vol. 3, pp. 362–365). Beijing.
Zurück zum Zitat Kepesi, M., Pernkopf, F., & Wohlmayr, M., (2007). Joint position pitch tracking for 2-channel audio. In International workshop on content based multimedia indexing. Bourdeaux. Kepesi, M., Pernkopf, F., & Wohlmayr, M., (2007). Joint position pitch tracking for 2-channel audio. In International workshop on content based multimedia indexing. Bourdeaux.
Zurück zum Zitat Kishore, B., Satyanarayana, M. R. S., & Sujatha, K. (2013). Adaptive genetic algorithm with neural network for machinery fault detection. International Journal of Advances in Engineering and Technology, 6, 1639. Kishore, B., Satyanarayana, M. R. S., & Sujatha, K. (2013). Adaptive genetic algorithm with neural network for machinery fault detection. International Journal of Advances in Engineering and Technology, 6, 1639.
Zurück zum Zitat Knapp, C. F., & Carter, G. C. (1976). The generalized correlation method for estimation of time delay. In IEEE Transactions on acoustic, speech and signal processing (vol. 24, pp. 320–327). Knapp, C. F., & Carter, G. C. (1976). The generalized correlation method for estimation of time delay. In IEEE Transactions on acoustic, speech and signal processing (vol. 24, pp. 320–327).
Zurück zum Zitat Lathoud, G. (2005). AV16.3: An audio-visual corpus for speaker localization and tracking. In Lectures notes in computer science. Lathoud, G. (2005). AV16.3: An audio-visual corpus for speaker localization and tracking. In Lectures notes in computer science.
Zurück zum Zitat Nazu, N. (2014). Locating and extracting acoustic and neural signals. Ph.D thesis, Graduate School of Wayne State University. Nazu, N. (2014). Locating and extracting acoustic and neural signals. Ph.D thesis, Graduate School of Wayne State University.
Zurück zum Zitat Nordholm, S., Claesson, I., & Grbiae, N. (2001). Optimal and adaptive microphone arrays for speech input in automobiles. In Digital signal processing (vol. 3, pp. 307–329). Berlin. Nordholm, S., Claesson, I., & Grbiae, N. (2001). Optimal and adaptive microphone arrays for speech input in automobiles. In Digital signal processing (vol. 3, pp. 307–329). Berlin.
Zurück zum Zitat Omologo, M., Matassoni, M., & Svaizer, P. (2001). Speech recognition with microphone arrays. In Microphone arrays-signal processing techniques and application (vol. 2, pp. 331–353). Omologo, M., Matassoni, M., & Svaizer, P. (2001). Speech recognition with microphone arrays. In Microphone arrays-signal processing techniques and application (vol. 2, pp. 331–353).
Zurück zum Zitat Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. In Speech communication (pp. 1243–1261). Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. In Speech communication (pp. 1243–1261).
Zurück zum Zitat Quazi, A. H. (1981). An overview on the time delay estimation in active and passive systems for target localization. In IEEE Transactions on acoustic, speech and signal processing (vol. 29, pp. 527–533). Quazi, A. H. (1981). An overview on the time delay estimation in active and passive systems for target localization. In IEEE Transactions on acoustic, speech and signal processing (vol. 29, pp. 527–533).
Zurück zum Zitat Raykar, V. C., Yegnanarayana, B., Prasanna, M. S. R., & Duraiswami, R. (2005). Speaker localization using excitation source information in speech. In IEEE transactions speech audio processing (vol. 13, no. 5, pp. 751–761). Raykar, V. C., Yegnanarayana, B., Prasanna, M. S. R., & Duraiswami, R. (2005). Speaker localization using excitation source information in speech. In IEEE transactions speech audio processing (vol. 13, no. 5, pp. 751–761).
Zurück zum Zitat Roig, E. T. (2014). Eigenbeamforming array systems for sound source localization. Ph.D thesis, Technical University of Denmark. Roig, E. T. (2014). Eigenbeamforming array systems for sound source localization. Ph.D thesis, Technical University of Denmark.
Zurück zum Zitat Swamy, R. K., Sri RamaMurty, K., & Yegnanarayana, B. (2007). Determining number of speakers from multispeaker speech signals using excitation source information. In IEEE signal processing letters (vol. 14, no. 7, pp. 481–484). Swamy, R. K., Sri RamaMurty, K., & Yegnanarayana, B. (2007). Determining number of speakers from multispeaker speech signals using excitation source information. In IEEE signal processing letters (vol. 14, no. 7, pp. 481–484).
Zurück zum Zitat Wang, H., & Chu, P. (1997). Voice source localization for automatic camera pointing system in videoconferencing. In Proceedings IEEE international conference acoustics, speech, signal processing (pp. 187–190). Orlando. Wang, H., & Chu, P. (1997). Voice source localization for automatic camera pointing system in videoconferencing. In Proceedings IEEE international conference acoustics, speech, signal processing (pp. 187–190). Orlando.
Zurück zum Zitat Zotkin, D., Duraiswami, R., Philomin, V., & Davis, L. (2000). Smart videoconferencing. In International conference multimedia expo (pp. 1597–2000). New York. Zotkin, D., Duraiswami, R., Philomin, V., & Davis, L. (2000). Smart videoconferencing. In International conference multimedia expo (pp. 1597–2000). New York.
Metadaten
Titel
Performance of speaker localization using microphone array
verfasst von
R. Visalakshi
P. Dhanalakshmi
S. Palanivel
Publikationsdatum
25.04.2016
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-016-9341-9

Weitere Artikel der Ausgabe 3/2016

International Journal of Speech Technology 3/2016 Zur Ausgabe

Neuer Inhalt