nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

HAVRUS Corpus: High-Speed Recordings of Audio-Visual Russian Speech

verfasst von : Vasilisa Verkhodanova, Alexander Ronzhin, Irina Kipyatkova, Denis Ivanko, Alexey Karpov, Miloš Železný

Erschienen in: Speech and Computer

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper we present a software-hardware complex for collection of audio-visual speech databases with a high-speed camera and a dynamic microphone. We describe the architecture of the developed software as well as some details of the collected database of Russian audio-visual speech HAVRUS. The developed software provides synchronization and fusion of both audio and video channels and makes allowance for and processes the natural factor of human speech - the asynchrony of audio and visual speech modalities. The collected corpus comprises recordings of 20 native speakers of Russian and is meant for further research and experiments on audio-visual Russian speech recognition.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Fusing Various Audio Feature Sets for Detection of Parkinson’s Disease from Sustained Voice and Speech Recordings

Nächstes Kapitel Human-Smartphone Interaction for Dangerous Situation Detection and Recommendation Generation While Driving

Biwi 3D Audiovisual Corpus of Affective Communication. http://www.vision.ee.ethz.ch/datasets/b3dac2.en.html

CHIL - Computers in the Human Interaction Loop. https://imatge.upc.edu/web/projects/chil-computers-human-interaction-loop

Czech Audio-Visual Speech Corpus for Recognition with Impaired Conditions. http://catalog.elra.info/product_info.php?cPath=25&products_id=1082

Císař, P., Železnỳ, M., Krňoul, Z., Kanis, J., Zelinka, J., Müller, L.: Design and recording of czech speech corpus for audio-visual continuous speech recognition. In: Proceedings of International Conference on the Auditory-Visual Speech Processing, pp. 1–4 (2005)

Císař, P., Zelinka, J., Železnỳ, M., Karpov, A., Ronzhin, A.: Audio-visual speech recognition for slavonic languages (Czech and Russian). In: Proceedings of 11th International Conference SPECOM 2006, St. Petersburg, Russia, pp. 493–498 (2006)

Estival, D., Cassidy, S., Cox, F., Burnham, D., et al.: Austalk: an audio-visual corpus of australian english. In: Proceedings of 9th Language Resources and Evaluation Conference LREC 2014, pp. 3105–3109 (2014)

Giraudel, A., Carré, M., Mapelli, V., Kahn, J., Galibert, O., Quintard, L.: The REPERE corpus: a multimodal corpus for person recognition. In: Proceedings of 8th Language Resources and Evaluation Conference (LREC 2012), pp. 1102–1107 (2012)

Grishina, E.: Multimodal russian corpus (MURCO): first steps. In: Proceedings of 7th Language Resources and Evaluation Conference (LREC 2010), pp. 2953–2960 (2010)

Karpov, A., Ronzhin, A., Kipyatkova, I.: Designing a multimodal corpus of audio-visual speech using a high-speed camera. In: Proceedings of 11th International Conference on Signal Processing (ICSP 2012), vol. 1, pp. 519–522. IEEE (2012)

10.

Karpov, A., Kipyatkova, I., Železný, M.: A framework for recording audio-visual speech corpora with a microphone and a high-speed camera. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 50–57. Springer, Heidelberg (2014)

11.

Karpov, A., Ronzhin, A., Kipyatkova, I., Železnỳ, M.: Influene of phone-viseme temporal correlations on audiovisual STT and TTS performance. In: Proceedings of 17th International Congress of Phonetic Sciences, pp. 1030–1033 (2011)

12.

Karpov, A., Ronzhin, A., Markov, K., Zeleznỳ, M.: Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition. In: Proceedings of INTERSPEECH 2010, Makuhari, Japan, pp. 2678–2681 (2010)

13.

Karpov, A.A., Ronzhin, A.L.: Information enquiry kiosk with multimodal user interface. Pattern Recogn. Image Analy. 19(3), 546–558 (2009)CrossRef

14.

Lee, B., Hasegawa-Johnson, M., Goudeseune, C., Kamdar, S., Borys, S., Liu, M., Huang, T.S.: AVICAR: audio-visual sspeech corpus in a car eenvironment. In: Proceedings of INTERSPEECH 2004, Jeju Island, Korea, pp. 2489–2492 (2004)

15.

Mostefa, D., Moreau, N., Choukri, K., Potamianos, G., Chu, S.M., Tyagi, A., Casas, J.R., Turmo, J., Cristoforetti, L., Tobia, F., et al.: The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms. Lang. Resour. Evalu. 41(3–4), 389–407 (2007)CrossRef

16.

Nikan, S.: Human face recognition under degraded conditions. University of Windsor (2014)

17.

Patterson, E.K., Gurbuz, S., Tufekci, Z., Gowdy, J.N.: CUAVE: a new audio-visual database for multimodal human-computer interface research. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, pp. 2017–2020. IEEE (2002)

18.

Ronzhin, A.L., Vatamanyuk, I., Ronzhin, A.L., Železnỳ, M.: Mathematical methods to estimate image blur and recognize faces in the system of automatic conference participant registration. Autom. Remote Control 76(11), 2011–2020 (2015)CrossRefMATH

19.

Togneri, R., B.M., Sui, C.: Multimodal speech recognition with the AusTalk 3D audio-visual corpus. In: Tutorial at ITERSPEECH 2014 (2014)

20.

Waibel, A., Stiefelhagen, R., Carlson, R., Casas, J., Kleindienst, J., Lamel, L., Lanz, O., Mostefa, D., Omologo, M., Pianesi, F., et al.: Computers in the human interaction loop. In: Nakashima, H., Aghajan, H., Augusto, J.C. (eds.) Handbook of Ambient Intelligence and Smart Environments, pp. 1071–1116. Springer, Heidelberg (2010)CrossRef

21.

Xie, X.: Illumination preprocessing for face images based on empirical mode decomposition. Signal Process. 103, 250–257 (2014)CrossRef

22.

Železnỳ, M., Císař, P., Krňoul, Z., Ronzhin, A., Li, I., Karpov, A.: Design of russian audio-visual speech corpus for bimodal speech recognition. In: Proceedings of SPECOM, pp. 397–400 (2005)

23.

Zeleznỳ, M., Císar, P.: Czech audio-visual speech corpus of a car driver for in-vehicle audio-visual speech recognition. In: Proceedings of International Conference on Audio-Visual Speech Processing (AVSP 2003), pp. 169–173 (2003)

Titel: HAVRUS Corpus: High-Speed Recordings of Audio-Visual Russian Speech
verfasst von: Vasilisa Verkhodanova
Alexander Ronzhin
Irina Kipyatkova
Denis Ivanko
Alexey Karpov
Miloš Železný
Verlag: Springer International Publishing
Buch: Speech and Computer
Print ISBN: 978-3-319-43957-0

Electronic ISBN: 978-3-319-43958-7

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-43958-7_40

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"