Skip to main content
Top

2022 | OriginalPaper | Chapter

ANNPRO: A Desktop Module for Automatic Segmentation and Transcription

Authors : Katarzyna Klessa, Danijel Koržinek, Brygida Sawicka-Stępińska, Hanna Kasperek

Published in: Human Language Technology. Challenges for Computer Science and Linguistics

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper describes an automatic segmentation and transcription module for Polish and its integration with the Annotation Pro software tool. The module is an extended desktop version of the CLARIN-PL online tool and has been named ANNPRO. Thanks to developing the module, it becomes possible to combine the functionality of Annotation Pro desktop program and the web-based automatic aligner. The results can be immediately used as the input for further acoustic-phonetic analyses with Annotation Pro native functions or annotation mining plugins. Annotation Pro enables using any number of external alignment modules, provided that certain basic format requirements are kept. We discuss these requirements and exemplify them with the ANNPRO module functionality and the integration steps. As an illustration, we present a brief report on experiences gained in the process of annotation of a multimodal corpus with the use of ANNPRO. Both Annotation Pro and the ANNPRO module are publicly available for download and can be freely used for research.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bigi, B.: SPPAS-multi-lingual approaches to the automatic annotation of speech. Phonetician 111, 54–69 (2015). ISSN:0741–6164 Bigi, B.: SPPAS-multi-lingual approaches to the automatic annotation of speech. Phonetician 111, 54–69 (2015). ISSN:0741–6164
2.
go back to reference Bigi, B., Klessa, K.: Automatic syllabification of Polish. In: Proceedings of the 7th Language Technology Conference, pp. 262–266. Poznań (2015) Bigi, B., Klessa, K.: Automatic syllabification of Polish. In: Proceedings of the 7th Language Technology Conference, pp. 262–266. Poznań (2015)
4.
go back to reference Castro, A.D., Ramos, D., Gonzalez-Rodriguez, J.: Forensic speaker recognition using traditional features comparing automatic and human-in-the-loop formant tracking. In: Proceedings of the 10th INTERSPEECH Conference (2009) Castro, A.D., Ramos, D., Gonzalez-Rodriguez, J.: Forensic speaker recognition using traditional features comparing automatic and human-in-the-loop formant tracking. In: Proceedings of the 10th INTERSPEECH Conference (2009)
5.
go back to reference Gibbon, D., Moore, R., Winski, R.: Handbook of standards and resources for spoken language systems. Walter de Gruyter (1997) Gibbon, D., Moore, R., Winski, R.: Handbook of standards and resources for spoken language systems. Walter de Gruyter (1997)
7.
go back to reference Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press (1997) Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press (1997)
8.
go back to reference Karpiński, M., Kleśta, J., Baranowska, E., Francuzik, K.: Interphrase pause realization rules for the purpose of high quality Polish speech synthesis. In: Speech Analysis, Synthesis and Recognition (SASR) in Technology, Linguistics and Medicine, Szczyrk 2003, pp. 85–89. AGH Kraków (2005) Karpiński, M., Kleśta, J., Baranowska, E., Francuzik, K.: Interphrase pause realization rules for the purpose of high quality Polish speech synthesis. In: Speech Analysis, Synthesis and Recognition (SASR) in Technology, Linguistics and Medicine, Szczyrk 2003, pp. 85–89. AGH Kraków (2005)
9.
go back to reference Katsamanis, A., Black, M., Georgiou, P.G., Goldstein, L., Narayanan, S.: Sailalign: robust long speech-text alignment. In: Proceedings of the Workshop on New Tools and Methods for Very-Large Scale Phonetics Research (2011) Katsamanis, A., Black, M., Georgiou, P.G., Goldstein, L., Narayanan, S.: Sailalign: robust long speech-text alignment. In: Proceedings of the Workshop on New Tools and Methods for Very-Large Scale Phonetics Research (2011)
10.
go back to reference Keller, E., Terken, J., Huckvale, M., Gailly, G., Monaghan, A.: Improvements in Speech Synthesis. Wiley (2001) Keller, E., Terken, J., Huckvale, M., Gailly, G., Monaghan, A.: Improvements in Speech Synthesis. Wiley (2001)
11.
go back to reference Kisler, T., Reichel, U.D., Schiel, F., Draxler, C., Jackl, B.: BAS Speech Science Web Services - An Update of Current Developments. In: LREC 2016. Portorož (2016) Kisler, T., Reichel, U.D., Schiel, F., Draxler, C., Jackl, B.: BAS Speech Science Web Services - An Update of Current Developments. In: LREC 2016. Portorož (2016)
12.
go back to reference Klessa, K.: Annotation Pro. Enhancing Analyses of Linguistic and Paralinguistic Features in Speech, Wydział Neofilologii UAM (2016) Klessa, K.: Annotation Pro. Enhancing Analyses of Linguistic and Paralinguistic Features in Speech, Wydział Neofilologii UAM (2016)
13.
go back to reference Klessa, K., Karpiński, M., Wagner, A.: Annotation Pro - a new software tool for annotation of linguistic and paralinguistic features. In: Proceedings of the TRASP Workshop, pp. 51–54. Aix en Provence (2013) Klessa, K., Karpiński, M., Wagner, A.: Annotation Pro - a new software tool for annotation of linguistic and paralinguistic features. In: Proceedings of the TRASP Workshop, pp. 51–54. Aix en Provence (2013)
14.
go back to reference Koržinek, D., Marasek, K., Brocki, L., Wołk, K.: Polish read speech corpus for speech tools and services. In: Selected Papers from the CLARIN Annual Conference 2016, Aix-en-Provence, 26–28 Oct 2016, CLARIN Common Language Resources and Technology Infrastructure, vol. 136, pp. 54–62. Linköping University Electronic Press (2017) Koržinek, D., Marasek, K., Brocki, L., Wołk, K.: Polish read speech corpus for speech tools and services. In: Selected Papers from the CLARIN Annual Conference 2016, Aix-en-Provence, 26–28 Oct 2016, CLARIN Common Language Resources and Technology Infrastructure, vol. 136, pp. 54–62. Linköping University Electronic Press (2017)
15.
go back to reference Laver, J.: Principles of Phonetics. Cambridge University Press (1994) Laver, J.: Principles of Phonetics. Cambridge University Press (1994)
17.
go back to reference Novak, J.R., Minematsu, N., Hirose, K.: WFST-based grapheme-to-phoneme conversion: open source tools for alignment, model-building and decoding. In: Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing, pp. 45–49 (2012) Novak, J.R., Minematsu, N., Hirose, K.: WFST-based grapheme-to-phoneme conversion: open source tools for alignment, model-building and decoding. In: Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing, pp. 45–49 (2012)
18.
go back to reference Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: Proceedings of the 16th INTERSPEECH Conference (2015) Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: Proceedings of the 16th INTERSPEECH Conference (2015)
19.
go back to reference Povey, D., Saon, G.: Feature and model space speaker adaptation with full covariance Gaussians. In: Proceedings of the 9th ICSLP Conference (2006) Povey, D., Saon, G.: Feature and model space speaker adaptation with full covariance Gaussians. In: Proceedings of the 9th ICSLP Conference (2006)
20.
go back to reference Poznyakovskiy, A.A., Mainka, A., Platzek, I., Mürbe, D.: A fast semiautomatic algorithm for centerline-based vocal tract segmentation. BioMed Res. Int. (2015) Poznyakovskiy, A.A., Mainka, A., Platzek, I., Mürbe, D.: A fast semiautomatic algorithm for centerline-based vocal tract segmentation. BioMed Res. Int. (2015)
21.
go back to reference Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef
22.
go back to reference Skurzok, D., Ziółko, B., Ziółko, M.: Ortfon2-tool for orthographic to phonetic transcription. In: Proceedings of the 7th LTC Conference. Poznań (2015) Skurzok, D., Ziółko, B., Ziółko, M.: Ortfon2-tool for orthographic to phonetic transcription. In: Proceedings of the 7th LTC Conference. Poznań (2015)
23.
go back to reference Sledzinski, D.: Fonetyczno-akustyczna analiza struktury sylaby w języku polskim na potrzeby technologii mowy. Unpublished Ph.D. Thesis. Adam Mickiewicz University, Poznan (2007) Sledzinski, D.: Fonetyczno-akustyczna analiza struktury sylaby w języku polskim na potrzeby technologii mowy. Unpublished Ph.D. Thesis. Adam Mickiewicz University, Poznan (2007)
24.
go back to reference Sledzinski, D.: Podział korpusu tekstów na sylaby-analiza polskich grup spółgłoskowych. Kwartalnik Jezykoznawczy 3(15), 48–100 (2013) Sledzinski, D.: Podział korpusu tekstów na sylaby-analiza polskich grup spółgłoskowych. Kwartalnik Jezykoznawczy 3(15), 48–100 (2013)
25.
go back to reference Sloetjes, H., Wittenburg, P.: Annotation by category-elan and ISO DCR. In: Proceedings of the 6th LREC (2008) Sloetjes, H., Wittenburg, P.: Annotation by category-elan and ISO DCR. In: Proceedings of the 6th LREC (2008)
26.
go back to reference Steffen-Batogowa, M.: Automatyzacja Transkrypcji Fonematycznej Tekstów Polskich. PWN, Warszawa (1975) Steffen-Batogowa, M.: Automatyzacja Transkrypcji Fonematycznej Tekstów Polskich. PWN, Warszawa (1975)
27.
go back to reference Wells, J.C.: SAMPA computer readable phonetic alphabet. In: Gibbon, D., Moore, R., Winski, R. (eds.) Handbook of Standards and Resources for Spoken Language Systems, Part IV, Section B. Mouton de Gruyter, Berlin, New York (1997) Wells, J.C.: SAMPA computer readable phonetic alphabet. In: Gibbon, D., Moore, R., Winski, R. (eds.) Handbook of Standards and Resources for Spoken Language Systems, Part IV, Section B. Mouton de Gruyter, Berlin, New York (1997)
28.
go back to reference Wypych, M., Baranowska, E., Demenko, G.: A grapheme-to-phoneme transcription algorithm based on the SAMPA alphabet extension for the Polish language. In: Proceedings of the 15th ICPhS, pp. 2601–2604. Barcelona (2003) Wypych, M., Baranowska, E., Demenko, G.: A grapheme-to-phoneme transcription algorithm based on the SAMPA alphabet extension for the Polish language. In: Proceedings of the 15th ICPhS, pp. 2601–2604. Barcelona (2003)
Metadata
Title
ANNPRO: A Desktop Module for Automatic Segmentation and Transcription
Authors
Katarzyna Klessa
Danijel Koržinek
Brygida Sawicka-Stępińska
Hanna Kasperek
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-031-05328-3_5

Premium Partner