Top

Published in:

2020 | OriginalPaper | Chapter

Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools

Authors : Ana Larissa Dias, Cassio Batista, Daniel Santana, Nelson Neto

Published in: Intelligent Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Phonetic analysis of speech, in general, requires the alignment of audio samples to its phonetic transcription. This task could be performed manually for a couple of files, but as the corpus grows large it becomes unfeasibly time-consuming, which emphasizes the need for computational tools that perform such speech-phonemes forced alignment automatically. Therefore, due to the scarce availability of phonetic alignment tools for Brazilian Portuguese (BP), this work describes the evolution process towards creating a free phonetic alignment tool for BP using Kaldi, a toolkit that has been the state of the art for open-source speech recognition. Five acoustic models were trained with Kaldi and tested in phonetic alignment, where the evaluation took place in terms of the phone boundary metric. The results show that its performance is similar to some Kaldi-based aligners for other languages, and superior to an outdated phonetic aligner for BP based on HTK toolkit.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter The Use of Machine Learning in the Classification of Electronic Lawsuits: An Application in the Court of Justice of Minas Gerais

next chapter Twitter Moral Stance Classification Using Long Short-Term Memory Networks

Anastasakos, T., McDonough, J., Makhoul, J.: Speaker adaptive training: a maximum likelihood approach to speaker normalization. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1043–1046 (1997)

Anastasakos, T., Mcdonough, J., Schwartz, R., Makhoul, J.: A compact model for speaker-adaptive training. In: Proceedings of the ICSLP, pp. 1137–1140 (1996)

Batista, C., Cunha, R., Batista, P., Klautau, A., Neto, N.: Utterance copy in formant-based speech synthesizers using LSTM neural networks. In: 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), pp. 90–95, October 2019. https://doi.org/10.1109/BRACIS.2019.00025

Batista, C., Dias, A.L., Sampaio Neto, N.: Baseline acoustic models for Brazilian Portuguese using Kaldi tools. In: Proceedings of IberSPEECH, pp. 77–81 (2018). https://doi.org/10.21437/IberSPEECH.2018-17

Bigi, B., Hirst, D.: Speech phonetization alignment and syllabification (SPPAS): a tool for the automatic analysis of speech prosody. In: Proceedings of Speech Prosody, pp. 1–4, May 2012. https://www.isca-speech.org/archive/sp2012/papers/sp12_019.pdf

Boersma, P., Weenink, D.: Praat: doing phonetics by computer (version 6.1.15) [computer program] (2020). https://www.fon.hum.uva.nl/praat/

Brognaux, S., Roekhaut, S., Drugman, T., Beaufort, R.: Train&align: a new online tool for automatic phonetic alignment. In: IEEE Workshop on Spoken Language Technology, pp. 416–421 (2012). https://doi.org/10.1109/SLT.2012.6424260

Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980). https://doi.org/10.1109/TASSP.1980.1163420CrossRef

Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley Interscience, Hoboken (2000)MATH

10.

Gales, M.J.F.: Maximum likelihood linear transformations for hmm-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998). https://doi.org/10.1006/csla.1998.0043CrossRef

11.

GitHub: Kaldi speech recognition toolkit (2018). https://github.com/kaldi-asr/kaldi

12.

GitHub: Frequencywords (2020). https://github.com/hermitdave/FrequencyWords

13.

GitHub: GNU Aspell (2020). https://github.com/GNUAspell/aspell

14.

Goldman, J.P.: EasyAlign: an automatic phonetic alignment tool under Praat. In: Proceedings of Interspeech, pp. 3233–3236 (2011). https://archive-ouverte.unige.ch/unige:18188

15.

Gopinath, R.A.: Maximum likelihood modeling with Gaussian distributions for classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, vol. 2, pp. 661–664, May 1998. https://doi.org/10.1109/ICASSP.1998.675351

16.

Gorman, K., Howell, J., Wagner, M.: Prosodylab-aligner: a tool for forced alignment of laboratory speech. Can. Acoust. 39(3), 192–193 (2011). https://jcaa.caa-aca.ca/index.php/jcaa/article/view/2476

17.

Grupo FalaBrasil: Ferramentas para alinhamento fonético em português brasileiro (2020). https://gitlab.com/fb-align/

18.

Grupo FalaBrasil: NLP: Gerador de ferramentas para processamento de linguagem natural (2020). https://gitlab.com/fb-nlp/nlp-generator

19.

Grupo FalaBrasil: Recursos prontos para processamento de linguagem natural em português brasileiro (2020). https://gitlab.com/fb-nlp/nlp-resources

20.

Guiroy, S., Cordoba, R., Villegas, A.: Application of the Kaldi toolkit for continuous speech recognition using hidden-Markov models and deep neural networks. In: Proceedings of IberSPEECH 2016, pp. 187–196 (2016). https://iberspeech2016.inesc-id.pt/wp-content/uploads/2017/01/OnlineProceedings_IberSPEECH2016.pdf

21.

Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, 1st edn. Prentice Hall PTR, Upper Saddle River (2001)

22.

Jiampojamarn, S., Kondrak, G., Sherif, T.: Applying many-to-many alignments and hidden Markov models to letter-to-phoneme conversion. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, Rochester, New York, pp. 372–379. Association for Computational Linguistics, April 2007. http://www.aclweb.org/anthology/N/N07/N07-1047

23.

Kipyatkova, I., Karpov, A.: DNN-based acoustic modeling for Russian speech recognition using Kaldi. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 246–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43958-7_29CrossRef

24.

LDC: CSLU: Spoltech Brazilian Portuguese version 1.0 (2018). https://catalog.ldc.upenn.edu/LDC2006S16

25.

LDC: West point Brazilian Portuguese speech (2018). https://catalog.ldc.upenn.edu/LDC2008S04

26.

McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., Sonderegger, M.: Montreal forced aligner: trainable text-speech alignment using Kaldi. In: Proceedings of Interspeech, pp. 498–502, August 2017. https://doi.org/10.21437/Interspeech.2017-1386

27.

Miao, Y., Zhang, H., Metze, F.: Speaker adaptive training of deep neural network acoustic models using I-vectors. IEEE/ACM Trans. Audio Speech Lang. Process. 23(11), 1938–1949 (2015)CrossRef

28.

Neto, N., Patrick, C., Klautau, A., Trancoso, I.: Free tools and resources for Brazilian Portuguese speech recognition. J. Braz. Comput. Soc. 17(1), 53–68 (2010). https://doi.org/10.1007/s13173-010-0023-1CrossRef

29.

Ochshorn, R.M., Hawkins, M.: Gentle forced aligner [computer program] (2020). https://github.com/lowerquality/gentle

30.

opensubtitles.org: Opensubtitles (2020). https://www.opensubtitles.org/

31.

PCD Legal: PCD legal: Acessível para todos (2018). http://www.pcdlegal.com.br/

32.

Povey, D.: Chain models (2020). https://kaldi-asr.org/doc/chain.html

33.

Povey, D.: Kaldi documentations (2020). https://kaldi-asr.org/doc/index.html

34.

Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop (2011)

35.

PUC-Rio: Centro de estudos em telecomunicações (CETUC) (2018). http://www.cetuc.puc-rio.br/

36.

Siravenha, A., Neto, N., Macedo, V., Klautau, A.: Uso de regras fonológicas com determinação de vogal tônica para conversão grafema-fone em Português Brasileiro. In: 7th International Information and Telecommunication Technologies Symposium (2008)

37.

Souza, G., Neto, N.: An automatic phonetic aligner for Brazilian Portuguese with a Praat interface. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 374–384. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_38CrossRef

38.

Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967). https://doi.org/10.1109/TIT.1967.1054010CrossRefMATH

39.

Young, S., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book. Cambridge University Engineering Department, Version 3.4 (2006)

40.

Yuan, J., Liberman, M.: Speaker identification on the SCOTUS corpus. J. Acoust. Soc. Am. 123(5), 3878–3881 (2008). https://doi.org/10.1121/1.2935783CrossRef

Title: Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools
Authors: Ana Larissa Dias
Cassio Batista
Daniel Santana
Nelson Neto
Publisher: Springer International Publishing
Book: Intelligent Systems
Print ISBN: 978-3-030-61376-1

Electronic ISBN: 978-3-030-61377-8

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-61377-8_44

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner