Skip to main content
Top

2020 | OriginalPaper | Chapter

Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools

Authors : Ana Larissa Dias, Cassio Batista, Daniel Santana, Nelson Neto

Published in: Intelligent Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Phonetic analysis of speech, in general, requires the alignment of audio samples to its phonetic transcription. This task could be performed manually for a couple of files, but as the corpus grows large it becomes unfeasibly time-consuming, which emphasizes the need for computational tools that perform such speech-phonemes forced alignment automatically. Therefore, due to the scarce availability of phonetic alignment tools for Brazilian Portuguese (BP), this work describes the evolution process towards creating a free phonetic alignment tool for BP using Kaldi, a toolkit that has been the state of the art for open-source speech recognition. Five acoustic models were trained with Kaldi and tested in phonetic alignment, where the evaluation took place in terms of the phone boundary metric. The results show that its performance is similar to some Kaldi-based aligners for other languages, and superior to an outdated phonetic aligner for BP based on HTK toolkit.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Anastasakos, T., McDonough, J., Makhoul, J.: Speaker adaptive training: a maximum likelihood approach to speaker normalization. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1043–1046 (1997) Anastasakos, T., McDonough, J., Makhoul, J.: Speaker adaptive training: a maximum likelihood approach to speaker normalization. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1043–1046 (1997)
2.
go back to reference Anastasakos, T., Mcdonough, J., Schwartz, R., Makhoul, J.: A compact model for speaker-adaptive training. In: Proceedings of the ICSLP, pp. 1137–1140 (1996) Anastasakos, T., Mcdonough, J., Schwartz, R., Makhoul, J.: A compact model for speaker-adaptive training. In: Proceedings of the ICSLP, pp. 1137–1140 (1996)
9.
go back to reference Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley Interscience, Hoboken (2000)MATH Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley Interscience, Hoboken (2000)MATH
21.
go back to reference Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, 1st edn. Prentice Hall PTR, Upper Saddle River (2001) Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, 1st edn. Prentice Hall PTR, Upper Saddle River (2001)
22.
go back to reference Jiampojamarn, S., Kondrak, G., Sherif, T.: Applying many-to-many alignments and hidden Markov models to letter-to-phoneme conversion. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, Rochester, New York, pp. 372–379. Association for Computational Linguistics, April 2007. http://www.aclweb.org/anthology/N/N07/N07-1047 Jiampojamarn, S., Kondrak, G., Sherif, T.: Applying many-to-many alignments and hidden Markov models to letter-to-phoneme conversion. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, Rochester, New York, pp. 372–379. Association for Computational Linguistics, April 2007. http://​www.​aclweb.​org/​anthology/​N/​N07/​N07-1047
27.
go back to reference Miao, Y., Zhang, H., Metze, F.: Speaker adaptive training of deep neural network acoustic models using I-vectors. IEEE/ACM Trans. Audio Speech Lang. Process. 23(11), 1938–1949 (2015)CrossRef Miao, Y., Zhang, H., Metze, F.: Speaker adaptive training of deep neural network acoustic models using I-vectors. IEEE/ACM Trans. Audio Speech Lang. Process. 23(11), 1938–1949 (2015)CrossRef
34.
go back to reference Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop (2011) Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop (2011)
36.
go back to reference Siravenha, A., Neto, N., Macedo, V., Klautau, A.: Uso de regras fonológicas com determinação de vogal tônica para conversão grafema-fone em Português Brasileiro. In: 7th International Information and Telecommunication Technologies Symposium (2008) Siravenha, A., Neto, N., Macedo, V., Klautau, A.: Uso de regras fonológicas com determinação de vogal tônica para conversão grafema-fone em Português Brasileiro. In: 7th International Information and Telecommunication Technologies Symposium (2008)
39.
go back to reference Young, S., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book. Cambridge University Engineering Department, Version 3.4 (2006) Young, S., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book. Cambridge University Engineering Department, Version 3.4 (2006)
Metadata
Title
Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools
Authors
Ana Larissa Dias
Cassio Batista
Daniel Santana
Nelson Neto
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-61377-8_44

Premium Partner