Skip to main content

2023 | OriginalPaper | Buchkapitel

Literature Review of Research on Common Methods of Grapheme-To-Phoneme

verfasst von : Yating Zhang, Han Zhang, Shaozhong Cao

Erschienen in: IEIS 2022

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Grapheme-to-phoneme (G2P) conversion techniques have been used in many fields, most notably speech synthesis (text-to-speech, TTS). Nowadays, the development of speech synthesis is facilitated with the continuous improvement of G2P conversion techniques. The purpose of the paper is to provide an review of grapheme-to-phoneme conversion methods. First, the grapheme-to-phoneme conversion methods in recent years are sorted out; then the relevant data sets and evaluation metrics are listed, and finally the problems and development trends faced by grapheme-to-phoneme conversion are described.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kaplan, R.M., Kay, M.: Regular models of phonological rule systems. Comput. Linguist. 20(3), 331–378 (1994) Kaplan, R.M., Kay, M.: Regular models of phonological rule systems. Comput. Linguist. 20(3), 331–378 (1994)
2.
Zurück zum Zitat Kominek, J., Black, A.W.: Learning pronunciation dictionaries: language complexity and word selection strategies. In: Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pp. 232–239 (2006) Kominek, J., Black, A.W.: Learning pronunciation dictionaries: language complexity and word selection strategies. In: Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pp. 232–239 (2006)
3.
Zurück zum Zitat Galescu, L., Allen, J.F.: Bi-directional conversion between graphemes and phonemes using a joint n-gram model. In: 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis (2001) Galescu, L., Allen, J.F.: Bi-directional conversion between graphemes and phonemes using a joint n-gram model. In: 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis (2001)
4.
Zurück zum Zitat Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)CrossRef Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)CrossRef
5.
Zurück zum Zitat Yang, D., Dixon, P., Sadahi, F.: Rapid development of a grapheme-to-phoneme system based on weighted finite state transducer (WFST) framework. In: The Japanese Acoustical Society 2009 Fall Lecture Proceedings, no. 3, pp. 111–112 (2009) Yang, D., Dixon, P., Sadahi, F.: Rapid development of a grapheme-to-phoneme system based on weighted finite state transducer (WFST) framework. In: The Japanese Acoustical Society 2009 Fall Lecture Proceedings, no. 3, pp. 111–112 (2009)
6.
Zurück zum Zitat Novak, J.R., Dixon, P.R., Minematsu, N., et al.: Improving WFST-based G2P conversion with alignment constraints and RNNLM N-best rescoring. In: Thirteenth Annual Conference of the International Speech Communication Association (2012) Novak, J.R., Dixon, P.R., Minematsu, N., et al.: Improving WFST-based G2P conversion with alignment constraints and RNNLM N-best rescoring. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
7.
Zurück zum Zitat Novak, J.R., Minematsu, N., Hirose, K.: Failure transitions for joint n-gram models and G2p conversion. In: Interspeech, pp. 1821–1825 (2013) Novak, J.R., Minematsu, N., Hirose, K.: Failure transitions for joint n-gram models and G2p conversion. In: Interspeech, pp. 1821–1825 (2013)
8.
Zurück zum Zitat Novak, J.R., Minematsu, N., Hirose, K.: WFST-based grapheme-to-phoneme conversion: open source tools for alignment, model-building and decoding. In: Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing, pp. 45–49 (2012) Novak, J.R., Minematsu, N., Hirose, K.: WFST-based grapheme-to-phoneme conversion: open source tools for alignment, model-building and decoding. In: Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing, pp. 45–49 (2012)
9.
Zurück zum Zitat Mikolov, T., Karafiát, M., Burget, L., et al.: Recurrent neural network based language model. In: Interspeech, vol. 2, no. 3, pp. 1045–1048 (2010) Mikolov, T., Karafiát, M., Burget, L., et al.: Recurrent neural network based language model. In: Interspeech, vol. 2, no. 3, pp. 1045–1048 (2010)
10.
Zurück zum Zitat Chen, S.F.: Conditional and joint models for grapheme-to-phoneme conversion. In: INTERSPEECH (2003) Chen, S.F.: Conditional and joint models for grapheme-to-phoneme conversion. In: INTERSPEECH (2003)
11.
Zurück zum Zitat Lehnen, P., Hahn, S., Guta, V.A., et al.: Hidden conditional random fields with m-to-n alignments for grapheme-to-phoneme conversion. In: Thirteenth Annual Conference of the International Speech Communication Association (2012) Lehnen, P., Hahn, S., Guta, V.A., et al.: Hidden conditional random fields with m-to-n alignments for grapheme-to-phoneme conversion. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
12.
Zurück zum Zitat Lehnen, P., Allauzen, A., Lavergne, T., et al.: Structure learning in hidden conditional random fields for grapheme-to-phoneme conversion. In: Annual Conference of the International Speech Communication Association (2013) Lehnen, P., Allauzen, A., Lavergne, T., et al.: Structure learning in hidden conditional random fields for grapheme-to-phoneme conversion. In: Annual Conference of the International Speech Communication Association (2013)
13.
Zurück zum Zitat Taylor, P.: Hidden Markov models for grapheme to phoneme conversion. In: Ninth European Conference on Speech Communication and Technology (2005) Taylor, P.: Hidden Markov models for grapheme to phoneme conversion. In: Ninth European Conference on Speech Communication and Technology (2005)
14.
Zurück zum Zitat Ogbureke, K.U., Cahill, P., Carson-Berndsen, J.: Hidden Markov models with context-sensitive observations for grapheme-to-phoneme conversion. In: Eleventh Annual Conference of the International Speech Communication Association (2010) Ogbureke, K.U., Cahill, P., Carson-Berndsen, J.: Hidden Markov models with context-sensitive observations for grapheme-to-phoneme conversion. In: Eleventh Annual Conference of the International Speech Communication Association (2010)
15.
Zurück zum Zitat Jiampojamarn, S., Kondrak, G., Sherif, T.: Applying many-to-many alignments and hidden Markov models to letter-to-phoneme conversion. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp. 372–379 (2007) Jiampojamarn, S., Kondrak, G., Sherif, T.: Applying many-to-many alignments and hidden Markov models to letter-to-phoneme conversion. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp. 372–379 (2007)
16.
Zurück zum Zitat Hahn, S., Vozila, P., Bisani, M.: Comparison of grapheme-to-phoneme methods on large pronunciation dictionaries and LVCSR tasks. In: Thirteenth Annual Conference of the International Speech Communication Association (2012) Hahn, S., Vozila, P., Bisani, M.: Comparison of grapheme-to-phoneme methods on large pronunciation dictionaries and LVCSR tasks. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
17.
Zurück zum Zitat Wu, K., Allauzen, C., Hall, K., et al.: Encoding linear models as weighted finite-state transducers (2014) Wu, K., Allauzen, C., Hall, K., et al.: Encoding linear models as weighted finite-state transducers (2014)
18.
Zurück zum Zitat Mnih, V., Heess, N., Graves, A.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, vol. 27 (2014) Mnih, V., Heess, N., Graves, A.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
19.
Zurück zum Zitat Rao, K., Peng, F., Sak, H., et al.: Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4225–4229. IEEE (2015) Rao, K., Peng, F., Sak, H., et al.: Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4225–4229. IEEE (2015)
20.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
21.
Zurück zum Zitat Yao, K., Zweig, G.: Sequence-to-sequence neural net models for grapheme-to-phoneme conversion. arXiv preprint arXiv:1506.00196 (2015) Yao, K., Zweig, G.: Sequence-to-sequence neural net models for grapheme-to-phoneme conversion. arXiv preprint arXiv:​1506.​00196 (2015)
22.
Zurück zum Zitat Graves, A., Fernández, S., Gomez, F., et al.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006) Graves, A., Fernández, S., Gomez, F., et al.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
23.
Zurück zum Zitat Mousa, A.E.D., Schuller, B.: Deep bidirectional long short-term memory recurrent neural networks for grapheme-to-phoneme conversion utilizing complex many-to-many alignments (2016) Mousa, A.E.D., Schuller, B.: Deep bidirectional long short-term memory recurrent neural networks for grapheme-to-phoneme conversion utilizing complex many-to-many alignments (2016)
24.
Zurück zum Zitat Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, vol. 27 (2014) Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
25.
Zurück zum Zitat Robinson, A.J., Fallside, F.: The Utility Driven Dynamic Error Propagation Network. University of Cambridge Department of Engineering, Cambridge (1987) Robinson, A.J., Fallside, F.: The Utility Driven Dynamic Error Propagation Network. University of Cambridge Department of Engineering, Cambridge (1987)
26.
Zurück zum Zitat Williams, R.J., Peng, J.: An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Comput. 2(4), 490–501 (1990)CrossRef Williams, R.J., Peng, J.: An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Comput. 2(4), 490–501 (1990)CrossRef
27.
Zurück zum Zitat Yolchuyeva, S., Németh, G., Gyires-Tóth, B.: Grapheme-to-phoneme conversion with convolutional neural networks. Appl. Sci. 9(6), 1143 (2019)CrossRef Yolchuyeva, S., Németh, G., Gyires-Tóth, B.: Grapheme-to-phoneme conversion with convolutional neural networks. Appl. Sci. 9(6), 1143 (2019)CrossRef
28.
Zurück zum Zitat Chae, M.J., Park, K., Bang, J., et al.: Convolutional sequence to sequence model with non-sequential greedy decoding for grapheme to phoneme conversion. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2486–2490. IEEE (2018) Chae, M.J., Park, K., Bang, J., et al.: Convolutional sequence to sequence model with non-sequential greedy decoding for grapheme to phoneme conversion. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2486–2490. IEEE (2018)
29.
Zurück zum Zitat Toshniwal, S., Livescu, K.: Jointly learning to align and convert graphemes to phonemes with neural attention models. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 76–82. IEEE (2016) Toshniwal, S., Livescu, K.: Jointly learning to align and convert graphemes to phonemes with neural attention models. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 76–82. IEEE (2016)
30.
Zurück zum Zitat Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:​1409.​0473 (2014)
31.
Zurück zum Zitat Sun, H., Tan, X., Gan, J.W., et al.: Token-level ensemble distillation for grapheme-to-phoneme conversion. arXiv preprint arXiv:1904.03446 (2019) Sun, H., Tan, X., Gan, J.W., et al.: Token-level ensemble distillation for grapheme-to-phoneme conversion. arXiv preprint arXiv:​1904.​03446 (2019)
32.
Zurück zum Zitat Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017) Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
33.
Zurück zum Zitat Wu, Y., Schuster, M., Chen, Z., et al.: Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016) Wu, Y., Schuster, M., Chen, Z., et al.: Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:​1609.​08144 (2016)
34.
Zurück zum Zitat Gehring, J., Auli, M., Grangier, D., et al.: Convolutional sequence to sequence learning. In: International Conference on Machine Learning, pp. 1243–1252. PMLR (2017) Gehring, J., Auli, M., Grangier, D., et al.: Convolutional sequence to sequence learning. In: International Conference on Machine Learning, pp. 1243–1252. PMLR (2017)
35.
Zurück zum Zitat Yolchuyeva, S., Németh, G., Gyires-Tóth, B.: Transformer based grapheme-to-phoneme conversion. arXiv preprint arXiv:2004.06338 (2020) Yolchuyeva, S., Németh, G., Gyires-Tóth, B.: Transformer based grapheme-to-phoneme conversion. arXiv preprint arXiv:​2004.​06338 (2020)
36.
Zurück zum Zitat Řezáčková, M., Švec, J., Tihelka, D.: T5g2p: using text-to-text transfer transformer for grapheme-to-phoneme conversion (2021) Řezáčková, M., Švec, J., Tihelka, D.: T5g2p: using text-to-text transfer transformer for grapheme-to-phoneme conversion (2021)
37.
Zurück zum Zitat Dong, L., Guo, Z.Q., Tan, C.H., et al.: Neural grapheme-to-phoneme conversion with pre-trained grapheme models. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6202–6206. IEEE (2022) Dong, L., Guo, Z.Q., Tan, C.H., et al.: Neural grapheme-to-phoneme conversion with pre-trained grapheme models. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6202–6206. IEEE (2022)
38.
39.
Zurück zum Zitat Peters, B., Dehdari, J., van Genabith, J.: Massively multilingual neural grapheme-to-phoneme conversion. arXiv preprint arXiv:1708.01464 (2017) Peters, B., Dehdari, J., van Genabith, J.: Massively multilingual neural grapheme-to-phoneme conversion. arXiv preprint arXiv:​1708.​01464 (2017)
40.
Zurück zum Zitat Sokolov, A., Rohlin, T., Rastrow, A.: Neural machine translation for multilingual grapheme-to-phoneme conversion. arXiv preprint arXiv:2006.14194 (2020) Sokolov, A., Rohlin, T., Rastrow, A.: Neural machine translation for multilingual grapheme-to-phoneme conversion. arXiv preprint arXiv:​2006.​14194 (2020)
41.
Zurück zum Zitat Yu, M., Nguyen, H.D., Sokolov, A., et al.: Multilingual grapheme-to-phoneme conversion with byte representation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8234–8238. IEEE (2020) Yu, M., Nguyen, H.D., Sokolov, A., et al.: Multilingual grapheme-to-phoneme conversion with byte representation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8234–8238. IEEE (2020)
42.
Zurück zum Zitat Zhu, J., Zhang, C., Jurgens, D.: ByT5 model for massively multilingual grapheme-to-phoneme conversion. arXiv preprint arXiv:2204.03067 (2022) Zhu, J., Zhang, C., Jurgens, D.: ByT5 model for massively multilingual grapheme-to-phoneme conversion. arXiv preprint arXiv:​2204.​03067 (2022)
43.
Zurück zum Zitat Kim, H.Y., Kim, J.H., Kim, J.M.: Fast bilingual grapheme-to-phoneme conversion. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 289–296. Industry Track (2022) Kim, H.Y., Kim, J.H., Kim, J.M.: Fast bilingual grapheme-to-phoneme conversion. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 289–296. Industry Track (2022)
45.
Zurück zum Zitat CALLHOME American English Lexicon (PRONLEX). Linguistic Data Consortium CALLHOME American English Lexicon (PRONLEX). Linguistic Data Consortium
47.
Zurück zum Zitat Sejnowski, T.J.: The NetTalk Corpus: Phonetic Transcription of 20008 English Words (1988) Sejnowski, T.J.: The NetTalk Corpus: Phonetic Transcription of 20008 English Words (1988)
48.
Zurück zum Zitat Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady. 10(8), 707–710 (1966) Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady. 10(8), 707–710 (1966)
49.
Zurück zum Zitat Zhu, R., Huang, Y.: Efficient privacy-preserving general edit distance and beyond. Cryptology ePrint Archive (2017) Zhu, R., Huang, Y.: Efficient privacy-preserving general edit distance and beyond. Cryptology ePrint Archive (2017)
Metadaten
Titel
Literature Review of Research on Common Methods of Grapheme-To-Phoneme
verfasst von
Yating Zhang
Han Zhang
Shaozhong Cao
Copyright-Jahr
2023
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-99-3618-2_16

Premium Partner