Skip to main content
Erschienen in: International Journal of Speech Technology 3/2018

30.05.2018

Text normalization with convolutional neural networks

verfasst von: Sevinj Yolchuyeva, Géza Németh, Bálint Gyires-Tóth

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Text normalization is a critical step in the variety of tasks involving speech and language technologies. It is one of the vital components of natural language processing, text-to-speech synthesis and automatic speech recognition. Convolutional neural networks (CNNs) have proven their superior performance to recurrent architectures in various application scenarios, like neural machine translation, however their ability in text normalization was not exploited yet. In this paper we investigate and propose a novel CNNs based text normalization method. Training, inference times, accuracy, precision, recall, and F1-score were evaluated on an open-source dataset. The performance of CNNs is evaluated and compared with a variety of different long short-term memory (LSTM) and Bi-LSTM architectures with the same dataset.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Allauzen, C., Riley, M., & Roark, B. (2016). Distributed representation and estimation of WFST-based N-Gram models. In Proceeding of ACL workshop on statistical NLP and weighted automata (pp 32–41). Allauzen, C., Riley, M., & Roark, B. (2016). Distributed representation and estimation of WFST-based N-Gram models. In Proceeding of ACL workshop on statistical NLP and weighted automata (pp 32–41).
Zurück zum Zitat Allen, J., Hunnicutt, M. S., & Klatt, D. (1987). From text to speech—The MITalk system. Cambridge: MIT press. Allen, J., Hunnicutt, M. S., & Klatt, D. (1987). From text to speech—The MITalk system. Cambridge: MIT press.
Zurück zum Zitat Arik, S. O., Chrzanowski, M., Coates, A., Diamos, G., Gibiansky, A., Kang, Y., et al. (2017). Deep voice: real-time neural text-to-speech. In Proceedings of the 34th international conference on machine learning (pp 195–204). Arik, S. O., Chrzanowski, M., Coates, A., Diamos, G., Gibiansky, A., Kang, Y., et al. (2017). Deep voice: real-time neural text-to-speech. In Proceedings of the 34th international conference on machine learning (pp 195–204).
Zurück zum Zitat Astudillo, R. F., Amir, S., Lin, W., Silva, M., & Trancoso, I. (2015). Learning word representations from scarce and noisy data with embedding subspaces. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (pp 1074–1084). Astudillo, R. F., Amir, S., Lin, W., Silva, M., & Trancoso, I. (2015). Learning word representations from scarce and noisy data with embedding subspaces. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (pp 1074–1084).
Zurück zum Zitat Aw, A, Zhang, M., Xiao, J., & Su, J. (2006). A phrase-based statistical model for SMS text normalization. In Proceedings of the COLING/ACL on Main Conference Poster Sessions (pp 33–40). Aw, A, Zhang, M., Xiao, J., & Su, J. (2006). A phrase-based statistical model for SMS text normalization. In Proceedings of the COLING/ACL on Main Conference Poster Sessions (pp 33–40).
Zurück zum Zitat Baldwin, T., Road, H., & Jose, S. (2015). An in-depth analysis of the effect of text normalization in social media. In Proceedings of the main conference, HLT-NAACL 2015—Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (pp 420–429) Baldwin, T., Road, H., & Jose, S. (2015). An in-depth analysis of the effect of text normalization in social media. In Proceedings of the main conference, HLT-NAACL 2015—Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (pp 420–429)
Zurück zum Zitat Bigi, B. (2014). A multilingual text normalization approach. Lecture notes in computer science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) pp. 515–526. Bigi, B. (2014). A multilingual text normalization approach. Lecture notes in computer science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) pp. 515–526.
Zurück zum Zitat Conneau, A., Schwenk, H., Barrault, L., & Lecun, Y. (2016). Very deep convolutional networks for text classification. KI - Künstliche Intelligenz, 26(4), 357–363. Conneau, A., Schwenk, H., Barrault, L., & Lecun, Y. (2016). Very deep convolutional networks for text classification. KI - Künstliche Intelligenz, 26(4), 357–363.
Zurück zum Zitat Cook, P., & Stevenson, S. (2009). An Unsupervised Model for Text Message Normalization. In Proceedings of the workshop on computational approaches to linguistic creativity (pp 71–78). Cook, P., & Stevenson, S. (2009). An Unsupervised Model for Text Message Normalization. In Proceedings of the workshop on computational approaches to linguistic creativity (pp 71–78).
Zurück zum Zitat Daiber, J., & Van Der Goot, R. (2016). The denoised web treebank: Evaluating dependency parsing under noisy input conditions. In Proceedings of the tenth international conference on language resources and evaluation (LREC 2016) (pp 649–653). Daiber, J., & Van Der Goot, R. (2016). The denoised web treebank: Evaluating dependency parsing under noisy input conditions. In Proceedings of the tenth international conference on language resources and evaluation (LREC 2016) (pp 649–653).
Zurück zum Zitat El-Desoky, M., & Schuller, B. (2016). Deep bidirectional long short-term memory recurrent neural networks for grapheme-to-phoneme conversion utilizing complex many-to-many alignments. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp 2836–2840). El-Desoky, M., & Schuller, B. (2016). Deep bidirectional long short-term memory recurrent neural networks for grapheme-to-phoneme conversion utilizing complex many-to-many alignments. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp 2836–2840).
Zurück zum Zitat Elman, J. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(2), 195–225. Elman, J. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(2), 195–225.
Zurück zum Zitat Gehring, J., Auli, M., Grangier, D. & Dauphin, Y. N. (2016). A convolutional encoder model for neural machine translation. In Proceedings of the 55th annual meeting of the association for computational linguistics (pp 123–135). Gehring, J., Auli, M., Grangier, D. & Dauphin, Y. N. (2016). A convolutional encoder model for neural machine translation. In Proceedings of the 55th annual meeting of the association for computational linguistics (pp 123–135).
Zurück zum Zitat Gehring, J., Auli, M., Grangier, D. & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. arXiv preprint arXiv: 1705.03122. Gehring, J., Auli, M., Grangier, D. & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. arXiv preprint arXiv: 1705.03122.
Zurück zum Zitat Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345–420.MathSciNetCrossRefMATH Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345–420.MathSciNetCrossRefMATH
Zurück zum Zitat Graves, A., Fernandez, S., Gomez, F. & Schmidhuber, J. (2006). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on machine learning (pp 369–376). Graves, A., Fernandez, S., Gomez, F. & Schmidhuber, J. (2006). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on machine learning (pp 369–376).
Zurück zum Zitat Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, L., Wang, G., Cai, J., & Chen, T. (2015). Recent advances in convolutional neural networks. arXiv preprint arXiv:1512.07108. Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, L., Wang, G., Cai, J., & Chen, T. (2015). Recent advances in convolutional neural networks. arXiv preprint arXiv:1512.07108.
Zurück zum Zitat Greff, K., Rupesh, K., & Srivastava, J. S. (2017). Highway and residual networks learn unrolled iterative estimation. In 5th international conference on learning representations. Greff, K., Rupesh, K., & Srivastava, J. S. (2017). Highway and residual networks learn unrolled iterative estimation. In 5th international conference on learning representations.
Zurück zum Zitat Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning. Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning.
Zurück zum Zitat Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR). Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR).
Zurück zum Zitat Kobus, C., Yvon, F., & Damnati, G. (2008). Normalizing SMS: Are two metaphors better than one? In Proceedings of the 22nd international conference on computational linguistics association for computational linguistics (pp 441–448). Kobus, C., Yvon, F., & Damnati, G. (2008). Normalizing SMS: Are two metaphors better than one? In Proceedings of the 22nd international conference on computational linguistics association for computational linguistics (pp 441–448).
Zurück zum Zitat Kumar, V., & Sridhar, R. (2015). Unsupervised text normalization using distributed representations of words and phrases. In Proceedings of NAACL-HLT 2015 (pp 8–16). Kumar, V., & Sridhar, R. (2015). Unsupervised text normalization using distributed representations of words and phrases. In Proceedings of NAACL-HLT 2015 (pp 8–16).
Zurück zum Zitat Lu, L., Zhang, X., & Renais, S. (2016). On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (pp 5060–5064). Lu, L., Zhang, X., & Renais, S. (2016). On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (pp 5060–5064).
Zurück zum Zitat Mikolov, T., Corrado, G., Chen, K., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of the international conference on learning representations (ICLR 2013) (pp 1–12). Mikolov, T., Corrado, G., Chen, K., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of the international conference on learning representations (ICLR 2013) (pp 1–12).
Zurück zum Zitat Nair, V. & Hinton., G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings 27th international conference on machine learning (pp. 807–814). Nair, V. & Hinton., G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings 27th international conference on machine learning (pp. 807–814).
Zurück zum Zitat Pundak, G., & Sainath, T. N. (2017). Highway-LSTM and recurrent highway networks for speech recognition. Interspeech 2017, 1303–1307.CrossRef Pundak, G., & Sainath, T. N. (2017). Highway-LSTM and recurrent highway networks for speech recognition. Interspeech 2017, 1303–1307.CrossRef
Zurück zum Zitat Roark, B., et al. (2012). The OpenGrm open-source finite-state grammar software libraries. In Proceedings of the ACL 2012 System Demonstrations (pp 61–66). Roark, B., et al. (2012). The OpenGrm open-source finite-state grammar software libraries. In Proceedings of the ACL 2012 System Demonstrations (pp 61–66).
Zurück zum Zitat Shang, W., & Chiu, J. (2017). Exploring normalization in deep residual networks with concatenated rectified linear units batch normalization in ResNets. Proceedings of the 31th Conference on Artificial Intelligence (AAAI 2017) 1, 1509–16. Shang, W., & Chiu, J. (2017). Exploring normalization in deep residual networks with concatenated rectified linear units batch normalization in ResNets. Proceedings of the 31th Conference on Artificial Intelligence (AAAI 2017) 1, 1509–16.
Zurück zum Zitat Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing and Management 45, 427–437.CrossRef Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing and Management 45, 427–437.CrossRef
Zurück zum Zitat Sonmez, C., Ozgur, A., & Ozg, A. (2014). A graph-based approach for contextual text normalization. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (Iv) (pp 313–324). Sonmez, C., Ozgur, A., & Ozg, A. (2014). A graph-based approach for contextual text normalization. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (Iv) (pp 313–324).
Zurück zum Zitat Sproat, R., et al. (2001). Normalization of non-standard words. Computer Speech & Language, 15(3), 287–333.CrossRef Sproat, R., et al. (2001). Normalization of non-standard words. Computer Speech & Language, 15(3), 287–333.CrossRef
Zurück zum Zitat Sproat, R., & Jaitly, N. (2016). RNN Approaches to text normalization: a challenge. arXiv: preprint arXiv: 1611.00068. Sproat, R., & Jaitly, N. (2016). RNN Approaches to text normalization: a challenge. arXiv: preprint arXiv: 1611.00068.
Zurück zum Zitat Sproat, R., & Hall, K. (2014). Applications of maximum entropy rankers to problems in spoken language processing. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH (pp 761–64). Sproat, R., & Hall, K. (2014). Applications of maximum entropy rankers to problems in spoken language processing. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH (pp 761–64).
Zurück zum Zitat Sridhar, R., & Kumar, V. (2015). Unsupervised topic modeling for short texts using distributed representations of words. In Proceedings of the 1st workshop on vector space modeling for natural language processing (pp 192–200). Sridhar, R., & Kumar, V. (2015). Unsupervised topic modeling for short texts using distributed representations of words. In Proceedings of the 1st workshop on vector space modeling for natural language processing (pp 192–200).
Zurück zum Zitat Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Training very deep networks training very deep networks. In NIPS’15 Proceedings of the 28th international conference on neural information processing systems (pp 2377–2385). Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Training very deep networks training very deep networks. In NIPS’15 Proceedings of the 28th international conference on neural information processing systems (pp 2377–2385).
Zurück zum Zitat Sun, R., & Lee Giles, C. (2001) Sequence learning: from recognition and prediction to sequential decision making. IEEE Intelligent Systems, 16(4), 67–70.CrossRef Sun, R., & Lee Giles, C. (2001) Sequence learning: from recognition and prediction to sequential decision making. IEEE Intelligent Systems, 16(4), 67–70.CrossRef
Zurück zum Zitat Sundermeyer, M., Alkhouli, T., Wuebker, J., & Ney, H. (2014). Translation modeling with bidirectional recurrent neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp 14–25). Sundermeyer, M., Alkhouli, T., Wuebker, J., & Ney, H. (2014). Translation modeling with bidirectional recurrent neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp 14–25).
Zurück zum Zitat Sutskever, I., Vinyals, O., & Quoc, V. L. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (NIPS) (pp 3104–3112). Sutskever, I., Vinyals, O., & Quoc, V. L. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (NIPS) (pp 3104–3112).
Zurück zum Zitat Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp 1–9). Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp 1–9).
Zurück zum Zitat The Theano Development. (2016). A Python framework for fast computation of mathematical expressions, arXiv preprint arXiv:1605.02688. The Theano Development. (2016). A Python framework for fast computation of mathematical expressions, arXiv preprint arXiv:1605.02688.
Zurück zum Zitat Weber, D., & Zhekova, D. (2016). TweetNorm: text normalization on Italian Twitter Data. In Proceedings of the 13th Conference on Natural Language Processing (pp 306–312). Weber, D., & Zhekova, D. (2016). TweetNorm: text normalization on Italian Twitter Data. In Proceedings of the 13th Conference on Natural Language Processing (pp 306–312).
Zurück zum Zitat Young, T., Hazarika, D., Poria, S., & Cambria, E. (2017). Recent trends in deep learning based natural language processing. arXiv: preprint arXiv:1708.02709v4. Young, T., Hazarika, D., Poria, S., & Cambria, E. (2017). Recent trends in deep learning based natural language processing. arXiv: preprint arXiv:1708.02709v4.
Zurück zum Zitat Zhang, C., Baldwin, T., Kimelfeld, B., & Li, Y. (2013). Adaptive parser-centric text normalization. In Proceedings of the 34th international conference on machine learning, Sydney. Zhang, C., Baldwin, T., Kimelfeld, B., & Li, Y. (2013). Adaptive parser-centric text normalization. In Proceedings of the 34th international conference on machine learning, Sydney.
Zurück zum Zitat Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. In Advances in neural information processing systems (pp. 1–9). Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. In Advances in neural information processing systems (pp. 1–9).
Zurück zum Zitat Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., & Xu, B. (2016). Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers (pp 3485–3495). Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., & Xu, B. (2016). Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers (pp 3485–3495).
Zurück zum Zitat Zilly, J. G., Srivastava, R. K., Jan, K., & Schmidhuber, J. (2017). Recurrent highway networks. In Proceedings of the 34th international conference on machine learning, Sydney. Zilly, J. G., Srivastava, R. K., Jan, K., & Schmidhuber, J. (2017). Recurrent highway networks. In Proceedings of the 34th international conference on machine learning, Sydney.
Metadaten
Titel
Text normalization with convolutional neural networks
verfasst von
Sevinj Yolchuyeva
Géza Németh
Bálint Gyires-Tóth
Publikationsdatum
30.05.2018
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-9521-x

Weitere Artikel der Ausgabe 3/2018

International Journal of Speech Technology 3/2018 Zur Ausgabe

Neuer Inhalt