nach oben

International Journal of Speech Technology

Erschienen in:

30.05.2018

Text normalization with convolutional neural networks

verfasst von: Sevinj Yolchuyeva, Géza Németh, Bálint Gyires-Tóth

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Text normalization is a critical step in the variety of tasks involving speech and language technologies. It is one of the vital components of natural language processing, text-to-speech synthesis and automatic speech recognition. Convolutional neural networks (CNNs) have proven their superior performance to recurrent architectures in various application scenarios, like neural machine translation, however their ability in text normalization was not exploited yet. In this paper we investigate and propose a novel CNNs based text normalization method. Training, inference times, accuracy, precision, recall, and F1-score were evaluated on an open-source dataset. The performance of CNNs is evaluated and compared with a variety of different long short-term memory (LSTM) and Bi-LSTM architectures with the same dataset.

Vorheriger Artikel Speech analysis and synthesis with a refined adaptive sinusoidal representation

Nächster Artikel Distant speech processing for smart home: comparison of ASR approaches in scattered microphone network for voice command

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://www.kaggle.com/c/text-normalization-challenge-english-language/data. Accessed December 2017.

Allauzen, C., Riley, M., & Roark, B. (2016). Distributed representation and estimation of WFST-based N-Gram models. In Proceeding of ACL workshop on statistical NLP and weighted automata (pp 32–41).

Allen, J., Hunnicutt, M. S., & Klatt, D. (1987). From text to speech—The MITalk system. Cambridge: MIT press.

Arik, S. O., Chrzanowski, M., Coates, A., Diamos, G., Gibiansky, A., Kang, Y., et al. (2017). Deep voice: real-time neural text-to-speech. In Proceedings of the 34th international conference on machine learning (pp 195–204).

Astudillo, R. F., Amir, S., Lin, W., Silva, M., & Trancoso, I. (2015). Learning word representations from scarce and noisy data with embedding subspaces. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (pp 1074–1084).

Aw, A, Zhang, M., Xiao, J., & Su, J. (2006). A phrase-based statistical model for SMS text normalization. In Proceedings of the COLING/ACL on Main Conference Poster Sessions (pp 33–40).

Baldwin, T., Road, H., & Jose, S. (2015). An in-depth analysis of the effect of text normalization in social media. In Proceedings of the main conference, HLT-NAACL 2015—Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (pp 420–429)

Bigi, B. (2014). A multilingual text normalization approach. Lecture notes in computer science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) pp. 515–526.

Chollet, F. (2016). Keras: Theano-based deep learning library. Code: https://github.com/fchollet. Documentation: http://keras.io.

Conneau, A., Schwenk, H., Barrault, L., & Lecun, Y. (2016). Very deep convolutional networks for text classification. KI - Künstliche Intelligenz, 26(4), 357–363.

Cook, P., & Stevenson, S. (2009). An Unsupervised Model for Text Message Normalization. In Proceedings of the workshop on computational approaches to linguistic creativity (pp 71–78).

Daiber, J., & Van Der Goot, R. (2016). The denoised web treebank: Evaluating dependency parsing under noisy input conditions. In Proceedings of the tenth international conference on language resources and evaluation (LREC 2016) (pp 649–653).

El-Desoky, M., & Schuller, B. (2016). Deep bidirectional long short-term memory recurrent neural networks for grapheme-to-phoneme conversion utilizing complex many-to-many alignments. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp 2836–2840).

Elman, J. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(2), 195–225.

Gehring, J., Auli, M., Grangier, D. & Dauphin, Y. N. (2016). A convolutional encoder model for neural machine translation. In Proceedings of the 55th annual meeting of the association for computational linguistics (pp 123–135).

Gehring, J., Auli, M., Grangier, D. & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. arXiv preprint arXiv: 1705.03122.

Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345–420.MathSciNetCrossRefMATH

Graves, A., Fernandez, S., Gomez, F. & Schmidhuber, J. (2006). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on machine learning (pp 369–376).

Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, L., Wang, G., Cai, J., & Chen, T. (2015). Recent advances in convolutional neural networks. arXiv preprint arXiv:1512.07108.

Greff, K., Rupesh, K., & Srivastava, J. S. (2017). Highway and residual networks learn unrolled iterative estimation. In 5th international conference on learning representations.

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning.

Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR).

Kobus, C., Yvon, F., & Damnati, G. (2008). Normalizing SMS: Are two metaphors better than one? In Proceedings of the 22nd international conference on computational linguistics association for computational linguistics (pp 441–448).

Kumar, V., & Sridhar, R. (2015). Unsupervised text normalization using distributed representations of words and phrases. In Proceedings of NAACL-HLT 2015 (pp 8–16).

Lu, L., Zhang, X., & Renais, S. (2016). On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (pp 5060–5064).

Mikolov, T., Corrado, G., Chen, K., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of the international conference on learning representations (ICLR 2013) (pp 1–12).

Nair, V. & Hinton., G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings 27th international conference on machine learning (pp. 807–814).

Pundak, G., & Sainath, T. N. (2017). Highway-LSTM and recurrent highway networks for speech recognition. Interspeech 2017, 1303–1307.CrossRef

Roark, B., et al. (2012). The OpenGrm open-source finite-state grammar software libraries. In Proceedings of the ACL 2012 System Demonstrations (pp 61–66).

Shang, W., & Chiu, J. (2017). Exploring normalization in deep residual networks with concatenated rectified linear units batch normalization in ResNets. Proceedings of the 31th Conference on Artificial Intelligence (AAAI 2017) 1, 1509–16.

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing and Management 45, 427–437.CrossRef

Sonmez, C., Ozgur, A., & Ozg, A. (2014). A graph-based approach for contextual text normalization. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (Iv) (pp 313–324).

Sproat, R., et al. (2001). Normalization of non-standard words. Computer Speech & Language, 15(3), 287–333.CrossRef

Sproat, R., & Jaitly, N. (2016). RNN Approaches to text normalization: a challenge. arXiv: preprint arXiv: 1611.00068.

Sproat, R., & Hall, K. (2014). Applications of maximum entropy rankers to problems in spoken language processing. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH (pp 761–64).

Sridhar, R., & Kumar, V. (2015). Unsupervised topic modeling for short texts using distributed representations of words. In Proceedings of the 1st workshop on vector space modeling for natural language processing (pp 192–200).

Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Training very deep networks training very deep networks. In NIPS’15 Proceedings of the 28th international conference on neural information processing systems (pp 2377–2385).

Sun, R., & Lee Giles, C. (2001) Sequence learning: from recognition and prediction to sequential decision making. IEEE Intelligent Systems, 16(4), 67–70.CrossRef

Sundermeyer, M., Alkhouli, T., Wuebker, J., & Ney, H. (2014). Translation modeling with bidirectional recurrent neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp 14–25).

Sutskever, I., Vinyals, O., & Quoc, V. L. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (NIPS) (pp 3104–3112).

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp 1–9).

The Theano Development. (2016). A Python framework for fast computation of mathematical expressions, arXiv preprint arXiv:1605.02688.

Weber, D., & Zhekova, D. (2016). TweetNorm: text normalization on Italian Twitter Data. In Proceedings of the 13th Conference on Natural Language Processing (pp 306–312).

Young, T., Hazarika, D., Poria, S., & Cambria, E. (2017). Recent trends in deep learning based natural language processing. arXiv: preprint arXiv:1708.02709v4.

Zhang, C., Baldwin, T., Kimelfeld, B., & Li, Y. (2013). Adaptive parser-centric text normalization. In Proceedings of the 34th international conference on machine learning, Sydney.

Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. In Advances in neural information processing systems (pp. 1–9).

Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., & Xu, B. (2016). Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers (pp 3485–3495).

Zilly, J. G., Srivastava, R. K., Jan, K., & Schmidhuber, J. (2017). Recurrent highway networks. In Proceedings of the 34th international conference on machine learning, Sydney.

Titel: Text normalization with convolutional neural networks
verfasst von: Sevinj Yolchuyeva
Géza Németh
Bálint Gyires-Tóth
Publikationsdatum: 30.05.2018
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 3/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-018-9521-x

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2018

Research on English pronunciation training based on intelligent speech recognition

Speech analysis and synthesis with a refined adaptive sinusoidal representation

Distant speech processing for smart home: comparison of ASR approaches in scattered microphone network for voice command

Gain optimization for millimeter wave reflectarray antennas based on a phase gradient approach

Improvement of phone recognition accuracy using speech mode classification

Language identification using phase information

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.