Skip to main content
Top
Published in:
Cover of the book

2018 | OriginalPaper | Chapter

Overview of Character-Based Models for Natural Language Processing

Authors : Heike Adel, Ehsaneddin Asgari, Hinrich Schütze

Published in: Computational Linguistics and Intelligent Text Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Character-based models become more and more popular for different natural language processing task, especially due to the success of neural networks. They provide the possibility of directly model text sequences without the need of tokenization and, therefore, enhance the traditional preprocessing pipeline. This paper provides an overview of character-based models for a variety of natural language processing tasks. We group existing work in three categories: tokenization-based approaches, bag-of-n-gram models and end-to-end models. For each category, we present prominent examples of studies with a particular focus on recent character-based deep learning work.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
There are also difficult cases in English, such as “Yahoo!” or “San Francisco-Los Angeles flights”.
 
2
In our view, morpheme-based models are not true instances of character-level models as linguistically motivated morphological segmentation is an equivalent step to tokenization, but on a different level. We therefore do not cover most work on morphological segmentation in this paper.
 
Literature
1.
go back to reference Alex, B.: An unsupervised system for identifying english inclusions in german text. In: Annual Meeting of the Association for Computational Linguistics (2005) Alex, B.: An unsupervised system for identifying english inclusions in german text. In: Annual Meeting of the Association for Computational Linguistics (2005)
2.
go back to reference Andor, D., et al.: Globally normalized transition-based neural networks. In: Annual Meeting of the Association for Computational Linguistics (2016) Andor, D., et al.: Globally normalized transition-based neural networks. In: Annual Meeting of the Association for Computational Linguistics (2016)
3.
go back to reference Asgari, E., Mofrad, M.R.K.: Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10(11), 1–15 (2015)CrossRef Asgari, E., Mofrad, M.R.K.: Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10(11), 1–15 (2015)CrossRef
4.
go back to reference Asgari, E., Mofrad, M.R.K.: Comparing fifty natural languages and twelve genetic languages using word embedding language divergence (WELD) as a quantitative measure of language distance. In: Workshop on Multilingual and Cross-lingual Methods in NLP, pp. 65–74 (2016) Asgari, E., Mofrad, M.R.K.: Comparing fifty natural languages and twelve genetic languages using word embedding language divergence (WELD) as a quantitative measure of language distance. In: Workshop on Multilingual and Cross-lingual Methods in NLP, pp. 65–74 (2016)
5.
go back to reference Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4945–4949 (2016) Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4945–4949 (2016)
6.
go back to reference Baldwin, T., Lui, M.: Language identification: the long and the short of the matter. In: Conference of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies, pp. 229–237 (2010) Baldwin, T., Lui, M.: Language identification: the long and the short of the matter. In: Conference of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies, pp. 229–237 (2010)
7.
go back to reference Ballesteros, M., Dyer, C., Smith, N.A.: Improved transition-based parsing by modeling characters instead of words with LSTMS. In: Conference on Empirical Methods in Natural Language Processing (2015) Ballesteros, M., Dyer, C., Smith, N.A.: Improved transition-based parsing by modeling characters instead of words with LSTMS. In: Conference on Empirical Methods in Natural Language Processing (2015)
8.
go back to reference Bilmes, J., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Conference of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies (2003) Bilmes, J., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Conference of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies (2003)
9.
go back to reference Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)CrossRef Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)CrossRef
10.
go back to reference Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics (2017) Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics (2017)
11.
go back to reference Bojanowski, P., Joulin, A., Mikolov, T.: Alternative structures for character-level RNNS. In: Workshop at International Conference on Learning Representations (2016) Bojanowski, P., Joulin, A., Mikolov, T.: Alternative structures for character-level RNNS. In: Workshop at International Conference on Learning Representations (2016)
12.
go back to reference Botha, J.A., Blunsom, P.: Compositional morphology for word representations and language modelling. In: International Conference on Machine Learning (2014) Botha, J.A., Blunsom, P.: Compositional morphology for word representations and language modelling. In: International Conference on Machine Learning (2014)
13.
go back to reference Cao, K., Rei, M.: A joint model for word embedding and word morphology. In: Annual Meeting of the Association for Computational Linguistics, pp. 18–26 (2016) Cao, K., Rei, M.: A joint model for word embedding and word morphology. In: Annual Meeting of the Association for Computational Linguistics, pp. 18–26 (2016)
14.
go back to reference Cavnar, W.: Using an n-gram-based document representation with a vector processing retrieval model. NIST SPECIAL PUBLICATION SP, pp. 269–269 (1995) Cavnar, W.: Using an n-gram-based document representation with a vector processing retrieval model. NIST SPECIAL PUBLICATION SP, pp. 269–269 (1995)
15.
go back to reference Chan, W., Jaitly, N., Le, Q.V., Vinyals, O.: Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4960–4964 (2016) Chan, W., Jaitly, N., Le, Q.V., Vinyals, O.: Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4960–4964 (2016)
16.
go back to reference Chen, A., He, J., Xu, L., Gey, F.C., Meggs, J.: Chinese text retrieval without using a dictionary. ACM SIGIR Forum 31(SI), 42–49 (1997)CrossRef Chen, A., He, J., Xu, L., Gey, F.C., Meggs, J.: Chinese text retrieval without using a dictionary. ACM SIGIR Forum 31(SI), 42–49 (1997)CrossRef
17.
go back to reference Chen, X., Xu, L., Liu, Z., Sun, M., Luan, H.: Joint learning of character and word embeddings. In: International Joint Conference on Artificial Intelligence, pp. 1236–1242 (2015) Chen, X., Xu, L., Liu, Z., Sun, M., Luan, H.: Joint learning of character and word embeddings. In: International Joint Conference on Artificial Intelligence, pp. 1236–1242 (2015)
18.
go back to reference Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNS. Trans. Assoc. Comput. Linguist. 4, 357–370 (2016) Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNS. Trans. Assoc. Comput. Linguist. 4, 357–370 (2016)
19.
go back to reference Chung, J., Ahn, S., Bengio, Y.: Hierarchical multiscale recurrent neural networks. In: Proceedings of International Conference on Learning Representations (2017) Chung, J., Ahn, S., Bengio, Y.: Hierarchical multiscale recurrent neural networks. In: Proceedings of International Conference on Learning Representations (2017)
20.
go back to reference Chung, J., Cho, K., Bengio, Y.: A character-level decoder without explicit segmentation for neural machine translation. In: Annual Meeting of the Association for Computational Linguistics (2016) Chung, J., Cho, K., Bengio, Y.: A character-level decoder without explicit segmentation for neural machine translation. In: Annual Meeting of the Association for Computational Linguistics (2016)
21.
go back to reference Church, K.W.: Char\(\_\)align: a program for aligning parallel texts at the character level. In: Annual Meeting of the Association for Computational Linguistics, pp. 1–8 (1993) Church, K.W.: Char\(\_\)align: a program for aligning parallel texts at the character level. In: Annual Meeting of the Association for Computational Linguistics, pp. 1–8 (1993)
22.
go back to reference Clark, A.: Combining distributional and morphological information for part of speech induction. In: Conference of the European Chapter of the Association for Computational Linguistics, pp. 59–66 (2003) Clark, A.: Combining distributional and morphological information for part of speech induction. In: Conference of the European Chapter of the Association for Computational Linguistics, pp. 59–66 (2003)
23.
go back to reference Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)MATH Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)MATH
24.
go back to reference Costa-Jussà, M.R., Fonollosa, J.A.R.: Character-based neural machine translation. In: Annual Meeting of the Association for Computational Linguistics (2016) Costa-Jussà, M.R., Fonollosa, J.A.R.: Character-based neural machine translation. In: Annual Meeting of the Association for Computational Linguistics (2016)
25.
go back to reference Cotterell, R., Vieira, T., Schütze, H.: A joint model of orthography and morphological segmentation. In: Conference of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies (2016) Cotterell, R., Vieira, T., Schütze, H.: A joint model of orthography and morphological segmentation. In: Conference of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies (2016)
26.
go back to reference Damashek, M.: Gauging similarity with n-grams: language-independent categorization of text. Science 267, 843–848 (1995)CrossRef Damashek, M.: Gauging similarity with n-grams: language-independent categorization of text. Science 267, 843–848 (1995)CrossRef
27.
go back to reference De Heer, T.: Experiments with syntactic traces in information retrieval. Inf. Storage Retr. 10(3–4), 133–144 (1974)CrossRef De Heer, T.: Experiments with syntactic traces in information retrieval. Inf. Storage Retr. 10(3–4), 133–144 (1974)CrossRef
28.
go back to reference Dunning, T.: Statistical identification of language. Technical Report MCCS 940–273, Computing Research Laboratory, New Mexico State (1994) Dunning, T.: Statistical identification of language. Technical Report MCCS 940–273, Computing Research Laboratory, New Mexico State (1994)
29.
go back to reference Eyben, F., Wöllmer, M., Schuller, B.W., Graves, A.: From speech to letters - using a novel neural network architecture for grapheme based ASR. In: IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), pp. 376–380 (2009) Eyben, F., Wöllmer, M., Schuller, B.W., Graves, A.: From speech to letters - using a novel neural network architecture for grapheme based ASR. In: IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), pp. 376–380 (2009)
30.
go back to reference Eyecioglu, A., Keller, B.: ASOBEK at SemEval-2016 task 1: sentence representation with character n-gram embeddings for semantic textual similarity. In: SemEval-2016: The 10th International Workshop on Semantic Evaluation, pp. 1320–1324 (2016) Eyecioglu, A., Keller, B.: ASOBEK at SemEval-2016 task 1: sentence representation with character n-gram embeddings for semantic textual similarity. In: SemEval-2016: The 10th International Workshop on Semantic Evaluation, pp. 1320–1324 (2016)
31.
go back to reference Faruqui, M., Tsvetkov, Y., Neubig, G., Dyer, C.: Morphological inflection generation using character sequence to sequence learning. In: Conference of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies (2016) Faruqui, M., Tsvetkov, Y., Neubig, G., Dyer, C.: Morphological inflection generation using character sequence to sequence learning. In: Conference of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies (2016)
32.
go back to reference Gerdjikov, S., Schulz, K.U.: Corpus analysis without prior linguistic knowledge-unsupervised mining of phrases and subphrase structure. CoRR abs/1602.05772 (2016) Gerdjikov, S., Schulz, K.U.: Corpus analysis without prior linguistic knowledge-unsupervised mining of phrases and subphrase structure. CoRR abs/1602.05772 (2016)
33.
go back to reference Gillick, D., Brunk, C., Vinyals, O., Subramanya, A.: Multilingual language processing from bytes. In: North American Chapter of the Association for Computational Linguistics, pp. 1296–1306, June 2016 Gillick, D., Brunk, C., Vinyals, O., Subramanya, A.: Multilingual language processing from bytes. In: North American Chapter of the Association for Computational Linguistics, pp. 1296–1306, June 2016
34.
go back to reference Golub, D., He, X.: Character-level question answering with attention. In: Conference on Empirical Methods in Natural Language Processing (2016) Golub, D., He, X.: Character-level question answering with attention. In: Conference on Empirical Methods in Natural Language Processing (2016)
35.
go back to reference Graves, A.: Generating sequences with recurrent neural networks. CoRR abs/1308.0850 (2013) Graves, A.: Generating sequences with recurrent neural networks. CoRR abs/1308.0850 (2013)
36.
go back to reference Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: International Conference on Machine Learning, pp. 1764–1772 (2014) Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: International Conference on Machine Learning, pp. 1764–1772 (2014)
37.
go back to reference Haizhou, L., Min, Z., Jian, S.: A joint source-channel model for machine transliteration. In: Annual Meeting of the Association for Computational Linguistics, p. 159 (2004) Haizhou, L., Min, Z., Jian, S.: A joint source-channel model for machine transliteration. In: Annual Meeting of the Association for Computational Linguistics, p. 159 (2004)
38.
go back to reference Hardmeier, C.: A neural model for part-of-speech tagging in historical texts. In: International Conference on Computational Linguistics, pp. 922–931 (2016) Hardmeier, C.: A neural model for part-of-speech tagging in historical texts. In: International Conference on Computational Linguistics, pp. 922–931 (2016)
39.
go back to reference Hirsimäki, T., Creutz, M., Siivola, V., Kurimo, M., Virpioja, S., Pylkkönen, J.: Unlimited vocabulary speech recognition with morph language models applied to finnish. Comput. Speech Lang. 20(4), 515–541 (2006)CrossRef Hirsimäki, T., Creutz, M., Siivola, V., Kurimo, M., Virpioja, S., Pylkkönen, J.: Unlimited vocabulary speech recognition with morph language models applied to finnish. Comput. Speech Lang. 20(4), 515–541 (2006)CrossRef
40.
go back to reference Ircing, P., et al.: On large vocabulary continuous speech recognition of highly inflectional language-czech. In: Proceedings of the 7th European Conference on Speech Communication and Technology, vol. 1, pp. 487–490. ISCA: International Speech Communication Association (2001) Ircing, P., et al.: On large vocabulary continuous speech recognition of highly inflectional language-czech. In: Proceedings of the 7th European Conference on Speech Communication and Technology, vol. 1, pp. 487–490. ISCA: International Speech Communication Association (2001)
41.
go back to reference Jaech, A., Mulcaire, G., Hathi, S., Ostendorf, M., Smith, N.A.: Hierarchical character-word models for language identification. In: Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media, pp. 84–93 (2016) Jaech, A., Mulcaire, G., Hathi, S., Ostendorf, M., Smith, N.A.: Hierarchical character-word models for language identification. In: Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media, pp. 84–93 (2016)
42.
go back to reference Kalchbrenner, N., Espeholt, L., Simonyan, K., van den Oord, A., Graves, A., Kavukcuoglu, K.: Neural machine translation in linear time. CoRR abs/1610.10099 (2016) Kalchbrenner, N., Espeholt, L., Simonyan, K., van den Oord, A., Graves, A., Kavukcuoglu, K.: Neural machine translation in linear time. CoRR abs/1610.10099 (2016)
43.
go back to reference Kann, K., Cotterell, R., Schütze, H.: Neural morphological analysis: encoding-decoding canonical segments. In: Conference on Empirical Methods in Natural Language Processing (2016) Kann, K., Cotterell, R., Schütze, H.: Neural morphological analysis: encoding-decoding canonical segments. In: Conference on Empirical Methods in Natural Language Processing (2016)
44.
go back to reference Kann, K., Schütze, H.: MED: The LMU system for the SIGMORPHON 2016 shared task on morphological reinflection. In: SIGMORPHON Workshop (2016) Kann, K., Schütze, H.: MED: The LMU system for the SIGMORPHON 2016 shared task on morphological reinflection. In: SIGMORPHON Workshop (2016)
45.
go back to reference Kann, K., Schütze, H.: Single-model encoder-decoder with explicit morphological representation for reinflection. In: Annual Meeting of the Association for Computational Linguistics (2016) Kann, K., Schütze, H.: Single-model encoder-decoder with explicit morphological representation for reinflection. In: Annual Meeting of the Association for Computational Linguistics (2016)
46.
go back to reference Kaplan, R.M., Kay, M.: Regular models of phonological rule systems. Comput. Linguist. 20(3), 331–378 (1994) Kaplan, R.M., Kay, M.: Regular models of phonological rule systems. Comput. Linguist. 20(3), 331–378 (1994)
47.
go back to reference Kettunen, K., McNamee, P., Baskaya, F.: Using syllables as indexing terms in full-text information retrieval. In: Human Language Technologies - The Baltic Perspective - Proceedings of the Fourth International Conference Baltic HLT 2010, Riga, Latvia, October 7–8, 2010, pp. 225–232 (2010) Kettunen, K., McNamee, P., Baskaya, F.: Using syllables as indexing terms in full-text information retrieval. In: Human Language Technologies - The Baltic Perspective - Proceedings of the Fourth International Conference Baltic HLT 2010, Riga, Latvia, October 7–8, 2010, pp. 225–232 (2010)
48.
go back to reference Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: AAAI Conference on Artificial Intelligence, pp. 2741–2749 (2016) Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: AAAI Conference on Artificial Intelligence, pp. 2741–2749 (2016)
49.
go back to reference Kirchhoff, K., Vergyri, D., Bilmes, J., Duh, K., Stolcke, A.: Morphology-based language modeling for conversational arabic speech recognition. Comput. Speech Lang. 20(4), 589–608 (2006)CrossRef Kirchhoff, K., Vergyri, D., Bilmes, J., Duh, K., Stolcke, A.: Morphology-based language modeling for conversational arabic speech recognition. Comput. Speech Lang. 20(4), 589–608 (2006)CrossRef
50.
go back to reference Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named entity recognition with character-level models. In: Computational Natural Language Learning, pp. 180–183 (2003) Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named entity recognition with character-level models. In: Computational Natural Language Learning, pp. 180–183 (2003)
51.
go back to reference Knight, K., Graehl, J.: Machine transliteration. Comput. Linguist. 24(4), 599–612 (1998) Knight, K., Graehl, J.: Machine transliteration. Comput. Linguist. 24(4), 599–612 (1998)
54.
go back to reference Kou, W., Li, F., Baldwin, T.: Automatic labelling of topic models using word vectors and letter trigram vectors. In: Asia Information Retrieval Societies Conference (AIRS), pp. 253–264 (2015) Kou, W., Li, F., Baldwin, T.: Automatic labelling of topic models using word vectors and letter trigram vectors. In: Asia Information Retrieval Societies Conference (AIRS), pp. 253–264 (2015)
55.
go back to reference Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Conference of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies (2016) Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Conference of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies (2016)
56.
go back to reference Lee, J., Cho, K., Hofmann, T.: Fully character-level neural machine translation without explicit segmentation. CoRR abs/1610.03017 (2016) Lee, J., Cho, K., Hofmann, T.: Fully character-level neural machine translation without explicit segmentation. CoRR abs/1610.03017 (2016)
57.
go back to reference Lepage, Y., Denoual, E.: Purest ever example-based machine translation: detailed presentation and assessment. Mach. Transl. 19(3–4), 251–282 (2005) Lepage, Y., Denoual, E.: Purest ever example-based machine translation: detailed presentation and assessment. Mach. Transl. 19(3–4), 251–282 (2005)
58.
go back to reference Ling, W., et al.: Finding function in form: compositional character models for open vocabulary word representation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1520–1530 (2015) Ling, W., et al.: Finding function in form: compositional character models for open vocabulary word representation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1520–1530 (2015)
59.
go back to reference Ling, W., Trancoso, I., Dyer, C., Black, A.W.: Character-based neural machine translation. CoRR abs/1511.04586 (2015) Ling, W., Trancoso, I., Dyer, C., Black, A.W.: Character-based neural machine translation. CoRR abs/1511.04586 (2015)
60.
go back to reference Luong, M., Manning, C.D.: Achieving open vocabulary neural machine translation with hybrid word-character models. In: Annual Meeting of the Association for Computational Linguistics (2016) Luong, M., Manning, C.D.: Achieving open vocabulary neural machine translation with hybrid word-character models. In: Annual Meeting of the Association for Computational Linguistics (2016)
61.
go back to reference Luong, M.T., Socher, R., Manning, C.D.: Better word representations with recursive neural networks for morphology. In: Computational Natural Language Learning (2013) Luong, M.T., Socher, R., Manning, C.D.: Better word representations with recursive neural networks for morphology. In: Computational Natural Language Learning (2013)
62.
go back to reference Ma, X., Hovy, E.H.: End-to-end sequence labeling via bi-directional LSTM-CNNS-CRF. In: Annual Meeting of the Association for Computational Linguistics (2016) Ma, X., Hovy, E.H.: End-to-end sequence labeling via bi-directional LSTM-CNNS-CRF. In: Annual Meeting of the Association for Computational Linguistics (2016)
63.
go back to reference McNamee, P., Mayfield, J.: Character n-gram tokenization for european language text retrieval. Inf. Retr. 7(1–2), 73–97 (2004)CrossRef McNamee, P., Mayfield, J.: Character n-gram tokenization for european language text retrieval. Inf. Retr. 7(1–2), 73–97 (2004)CrossRef
64.
go back to reference Mihalcea, R., Nastase, V.: Letter level learning for language independent diacritics restoration. In: Computational Natural Language Learning (2002) Mihalcea, R., Nastase, V.: Letter level learning for language independent diacritics restoration. In: Computational Natural Language Learning (2002)
65.
go back to reference Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
66.
go back to reference Mikolov, T., Sutskever, I., Deoras, A., Le, H.S., Kombrink, S., Cernocky, J.: Subword language modeling with neural networks (2012) Mikolov, T., Sutskever, I., Deoras, A., Le, H.S., Kombrink, S., Cernocky, J.: Subword language modeling with neural networks (2012)
67.
go back to reference Miyamoto, Y., Cho, K.: Gated word-character recurrent language model. In: Conference on Empirical Methods in Natural Language Processing, pp. 1992–1997 (2016) Miyamoto, Y., Cho, K.: Gated word-character recurrent language model. In: Conference on Empirical Methods in Natural Language Processing, pp. 1992–1997 (2016)
68.
go back to reference Müller, T., Schmid, H., Schütze, H.: Efficient higher-order CRFs for morphological tagging. In: Conference on Empirical Methods in Natural Language Processing, pp. 322–332 (2013) Müller, T., Schmid, H., Schütze, H.: Efficient higher-order CRFs for morphological tagging. In: Conference on Empirical Methods in Natural Language Processing, pp. 322–332 (2013)
69.
go back to reference Parada, C., Dredze, M., Sethy, A., Rastrow, A.: Learning sub-word units for open vocabulary speech recognition. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 712–721 (2011) Parada, C., Dredze, M., Sethy, A., Rastrow, A.: Learning sub-word units for open vocabulary speech recognition. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 712–721 (2011)
70.
go back to reference Peng, F., Schuurmans, D., Wang, S., Keselj, V.: Language independent authorship attribution using character level language models. In: Conference of the European Chapter of the Association for Computational Linguistics, pp. 267–274 (2003) Peng, F., Schuurmans, D., Wang, S., Keselj, V.: Language independent authorship attribution using character level language models. In: Conference of the European Chapter of the Association for Computational Linguistics, pp. 267–274 (2003)
71.
go back to reference Pettersson, E., Megyesi, B., Nivre, J.: A multilingual evaluation of three spelling normalisation methods for historical text. In: Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 32–41 (2014) Pettersson, E., Megyesi, B., Nivre, J.: A multilingual evaluation of three spelling normalisation methods for historical text. In: Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 32–41 (2014)
72.
go back to reference Plank, B., Søgaard, A., Goldberg, Y.: Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In: Annual Meeting of the Association for Computational Linguistics (2016) Plank, B., Søgaard, A., Goldberg, Y.: Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In: Annual Meeting of the Association for Computational Linguistics (2016)
73.
go back to reference Rastogi, P., Cotterell, R., Eisner, J.: Weighting finite-state transductions with neural context. In: Conference of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies, pp. 623–633 (2016) Rastogi, P., Cotterell, R., Eisner, J.: Weighting finite-state transductions with neural context. In: Conference of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies, pp. 623–633 (2016)
74.
go back to reference Ratnaparkhi, A., et al.: A maximum entropy model for part-of-speech tagging. In: Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 133–142. Philadelphia, USA (1996) Ratnaparkhi, A., et al.: A maximum entropy model for part-of-speech tagging. In: Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 133–142. Philadelphia, USA (1996)
75.
go back to reference Sajjad, H.: Statistical models for unsupervised, semi-supervised and supervised transliteration mining. In: Computational Linguistics (2012) Sajjad, H.: Statistical models for unsupervised, semi-supervised and supervised transliteration mining. In: Computational Linguistics (2012)
76.
go back to reference dos Santos, C.N., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: International Conference on Computational Linguistics. pp. 69–78 (2014) dos Santos, C.N., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: International Conference on Computational Linguistics. pp. 69–78 (2014)
77.
go back to reference dos Santos, C.N., Guimarães, V.: Boosting named entity recognition with neural character embeddings. In: Fifth Named Entity Workshop, pp. 25–33 (2015) dos Santos, C.N., Guimarães, V.: Boosting named entity recognition with neural character embeddings. In: Fifth Named Entity Workshop, pp. 25–33 (2015)
78.
go back to reference dos Santos, C.N., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: International Conference on Machine Learning, pp. 1818–1826 (2014) dos Santos, C.N., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: International Conference on Machine Learning, pp. 1818–1826 (2014)
79.
go back to reference Schütze, H.: Word space. In: Advances in Neural Information Processing Systems, pp. 895–902 (1992) Schütze, H.: Word space. In: Advances in Neural Information Processing Systems, pp. 895–902 (1992)
80.
go back to reference Schütze, H.: Nonsymbolic text representation. CoRR abs/1610.00479 (2016) Schütze, H.: Nonsymbolic text representation. CoRR abs/1610.00479 (2016)
81.
go back to reference Sejnowski, T.J., Rosenberg, C.R.: Parallel networks that learn to pronounce english text. Complex Syst. 1(1), 145–168 (1987)MATH Sejnowski, T.J., Rosenberg, C.R.: Parallel networks that learn to pronounce english text. Complex Syst. 1(1), 145–168 (1987)MATH
82.
go back to reference Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Annual Meeting of the Association for Computational Linguistics (2016) Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Annual Meeting of the Association for Computational Linguistics (2016)
83.
go back to reference Shaik, M.A.B., Mousa, A.E.D., Schlüter, R., Ney, H.: Hybrid language models using mixed types of sub-lexical units for open vocabulary german lvcsr. In: Annual Conference of the International Speech Communication Association, pp. 1441–1444 (2011) Shaik, M.A.B., Mousa, A.E.D., Schlüter, R., Ney, H.: Hybrid language models using mixed types of sub-lexical units for open vocabulary german lvcsr. In: Annual Conference of the International Speech Communication Association, pp. 1441–1444 (2011)
84.
go back to reference Shaik, M.A.B., Mousa, A.E., Schlüter, R., Ney, H.: Feature-rich sub-lexical language models using a maximum entropy approach for german LVCSR. In: Annual Conference of the International Speech Communication Association, pp. 3404–3408 (2013) Shaik, M.A.B., Mousa, A.E., Schlüter, R., Ney, H.: Feature-rich sub-lexical language models using a maximum entropy approach for german LVCSR. In: Annual Conference of the International Speech Communication Association, pp. 3404–3408 (2013)
85.
go back to reference Shannon, C.E.: Prediction and entropy of printed english. Bell Labs Tech. J. 30(1), 50–64 (1951)CrossRef Shannon, C.E.: Prediction and entropy of printed english. Bell Labs Tech. J. 30(1), 50–64 (1951)CrossRef
86.
go back to reference Sperr, H., Niehues, J., Waibel, A.: Letter n-gram-based input encoding for continuous space language models. In: Workshop on Continuous Vector Space Models and their Compositionality, pp. 30–39 (2013) Sperr, H., Niehues, J., Waibel, A.: Letter n-gram-based input encoding for continuous space language models. In: Workshop on Continuous Vector Space Models and their Compositionality, pp. 30–39 (2013)
87.
go back to reference Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. In: ICML 2015 Deep Learing Workshop (2015) Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. In: ICML 2015 Deep Learing Workshop (2015)
88.
go back to reference Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In: International Conference on Machine Learning, pp. 1017–1024 (2011) Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In: International Conference on Machine Learning, pp. 1017–1024 (2011)
89.
go back to reference Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014) Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
90.
go back to reference Tiedemann, J., Nakov, P.: Analyzing the use of character-level translation with sparse and noisy datasets. In: Recent Advances in Natural Language Processing, RANLP 2013, 9–11 September, 2013, Hissar, Bulgaria, pp. 676–684 (2013) Tiedemann, J., Nakov, P.: Analyzing the use of character-level translation with sparse and noisy datasets. In: Recent Advances in Natural Language Processing, RANLP 2013, 9–11 September, 2013, Hissar, Bulgaria, pp. 676–684 (2013)
91.
go back to reference Murthy, V., Khapra, M.M., Bhattacharyya, P.: Sharing network parameters for crosslingual named entity recognition. CoRR abs/1607.00198 (2016) Murthy, V., Khapra, M.M., Bhattacharyya, P.: Sharing network parameters for crosslingual named entity recognition. CoRR abs/1607.00198 (2016)
92.
go back to reference Vergyri, D., Kirchhoff, K., Duh, K., Stolcke, A.: Morphology-based language modeling for arabic speech recognition. In: Annual Conference of the International Speech Communication Association, 4, 2245–2248 (2004) Vergyri, D., Kirchhoff, K., Duh, K., Stolcke, A.: Morphology-based language modeling for arabic speech recognition. In: Annual Conference of the International Speech Communication Association, 4, 2245–2248 (2004)
93.
go back to reference Vilar, D., Peter, J.T., Ney, H.: Can we translate letters? In: Workshop on Statistical Machine Translation (2007) Vilar, D., Peter, J.T., Ney, H.: Can we translate letters? In: Workshop on Statistical Machine Translation (2007)
94.
go back to reference Vylomova, E., Cohn, T., He, X., Haffari, G.: Word representation models for morphologically rich languages in neural machine translation. CoRR abs/1606.04217 (2016) Vylomova, E., Cohn, T., He, X., Haffari, G.: Word representation models for morphologically rich languages in neural machine translation. CoRR abs/1606.04217 (2016)
95.
go back to reference Wang, L., Cao, Z., Xia, Y., de Melo, G.: Morphological segmentation with window LSTM neural networks. In: AAAI Conference on Artificial Intelligence (2016) Wang, L., Cao, Z., Xia, Y., de Melo, G.: Morphological segmentation with window LSTM neural networks. In: AAAI Conference on Artificial Intelligence (2016)
96.
go back to reference Wieting, J., Bansal, M., Gimpel, K., Livescu, K.: Charagram: embedding words and sentences via character n-grams. In: Conference on Empirical Methods in Natural Language Processing (2016) Wieting, J., Bansal, M., Gimpel, K., Livescu, K.: Charagram: embedding words and sentences via character n-grams. In: Conference on Empirical Methods in Natural Language Processing (2016)
97.
go back to reference Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016) Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016)
98.
go back to reference Xiao, Y., Cho, K.: Efficient character-level document classification by combining convolution and recurrent layers. CoRR abs/1602.00367 (2016) Xiao, Y., Cho, K.: Efficient character-level document classification by combining convolution and recurrent layers. CoRR abs/1602.00367 (2016)
99.
go back to reference Yaghoobzadeh, Y., Schütze, H.: Multi-level representations for fine-grained typing of knowledge base entities. In: Conference of the European Chapter of the Association for Computational Linguistics (2017) Yaghoobzadeh, Y., Schütze, H.: Multi-level representations for fine-grained typing of knowledge base entities. In: Conference of the European Chapter of the Association for Computational Linguistics (2017)
100.
go back to reference Yang, Z., Chen, W., Wang, F., Xu, B.: A character-aware encoder for neural machine translation. In: International Conference on Computational Linguistics, pp. 3063–3070 (2016) Yang, Z., Chen, W., Wang, F., Xu, B.: A character-aware encoder for neural machine translation. In: International Conference on Computational Linguistics, pp. 3063–3070 (2016)
101.
go back to reference Yang, Z., Salakhutdinov, R., Cohen, W.W.: Multi-task cross-lingual sequence tagging from scratch. CoRR abs/1603.06270 (2016) Yang, Z., Salakhutdinov, R., Cohen, W.W.: Multi-task cross-lingual sequence tagging from scratch. CoRR abs/1603.06270 (2016)
102.
go back to reference Yu, L., Buys, J., Blunsom, P.: Online segment to segment neural transduction. In: Conference on Empirical Methods in Natural Language Processing, pp. 1307–1316 (2016) Yu, L., Buys, J., Blunsom, P.: Online segment to segment neural transduction. In: Conference on Empirical Methods in Natural Language Processing, pp. 1307–1316 (2016)
103.
go back to reference Zhang, X., LeCun, Y.: Text understanding from scratch. CoRR abs/1502.01710 (2015) Zhang, X., LeCun, Y.: Text understanding from scratch. CoRR abs/1502.01710 (2015)
104.
go back to reference Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015) Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)
Metadata
Title
Overview of Character-Based Models for Natural Language Processing
Authors
Heike Adel
Ehsaneddin Asgari
Hinrich Schütze
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-77113-7_1

Premium Partner