Unknown words whose translation is not listed in general dictionaries, have been a problem in cross-language information retrieval and machine translation. Since the new terms are created one after the other, it is difficult to cover all such terms using general bilingual dictionaries. Therefore, researches on automatic extraction of translations for unknown words have been performed for the purpose of building a bilingual dictionary at low cost using Web corpora. In this paper, we focus on anime titles; they are commercially important, and propose a method to extract Japanese candidate translations corresponding to the English anime titles using Conditional Random Fields (CRF). We used transliteration features as well as features of bag of words, part of speech, and so on because we focused on the fact that when the Japanese anime titles were translated into English, they were transliterated in many cases. The experiments were performed using one hundred Web pages at most collected from the search engine, whose queries were Japanese-English anime title pairs extracted from Wikipedia. The results showed that the number of acquired titles significantly increased when the transliteration features were used.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- Extracting the Translation of Anime Titles from Web Corpora Using CRF
- Springer International Publishing
Neuer Inhalt/© ITandMEDIA