2011 | OriginalPaper | Buchkapitel
Development of an English-Macedonian Machine Readable Dictionary by Using Parallel Corpora
verfasst von : Martin Saveski, Igor Trajkovski
Erschienen in: ICT Innovations 2010
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
The dictionaries are one of the most useful lexical resources. However, most of the dictionaries today are not in digital form. This makes them cumbersome for usage by humans and impossible for integration in computer programs. The process of digitalizing an existing traditional dictionary is expensive and labor intensive task. In this paper, we present a method for development of Machine Readable Dictionaries by using the already available resources. Machine readable dictionary consists of simple word-toword mappings, where word from the source language can be mapped into several optional words in the target language. We present a series of experiments where by using the parallel corpora and open source Statistical Machine Translation tools at our disposal, we managed to develop an English- Macedonian Machine Readable Dictionary containing 23,296 translation pairs (17,708 English and 18,343 Macedonian terms). A subset of the produced dictionary has been manually evaluated and showed accuracy of 79.8%.