Skip to main content
main-content

Über dieses Buch

This book constitutes the refereed proceedings of the Second International Workshop on Systems and Frameworks for Computational Morphology, SFCM 2011, held in Zurich, Switzerland in August 2011.

The eight revised full papers presented together with one invited paper were carefully reviewed and selected from 13 submissions. The papers address various topics in computational morphology and the relevance of morphology to computational linguistics more broadly.

Inhaltsverzeichnis

Frontmatter

Beyond Morphology: Pattern Matching with FST

fst stands for Finite-State Toolkit. It is an enhanced version of the xfst tool described in the 2003 Beesley and Karttunen book Finite State Morphology. Like xfst, fst serves two purposes. It is a development tool for compiling finite-state networks and a runtime tool that applies networks to input strings or files. xfst is limited to morphological analysis and generation. fst can also be used for other applications. This paper describes the new features of the fst regular expression formalism and illustrates their use for named-entity recognition, relation extraction, tokenization and parsing. The fst pattern matching algorithm (pmatch) operates on a single pattern network but the network can be the union of any number of distinct pattern definitions. Many patterns can be matched simultaneously in one pass over a text. This is a distinct fst advantage over pattern matching facilities in languages such as Perl and Python.
Lauri Karttunen

Maximum Entropy Model for Disambiguation of Rich Morphological Tags

In this work we describe a statistical morphological tagger for Latvian, Lithuanian and Estonian languages based on morphological tag disambiguation. These languages have rich tagsets and very high rates of morphological ambiguity. We model distribution of possible tags with an exponential probabilistic model, which allows to select and use features from surrounding context. Results show significant improvement in error rates over the baseline, the same as the results for Czech. In comparison with the simplified parameter estimation method applied for Czech, we show that maximum entropy weight estimation achieves considerably better results.
Mārcis Pinnis, Kārlis Goba

Non-canonical Inflection: Data, Formalisation and Complexity Measures

Non-canonical inflection (suppletion, deponency, heteroclisis, etc.) is extensively studied in theoretical approaches to morphology. However, these studies often lack practical implementations associated with large-scale lexica. Yet these are precisely the requirements for objective comparative studies on the complexity of morphological descriptions. We show how a model of inflectional morphology which can represent many non-canonical phenomena [67], as well as a formalisation and an implementation thereof can be used to evaluate the complexity of competing morphological descriptions. After illustrating the properties of the model with data about French, Latin, Italian, Persian and Sorani Kurdish verbs and about noun classes from Croatian and Slovak we expose experiments conducted on the complexity of four competing descriptions of French verbal inflection. The complexity is evaluated using the information-theoretic concept of description length. We show that the new concepts introduced in the model by [67] enable reducing the complexity of morphological descriptions w.r.t. both traditional or more recent models.
Benoît Sagot, Géraldine Walther

A User-Oriented Approach to Evaluation and Documentation of a Morphological Analyzer

This article describes a user-oriented approach to evaluate and extensively document a morphological analyzer with a view to normative descriptions of ISO and EAGLES. While current state-of-the-art work in this field often describes task-based evaluation, our users (supposedly rather NLP non-experts, anonymously using the tool as part of a webservice) expect an extensive documentation of the tool itself, the testsuite that was used to validate it and the results of the validation process. ISO and EAGLES offer a good starting point when attempting to find attributes that are to be evaluated. The documentation introduced in this article describes the analyzer in a way comparable to others by defining its features as attribute-value pairs (encoded in DocBook XML). Furthermore, the evaluation itself and its results are described. All documentation and the created testsuites are online and free to use: http://www.ims.uni-stuttgart.de/projekte/dspin .
Gertrud Faaß

HFST—Framework for Compiling and Applying Morphologies

HFST–Helsinki Finite-State Technology ( hfst.sf.net ) is a framework for compiling and applying linguistic descriptions with finite-state methods. HFST currently connects some of the most important finite-state tools for creating morphologies and spellers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications in key environments and operating systems. HFST also provides an opportunity to exchange transducers between different software providers in order to get the best out of each finite-state library.
Krister Lindén, Erik Axelson, Sam Hardwick, Tommi A. Pirinen, Miikka Silfverberg

Morphology to the Rescue Redux: Resolving Borrowings and Code-Mixing in Machine Translation

In the IBM LMT machine translation system, derivational morphological rules recognize and analyze words that are not found in its source lexicons, and generate default transfers for these unlisted words. Unfound words with no inflectional or derivational affixes are by default nouns. These rules are now expanded to provide lexical coverage of a particular set of words created on the fly in emails by bilingual Spanish-English speakers. What characterizes the approach is the generation of additional default parts of speech, and the use of morphological, semantic, and syntactic features from both source and target lexicons for analysis and transfer. A built-in rule-based strategy to handle language borrowing and code-mixing allows for the recognition of words with variable and unpredictable frequency of occurrence, which would remain otherwise unfound, thus affecting the accuracy of parsing and the quality of translation output.
Esmé Manandise, Claudia Gdaniec

A Lexical Database for Modern Standard Arabic Interoperable with a Finite State Morphological Transducer

Current Arabic lexicons, whether computational or otherwise, make no distinction between entries from Modern Standard Arabic (MSA) and Classical Arabic (CA), and tend to include obsolete words that are not attested in current usage. We address this problem by building a large-scale, corpus-based lexical database that is representative of MSA. We use an MSA corpus of 1,089,111,204 words, a pre-annotation tool, machine learning techniques, and knowledge-based templatic matching to automatically acquire and filter lexical knowledge about morpho-syntactic attributes and inflection paradigms. Our lexical database is scalable, interoperable and suitable for constructing a morphological analyser, regardless of the design approach and programming language used. The database is formatted according to the international ISO standard in lexical resource representation, the Lexical Markup Framework (LMF). This lexical database is used in developing an open-source finite-state morphological processing toolkit. We build a web application, AraComLex (Arabic Computer Lexicon), for managing and curating the lexical database.
Mohammed Attia, Pavel Pecina, Antonio Toral, Lamia Tounsi, Josef van Genabith

Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus

This paper describes a robust finite state morphology tool for Indonesian (MorphInd), which handles both morphological analysis and lemmatization for a given surface word form so that it is suitable for further language processing. MorphInd has wider coverage on handling Indonesian derivational and inflectional morphology compared to an existing Indonesian morphological analyzer [1], along with a more detailed tagset. MorphInd outputs the analysis in the form of segmented morphemes along with the morphological tags. The implementation was done using finite state technology by adopting the two-level morphology approach implemented in Foma. It achieved 84.6% of coverage on a preliminary stage Indonesian corpus where it mostly fails to capture the proper nouns and foreign words as expected initially.
Septina Dian Larasati, Vladislav Kuboň, Daniel Zeman

Morphology Generation for Swiss German Dialects

Most work in natural language processing is geared towards written, standardized language varieties. In this paper, we present a morphology generator that is able to handle continuous linguistic variation, as it is encountered in the dialect landscape of German-speaking Switzerland. The generator derives inflected dialect forms from Standard German input. Besides generation of inflectional affixes, this system also deals with the phonetic adaptation of cognate stems and with lexical substitution of non-cognate stems. Most of its rules are parametrized by probability maps extracted from a dialectological atlas, thereby providing a large dialectal coverage.
Yves Scherrer

Backmatter

Weitere Informationen

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.

Whitepaper

- ANZEIGE -

Globales Erdungssystem in urbanen Kabelnetzen

Bedingt durch die Altersstruktur vieler Kabelverteilnetze mit der damit verbundenen verminderten Isolationsfestigkeit oder durch fortschreitenden Kabelausbau ist es immer häufiger erforderlich, anstelle der Resonanz-Sternpunktserdung alternative Konzepte für die Sternpunktsbehandlung umzusetzen. Die damit verbundenen Fehlerortungskonzepte bzw. die Erhöhung der Restströme im Erdschlussfall führen jedoch aufgrund der hohen Fehlerströme zu neuen Anforderungen an die Erdungs- und Fehlerstromrückleitungs-Systeme. Lesen Sie hier über die Auswirkung von leitfähigen Strukturen auf die Stromaufteilung sowie die Potentialverhältnisse in urbanen Kabelnetzen bei stromstarken Erdschlüssen. Jetzt gratis downloaden!

Bildnachweise