Top

2009 | Book

Read chapter Read first chapter

Aspects of Natural Language Processing

Essays Dedicated to Leonard Bolc on the Occasion of His 75th Birthday

Editors: Małgorzata Marciniak, Agnieszka Mykowiecka

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

For many years Leonard Bolc has played an important role in the Polish computer science community. He is especially known for his clear vision in the development of artificial intelligence, inspiring research, organizational and editorial achievements in areas such as e.g.: logic, automatic reasoning, natural language processing, and computer applications of natural language or human-like reasoning.

This Festschrift volume, published to honor Leonard Bolc on his 75th birthday includes 17 refereed papers by leading researchers, his friends, former students and colleagues to celebrate his scientific career. The essays present research in the areas which Leonard Bolc and his colleagues investigated during his long scientific career.

The volume is organized in three parts; the first is devoted to logic - the domain which was one of the most explored by Leonard Bolc himself. The second part contains papers focusing on different aspects of computational linguistics; the third part comprises papers describing different applications in which natural language processing or automatic reasoning plays an important role.

Frontmatter

Logic

Frontmatter

Wisdom Technology: A Rough-Granular Approach

Abstract

We discuss foundations for modern intelligent systems in the framework of Wisdom Technology (Wistech). The approach is based on the rough-granular approach.

Andrzej Jankowski, Andrzej Skowron

Paraconsistent Reasoning with Words

Abstract

Fuzzy logics are one of the most frequent approaches to model uncertainty and vagueness. In the case of fuzzy modeling, degrees of belief and disbelief sum up to 1, which causes problems in modeling the lack of knowledge and inconsistency. Therefore, so called paraconsistent intuitionistic fuzzy sets have been introduced, where the degrees of belief and disbelief are not required to sum up to 1. The situation when this sum is smaller than 1 reflects the lack of knowledge and its value greater than 1 models inconsistency.

In many applications there is a strong need to guide and interpret fuzzy-like reasoning using qualitative approaches. To achieve this goal in the presence of uncertainty, lack of knowledge and inconsistency, we provide a framework for qualitative interpretation of the results of fuzzy-like reasoning by labeling numbers with words, like true, false, inconsistent, unknown, reflecting truth values of a suitable, usually finitely valued logical formalism.

Alicja S. Szalas, Andrzej Szałas

Language

Frontmatter

On the Root-Based Lexicon for Polish

Abstract

In this paper we present the concept of an electronic lexicon based on morphological roots. The idea of the root-based lexicon returns to traditional linguistic division of a word into a stem and an inflectional suffix. The only difference to the pure linguistic description is that an electronic resource must adapt to the analyzed text. We assume that the lexicon will be used in written text analysis (or synthesis), therefore we operate on grapheme objects.

We used the lexicon of the inflectional analyzer AMOR as the empirical foundation for the root-based lexicon. In the second part of the paper we describe the process of the automatic conversion of the data from the analyzer into the assumed format. The conversion concerns the major inflecting parts of speech: nouns, adjectives and verbs. The results are two-level morphology based entries which bear the whole package of morphological information about lexemes. In the presented form, however, any generalization about Polish inflection or inner root alternations is not available. Thus, we rebuilt the lexicon of roots. As a result we obtained the compressed lexicon which can serve not only for inflection analysis but also applications of word-formation descriptions.

Joanna Rabiega-Wiśniewska

Representation of Uzbek Morphology in Prolog

Abstract

In the paper we address issues related to the morphology of the Uzbek language. In Uzbek, as in many other agglutinative languages, some single text-words correspond to sentences in non-agglutinative languages. Morphological processing is therefore a crucial operation in the automatic processing of Uzbek. We approach the theory of Uzbek morphology in terms of morphotactic and morphophonemic rules. We present the UZMORPP system of automatic morphological parsing for the Uzbek language. The Prolog implementation of this system is provided.

Gayrat Matlatipov, Zygmunt Vetulani

Inflection of Polish Multi-Word Proper Names with Morfeusz and Multiflex

Abstract

We discuss morphological properties of Polish multi-word proper names. We present a cooperating framework of two morphological tools: Morfeusz, a morphological analyser and generator for Polish simple words, and Multiflex, a cross-language morpho-syntactic generator of multi-word units. We discuss interface constraints required for the interoperability of these tools, and we show how the resulting platform allows one to describe the morpho-syntactic behaviour of some interesting examples of Warsaw multi-word toponyms.

Agata Savary, Joanna Rabiega-Wiśniewska, Marcin Woliński

A New Formal Definition of Polish Nominal Phrases

Abstract

In the paper, a new formal definition of Polish nominal phrases is presented. Based upon a certain formal grammar of Polish (FGP) that applies a formalism of metamorphosis grammar, it is the first step towards redesigning the entire grammar. It makes use of the results of experiments with implementation of the grammar. After a report on empirical data a large set of parameters that formalize various grammatical features is introduced. Some of those parameters are really new, others are to be reinterpreted and improved. A number of rules are written down to illustrate the way empirical expressions are accounted for. The paper ends in formulating some postulates that the new version of FGP is expected to fulfil.

Marek Świdziński, Marcin Woliński

Morphosyntactic Constraints in the Acquisition of Linguistic Knowledge for Polish

Abstract

Many approaches to the construction of language tools and acquisition of linguistic knowledge from corpora assume the application of some robust shallow parser. Construction of such a parser is difficult in the case of inflective languages with relaxed word order like Polish. The goal of the work presented here is to analyse the extent of knowledge that can be expressed in the form of morphosyntactic constraints referring to morphological properties of word forms, and its applications in the automatic extraction of syntactic and semantic knowledge. Basic properties of an extended version of the language of morphosyntactic constraints called JOSKIPI are briefly presented. The application of morphosyntactic constraints as background knowledge for extraction of disambiguation rules for Polish is discussed. A new approach to extraction of lexical semantic relations is presented: it relies on the constraints in identifying lexico-morphosyntactic dependencies among word forms in the text. Finally, a combination of the constraints and statistical analysis in the acquisition of multiword expressions is outlined.

Maciej Piasecki, Adam Radziszewski

Towards the Automatic Acquisition of a Valence Dictionary for Polish

Abstract

This article presents the evaluation of a valence dictionary for Polish produced with the help of shallow parsing techniques and compares those results to earlier results involving deep parsing. We show that the valence dictionary obtained with the use of shallow parsing attains higher quality when it is measured on the basis of a corpus of valence frames, while the dictionary produced with the help of deep parsing seems superior when the results are compared to existing valence dictionaries.

Adam Przepiórkowski

Semantic Annotation of Verb Arguments in Shallow Parsed Polish Sentences by Means of the EM Selection Algorithm

Abstract

The ultimate goal of our work is to extend a syntactic valence dictionary of Polish verbs by adding some semantic information to verb arguments. This information consists of wordnet semantic categories of words. In order to provide syntactic slots of dictionary entries with lists of appropriate semantic categories of corresponding nouns, we need a treebank with all nouns semantically annotated with such categories, as both syntactic (i.e., argument structure) and semantic information is required.

We aim here at Word Sense Disambiguation (WSD). To solve this task for our specific application, we adapt EM selection algorithm elaborated for extraction of syntactic valence frames.

In the paper, the whole process of data processing is shown. The main focus is put on WSD task. Three versions of the EM selection algorithm are presented: the original one and its two modifications. Finally, the evaluation and comparison of the algorithms is performed.

Elżbieta Hajnicz

Adjectives: Constructions vs. Valence

Abstract

The paper approaches adjectives in French and Polish from two perspectives: a linguistic description and an automatic text analysis. In particular, we aim at specifying adjective valence and distinguish it from components with which they occasionally occur in various syntactic constructions. Then, we apply linguistic knowledge to annotated data and automatically extract valence lexicons for adjectives. For French, a richly annotated treebank is available whereas the Polish corpus we use currently contains only morphosyntactic information. The paper focuses on results obtained for French as valence extraction for Polish requires additional data processing.

Anna Kupść

Applications

Frontmatter

User-Centered Design for a Voice Portal

Abstract

After a brief overview of voice portal technology, with special attention paid to Polish, we discuss some aspects of user-centered design and its influence on usability of the proposed solution. We describe the issues of voice portal preparation on the example of Warsaw city transportation hotline and main components of the voice portal. The system is effective and supports users in their needs: about 30% of users complete their requests through the automated system without talking to human operators.

Krzysztof Marasek, Łukasz Brocki, Danijel Koržinek, Krzysztof Szklanny, Ryszard Gubrynowicz

Speech Understanding System SUSY—A New Version of the Speech Synthesis Program

Abstract

The method of digital synthesis of speech presented below was been worked out for Warsaw University in order to implement the acoustic output of an automated telephone information office. The paper presents a program which implements microphonemic synthesis of speech designed by a team of Professor Leonard Bolc in the mid-1970’s. We had to build the program which would generate continuous speech with prosodic features similar to the natural language and understandable.

Jerzy Cytowski

Exploring Curvature-Based Topic Development Analysis for Detecting Event Reporting Boundaries

Abstract

In the era of proliferation of electronic news media and an ever-growing demand for prompt and concise information, natural language text processing technologies which map free texts into structured data format are becoming paramount. Recently, we have witnessed an emergence of publicly accessible news aggregation systems for facilitating navigation through news. This paper reports on some explorations of refining a real-time news event extraction system, which runs on top of the Europe Media Monitoring news aggregation system developed at the Joint Research Centre of the European Commission. Our experiments focus on the task of detecting new events in a given news story, i.e. tagging events extracted by the core event extraction system as new. Several methods ranging from simple similarity computation of event descriptions of adjacent events to more elaborate ones based on curvature-based topic development analysis which utilize global knowledge. The paper describes first the particularities of the real-time news event extraction processing chain. Next, in order to get a better insight how news stories evolve over time some statistics on event dynamics are presented. Finally, the new event detection techniques are introduced and the results of the evaluation are given.

Jakub Piskorski

Domain Model for Medical Information Extraction—The LightMedOnt Ontology

Abstract

The paper describes the creation of a domain model for an Information Extraction (IE) application in the medical domain. First, we present texts: mammography reports and diabetology patients’ discharge documents, for which IE systems were created. The methodology and results of terminology extraction for both domains are described. Next, the main features and the upper part of LightMedOnt—medical ontology in OWL formalism are presented. In the final part of the paper we discuss the relationships between OWL ontologies and the domain model of the IE system used for our experiments.

Agnieszka Mykowiecka, Małgorzata Marciniak

A Survey of Text Processing Tools for the Automatic Analysis of Molecular Sequences

Abstract

Automatic analysis of molecular sequences is an interdisciplinary field of science, with many analogies to the methodologies of analyses and understanding of natural languages. In both these fields the object of the study has a complex, hierarchical character, which results from natural evolution. In this paper we have presented a survey of textual processing algorithms in the aspect of their applications to molecular sequences. We have shown methods for solving problems for exact and approximate searches of patterns in texts: aligning and block aligning of molecular sequences and analyzing molecular sequences by using indexed structures and transformations. We have covered some recent developments in these fields and we have provided some examples of inferring biological knowledge by using text processing algorithms for molecular sequences.

Andrzej Polański, Rafał Pokrzywa, Marek Kimmel

Intelligent Decision Support: A Fuzzy Stock Ranking System

Abstract

This paper presents an intelligent decision support system for financial portfolio management. An adaptive business intelligence approach combines optimization, forecasting and adaptation with application specific financial information processing and quantitative investment paradigms.

The methodology involves constructing a ranking of stocks by strength of a buy or sell recommendation which is inferred using an adapting forecasting model that considers a range of factors. These include company balance sheet information, market price and trading volume as well as the wider economy. The system adjusts its prediction model dynamically as market conditions change. An evolving fuzzy rule base mechanism encodes a model of relationships between model factors and a recommendation to buy, sell or hold securities.

Adam Ghandar, Zbigniew Michalewicz, Ralf Zurbruegg

COLLANE: An Experiment in Computer-Mediated Tacit Collaboration

Abstract

We introduce COLLANE, an experimental collaborative analytic environment that allows a group of professional analysts to work together effectively on complex, multifaceted information problems. COLLANE has been developed to investigate innovative ways of harnessing the power of collaboration so that to maximize the quality of the analytical product while at the same time controlling for its hidden costs: bias, groupthink, compromise, suppression of dissent and individual initiative. The key innovation that we are advancing in this project is the concept of ubiquitous tacit collaboration enabled through computer-mediated information sharing between the participants. By design, tacit collaboration requires no extraneous effort from the users since the information exchange is both automatic and targeted to what each analyst is currently doing. It also requires no specific “engagement” with subject matter experts since their continuous virtual presence assures ubiquity of collaborative opportunities. In this paper we describe an initial prototype of COLLANE, explaining its basic functions and components.

Tomek Strzalkowski, Sarah Taylor, Samira Shaikh, Ben-Ami Lipetz, Hilda Hardy, Nick Webb, Tony Cresswell, Ting Liu, Min Wu, Yu Zhan, Song Chen

Backmatter

Title: Aspects of Natural Language Processing
Editors: Małgorzata Marciniak
Agnieszka Mykowiecka
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-04735-0
Print ISBN: 978-3-642-04734-3
DOI: https://doi.org/10.1007/978-3-642-04735-0

Springer Professional

Aspects of Natural Language Processing

Essays Dedicated to Leonard Bolc on the Occasion of His 75th Birthday

About this book

Table of Contents

Frontmatter

Logic

Frontmatter

Wisdom Technology: A Rough-Granular Approach

Paraconsistent Reasoning with Words

Language

Frontmatter

On the Root-Based Lexicon for Polish

Representation of Uzbek Morphology in Prolog

Inflection of Polish Multi-Word Proper Names with Morfeusz and Multiflex

A New Formal Definition of Polish Nominal Phrases

Morphosyntactic Constraints in the Acquisition of Linguistic Knowledge for Polish

Towards the Automatic Acquisition of a Valence Dictionary for Polish

Semantic Annotation of Verb Arguments in Shallow Parsed Polish Sentences by Means of the EM Selection Algorithm

Adjectives: Constructions vs. Valence

Applications

Frontmatter

User-Centered Design for a Voice Portal

Speech Understanding System SUSY—A New Version of the Speech Synthesis Program

Exploring Curvature-Based Topic Development Analysis for Detecting Event Reporting Boundaries

Domain Model for Medical Information Extraction—The LightMedOnt Ontology

A Survey of Text Processing Tools for the Automatic Analysis of Molecular Sequences

Intelligent Decision Support: A Fuzzy Stock Ranking System

COLLANE: An Experiment in Computer-Mediated Tacit Collaboration

Backmatter

Premium Partner