Skip to main content

2005 | Buch

Progress in Artificial Intelligence

12th Portuguese Conference on Artificial Intelligence, EPIA 2005, Covilhã, Portugal, December 5-8, 2005. Proceedings

herausgegeben von: Carlos Bento, Amílcar Cardoso, Gaël Dias

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Inhaltsverzeichnis

Frontmatter

Chapter 1 – General Artificial Intelligence (GAIW 2005)

Frontmatter
Introduction

Along the various editions of EPIA the scientific program comprised invited lectures, tutorials, parallel workshops, and paper presentations. The success of the workshop format, since it was adopted by the conference, motivated the organizers of the previous and current editions to generalize the adoption of this model for scientific presentations, leaving the plenary sessions for invited lectures, tutorials, posters and panels.

As expected, although a signi.cant number of workshops are accepted in each edition of EPIA, they do not cover all areas of AI. Another peculiarity of the workshop format is that the areas that are addressed differ substantially from one edition to another.

Carlos Bento, Amílcar Cardoso, Gaël Dias
Reducing Propositional Theories in Equilibrium Logic to Logic Programs

The paper studies reductions of propositional theories in equilibrium logic to logic programs under answer set semantics. Specifically we are concerned with the question of how to transform an arbitrary set of propositional formulas into an equivalent logic program and what are the complexity constraints on this process. We want the transformed program to be equivalent in a strong sense so that theory parts can be transformed independent of the wider context in which they might be embedded. It was only recently established [1] that propositional theories are indeed equivalent (in a strong sense) to logic programs. Here this result is extended with the following contributions. (i) We show how to effectively obtain an equivalent program starting from an arbitrary theory. (ii) We show that in general there is no polynomial time transformation if we require the resulting program to share precisely the vocabulary or signature of the initial theory. (iii) Extending previous work we show how polynomial transformations can be achieved if one allows the resulting program to contain new atoms. The program obtained is still in a strong sense equivalent to the original theory, and the answer sets of the theory can be retrieved from it.

Pedro Cabalar, David Pearce, Agustín Valverde
Preference Revision Via Declarative Debugging

Preference criteria are rarely static. Often they are subject to modification and aggregation. The resulting preference criteria may not satisfy the properties of the original ones and must therefore be revised. This paper investigates the problem of revising such preference criteria by means of declarative debugging techniques.

Pierangelo Dell’Acqua, Luís Moniz Pereira
Revised Stable Models – A Semantics for Logic Programs

This paper introduces an original 2-valued semantics for Normal Logic Programs (NLP), which conservatively extends the Stable Model semantics (SM) to all normal programs. The distinction consists in the revision of one feature of SM, namely its treatment of odd loops, and of infinitely long support chains, over default negation. This single revised aspect, addressed by means of a

Reductio ad Absurdum

approach, affords a number of fruitful consequences, namely regarding existence, relevance and top-down querying, cumulativity, and implementation.

The paper motivates and defines the Revised Stable Models semantics (rSM), justifying and exemplifying it. Properties of rSM are given and contrasted with those of SM. Furthermore, these results apply to SM whenever odd loops and infinitely long chains over negation are absent, thereby establishing significant, not previously known, properties of SM. Conclusions, further work, terminate the paper.

Luís Moniz Pereira, Alexandre Miguel Pinto
Operational Semantics for DyLPs

Theoretical research has spent some years facing the problem of how to represent and provide semantics to updates of logic programs. This problem is relevant for addressing highly dynamic domains with logic programming techniques. Two of the most recent results are the definition of the refined stable and the well founded semantics for dynamic logic programs that extend stable model and well founded semantic to the dynamic case. We present here alternative, although equivalent, operational characterizations of these semantics by program transformations into normal logic programs. The transformations provide new insights on the computational complexity of these semantics, a way for better understanding the meaning of the update programs, and also a methodology for the implementation of these semantics. In this sense, the equivalence theorems in this paper constitute soundness an completeness results for the implementations of these semantics.

F. Banti, J. J. Alferes, A. Brogi
Case Retrieval Nets for Heuristic Lexicalization in Natural Language Generation

In this paper we discuss the use of Case Retrieval Nets, a particular memory model for implementing case-base reasoning solutions, for implementing a heuristic lexicalisation module within a natural language generation application. We describe a text generator for fairy tales implemented using a generic architecture, and we present examples of how the Case Retrieval Net solves the Lexicalization task.

Raquel Hervás, Pablo Gervás
Partially Parametric SVM

In this paper we propose a simple and intuitive method for constructing partially linear models and, in general, partially parametric models, using support vector machines for regression and, in particular, using regularization networks (splines). The results are more satisfactory than those for classical nonparametric approaches. The method is based on a suitable approach to selecting the kernel by relaying on the properties of positive definite functions. No modification is required of the standard SVM algorithms, and the approach is valid for the

ε

-insensitive loss. The approach described here can be immediately applied to SVMs for classification and to other methods that use the kernel as the inner product.

José M. Matías
Adapting Hausdorff Metrics to Face Detection Systems: A Scale-Normalized Hausdorff Distance Approach

Template matching face detection systems are used very often as a previous step in several biometric applications. These biometric applications, like face recognition or video surveillance systems, need the face detection step to be efficient and robust enough to achieve better results. One of many template matching face detection methods uses Hausdorff distance in order to search the part of the image more similar to a face. Although Hausdorff distance involves very accurate results and low error rates, overall robustness can be increased if we adapt it to our concrete application. In this paper we show how to adjust Hausdorff metrics to face detection systems, presenting a scale-normalized Hausdorff distance based face detection system. Experiments show that our approach can perform an accurate face detection even with complex background or varying light conditions.

Pablo Suau
Robust Real-Time Human Activity Recognition from Tracked Face Displacements

We are interested in the challenging scientific pursuit of how to characterize human activities in any formal meeting situation by tracking people’s positions with a computer vision system. We present a human activity recognition algorithm that works within the framework of CAMEO (the Camera Assisted Meeting Event Observer), a panoramic vision system designed to operate in real-time and in uncalibrated environments. Human activity is difficult to characterize within the constraints that the CAMEO must operate, including uncalibrated deployment and unmodeled occlusions. This paper describes these challenges and how we address them by identifying invariant features and robust activity models. We present experimental results of our recognizer correctly classifying person data.

Paul E. Rybski, Manuela M. Veloso

Chapter 2 – Affective Computing (AC 2005)

Frontmatter
Introduction

Almost forty years ago, Herbert Simon emphasised the role of emotions in problem solving. Nevertheless, until recently, research on intelligent systems has traditionally been focused on the development of theories and techniques mostly inspired on what was considered the “rational” aspects of human behaviour.

But findings from neuroscience (such as by Damásio and LeDoux’s) and psychology suggesting that emotions are a leading part of what is considered intelligent behaviour, has brought the role of emotions into the limelight. Furthermore, the work by R. Picard, and the creation of the area of Affective Computing, has provided the right frame for research and develop new intelligent systems. Emotions can further be considered, not only as essential for problem solving techniques in intelligent systems but also allow for the construction of systems that interact with humans in more natural and human-like manner. Also, the increasing attention given to Agent-oriented programming makes it more relevant the enhancement of agent deliberation on the grounds of both rationality and emotionality.

Ana Paiva, Carlos Martinho, Eugénio de Oliveira
Adaptation and Decision-Making Driven by Emotional Memories

The integration between emotion and cognition can provide an important support for adaptation and decision-making under resource-bounded conditions, typical of real-world domains. The ability to adjust cognitive activity and to take advantage of emotion-modulated memories are two main aspects resulting from that integration. In this paper we address those issues under the framework of the

agent flow model

, describing the formation of emotional memories and the regulation of their use through attention focusing. Experimental results from simulated rescue scenarios show how the proposed approach enables effective decision making and fast adaptation rates in completely unknown environments.

Luís Morgado, Graça Gaspar
Affective Revision

Moods and emotions influence human reasoning, most of the time in a positive way. One aspect of reasoning is the revision of beliefs, i.e., how to change a set of beliefs in order to incorporate new information that conflicts with the existing beliefs. We incorporate two influences of affective states on belief maintenance identified by psychologists, in a AI belief revision operation. On one hand, we present an alternative operation to conventional Belief Revision, Affective Revision, that determines the preference between new and old information based on the mood of the agent revising its beliefs. On the other, we show how beliefs can be automatically ordered, in terms of resistance to change, based on (among other aspects) the influence of emotion anticipations on the strength of beliefs.

César F. Pimentel, Maria R. Cravo
Feeling and Reasoning: A Computational Model for Emotional Characters

Interactive virtual environments (IVEs) are now seen as an engaging new way by which children learn experimental sciences and other disciplines. These environments are populated by synthetic characters that guide and stimulate the children activities. In order to build such environments, one needs to address the problem of how achieve believable and empathic characters that act autonomously. Inspired by the work of traditional character animators, this paper proposes an architectural model to build autonomous characters where the agent’s reasoning and behaviour is influenced by its emotional state and personality. We performed a small case evaluation in order to determine if the characters evoked empathic reactions in the users with positive results.

João Dias, Ana Paiva

Chapter 3 – Artificial Life and Evolutionary Algorithms (ALEA 2005)

Frontmatter
Introduction

In this part we present accepted communications to ALEA’05, which took place at University of Covilhã, Portugal, on 5-8 December 2005. ALEA’05 was the second workshop on Artificial Life and Evolutionary Algorithms, organised as part of EPIA’05 (Portuguese Conference on Artificial Intelligence). ALEA is an event targeted at Artificial Life (ALife) community, Evolutionary Algorithms (EA) community and researchers working in the crossing of these two areas.

To a certain extent, ALife and EA aim at similar goals of classical Artificial Intelligence (AI): to build computer based intelligent solutions. The path, however, is diverse since ALife and EA are more concerned with the study of simple, bottom-up, biologically inspired modular solutions. Therefore, research on computer based bio-inspired solutions, possibly with emergent properties, and biology as computation, may be the global characterisation of this workshop.

Luís Correia, Ernesto Costa
Evolutionary Computation Approaches for Shape Modelling and Fitting

This paper proposes and analyzes different evolutionary computation techniques for conjointly determining a model and its associated parameters. The context of 3D reconstruction of objects by a functional representation illustrates the ability of the proposed approaches to perform this task using real data, a set of 3D points on or near the surface of the real object. The final recovered model can then be used efficiently in further modelling, animation or analysis applications. The first approach is based on multiple genetic algorithms that find the correct model and parameters by successive approximations. The second approach is based on a standard strongly-typed implementation of genetic programming. This study shows radical differences between the results produced by each technique on a simple problem, and points toward future improvements to join the best features of both approaches.

Sara Silva, Pierre-Alain Fayolle, Johann Vincent, Guillaume Pauron, Christophe Rosenberger, Christian Toinard
Reaction-Agents: First Mathematical Validation of a Multi-agent System for Dynamical Biochemical Kinetics

In the context of multi-agent simulation of biological complex systems, we present a reaction-agent model for biological chemical kinetics that enables interaction with the simulation during the execution. In a chemical reactor with no spatial dimension -e.g. a cell-, a reaction-agent represents an autonomous chemical reaction between several reactants : it reads the concentration of reactants, adapts its reaction speed, and modifies consequently the concentration of reaction products. This approach, where the simulation engine makes agents intervene in a chaotic and asynchronous way, is an alternative to the classical model -which is not relevant when the limits conditions change- based on differential systems. We establish formal proofs of convergence for our reaction-agent methods, generally quadratic. We illustrate our model with an example about the extrinsic pathway of blood coagulation.

Pascal Redou, Sébastien Kerdelo, Christophe Le Gal, Gabriel Querrec, Vincent Rodin, Jean-François Abgrall, Jacques Tisseau
A Hybrid Classification System for Cancer Diagnosis with Proteomic Bio-markers

A number of studies have been performed with the objective of applying various artificial intelligence techniques to the prediction and classification of cancer specific biomarkers for use in clinical diagnosis. Most biological data, such as that obtained from SELDI-TOF (Surface Enhanced Laser Desorption and Ionization-Time Of Flight) MS (Mass Spectrometry) is high dimensional, and therefore requires dimension reduction in order to limit the computational complexity and cost. The DT (Decision Tree) is an algorithm which allows for the fast classification and effective dimension reduction of high dimensional data. However, it does not guarantee the reliability of the features selected by the process of dimension reduction. Another approach is the MLP (Multi-Layer Perceptron) which is often more accurate at classifying data, but is not suitable for the processing of high dimensional data. In this paper, we propose on a novel approach, which is able to accurately classify prostate cancer SELDI data into normal and abnormal classes and to identify the potential biomarkers. In this approach, we first select those features that have excellent discrimination power by using the DT. These selected features constitute the potential biomarkers. Next, we classify the selected features into normal and abnormal categories by using the MLP; at this stage we repeatedly perform cross validation to evaluate the propriety of the selected features. In this way, the proposed algorithm can take advantage of both the DT and MLP, by hybridizing these two algorithms. The experimental results demonstrate that the proposed algorithm is able to identify multiple potential biomarkers that enhance the confidence of diagnosis, also showing better specificity, sensitivity and learning error rates than other algorithms. The proposed algorithm represents a promising approach to the identification of proteomic patterns in serum that can distinguish cancer from normal or benign and is applicable to clinical diagnosis and prognosis.

Jung-Ja Kim, Young-Ho Kim, Yonggwan Won
Intelligent Multiobjective Particle Swarm Optimization Based on AER Model

How to find a sufficient number of uniformly distributed and representative Pareto optimal solutions is very important for Multiobjective Optimization (MO) problems. An Intelligent Particle Swarm Optimization (IPSO) for MO problems is proposed based on AER (Agent-Environment-Rules) model, in which competition and clonal selection operator are designed to provide an appropriate selection pressure to propel the swarm population towards the Pareto-optimal Front. An improved measure for uniformity is carried out to the approximation of the Pareto-optimal set. Simulations and comparison with NSGA-II and MOPSO indicate that IPSO is highly competitive.

Hong-yun Meng, Xiao-hua Zhang, San-yang Liu
A Quantum Inspired Evolutionary Framework for Multi-objective Optimization

This paper provides a new proposal that aims to solve multi-objective optimization problems (MOP

s

) using quantum evolutionary paradigm. Three main features characterize the proposed framework. In one hand, it exploits the states superposition quantum concept to derive a probabilistic representation encoding the vector of the decision variables for a given MOP. The advantage of this representation is its ability to encode the entire population of potential solutions within a single chromosome instead of considering only a gene pool of individuals as proposed in classical evolutionary algorithms. In the other hand, specific quantum operators are defined in order to reward good solutions while maintaining diversity. Finally, an evolutionary dynamics is applied on these quantum based elements to allow stochastic guided exploration of the search space. Experimental results show not only the viability of the method but also its ability to achieve good approximation of the Pareto Front when applied on the multi-objective knapsack problem.

Souham Meshoul, Karima Mahdi, Mohamed Batouche

Chapter 4 – Building and Applying Ontologies for the Semantic Web (BAOSW 2005)

Frontmatter
Introduction

The emergence of the Semantic Web has marked another stage in the evolution of the ontology field. According to Berners-Lee, the Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. This cooperation can be achieved by using shared knowledge-components. Therefore ontologies have become a key instrument in developing the Semantic Web. They interweave human understanding of symbols with their machine processability.

This workshop addressed the problems of building and applying ontologies in the semantic web as well as the theoretical and practical challenges arising from different applications. We invited and received contributions that enhance the state-of-the-art of creating, managing and using ontologies. The workshop received high quality submissions, which were peer-reviewed by two or three reviewers.

H. Sofia Pinto, Andreia Malucelli, Fred Freitas, Christoph Tempich
A Database Trigger Strategy to Maintain Knowledge Bases Developed Via Data Migration

The mapping between databases and ontologies is an issue of importance for the creation of the Semantic Web. This is mainly due to the large amount of web data stored in databases. Our approach tackles the consideration of the dynamic aspects of relational databases in knowledge bases. This solution is of particular interest for “ontology-driven” information systems equipped with inference functionality and which require synchronization with legacy database.

Olivier Curé, Raphaël Squelbut
The SWRC Ontology – Semantic Web for Research Communities

Representing knowledge about researchers and research communities is a prime use case for distributed, locally maintained, interlinked and highly structured information in the spirit of the Semantic Web. In this paper we describe the publicly available ‘Semantic Web for Research Communities’ (SWRC) ontology, in which research communities and relevant related concepts are modelled. We describe the design decisions that underlie the ontology and report on both experiences with and known usages of the SWRC Ontology. We believe that for making the Semantic Web reality the re-usage of ontologies and their continuous improvement by user communities is crucial. Our contribution aims to provide a description and usage guidelines to make the value of the SWRC explicit and to facilitate its re-use.

York Sure, Stephan Bloehdorn, Peter Haase, Jens Hartmann, Daniel Oberle

Chapter 5 – Computational Methods in Bioinformatics (CMB 2005)

Frontmatter
Introduction

The Workshop on Computational Methods in Bioinformatics was held in Covilhã between the 5th and 8th December 2005, as part of the 12th Portuguese Conference on Artificial Intelligence.

The success of bioinformatics in recent years has been prompted by research in molecular biology and molecular medicine in initiatives like the human genome project. These initiatives gave rise to an exponential increase in the volume and diversification of data, including protein and gene data, nucleotide sequences and biomedical literature. The accumulation and exploitation of large-scale data bases prompts for new computational technology and for research into these issues. In this context, many widely successful computational models and tools used by biologists in these initiatives, such as clustering and classification methods for gene expression data, are based on artificial intelligence (AI) techniques. Hence, this workshop brought the opportunity to discuss applications of AI with an interdisciplinary character, exploring the interactions between sub-areas of AI and Bioinformatics.

Rui Camacho, Alexessander Alves, Joaquim Pinto da Costa, Paulo Azevedo
Protein Sequence Classification Through Relevant Sequence Mining and Bayes Classifiers

We tackle the problem of sequence classification using relevant subsequences found in a dataset of protein labelled sequences. A subsequence is

relevant

if it is frequent and has a minimal length. For each query sequence a vector of features is obtained. The features consist in the number and average length of the relevant subsequences shared with each of the protein families. Classification is performed by combining these features in a Bayes Classifier. The combination of these characteristics results in a multi-class and multi-domain method that is exempt of data transformation and background knowledge. We illustrate the performance of our method using three collections of protein datasets. The performed tests showed that the method has an equivalent performance to state of the art methods in protein classification.

Pedro Gabriel Ferreira, Paulo J. Azevedo
CONAN: An Integrative System for Biomedical Literature Mining

The amount of information about the genome, transcriptome and proteome, forms a problem for the scientific community: how to find the right information in a reasonable amount of time. Most research aiming to solve this problem, however, concentrate on a certain organism or a very limited dataset. Complementary to those algorithms, we developed CONAN, a system which provides a full-scale approach, tailored to experimentalists, designed to combine several information extraction methods and connect the outcome of these methods to gather novel information. Its methods include tagging of gene/protein names, finding interaction and mutation data, tagging of biological concepts, linking to MeSH and Gene Ontology terms, which can all be found back by querying the system. We present a full-scale approach that will ultimately cover all of PubMed/MEDLINE. We show that this universality has no effect on quality: our system performs as well as existing systems.

Rainer Malik, Arno Siebes
A Quantum Evolutionary Algorithm for Effective Multiple Sequence Alignment

This paper describes a novel approach to deal with multiple sequence alignment (MSA). MSA is an essential task in bioinformatics which is at the heart of denser and more complex tasks in biological sequence analysis. MSA problem still attracts researcher’s attention despite the significant research effort spent to solve it. We propose in this paper a quantum evolutionary algorithm to improve solutions given by CLUSTALX package. The contribution consists in defining an appropriate representation scheme that allows applying successfully on MSA problem some quantum computing principles like qubit representation and superposition of states. This representation scheme is embedded within an evolutionary algorithm leading to an efficient hybrid framework which achieves better balance between exploration and exploitation capabilities of the search process. Experiments on a wide range of data sets have shown the effectiveness of the proposed framework and its ability to improve by many orders of magnitude the CLUSTALX’s solutions.

Souham Meshoul, Abdessalem Layeb, Mohamed Batouche
Hierarchical Multi-classification with Predictive Clustering Trees in Functional Genomics

This paper investigates how predictive clustering trees can be used to predict gene function in the genome of the yeast

Saccharomyces cerevisiae

. We consider the MIPS FunCat classification scheme, in which each gene is annotated with one or more classes selected from a given functional class hierarchy. This setting presents two important challenges to machine learning: (1) each instance is labeled with a set of classes instead of just one class, and (2) the classes are structured in a hierarchy; ideally the learning algorithm should also take this hierarchical information into account. Predictive clustering trees generalize decision trees and can be applied to a wide range of prediction tasks by plugging in a suitable distance metric. We define an appropriate distance metric for hierarchical multi-classification and present experiments evaluating this approach on a number of data sets that are available for yeast.

Jan Struyf, Sašo Džeroski, Hendrik Blockeel, Amanda Clare

Chapter 6 – Extracting Knowledge from Databases and Warehouses (EKDB&W 2005)

Frontmatter
Introduction

The 2005 EKDB&W –

Extracting Knowledge from Databases and Warehouses

workshop objective was to attract contributions related to methods for nontrivial extraction of information from data. This book of proceedings includes 10 selected papers (resulting from 3 reviews). We believe that the diversity of these papers illustrates the EKDB & W objective attainment.

Unsupervised Learning (Clustering methods in particular) was addressed in 3 papers: (i) an extension of traditional SOM was proposed which considered specific measures of distance for categorical attributes; (ii) an empirical ranking of information criteria was provided for determining the number of clusters when dealing with mixed attributes in Latent Segments Models; (iii) CLOPE was found particularly useful to deal with binary basket data and provided the means to define web User Group Profiles. Supervised Learning was addressed in 3 papers: (i) multi-output nonparametric regression methods were presented comparing alternative ways to integrate co-response observations; (ii) Peepholing Techniques were adapted for Regression Trees, providing means to reduce the number of continuous variables and the ranges considered for nodes splitting; (iii) a Multi-Layer Perceptron was used to classify vector structures derived using the Law’s Algorithm. Three papers address the issue of data and knowledge extraction dealing directly with databases and data warehouses: (i) a methodology to evaluate the quality of Meta-Data describing contents in web portals was proposed; (ii) a new approach to retrieve data from semi-structured text .les and integrate it in a decision support system was proposed; (iii) an alternative approach for itemset mining over large transactional tables was presented. Finally, an alternative approach to the L* algorithm was proposed, trying to diminish the needless repetition of membership queries.

Application domains were very diverse and illustrated the practical utility of the presented methodologies. They ranged from Web services and Retail to the treatment of Sea surface data, Space Weather and Spacecraft data. Papers in these areas hopefully contributed bridge the gap between research and practice. The EKDB & W Workshop would not have been possible without the contribution of Authors, Program Committee members and EPIA 2005 Organizers. All deserve our thanks and appreciation.

João Gama, João Moura-Pires, Margarida Cardoso, Nuno Cavalheiro Marques, Luís Cavique
Multi-output Nonparametric Regression

Several non-parametric regression methods with various dependent variables that are possibly related are explored. The techniques which produce the best results in the simulations are those which incorporate the observations of the other response variables in the estimator. Compared to analogous single-response techniques, this approach results in a significant reduction in the quadratic error in the response.

José M. Matías
Adapting Peepholing to Regression Trees

This paper presents an adaptation of the peepholing method to regression trees. Peepholing was described as a means to overcome the major computational bottleneck of growing classification trees by Catlett [3] . This method involves two major steps: shortlisting and blinkering. The former has the goal of eliminating some continuous variables from consideration when growing the tree, while the second tries to restrict the range of values of the remaining continuous variables that should be considered when searching for the best cut point split. Both are effective means of overcoming the most costly step of growing tree-based models: sorting the values of the continuous variables before selecting their best split. In this work we describe the adaptations that are necessary to use this method within regression trees. The major adaptations involve developing means to obtain biased estimates of the criterion used to select the best split of these models. We present some preliminary experiments that show the effectiveness of our proposal.

Luis Torgo, Joana Marques
An Extension of Self-organizing Maps to Categorical Data

Self-organizing maps (SOM) have been recognized as a powerful tool in data exploratoration, especially for the tasks of clustering on high dimensional data. However, clustering on categorical data is still a challenge for SOM. This paper aims to extend standard SOM to handle feature values of categorical type. A batch SOM algorithm (NCSOM) is presented concerning the dissimilarity measure and update method of map evolution for both numeric and categorical features simultaneously.

Ning Chen, Nuno C. Marques
Programming Relational Databases for Itemset Mining over Large Transactional Tables

Most of the

itemset

mining

approaches are memory-like and run outside of the database. On the other hand, when we deal with data warehouse the size of tables is extremely huge for memory copy. In addition, using a pure SQL-like approach is quite inefficient. Actually, those implementations rarely take advantages of database programming. Furthermore, RDBMS vendors offer a lot of features for taking control and management of the data. We purpose a

pattern growth mining

approach by means of

database programming

for finding

allfrequent itemsets

. The main idea is to avoid

one-at-a-time record retrieval

from the database, saving both the copying and process context switching,

expensive joins

, and

table reconstruction

. The empirical evaluation of our approach shows that runs competitively with the most known

itemset mining

implementations based on SQL. Our performance evaluation was made with SQL Server 2000 (v.8) and T-SQL, throughout several synthetical datasets.

Ronnie Alves, Orlando Belo
Using a More Powerful Teacher to Reduce the Number of Queries of the L* Algorithm in Practical Applications

In this work we propose to use a more powerful teacher to effectively apply query learning algorithms to identify regular languages in practical, real-world problems. More specifically, we define a more powerful set of replies to the membership queries posed by the L* algorithm that reduces the number of such queries by several orders of magnitude in a practical application. The basic idea is to avoid the needless repetition of membership queries in cases where the reply will be negative as long as a particular condition is met by the string in the membership query. We present an example of the application of this method to a real problem, that of inferring a grammar for the structure of technical articles.

André L. Martins, H. Sofia Pinto, Arlindo L. Oliveira
User Group Profile Modeling Based on User Transactional Data for Personalized Systems

In this paper, we propose a framework named UMT (User-profile Modeling based on Transactional data) for modeling user group profiles based on the transactional data. UMT is a generic framework for application systems that keep the historical transactions of their users. In UMT, user group profiles consist of three types: basic information attributes, synthetic attributes and probability distribution attributes. User profiles are constructed by clustering user transaction data and integrating cluster attributes with domain information extracted from application systems and other external data sources. The characteristic of UMT makes it suitable for personalization of transaction-based commercial application systems. A case study is presented to illustrate how to use UMT to create a personalized tourism system capable of using domain information in intelligent ways and of reacting to external events.

Yiling Yang, Nuno C. Marques
Retail Clients Latent Segments

Latent Segments Models (LSM) are commonly used as an approach for market segmentation. When using LSM, several criteria are available to determine the number of segments. However, it is not established which criteria are more adequate when dealing with a specific application. Since most market segmentation problems involve the simultaneous use of categorical and continuous base variables, it is particularly useful to select the

best

criteria when dealing with LSM with mixed type base variables. We first present an empirical test, which provides the ranking of several information criteria for model selection based on ten mixed data sets. As a result, the ICL-BIC, BIC, CAIC and

${\mathcal L}$

criteria are selected as the

best

performing criteria in the estimation of mixed mixture models. We then present an application concerning a retail chain clients’ segmentation. The

best

information criteria yield two segments:

Preferential Clients

and

Occasional Clients

.

Jaime R. S. Fonseca, Margarida G. M. S. Cardoso
Automatic Detection of Meddies Through Texture Analysis of Sea Surface Temperature Maps

A new machine learning approach is presented for automatic detection of Mediterranean water eddies from sea surface temperature maps of the Atlantic Ocean. A pre-processing step uses Laws’ convolution kernels to reveal microstructural patterns of water temperature. Given a map point, a numerical vector containing information on local structural properties is generated. This vector is forwarded to a multi-layer perceptron classifier that is trained to recognise texture patterns generated by positive and negative instances of eddy structures. The proposed system achieves high recognition accuracy with fast and robust learning results over a range of different combinations of statistical measures of texture properties. Detection results are characterised by a very low rate of false positives. The latter is particularly important since meddies occupy only a small portion of SST map area.

Marco Castellani, Nuno C. Marques
Monitoring the Quality of Meta-data in Web Portals Using Statistics, Visualization and Data Mining

We propose a methodology to monitor the quality of the meta-data used to describe content in web portals. It is based on the analysis of the meta-data using statistics, visualization and data mining tools. The methodology enables the site’s editor to detect and correct problems in the description of contents, thus improving the quality of the web portal and the satisfaction of its users. We also define a general architecture for a platform to support the proposed methodology. We have implemented this platform and tested it on a Portuguese portal for management executives. The results validate the methodology proposed.

Carlos Soares, Alípio Mário Jorge, Marcos Aurélio Domingues
A Real Time Data Extraction, Transformation and Loading Solution for Semi-structured Text Files

Space applications’ users have been relying for the past decades on custom developed software tools capable of addressing short term necessities during critical Spacecraft control periods. Advances in computing power and storage solutions have made possible the development of innovative decision support systems. These systems are capable of providing high quality integrated data to both near real time and historical data analysis applications. This paper describes the implementation of a new approach for a distributed and loosely coupled data extraction and transformation solution capable of extracting, transforming and perform loading of relevant real-time and historical Space Weather and Spacecraft data from semi-structured text files into an integrated space-domain decision support system. The described solution takes advantage of XML and Web Service technologies and is currently working under operational environment at the European Space Agency as part of the Space Environment Information System for Mission Control Purposes (SEIS) project.

Nuno Viana, Ricardo Raminhos, João Moura-Pires

Chapter 7 – Intelligent Robotics (IROBOT 2005)

Frontmatter
Introduction

Research in robotics has traditionally emphasized low-level sensing and control tasks, path planning and actuator design and control. In contrast, generally using robotic simulators, several Artificial Intelligence (AI) researchers are more concerned with providing real/simulated robots with higher-level cognitive functions that enable them to reason, act and perceive in an autonomous way in dynamic, inaccessible, continuous and non deterministic environments. Combining results from traditional robotics with those from AI and cognitive science will be thus essential for the future of intelligent robotics.

The purpose of the 1st International Workshop on Intelligent Robotics IROBOT’ 05 was to bring together researchers, engineers and other professionals interested in the application of Artificial Intelligence techniques in real/simulated robotics to discuss current work and future directions.

Luís Paulo Reis, Nuno Lau, Carlos Carreto, Eduardo Silva
Visual Based Human Motion Analysis: Mapping Gestures Using a Puppet Model

This paper presents a novel approach to analyze the appearance of human motions with a simple model i.e. mapping the motions using a virtual marionette model. The approach is based on a robot using a monocular camera to recognize the person interacting with the robot and start tracking its head and hands. We reconstruct 3-D trajectories from 2-D image space (IS) by calibrating and fusing the camera images with data from an inertial sensor, applying general anthropometric data and restricting the motions to lie on a plane. Through a virtual marionette model we map 3-D trajectories to a feature vector in the

marionette control space (MCS)

. This implies inversely that now a certain set of 3-D motions can be performed by the (virtual) marionette system. A subset of these motions are considered to convey information (i.e. gestures). Thus, we are aiming to build up a database which keeps the vocabulary of gestures represented as signals in the

MCS

. The main contribution of this work is the computational model of the

IS-MCS-Mapping

. We introduce the guide robot “Nicole” to place our system in an embodied context. We sketch two novel approaches to represent human motion (i.e. Marionette Space and Labananalysis). We define a gesture vocabulary organized in three sets (i.e. Cohen’s Gesture Lexicon, Pointing Gestures and Other Gestures).

Jörg Rett, Jorge Dias
Acquiring Observation Models Through Reverse Plan Monitoring

We present a general-purpose framework for updating a robot’s observation model within the context of planning and execution. Traditional plan execution relies on monitoring plan step transitions through accurate state observations obtained from sensory data. In order to gather meaningful state data from sensors, tedious and time-consuming calibration methods are often required. To address this problem we introduce

Reverse Monitoring

, a process of learning an observation model through the use of plans composed of scripted actions. The automatically acquired observation models allow the robot to adapt to changes in the environment and robustly execute arbitrary plans. We have fully implemented the method in our AIBO robots, and our empirical results demonstrate its effectiveness.

Sonia Chernova, Elisabeth Crawford, Manuela Veloso
Applying Biological Paradigms to Emerge Behaviour in RoboCup Rescue Team

This paper presents a hybrid behaviour process for performing collaborative tasks and coordination capabilities in a rescue team. RoboCup Rescue simulator and its associated international competition are used as the testbed for our proposal. Unlike other published work in this field one of our main concerns is having good results on RoboCup Rescue championships by emerging behaviour in agents using a biological paradigm. The benefit comes from the hierarchic and parallel organisation of the mammalian brain. In our behaviour process, Artificial Neural Networks are used in order to make agents capable of learning information from the environment. This allows agents to improve several algorithms like their Path Finding Algorithm to find the shortest path between two points. Also, we aim to filter the most important messages that arise from the environment, to make the right choice on the best path planning among many alternatives, in a short time. A policy action was implemented using Kohonen’s network, Dijkstra’s and D* algorithm. This policy has achieved good results in our tests, getting our team classified for RoboCup Rescue Simulation League 2005.

Francisco Reinaldo, Joao Certo, Nuno Cordeiro, Luis P. Reis, Rui Camacho, Nuno Lau
Survival Kit: A Constraint-Based Behavioural Architecture for Robot Navigation

This article presents a constraint-based behavioural architecture for low-level safe navigation, the Survival Kit. Instead of approaching the problem by customising a generic Behaviour-Based architecture, the Survival Kit embodies a dedicated semantics for safe navigation, which augments its expressiveness for the task. An instantiation of the architecture for goal-oriented obstacle avoidance in unstructured indoor environments is proposed. Special attention is given to an environmental feature, the gap, which allows to optimise paths based on immediate ranging data. Experimental results in simulation confirm the capabilities of the approach.

Pedro Santana, Luís Correia
Heuristic Algorithm for Robot Path Planning Based on a Growing Elastic Net

A simple effective method for path planning based on a growing self-organizing elastic neural network, enhanced with a heuristic for the exploration of local directions is presented. The general problem is to find a collision-free path for moving objects among a set of obstacles. A path is represented by an interconnected set of processing units in the elastic self organizing network. The algorithm is initiated with a straight path defined by a small number of processing units between the start and goal positions. The two units at the extremes of the network are static and are located at the start and goal positions, the remaining units are adaptive. Using a local sampling strategy of the points around each processing unit, a Kohonen type learning and a simple processing units growing rule the initial straight path evolves into a collision free path. The proposed algorithm was experimentally tested for 2 DOF and 3 DOF robots on a workspace cluttered with random and non random distributed obstacles. It is shown that with very little computational effort a satisfactory free collision path is calculated.

José Alí Moreno, Miguel Castro
Robust Artificial Landmark Recognition Using Polar Histograms

New results on our artificial landmark recognition approach are presented, as well as new experiments in order to demonstrate the robustness of our method. The objective of our work is the localization and recognition of artificial landmarks to help in the navigation of a mobile robot. Recognition is based on interpretation of histograms obtained from polar coordinates of the landmark symbol. Experiments prove that our approach is fast and robust even if the database has an high number of landmarks to compare with.

Pablo Suau
An Architecture of Sensor Fusion for Spatial Location of Objects in Mobile Robotics

Each part of a mobile robot has particular aspects of its own, which must be integrated in order to successfully conclude a specific task. Among these parts, sensing enables to construct a representation of landmarks of the surroundings with the goal of supplying relevant information for the robot’s navigation. The present work describes the architecture of a perception system based on data fusion from a CMOS camera and distance sensors. The aim of the proposed architecture is the spatial location of objects on a soccer field. An SVM is used for both recognition and object location and the process of fusion is made by means of a fuzzy system, using a TSK model.

Luciano Oliveira, Augusto Costa, Leizer Schnitman, J. Felippe Souza
CATRAPILAS – A Simple Robotic Platform

This paper describes Catrapilas, a small robotic platform, designed to be capable of solving some well known robot problems. Among these are some of the most popular robotic contests, like Micro Mouse, Fire Fighting and Autonomous Driving. It describes the major decisions and details of the physical architecture of the robot, but emphasizes on the high level approach used to control the robotic agent. This approach is based on the creation of a 2D map of the agent’s environment, which should contain all the information needed in order to solve the current problem. There is also a description of the implementation used for the Autonomous Driving Competition, from the 2005 Portuguese National Robotics Festival, and the results that were obtained. There is a focus on the robot’s ability to accomplish the objectives of the contest, and how this proved that the concept and ideas behind Catrapilas are correct.

Nuno Cerqueira

Chapter 8 – Multi-agent Systems: Theory and Applications (MASTA 2005)

Frontmatter
Introduction

Multi-Agent Systems (MAS) is now one of the most relevant and attractive research areas in the field of computer science. Since 1993 the area of Multi-Agent Systems/Distributed Artificial Intelligence has been present in the EPIA conferences, both as individual tracks in the main conference and as autonomous workshops.

The

3rd Workshop on Multi-Agent Systems: Theory and Applications

(MASTA 2005) took place in the University of Beira Interior, Covilhã, Portugal, December 6-8, 2005, as part of EPIA 2005 – 12th Portuguese Conference on Artificial Intelligence. Focusing on a fundamental area of research in Artificial Intelligence, the 3rd MASTA workshop was the forum for presenting and discussing the most recent and innovative work in the areas of multi-agent systems and autonomous agents.

João Balsa, Luís Moniz, Luís Paulo Reis
A Model of Pedagogical Negotiation

This paper presents a model of pedagogical negotiation developed for the AMPLIA, an Intelligent Probabilistic Multi-agent Learning Environment. Three intelligent software agents: Domain Agent, Learner Agent and Mediator Agent were developed using Bayesian Networks and Influence Diagrams. The goal of the negotiation model is to increase, as much as possible: (a) the performance of the model the students build; (b) the confidence that teachers and tutors have in the students’ ability to diagnose cases; and the students’ confidence on their own ability to diagnose cases; and (c) the students’ confidence on their own ability to diagnose diseases.

Cecilia D. Flores, Louise J. Seixas, João C. Gluz, Rosa M. Vicari
Towards a Market Mechanism for Airport Traffic Control

We present a multiagent decision mechanism for the airport traffic control domain. It enables airlines to jointly decide on proposals for plan conflict solutions. The mechanism uses weighted voting for maximizing global utility and Clarke Tax to discourage manipulation. We introduce accounts to ensure that all agents are treated fairly, to some extent. The mechanism allows an airport to determine the pay-off between optimality and fairness of schedules. Also, it compensates for agents that happen to be in practically unfavourable positions.

Geert Jonker, John-Jules Meyer, Frank Dignum
Intentions and Strategies in Game-Like Scenarios

In this paper, we investigate the link between logics of games and “mentalistic” logics of rational agency, in which agents are characterized in terms of attitudes such as belief, desire and intention. In particular, we investigate the possibility of extending the logics of games with the notion of agents’ intentions (in the sense of Cohen and Levesque’s BDI theory). We propose a new operator (str

a

σ) that can be used to formalize reasoning about outcomes of strategies in game-like scenarios. We briefly discuss the relationship between intentions and goals in this new framework, and show how to capture dynamic logic-like constructs. Finally, we demonstrate how game-theoretical concepts like Nash equilibrium can be expressed to reason about rational intentions and their consequences.

Wojciech Jamroga, Wiebe van der Hoek, Michael Wooldridge
Semantics and Pragmatics for Agent Communication

For the successful management of interactions in open multi-agent systems, a social framework is needed to complement a standard semantics and interaction protocols for agent communication. In this paper a rights-based framework in which interaction protocols and conversation policies acquire their meaning is presented. Rights improve interaction and facilitate social action in multi-agent domains. Rights allow agents enough freedom, and at the same time constrain them (prohibiting specific actions). A general framework for agent communication languages (ACLs) is proposed, defining a set of performatives (semantics) and showing why a set of conversation policies to guide agent’s interactions (pragmatics) is needed. Finally, we show how it is possible to model interaction protocols within a rights-based normative open MAS.

Rodrigo Agerri, Eduardo Alonso
Logical Implementation of Uncertain Agents

We consider the representation and execution of agents specified using

temporal logics

. Previous work in this area has provided a basis for the direct execution of agent specifications, and has been extended to allow the handling of agent beliefs, deliberation and multi-agent groups. However, the key problem of

uncertainty

has not been tackled. Given that agents work in unknown environments, and interact with other agents that may, in turn, be unpredictable, then it is essential for any formal agent description to incorporate some mechanism for capturing this aspect. Within the framework of executable specifications, formal descriptions involving uncertainty must also be executable. The contribution of this paper is to extend executable temporal logic in order to allow the representation and execution of uncertain statements within agents. In particular, we extend the basis of the

MetateM

temporal framework with a

probabilistic belief

dimension captured by the recently introduced

P

F

KD

45 logic. We provide a description of the extended logic, the translation procedure for formulae in this extended logic to an executable normal form, and the execution algorithm for such formulae. We also outline technical results concerning the correctness of the translation to the normal form and the completeness of the execution mechanism.

Nivea de C. Ferreira, Michael Fisher, Wiebe van der Hoek
Subgoal Semantics in Agent Programming

This paper investigates the notion of subgoals as used in plans in cognitive agent programming languages. These subgoals form an abstract representation of more concrete courses of action or plans. Subgoals can have a procedural interpretation (directly linked to a concrete plan) or a declarative one (the state to be reached as represented by the subgoal is taken into account). We propose a formal semantics for subgoals that interprets these declaratively, and study the relation between this semantics and the procedural subgoal semantics of the cognitive agent programming language 3APL. We prove that subgoals of 3APL can be programmed to behave declaratively, although the semantics is defined procedurally.

M. Birna van Riemsdijk, Mehdi Dastani, John-Jules Ch. Meyer
The Multi-team Formation Precursor of Teamwork

We formulate the multi-team formation (M-TF) domain-independent problem and describe a generic solution for the problem. We illustrate the M-TF preference relation component in the domain of a large-scale disaster response simulation environment. The M-TF problem is the precursor of teamwork that explicitly addresses the achievement of several short time period goals, where the work to achieve the complete set of goals overwhelms the working capacity of the team formation space (all teams formed from the finite set of available agents). Decisions regarding team formation are made by the agents considering their own probabilistic beliefs and utility preferences about the whole (known) set of goals to achieve. The RoboCupRescue simulated large-scale disaster domain is used to illustrate the design of the preference relation domain-specific M-TF component.

Paulo Trigo, Helder Coelho
Seeking Multiobjective Optimization in Uncertain, Dynamic Games

If the decisions of agents arise from the solution of general unconstrained problems, altruistic agents can implement effective problem transformations to promote convergence to attractors and draw these fixed points toward Pareto optimal points. In the literature, algorithms have been developed to compute optimal parameters for problem transformations in the seemingly more restrictive scenario of uncertain, quadratic games in which an agent’s response is induced by one of a set of potential problems. This paper reviews these developments briefly and proposes a convergent algorithm that enables altruistic agents to relocate the attractor at a point at which all agents are better off, rather than optimizing a weighted function of the agents’ objectives.

Eduardo Camponogara, Haoyu Zhou
Learning to Select Negotiation Strategies in Multi-agent Meeting Scheduling

In this paper, we look at the Multi-Agent Meeting Scheduling problem where distributed agents negotiate meeting times on behalf of their users. While many negotiation approaches have been proposed for scheduling meetings, it is not well understood how agents can negotiate strategically in order to maximize their users’ utility. To negotiate strategically, agents need to learn to pick good strategies for negotiating with other agents. We show how the

playbook

approach, introduced by [1] for team plan selection in small-size robot soccer, can be used to select strategies. Selecting strategies in this way gives some theoretical guarantees about regret. We also show experimental results demonstrating the effectiveness of the approach.

Elisabeth Crawford, Manuela Veloso

Chapter 9 – Text Mining and Applications (TEMA 2005)

Frontmatter
Introduction

This chapter contains papers presented in the workshop on Text Mining and Applications (TeMA 2005), organized in the framework of the Portuguese Association for Artificial Intelligence conference (EPIA). This workshop is aimed at attracting quality papers and enhancing the knowledge in this area.

27 papers were submitted. From these, 9 papers were selected for publication in this Springer volume. These numbers show current importance of this field in AI and suggest that the organization of equivalent events in future EPIA editions should be pursued.

First paper works on bilingual lexical acquisition from non parallel corpora, applicable in Machine Translation. Second paper describes work on Text Summarization. Third uses Transformation Based learning for NP Identification applied to Portuguese.Fourth uses linguistic knowledge for passage retrieval and question answering. Fifth describes the use of weakly supervised learning for extraction of semantic patterns. Sixth paper proposes a method for semantic indexing and evaluates it on traditional Information Retrieval tasks. Seventh works on unsupervised language independent extraction of multi-word terms, applicable in multiple domains and evaluates their results for Slovene and English. Eighth paper presents a variant of a known method for Anaphora resolution, adapted to Portuguese. Last paper proposes a stemmer for Brazilian Portuguese.

Gabriel Pereira Lopes, Joaquim Ferreira da Silva, Victor Rocio, Paulo Quaresma
An Approach to Acquire Word Translations from Non-parallel Texts

Few approaches to extract word translations from non-parallel texts have been proposed so far. Researchers have not been encouraged to work on this topic because extracting information from non-parallel corpora is a difficult task producing poor results. Whereas for parallel texts, word translation extraction can reach about 99%, the accuracy for non-parallel texts has been around 72% up to now. The current approach, which relies on the previous extraction of bilingual pairs of lexico-syntactic templates from parallel corpora, makes a significant improvement to about 89% of words translations identified correctly.

Pablo Gamallo Otero, José Ramom Pichel Campos
Experiments on Statistical and Pattern-Based Biographical Summarization

We describe experiments on content selection for producing biographical summaries from multiple documents. The method relies on a set of patterns to identify descriptive phrases, an available co-reference resolution algorithm, and a greedy, corpus-based sentence deletion procedure for document compression. We show that in an automatic evaluation of content using ROUGE, the proposed method obtains very good performance.

Horacio Saggion, Robert Gaizauskas
Constrained Atomic Term: Widening the Reach of Rule Templates in Transformation Based Learning

Within the framework of Transformation Based Learning (TBL), the rule template is one of the most important elements in the learning process. This paper presents a new model for TBL templates, in which the basic unit, denominated here as an atomic term (AT), encodes a variable sized window and a test that precedes the capture of a feature’s value. A case study of Portuguese NP identification is described and the experimental results obtained are presented.

Cícero Nogueira dos Santos, Claudia Oliveira
Improving Passage Retrieval in Question Answering Using NLP

This paper describes an approach for the integration of linguistic information in passage retrieval in an open-source question answering system for Dutch. Annotation produced by the wide-coverage dependency parser Alpino is stored in multiple index layers to be matched with natural language question that have been analyzed by the same parser. We present a genetic algorithm to select features to be included in retrieval queries and for optimizing keyword weights. The system is trained on questions annotated with their answers from the competition on Dutch question answering within the Cross-Language Evaluation Forum (CLEF). The optimization yielded a significant improvement of about 19% in mean reciprocal rank scores on unseen evaluation data compared to the base-line using traditional information retrieval with plain text keywords.

Jörg Tiedemann
Mining the Semantics of Text Via Counter-Training

We report on a set of experiments in text mining, specifically, finding semantic patterns given only a few keywords. The experiments employ the Counter-training framework for discovery of semantic knowledge from raw text in a weakly supervised fashion. The experiments indicate that the framework is suitable for efficient acquisition of semantic word classes and collocation patterns, which may be used for Information Extraction.

Roman Yangarber
Minimum Redundancy Cut in Ontologies for Semantic Indexing

This paper presents a new method that aims at improving semantic indexing while reducing the number of indexing terms. Indexing terms are determined using a minimum redundancy cut in a hierarchy of conceptual hypernyms provided by an ontology (e.g.

WordNet

,

EDR

). The results of some information retrieval experiments carried out on several standard document collections using the

EDR

ontology are presented, illustrating the benefit of the method.

Florian Seydoux, Jean-Cédric Chappelier
Unsupervised Learning of Multiword Units from Part-of-Speech Tagged Corpora: Does Quantity Mean Quality?

This paper describes an original hybrid system that extracts multiword unit candidates from part-of-speech tagged corpora. While classical hybrid systems manually define local part-of-speech patterns that lead to the identification of well-known multiword units (mainly compound nouns), we automatically identify relevant syntactical patterns from the corpus. Word statistics are then combined with the endogenously acquired linguistic information in order to extract the most relevant sequences of words. As a result, (1) human intervention is avoided providing total flexibility of use of the system and (2) different multiword units like phrasal verbs, adverbial locutions and prepositional locutions may be identified. Finally, we propose an exhaustive evaluation of our architecture based on the multi-domain, bilingual Slovene-English IJS-ELAN corpus where surprising results are evidenced. To our knowledge, this challenge has never been attempted before.

Gaël Dias, Špela Vintar
Lappin and Leass’ Algorithm for Pronoun Resolution in Portuguese

This paper presents a variant of Lappin and Leass’ Algorithm for pronoun resolution in Portuguese texts; the algorithm resolves third person pronominal anaphora, as well as reflexive and reciprocal pronouns. It relies on salience measures, derived from the syntactic structure of the sentence, and on a simple discourse representation model. The algorithm, as well as its evaluation with legal and literary corpora, are presented.

Thiago Thomes Coelho, Ariadne Maria Brito Rizzoni Carvalho
STEMBR: A Stemming Algorithm for the Brazilian Portuguese Language

Stemming algorithms have traditionally been utilized in information retrieval systems as they generate a more concise word representation. However, the efficiency of these algorithms varies according to the language they are used with. This paper presents STEMBR, a stemmer for Brazilian Portuguese whereby the suffix treatment is based on a statistical study of the frequency of the last letter for words found in Brazilian web pages. The proposed stemmer is compared with another algorithm specifically developed for Portuguese. The results show the efficiency of our stemmer.

Reinaldo Viana Alvares, Ana Cristina Bicharra Garcia, Inhaúma Ferraz
Backmatter
Metadaten
Titel
Progress in Artificial Intelligence
herausgegeben von
Carlos Bento
Amílcar Cardoso
Gaël Dias
Copyright-Jahr
2005
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-31646-6
Print ISBN
978-3-540-30737-2
DOI
https://doi.org/10.1007/11595014