nach oben

2006 | Buch

Kapitel lesen Erstes Kapitel lesen

Foundations of Intelligent Systems

16th International Symposium, ISMIS 2006, Bari, Italy, September 27-29, 2006. Proceedings

herausgegeben von: Floriana Esposito, Zbigniew W. Raś, Donato Malerba, Giovanni Semeraro

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Inhaltsverzeichnis

Frontmatter

Invited Talks

Lifecycle Knowledge Management: Getting the Semantics Across in X-Media

Knowledge and information spanning multiple information sources, multiple media, multiple versions and multiple communities challenge the capabilities of existing knowledge and information management infrastructures by far — primarily in terms of intellectually exploiting the stored knowledge and information. In this paper we present some semantic web technologies of the EU integrated project X-Media that build bridges between the various information sources, the different media, the stations of knowledge management and the different communities. Core to this endeavour is the combination of information extraction with formal ontologies as well as with semantically lightweight folksonomies.

Steffen Staab, Thomas Franz, Olaf Görlitz, Carsten Saathoff, Simon Schenk, Sergej Sizov

Argument-Based Machine Learning

In this paper, some recent ideas will be presented about making machine learning (ML) more effective through mechanisms of argumentation. In this sense,

argument-based machine learning

(ABML) is defined as a refinement of the usual definition of ML. In ABML, some learning examples are accompanied by arguments, that are expert’s reasons for believing why these examples are as they are. Thus ABML provides a natural way of introducing domain-specific prior knowledge in a way that is different from the traditional, general background knowledge. The task of ABML is to find a theory that explains the “argumented” examples by making reference to the given reasons. ABML, so defined, is motivated by the following advantages in comparison with standard learning from examples: (1) arguments impose constraints over the space of possible hypotheses, thus reducing search complexity, and (2) induced theories should make more sense to the expert. Ways of realising ABML by extending some existing ML techniques are discussed, and the aforementioned advantages of ABML are demonstrated experimentally.

Ivan Bratko, Martin Možina, Jure Žabkar

Play It Again: A Case-Based Approach to Expressivity-Preserving Tempo Transformations in Music

It has been long established that when humans perform music, the result is never a literal mechanical rendering of the score. That is, humans deviate from the score. As far as these performance deviations are intentional, they are commonly thought of as conveying musical expressivity which is a fundamental aspect of music. Two main functions of musical expressivity are generally recognized. Firstly, expressivity is used to clarify the musical structure of the piece (metrical structure, phrasing, harmonic structure). secondly, expressivity is also used as a way of communicating, or accentuating, affective content.

An important issue when performing music is the effect of tempo on expressivity. It has been argued that temporal aspects of performance scale uniformly when tempo changes. That is, the durations of all performed notes maintain their relative proportions. This hypothesis is called relational invariance (of timing under tempo changes). However, counter-evidence for this hypothesis has been provided, and a recent study shows that listeners are able to determine above chance-level whether audio recordings of jazz and classical performances are uniformly time stretched or original recordings, based solely on expressive aspects of the performances. In my talk I address this issue by focusing on our research on tempo transformations of audio recordings of saxophone jazz performances. More concretely, we have investigated the problem of how a performance played at a particular tempo can be automatically rendered at another tempo while preserving its expressivity. To do so we have developed a case-based reasoning system called TempoExpress. Our approach also experimentally refutes the relational invariance hypothesis by comparing the automatic transformations generated by TempoExpress against uniform time stretching.

Ramon López de Mántaras

Active Media Human-Computer Interaction

Decision Fusion of Shape and Motion Information Based on Bayesian Framework for Moving Object Classification in Image Sequences

This paper proposes decision fusion method of shape and motion information based on Bayesian framework for object classification in image sequences. This method is designed for intelligent information and surveillance guard robots to detect and track a suspicious person and vehicle within a security region. For reliable and stable classification of targets, multiple invariant feature vectors to more certainly discriminate between targets are required. To do this, shape and motion information are extracted using Fourier descriptor, gradients, and motion feature variation on spatial and temporal images, and then local decisions are performed respectively. Finally, global decision is done using decision fusion method based on Bayesian framework. The experimental results on the different test sequences showed that the proposed method obtained good classification result than any other ones using neural net and other fusion methods.

Heungkyu Lee, JungHo Kim, June Kim

A Two-Stage Visual Turkish Sign Language Recognition System Based on Global and Local Features

In order to provide communication between the deaf-dumb people and the hearing people, a two-stage system translating Turkish Sign Language into Turkish is developed by using vision based approach. Hidden Markov models are utilized to determine the global feature group in the dynamic gesture recognition stage, and k nearest neighbor algorithm is used to compare the local features in the static gesture recognition stage. The system can perform person dependent recognition of 172 isolated signs.

Hakan Haberdar, Songül Albayrak

Speech Emotion Recognition Using Spiking Neural Networks

Human social communication depends largely on exchanges of non-verbal signals, including non-lexical expression of emotions in speech. In this work, we propose a biologically plausible methodology for the problem of emotion recognition, based on the extraction of vowel information from an input speech signal and on the classification of extracted information by a spiking neural network. Initially, a speech signal is segmented into vowel parts which are represented with a set of salient features, related to the Mel-frequency cesptrum. Different emotion classes are then recognized by a spiking neural network and classified into five different emotion classes.

Cosimo A. Buscicchio, Przemysław Górecki, Laura Caponetti

Visualizing Transactional Data with Multiple Clusterings for Knowledge Discovery

Information visualization is gaining importance in data mining and transactional data has long been an important target for data miners. We propose a novel approach for visualizing transactional data using multiple clustering results for knowledge discovery. This scheme necessitates us to relate different clustering results in a comprehensive manner. Thus we have invented a method for attributing colors to clusters of different clustering results based on minimal transversals. The effectiveness of our method

VisuMClust

has been confirmed with experiments using artificial and real-world data sets.

Nicolas Durand, Bruno Crémilleux, Einoshin Suzuki

Computational Intelligence

An Optimization Model for Visual Cryptography Schemes with Unexpanded Shares

Visual cryptography schemes encrypt a secret image into

shares so that any qualified set of shares enables one to visually decrypt the hidden secret; whereas any forbidden set of shares cannot leak out any secret information. In the study of visual cryptography, pixel expansion and contrast are two important issues. Since pixel-expansion based methods encode a pixel to many pixels on each share, the size of the share is larger than that of the secret image. Therefore, they result in distortion of shares and consume more storage space. In this paper, we propose a method to reach better contrast without pixel expansion. The concept of probability is used to construct an optimization model for general access structures, and the solution space is searched by genetic algorithms. Experimental result shows that the proposed method can reach better contrast and blackness of black pixels in comparison with Ateniese et al.’s.

Ching-Sheng Hsu, Shu-Fen Tu, Young-Chang Hou

A Fast Temporal Texture Synthesis Algorithm Using Segment Genetic Algorithm

Texture synthesis is a very active research area in computer vision and graphics, and temporal texture synthesis is one subset of it. We present a new temporal texture synthesis algorithm, in which a segment genetic algorithm is introduced into the processes of synthesizing videos. In the algorithm, by analyzing and processing a finite source video clip, Infinite video sequences that are played smoothly in vision can be obtained. Comparing with many temporal texture synthesis algorithms nowadays, this algorithm can get high-quality video results without complicated pre-processing of source video while it improves the efficiency of synthesis. It is analyzed in this paper that how the population size and the Max number of generations influence the speed and quality of synthesis.

Li Wen-hui, Meng Yu, Zhang Zhen-hua, Liu Dong-fei, Wang Jian-yuan

Quantum-Behaved Particle Swarm Optimization with Immune Operator

In the previous paper, we proposed Quantum-behaved Particle Swarm Optimization (QPSO) that outperforms traditional standard Particle Swarm Optimization (SPSO) in search ability as well as less parameter to control. However, although QPSO is a global convergent search method, the intelligence of simulating the ability of human beings is deficient. In this paper, the immune operator based on the vector distance to calculate the density of antibody is introduced into Quantum-behaved Particle Swarm Optimization. The proposed algorithm incorporates the immune mechanism in life sciences and global search method QPSO to improve the intelligence and performance of the algorithm and restrain the degeneration in the process of optimization effectively. The results of typical optimization functions showed that QPSO with immune operator performs better than SPSO and QPSO without immune operator.

Jing Liu, Jun Sun, Wenbo Xu

Particle Swarm Optimization-Based SVM for Incipient Fault Classification of Power Transformers

A successful adoption and adaptation of the particle swarm optimization (PSO) algorithm is presented in this paper. It improves the performance of Support Vector Machine (SVM) in the classification of incipient faults of power transformers. A PSO-based encoding technique is developed to improve the accuracy of classification. The proposed scheme is capable of removing misleading input features and, optimizing the kernel parameters at the same time. Experiments on real operational data had demonstrated the effectiveness and efficiency of the proposed approach. The power system industry can benefit from our system in both the accelerated operational speed and the improved accuracy in the classification of incipient faults.

Tsair-Fwu Lee, Ming-Yuan Cho, Chin-Shiuh Shieh, Hong-Jen Lee, Fu-Min Fang

AntTrend: Stigmergetic Discovery of Spatial Trends

Large amounts of spatially referenced data have been aggregated in various application domains such as Geographic Information Systems (GIS), banking and retailing that motivate the highly demanding field of spatial data mining. So far many beneficial optimization solutions have been introduced inspired by the foraging behavior of ant colonies. In this paper a novel algorithm named AntTrend is proposed for efficient discovery of

spatial trends

. AntTrend applies the emergent intelligent behavior of ant colonies to handle the huge search space encountered in the discovery of this valuable knowledge. Ant agents in AntTrend share their individual experience of trend detection by exploiting the phenomenon of

stigmergy

. Many experiments were run on a real banking spatial database to investigate the properties of the algorithm. The results show that AntTrend has much higher efficiency both in performance of the discovery process and in the quality of patterns discovered compared to non-intelligent methods.

Ashkan Zarnani, Masoud Rahgozar, Caro Lucas, Azizollah Memariani

Genetic Algorithm Based Approach for Multi-UAV Cooperative Reconnaissance Mission Planning Problem

Multiple UAV cooperative reconnaissance is one of the most important aspects of UAV operations. This paper presents a genetic algorithm(GA) based approach for multiple UAVs cooperative reconnaissance mission planning problem. The objective is to conduct reconnaissance on a set of targets within predefined time windows at minimum cost, while satisfying the reconnaissance resolution demands of the targets, and without violating the maximum travel time for each UAV. A mathematical formulation is presented for the problem, taking the targets reconnaissance resolution demands and time windows constraints into account, which are always ignored in previous approaches. Then a GA based approach is put forward to resolve the problem. Our GA implementation uses integer string as the chromosome representation, and incorporates novel evolutionary operators, including a subsequence crossover operator and a forward insertion mutation operator. Finally the simulation results show the efficiency of our algorithm.

Jing Tian, Lincheng Shen, Yanxing Zheng

Improving SVM Training by Means of NTIL When the Data Sets Are Imbalanced

This paper deals with the problem of training a discriminative classifier when the data sets are imbalanced. More specifically, this work is concerned with the problem of classify a sample as belonging, or not, to a Target Class (TC), when the number of examples from the “Non-Target Class” (NTC) is much higher than those of the TC. The effectiveness of the heuristic method called

Non Target Incremental Learning

(NTIL) in the task of extracting, from the pool of NTC representatives, the most discriminant training subset with regard to the TC, has been proved when an Artificial Neural Network is used as classifier (ISMIS 2003). In this paper the effectiveness of this method is also shown for Support Vector Machines.

Carlos E. Vivaracho

Evolutionary Induction of Cost-Sensitive Decision Trees

In the paper, a new method for cost-sensitive learning of decision trees is proposed. Our approach consists in extending the existing evolutionary algorithm (EA) for global induction of decision trees. In contrast to the classical top-down methods, our system searches for the whole tree at the moment. We propose a new fitness function which allows the algorithm to minimize expected cost of classification defined as a sum of misclassification cost and cost of the tests. The remaining components of EA i.e. the representation of solutions and the specialized genetic search operators are not changed. The proposed method is experimentally validated and preliminary results show that the global approach is able to effectively induce cost-sensitive decision trees.

Marek Krętowski, Marek Grześ

Triangulation of Bayesian Networks Using an Adaptive Genetic Algorithm

The search for an optimal node elimination sequence for the triangulation of Bayesian networks is an NP-hard problem. In this paper, a new method, called the TAGA algorithm, is proposed to search for the optimal node elimination sequence. TAGA adjusts the probabilities of crossover and mutation operators by itself, and provides an adaptive ranking-based selection operator that adjusts the pressure of selection according to the evolution of the population. Therefore the algorithm not only maintains the diversity of the population and avoids premature convergence, but also improves on-line and off-line performances. Experimental results show that the TAGA algorithm outperforms a simple genetic algorithm, an existing adaptive genetic algorithm, and simulated annealing on three Bayesian networks.

Hao Wang, Kui Yu, Xindong Wu, Hongliang Yao

Intelligent Agent Technology

Intelligent Agents That Make Informed Decisions

Electronic markets with access to the Internet and the World Wide Web, are information-rich and require agents that can assimilate and use real-time information flows wisely. A new breed of “information-based” agents aims to meet this requirement. They are founded on concepts from information theory, and are designed to operate with information flows of varying and questionable integrity. These agents are part of a larger project that aims to make informed automated trading in applications such as eProcurement a reality.

John Debenham, Elaine Lawrence

Using Intelligent Agents in e-Government for Supporting Decision Making About Service Proposals

This paper aims at studying the possibility of exploiting the Intelligent Agent technology in e-government for supporting the decision making activity of government agencies. Specifically, it proposes a system to assist managers of a government agency, who plan to propose a new service, to identify those citizens that could gain the highest benefit from it. The paper illustrates the proposed system and reports some experimental results.

Pasquale De Meo, Giovanni Quattrone, Domenico Ursino

A Butler Agent for Personalized House Control

This paper illustrates our work concerning the development of an agent-based architecture for the control of a smart home environment. In particular, we focus the description on a particular component of the system: the Butler Interactor Agent (BIA). BIA has the role of mediating between the agents controlling environment devices and the user. As any good butler, it is able to observe and learn about users preferences but it leaves to its “owner” the last word on critical decisions. This is possible by employing user and context modeling techniques in order to provide a dynamic adaptation of the interaction with the environment according to the vision of ambient intelligence. Moreover, in order to support trust, this agent is able to adapt its autonomy on the basis of the received user delegation.

Berardina De Carolis, Giovanni Cozzolongo, Sebastiano Pizzutilo

Incremental Aggregation on Multiple Continuous Queries

Continuously monitoring large-scale aggregates over data streams is important for many stream processing applications, e.g. collaborative intelligence analysis, and presents new challenges to data management systems. The first challenge is to efficiently generate the updated aggregate values and provide the new results to users after new tuples arrive. We implemented an incremental aggregation mechanism for doing so for arbitrary algebraic aggregate functions including user-defined ones by keeping up-to-date finite data summaries. The second challenge is to construct shared query evaluation plans to support large-scale queries effectively. Since multiple query optimization is NP-complete and the queries generally arrive asynchronously, we apply an incremental sharing approach to obtain the shared plans that perform reasonably well. The system is built as a part of ARGUS, a stream processing system atop of a DBMS. The evaluation study shows that our approaches are effective and efficient on typical collaborative intelligence analysis data and queries.

Chun Jin, Jaime Carbonell

Location-Aware Multi-agent Based Intelligent Services in Home Networks

The development of intelligent multi-agent systems involves a number of concerns, including mobility, context-awareness, reasoning and mining. Towards ubiquitous intelligence this area of research addresses the intersection between mobile agents, heterogeneous networks, and ubiquitous intelligence. This paper presents a development of hardware and software systems to address the combination of these interests as Location-Aware Service. Our architecture performs the intelligent services to meet the respective requirements. By adding autonomous mobility to the agents, the system becomes more able to dynamically localize around areas of interest and adapt to changes in the ubiquitous intelligence landscape. We also analyze some lessons learned based on our experience in using location-aware multi-agent techniques and methods.

Minsoo Lee, Yong Kim, Yoonsik Uhm, Zion Hwang, Gwanyeon Kim, Sehyun Park, Ohyoung Song

A Verifiable Logic-Based Agent Architecture

In this paper, we present the

${\mathcal{S}}$

CIFF platform for multi-agent systems.

The platform is based on Abductive Logic Programming, with a uniform language for specifying agent policies and interaction protocols. A significant advantage of the computational logic foundation of the

${\mathcal{S}}$

CIFF framework is that the declarative specifications of agent policies and interaction protocols can be used directly, at runtime, as the programs for the agent instances and for the verification of compliance.

We also provide a definition of conformance of an agent policy to an interaction protocol (i.e., a property that guarantees that an agent will comply to a given protocol) and a operational procedure to test conformance.

Marco Alberti, Federico Chesani, Marco Gavanelli, Evelina Lamma, Paola Mello

Intelligent Information Retrieval

Flexible Querying of XML Documents

Text search engines are inadequate for indexing and searching XML documents because they ignore metadata and aggregation structure implicit in the XML documents. On the other hand, the query languages supported by specialized XML search engines are very complex. In this paper, we present a simple yet flexible query language, and develop its semantics to enable intuitively appealing

extraction

of relevant fragments of information while simultaneously falling back on

retrieval

through plain text search if necessary. We also present a simple yet robust relevance ranking for heterogeneous document-centric XML.

Krishnaprasad Thirunarayan, Trivikram Immaneni

VIRMA: Visual Image Retrieval by Shape MAtching

The huge amount of image collections connected to multimedia applications has brought forth several approaches to content-based image retrieval, that means retrieving images based on their visual content instead of textual descriptions. In this paper, we present a system, called VIRMA (Visual Image Retrieval by Shape MAtching), which combines different techniques from Computer Vision to perform content-based image retrieval based on shape matching. The architecture of the VIRMA system is portrayed and algorithms underpinning the developed prototype are briefly described. Application of VIRMA to a database of real-world pictorial images shows its effectiveness in visual image retrieval.

G. Castellano, C. Castiello, A. M. Fanelli

Asymmetric Page Split Generalized Index Search Trees for Formal Concept Analysis

Formal Concept Analysis is an unsupervised machine learning technique that has successfully been applied to document organisation by considering documents as objects and keywords as attributes. The basic algorithms of Formal Concept Analysis then allow an intelligent information retrieval system to cluster documents according to keyword views. This paper investigates the scalability of this idea. In particular we present the results of applying spatial data structures to large datasets in formal concept analysis. Our experiments are motivated by the application of the Formal Concept Analysis idea of a virtual filesystem [11,17,15]. In particular the libferris [1] Semantic File System. This paper presents customizations to an RD-Tree Generalized Index Search Tree based index structure to better support the application of Formal Concept Analysis to large data sources.

Ben Martin, Peter Eklund

Blind Signal Separation of Similar Pitches and Instruments in a Noisy Polyphonic Domain

In our continuing work on ”Blind Signal Separation” this paper focuses on extending our previous work [1] by creating a data set that can successfully perform blind separation of polyphonic signals containing similar instruments playing similar notes in a noisy environment. Upon isolating and subtracting the dominant signal from a base signal containing varying types and amounts of noise, even though we purposefully excluded any identical matches in the dataset, the signal separation system successfully built a resulting foreign set of synthesized sounds that the classifier correctly recognized. Herein, this paper presents a system that classifies and separates two harmonic signals with added noise. This novel methodology incorporates Knowledge Discovery, MPEG7-based segmentation and Inverse Fourier Transforms.

Rory A. Lewis, Xin Zhang, Zbigniew W. Raś

Score Distribution Approach to Automatic Kernel Selection for Image Retrieval Systems

This paper introduces a kernel selection method to automatically choose the best kernel type for a query by using the score distributions of the relevant and non-relevant images given by user as feedback. When applied to our data, the method selects the same best kernel (out of the 12 tried kernels) for a particular query as the kernel obtained from our extensive experimental results.

Anca Doloc-Mihu, Vijay V. Raghavan

Business Intelligence in Large Organizations: Integrating Which Data?

This paper describes a novel approach to business intelligence and program management for large technology enterprises like the U.S. National Aeronautics and Space Administration (NASA). Two key distinctions of the approach are that 1) standard business documents are the user interface, and 2) a “schema-less” XML database enables flexible integration of technology information for use by both humans and machines in a highly dynamic environment. The implementation utilizes patent-pending NASA software called the NASA Program Management Tool (PMT) and its underlying “schema-less” XML database called Netmark. Initial benefits of PMT include elimination of discrepancies between business documents that use the same information and “paperwork reduction” for program and project management in the form of reducing the effort required to understand standard reporting requirements and to comply with those reporting requirements. We project that the underlying approach to business intelligence will enable significant benefits in the timeliness, integrity and depth of business information available to decision makers on all organizational levels.

David Maluf, David Bell, Naveen Ashish, Peter Putz, Yuri Gawdiak

Intelligent Infomation Systems

Intelligent Methodologies for Scientific Conference Management

This paper presents the advantage that knowledge-intensive activities, such as Scientific Conference Management, can take by the exploitation of expert components in the key tasks. Typically, in this domain the task of scheduling the activities and resources or the assignment of reviewers to papers is performed manually leading therefore to time-consuming procedures with high degree of inconsistency due to many parameters and constraints to be considered. The proposed systems, evaluated on real conference datasets, show good results compared to manual scheduling and assignment, in terms of both accuracy and reduction of runtime.

Marenglen Biba, Stefano Ferilli, Nicola Di Mauro, Teresa M. A. Basile

The Consistent Data Replication Service for Distributed Computing Environments

This paper describes a data replication service for large-scale, data intensive applications whose results can be shared among geographically distributed scientists. We introduce two kinds of data replication techniques, called owner-initiated data replication and client-initiated data replication, that are developed to support data replica consistency without requiring any special file system-level locking functions. Also, we present performance results on Linux clusters located at Sejong University.

Jaechun No, Chang Won Park, Sung Soon Park

OntoBayes Approach to Corporate Knowledge

In this paper, we investigate the integration of virtual knowledge communities (VKC) into an ontology-driven uncertainty model (OntoBayes). The selected overall framework for OntoBayes is the multiagent paradigm. Agents modeled with OntoBayes have two parts: knowledge and decision making parts. The former is the ontology knowledge while the latter is based upon Bayesian Networks (BN). OntoBayes is thus designed in agreement with the Agent Oriented Abstraction (AOA) paradigm. Agents modeled with OntoBayes possess a common community layer that enables to define, describe and implement corporate knowledge. This layer consists of virtual knowledge communities.

Yi Yang, Jacques Calmet

About Inclusion-Based Generalized Yes/No Queries in a Possibilistic Database Context

This paper deals with the querying of possibilistic relational databases, by means of queries called generalized yes/no queries, whose general form is: “to what extent is it possible that the answer to Q satisfies property P”. Here, property P concerns the inclusion in the result, of a set of tuples specified by the user. A processing technique for such queries is proposed, which avoids computing the worlds attached to the possibilistic database.

Patrick Bosc, Nadia Lietard, Olivier Pivert

Flexible Querying of an Intelligent Information System for EU Joint Project Proposals in a Specific Topic

This paper presents a semantics-driven web portal allowing browsing and flexible querying state-of-the-art data on organizations, national and international projects, current research topics, technologies and software solutions in the domain of the European Research in the FP6/IST program. The backbone of this portal consists of an ontology-based knowledge base structuring information about the above-described area. The portal incorporates advanced information retrieval methods, based on ontologies and on natural language processing.

Stefan Trausan-Matu, Ruxandra Bantea, Vlad Posea, Diana Petrescu, Alexandru Gartner

Multi-expression Face Recognition Using Neural Networks and Feature Approximation

A human face is a complex object with features that can vary over time. Face recognition systems have been investigated while developing biometrics technologies. This paper presents a face recognition system that uses eyes, nose and mouth approximations for training a neural network to recognize faces in different expressions such as natural, smiley, sad and surprised. The developed system is implemented using our face database and the ORL face database. A comparison will be drawn between our method and two other face recognition methods; namely PCA and LDA. Experimental results suggest that our method performs well and provides a fast, efficient system for recognizing faces with different expressions.

Adnan Khashman, Akram A. Garad

A Methodology for Building Semantic Web Mining Systems

In this paper we present a methodology based on interoperability for building Semantic Web Mining systems. In particular we consider the still poorly investigated case of mining the Semantic Web layers of ontologies and rules. We argue that Inductive Logic Programming systems could serve the purpose if they were more compliant with the standards of representation for ontologies and rules in the Semantic Web and/or interoperable with well-established Ontological Engineering tools that support these standards.

Francesca A. Lisi

Content-Based Image Filtering for Recommendation

Content-based filtering can reflect content information, and provide recommendations by comparing various feature based information regarding an item. However, this method suffers from the shortcomings of superficial content analysis, the special recommendation trend, and varying accuracy of predictions, which relies on the learning method. In order to resolve these problems, this paper presents content-based image filtering, seamlessly combining content-based filtering and image-based filtering for recommendation. Filtering techniques are combined in a weighted mix, in order to achieve excellent results. In order to evaluate the performance of the proposed method, this study uses the EachMovie dataset, and is compared with the performance of previous recommendation studies. The results have demonstrated that the proposed method significantly outperforms previous methods.

Kyung-Yong Jung

Knowledge Representation and Integration

A Declarative Kernel for $\mathcal{ALC}$ Concept Descriptions

This work investigates on kernels that are applicable to semantic annotations expressed in Description Logics which are the theoretical counterpart of the standard representations for the Semantic Web. Namely, the focus is on the definition of a kernel for the

$\mathcal{ALC}$

logic, based both on the syntax and on the semantics of concept descriptions. The kernel is proved to be valid. Furthermore, semantic distance measures are induced from the kernel function.

Nicola Fanizzi, Claudia d’Amato

RDF as Graph-Based, Diagrammatic Logic

The Resource Description Framework (RDF) is the basic standard for representing information in the Semantic Web. It is mainly designed to be machine-readable and -processable. This paper takes the opposite side of view: RDF is investigated as a logic system designed for the needs of humans. RDF is developed as a logic system based on mathematical graphs, i.e., as

diagrammatic reasoning system

. As such, is has humanly-readable, diagrammatic representations. Moreover, a sound and complete calculus is provided. Its rules are suited to act on the diagrammatic representations. Finally, some normalforms for the graphs are introduced, and the calculus is modified to suit them.

Frithjof Dau

A Framework for Defining and Verifying Clinical Guidelines: A Case Study on Cancer Screening

Medical guidelines are clinical behaviour recommendations used to help and support physicians in the definition of the most appropriate diagnosis and/or therapy within determinate clinical circumstances. Due to the intrinsic complexity of such guidelines, their application is not a trivial task; hence it is important to verify if health-care workers behave in a conform manner w.r.t. the intended model, and to evaluate how much their behaviour differs.

In this paper we present the GPROVE framework that we are developing within a regional project to describe medical guidelines in a visual way and to automatically perform the conformance verification.

Federico Chesani, Pietro De Matteis, Paola Mello, Marco Montali, Sergio Storari

Preferred Generalized Answers for Inconsistent Databases

The aim of this paper consists in investigating the problem of managing inconsistent databases, i.e. databases violating integrity constraints. A flurry of research on this topic has shown that the presence of inconsistent data can be resolved by “repairing” the database, i.e. by providing a computational mechanism that ensures obtaining consistent “scenarios” of the information or by consistently answer to queries posed on an inconsistent set of data. This paper considers preferences among repairs and possible answers by introducing a partial order among them on the basis of some preference criteria. Moreover, the paper also extends the notion of preferred consistent answer by extracting from a set of preferred repaired database, the maximal consistent overlapping portion of the information, i.e. the information supported by each preferred repaired database.

L. Caroprese, S. Greco, I. Trubitsyna, E. Zumpano

Representation Interest Point Using Empirical Mode Decomposition and Independent Components Analysis

This paper presents a new interest point descriptors representation method based on empirical mode decomposition (EMD) and independent components analysis (ICA). The proposed algorithm first finds the characteristic scale and the location of the interest points using Harris-Laplacian interest point detector. We then apply the Hilbert transform to each component and get the amplitude and the instantaneous frequency as the feature vectors. Then independent components analysis is used to model the image subspace and reduces the dimension of the feature vectors. The aim of this algorithm is to find a meaningful image subspace and more compact descriptors. Combination the proposed descriptors with an effective interest point detector, the proposed algorithm has a more accurate matching rate besides the robustness towards image deformations.

Dongfeng Han, Wenhui Li, Xiaosuo Lu, Yi Wang, Ming Li

Integration of Graph Based Authorization Policies

In a distributed environment, authorizations are usually physically stored in several computers connected by a network. Each computer may have its own local policies which could conflict with the others. Therefore how to make a global decision from the local authorization policies is a crucial and practical problem for a distributed system. In this paper, three general integration models based on the degrees of node autonomy are proposed, and different strategies of integrating the local policies into the global policies in each model are systematically discussed. The discussion is based on the weighted authorization graph model that we proposed before.

Chun Ruan, Vijay Varadharajan

Knowledge Discovery and Data Mining

Supporting Visual Exploration of Discovered Association Rules Through Multi-Dimensional Scaling

Association rules are typically evaluated in terms of support and confidence measures, which ensure that discovered rules have enough positive evidence. However, in real-world applications, even considering only those rules with high confidence and support it is not true that all of them are interesting. It may happen that the presentation of all discovered rules can discourage users from interpreting them in order to find nuggets of knowledge. Association rules interpretation can benefit from discovering group of “similar” rules, where (dis)similarity is estimated on the basis of syntactic or semantic characteristics. In this paper, we resort to the multi-dimensional scaling to support a visual exploration of association rules by means of bi-dimensional scatter-plots. An application in the domain of biomedical literature is reported. Results show that the use of this visualization technique is beneficial.

Margherita Berardi, Annalisa Appice, Corrado Loglisci, Pietro Leo

Evaluating Learning Algorithms for a Rule Evaluation Support Method Based on Objective Rule Evaluation Indices

In this paper, we present an evaluation of learning algorithms of a novel rule evaluation support method for post-processing of mined results with rule evaluation models based on objective indices. Post-processing of mined results is one of the key processes in a data mining process. However, it is difficult for human experts to completely evaluate several thousands of rules from a large dataset with noises. To reduce the costs in such rule evaluation task, we have developed the rule evaluation support method with rule evaluation models which learn from a dataset. This dataset comprises objective indices for mined classification rules and evaluations by a human expert for each rule. To evaluate performances of learning algorithms for constructing the rule evaluation models, we have done a case study on the meningitis data mining as an actual problem. Furthermore, we have also evaluated our method with five rule sets obtained from five UCI datasets.

Hidenao Abe, Shusaku Tsumoto, Miho Ohsaki, Takahira Yamaguchi

Quality Assessment of k-NN Multi-label Classification for Music Data

This paper investigates problems related to quality assessment in the case of multi-label automatic classification of data, using

-Nearest Neighbor classifier. Various methods of assigning classes, as well as measures of assessing the quality of classification results are proposed and investigated both theoretically and in practical tests. In our experiments, audio data representing short music excerpts of various emotional contents were parameterized and then used for training and testing. Class labels represented emotions assigned to a given audio excerpt. The experiments show how various measures influence quality assessment of automatic classification of multi-label data.

Alicja Wieczorkowska, Piotr Synak

Effective Mining of Fuzzy Multi-Cross-Level Weighted Association Rules

This paper addresses fuzzy weighted multi-cross-level association rule mining. We define a fuzzy data cube, which facilitates for handling quantitative values of dimensional attributes, and hence allows for mining fuzzy association rules at different levels. A method is introduced for single dimension fuzzy weighted association rules mining. To the best of our knowledge, none of the studies described in the literature considers weighting the internal nodes in such taxonomy. Only items appearing in transactions are weighted to find more specific and important knowledge. But, sometimes weighting internal nodes on a tree may be more meaningful and enough. We compared the proposed approach to an existing approach that does not utilize fuzziness. The reported experimental results demonstrate the effectiveness and applicability of the proposed fuzzy weighted multi-cross-level mining approach.

Mehmet Kaya, Reda Alhajj

A Methodological Contribution to Music Sequences Analysis

In this paper we present a stepwise method for the analysis of musical sequences. The starting point is either a MIDI file or the score of a piece of music. The result is a set of likely themes and motifs. The method relies on a pitch intervals representation of music and an event discovery system that extracts significant and repeated patterns from sequences. We report and discuss the results of a preliminary experimentation, and outline future enhancements.

Daniele P. Radicioni, Marco Botta

Towards a Framework for Inductive Querying

Despite many recent developments, there are still a number of central issues in inductive databases that need more research. In this paper we address two of them. The first issue is about how the discovery of patterns can use existing patterns. We will give a concrete example showing an advantage of mining both the patterns and the data. The second issue we consider is the actual implementation of inductive databases. We will propose an architectural framework for inductive databases and show how existing databases can be incorporated.

Jeroen S. de Bruin

Towards Constrained Co-clustering in Ordered 0/1 Data Sets

Within 0/1 data, co-clustering provides a collection of bi-clusters, i.e., linked clusters for both objects and Boolean properties. Beside the classical need for grouping quality optimization, one can also use user-defined constraints to capture subjective interestingness aspects and thus to improve bi-cluster relevancy. We consider the case of 0/1 data where at least one dimension is ordered, e.g., objects denotes time points, and we introduce co-clustering constrained by interval constraints. Exploiting such constraints during the intrinsically heuristic clustering process is challenging. We propose one major step in this direction where bi-clusters are computed from collections of local patterns. We provide an experimental validation on two temporal gene expression data sets.

Ruggero G. Pensa, Céline Robardet, Jean-François Boulicaut

A Comparative Analysis of Clustering Methodology and Application for Market Segmentation: K-Means, SOM and a Two-Level SOM

The purpose of our research is to identify the critical variables, to evaluate the performance of variable selection, to evaluate the performance of a two-level SOM and to implement this methodology into Asian online game market segmentation. Conclusively, our results suggest that weight-based variable selection is more useful for market segmentation than full-based and SEM-based variable selection. Additionally, a two-level SOM is more accurate in classification than K-means and SOM. The critical segmentation variables and the characteristics of target customers were different among countries. Therefore, online game companies should develop diverse marketing strategies based on characteristics of their target customers using research framework we propose.

Sang-Chul Lee, Ja-Chul Gu, Yung-Ho Suh

Action Rules Discovery, a New Simplified Strategy

A new strategy for discovering action rules (or interventions) is presented in this paper. The current methods [14], [12], [8] require to discover classification rules before any action rule can be constructed from them. Several definitions of action rules [8], [13], [9], [3] have been proposed. They differ in the generality of their classification parts but they are always constructed from certain pairs of classification rules. Our new strategy defines the classification part of an action rule in a unique way. Also, action rules are constructed from single classification rules. We show how to compute their confidence and support. Action rules are used to reclassify objects. In this paper, we propose a method for measuring the level of reclassification freedom for objects in a decision system.

Zbigniew W. Raś, Agnieszka Dardzińska

Characteristics of Indiscernibility Degree in Rough Clustering Examined Using Perfect Initial Equivalence Relations

In this paper, we analyze the influence of the indiscernibility degree, which is a primary parameter in rough clustering, on cluster formation. Rough clustering consists of two steps: (1)assignment of initial equivalence relations and (2)iterative refinement of the initial relations. Indiscernibility degree plays a key role in the second step, but it is not easy to independently analyze its characteristics because it inherits the results of step 1. In this paper, we employ the perfect initial equivalence relations, which were generated according to class labels of data, to seclude the influence of step 1. We first examine the relationship between the threshold value of indiscernibility degree and resultant clusters. After that, we apply random disturbance to the perfect relations, and examine how the result changes. The results demonstrated that the relationships between indiscernibility degree and the number of clusters draw a globally convex but multi-modal curve, and the range of indiscernibility degree that yields best cluster validity may exist on a local minimum around the global one which generates single cluster.

Shoji Hirano, Shusaku Tsumoto

Implication Strength of Classification Rules

This paper highlights the interest of implicative statistics for classification trees. We start by showing how Gras’ implication index may be defined for the rules derived from an induced decision tree. Then, we show that residuals used in the modeling of contingency tables provide interesting alternatives to Gras’ index. We then consider two main usages of these indexes. The first is purely descriptive and concerns the a posteriori individual evaluation of the classification rules. The second usage, considered for instance by [15], relies upon the intensity of implication to define the conclusion in each leaf of the induced tree.

Gilbert Ritschard, Djamel A. Zighed

A New Clustering Approach for Symbolic Data and Its Validation: Application to the Healthcare Data

Graph coloring is used to characterize some properties of graphs. A

b-coloring

of a graph

(using colors 1,2,...,k) is a coloring of the vertices of

such that (i) two neighbors have different colors (proper coloring) and (ii) for each color class there exists a dominating vertex which is adjacent to all other

k-1

color classes. In this paper, based on a

b-coloring

of a graph, we propose a new clustering technique. Additionally, we provide a cluster validation algorithm. This algorithm aims at finding the optimal number of clusters by evaluating the property of

color dominating vertex

. We adopt this clustering technique for discovering a new typology of hospital stays in the French healthcare system.

Haytham Elghazel, Véronique Deslandres, Mohand-Said Hacid, Alain Dussauchoy, Hamamache Kheddouci

Action Rules Discovery System DEAR_3

E-action rules, introduced in [8], represent actionability knowledge hidden in a decision system. They enhance action rules [3] and extended action rules [4], [6], [7] by assuming that data can be either symbolic or nominal. Several efficient strategies for mining e-action rules have been developed [6], [7], [5], and [8]. All of them assume that data are complete. Clearly, this constraint has to be relaxed since information about attribute values for some objects can be missing or represented as multi-values. To solve this problem, we present DEAR_3 which is an e-action rule generating algorithm. It has three major improvements in comparison to DEAR_2: handling data with missing attribute values and uncertain attribute values, and pruning outliers at its earlier stage.

Li-Shiang Tsay, Zbigniew W. Raś

Mining and Modeling Database User Access Patterns

We present our approach to mining and modeling the behavior of database users. In particular, we propose graphic models to capture the database user’s dynamic behavior and focus on applying data mining techniques to the problem of mining and modeling database user behaviors from database trace logs. The experimental results show that our approach can discover and model user behaviors successfully.

Qingsong Yao, Aijun An, Xiangji Huang

Logic for AI and Logic Programming

Belief Revision in the Situation Calculus Without Plausibility Levels

The Situation Calculus has been used by Scherl and Levesque to represent beliefs and belief change without modal operators thanks to a predicate plays the role of an accessibility relation. Their approach has been extended by Shapiro et al. to support belief revision. In this extension plausibility levels are assigned to each situation, and the believed propositions are the propositions that are true in all the most plausible accessible situations.

Their solution is quite elegant from a theoretical point of view but the definition of the plausibility assignment, for a given application domain, raises practical problems.

This paper presents a new proposal that does not make use of plausibilities. The idea is to include the knowledge producing actions into the successor state axioms. In this framework each agent may have a different successor state axiom for a given fluent. Then, each agent may have his subjective view of the evolution of the world. Also, agents may know or may not know that a given action has been performed. That is, the actions are not necessarily public.

Robert Demolombe, Pilar Pozos Parra

Norms, Institutional Power and Roles: Towards a Logical Framework

In the design of the organisation of a multiagent system the concept of role is fundamental. We informally analyse this concept through examples. Then we propose a more formal definition that can be decomposed into: the conditions that have to be satisfied to hold a role, the norms and institutional powers that apply to a role holder. Finally, we present a modal logical framework to represent these concepts.

Robert Demolombe, Vincent Louis

Adding Efficient Data Management to Logic Programming Systems

This paper considers the problem of reasoning on massive amounts of (possibly distributed) data. Presently, existing proposals show some limitations:

(i)

the quantity of data that can be handled contemporarily is limited, due to the fact that reasoning is generally carried out in main memory;

(ii)

the interaction with external (and independent) DBMSs is not trivial and, in several cases, not allowed at all;

(iii)

the efficiency of present implementations is still not sufficient for their utilization in complex reasoning tasks involving massive amounts of data. This paper provides a contribution in this setting; it presents a new system, called DLV

, which aims to solve all these problems.

G. Terracina, N. Leone, V. Lio, C. Panetta

A Logic-Based Approach to Model Supervisory Control Systems

We present an approach to model supervisory control systems based on extended behaviour networks. In particular, we employ them to formalize the control theory of the supervisor. By separating the reasoning in the supervisor and the action implementation in the controller, the overall system architecture becomes modular, and therefore easily changeable and modifiable.

Pierangelo Dell’Acqua, Anna Lombardi, Luís Moniz Pereira

SAT as an Effective Solving Technology for Constraint Problems

In this paper we investigate the use of SAT technology for solving constraint problems. In particular, we solve many instances of several common benchmark problems for CP with different SAT solvers, by exploiting the declarative modelling language

NPSpec

, and

Spec2Sat

, an application that allows us to compile

NPSpec

specifications into SAT instances. Furthermore, we start investigating whether some reformulation techniques already used in CP are effective when using SAT as solving engine. We present encouraging experimental results in this direction, showing that this approach can be appealing.

Marco Cadoli, Toni Mancini, Fabio Patrizi

Dependency Tree Semantics

This paper presents Dependency Tree Semantics (DTS), an underspecified logic for representing quantifier scope ambiguities. DTS features a direct interface with a Dependency grammar, an easy management of partial disambiguations and the ability to represent branching quantifier readings. This paper focuses on the syntax of DTS, while does not take into account the model-theoretic interpretation of its well-formed structures.

Leonardo Lesmo, Livio Robaldo

Machine Learning

Mining Tolerance Regions with Model Trees

Many problems encountered in practice involve the prediction of a continuous attribute associated with an example. This problem, known as regression, requires that samples of past experience with known continuous answers are examined and generalized in a regression model to be used in predicting future examples. Regression algorithms deeply investigated in statistics, machine learning and data mining usually lack measures to give an indication of how “good” the predictions are. Tolerance regions, i.e., a range of possible predictive values, can provide a measure of reliability for every bare prediction. In this paper, we focus on tree-based prediction models, i.e., model trees, and resort to the inductive inference to output tolerance regions in addition to bare prediction. In particular, we consider model trees mined by SMOTI (Stepwise Model Tree Induction) that is a system for data-driven stepwise construction of model trees with regression and splitting nodes and we extend the definition of trees to build tolerance regions to be associated with each leaf. Experiments evaluate validity and quality of output tolerance regions.

Annalisa Appice, Michelangelo Ceci

Lazy Learning from Terminological Knowledge Bases

This work presents a method founded on instance-based learning algorithms for inductive (memory-based) reasoning on ABoxes. The method, which exploits a semantic dissimilarity measure between concepts and instances, can be employed both to infer class membership of instances and to predict hidden assertions that are not logically entailed from the knowledge base and need to be successively validated by humans (e.g. a knowledge engineer or a domain expert). In the experimentation, we show that the method can effectively help populating an ontology with likely assertions that could not be logically derived.

Claudia d’Amato, Nicola Fanizzi

Diagnosis of Incipient Fault of Power Transformers Using SVM with Clonal Selection Algorithms Optimization

In this study we explore the feasibility of applying Artificial Neural Networks (ANN) and Support Vector Machines (SVM) to the prediction of incipient power transformer faults. A clonal selection algorithm (CSA) is introduced for the first time in the literature to select optimal input features and RBF kernel parameters. CSA is shown to be capable of improving the speed and accuracy of classification systems by removing redundant and potentially confusing input features, and of optimizing the kernel parameters simultaneously. Simulation results on practice data demonstrate the effectiveness and high efficiency of the proposed approach.

Tsair-Fwu Lee, Ming-Yuan Cho, Chin-Shiuh Shieh, Hong-Jen Lee, Fu-Min Fang

An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers

Separate-and-conquer classifiers strongly depend on the criteria used to choose which rules will be included in the classification model. When association rules are employed to build such classifiers (as in ART [3]), rule evaluation can be performed attending to different criteria (other than the traditional confidence measure used in association rule mining). In this paper, we analyze the desirable properties of such alternative criteria and their effect in building rule-based classifiers using a separate-and-conquer strategy.

Fernando Berzal, Juan-Carlos Cubero, Nicolás Marín, José-Luis Polo

Learning Students’ Learning Patterns with Support Vector Machines

Using Bayesian networks as the representation language for student modeling has become a common practice. Many computer-assisted learning systems rely exclusively on human experts to provide information for constructing the network structures, however. We explore the possibility of applying mutual information-based heuristics and support vector machines to learn how students learn composite concepts, based on students’ item responses to test items. The problem is challenging because it is well known that students’ performances in taking tests do not reflect their competences faithfully. Experimental results indicate that the difficulty of identifying the true learning patterns varies with the degree of uncertainty in the relationship between students’ performances in tests and their abilities in concepts. When the degree of uncertainty is moderate, it is possible to infer the unobservable learning patterns from students’ external performances with computational techniques.

Chao-Lin Liu

Practical Approximation of Optimal Multivariate Discretization

Discretization of the value range of a numerical feature is a common task in data mining and machine learning. Optimal multivariate discretization is in general computationally intractable. We have proposed approximation algorithms with performance guarantees for training error minimization by axis-parallel hyperplanes. This work studies their efficiency and practicability. We give efficient implementations to both greedy set covering and linear programming approximation of optimal multivariate discretization. We also contrast the algorithms empirically to an efficient heuristic discretization method.

Tapio Elomaa, Jussi Kujala, Juho Rousu

Optimisation and Evaluation of Random Forests for Imbalanced Datasets

This paper deals with an optimization of Random Forests which aims at: adapting the concept of forest for learning imbalanced data as well as taking into account user’s wishes as far as recall and precision rates are concerned. We propose to adapt Random Forest on two levels. First of all, during the forest creation thanks to the use of asymmetric entropy measure associated to specific leaf class assignation rules. Then, during the voting step, by using an alternative strategy to the classical majority voting strategy. The automation of this second step requires a specific methodology for results quality assessment. This methodology allows the user to define his wishes concerning (1) recall and precision rates for each class of the concept to learn, and, (2) the importance he wants to confer to each one of those classes. Finally, results of experimental evaluations are presented.

Julien Thomas, Pierre-Emmanuel Jouve, Nicolas Nicoloyannis

Improving SVM-Linear Predictions Using CART for Example Selection

This paper describes the study on example selection in regression problems using

-SVM (Support Vector Machine) linear as prediction algorithm. The motivation case is a study done on real data for a problem of bus trip time prediction. In this study we use three different training sets: all the examples, examples from past days similar to the day where prediction is needed, and examples selected by a CART regression tree. Then, we verify if the CART based example selection approach is appropriate on different regression data sets. The experimental results obtained are promising.

João M. Moreira, Alípio M. Jorge, Carlos Soares, Jorge Freire de Sousa

Simulated Annealing Algorithm with Biased Neighborhood Distribution for Training Profile Models

Functional biological sequences, which typically come in families, have retained some level of similarity and function during evolution. Finding consensus regions, alignment of sequences, and identifying the relationship between a sequence and a family allow inferences about the function of the sequences. Profile hidden Markov models (HMMs) are generally used to identify those relationships. A profile HMM can be trained on unaligned members of the family using conventional algorithms such as Baum-Welch, Viterbi, and their modifications. The overall quality of the alignment depends on the quality of the trained model. Unfortunately, the conventional training algorithms converge to suboptimal models most of the time. This work proposes a training algorithm that early identifies many imperfect models. The method is based on the Simulated Annealing approach widely used in discrete optimization problems. The training algorithm is implemented as a component in HMMER. The performance of the algorithm is discussed on protein sequence data.

Anton Bezuglov, Juan E. Vargas

A Conditional Model for Tonal Analysis

Tonal harmony analysis is arguably one of the most sophisticated tasks that musicians deal with. It combines general knowledge with contextual cues, being ingrained with both faceted and evolving objects, such as musical language, execution style, or even taste. In the present work we introduce

breve

, a system for tonal analysis.

breve

automatically learns to analyse music using the recently developed framework of conditional models. The system is presented and assessed on a corpus of Western classical pieces from the 18

to the late 19

Centuries repertoire. The results are discussed and interesting issues in modeling this problem are drawn.

Daniele P. Radicioni, Roberto Esposito

Hypothesis Diversity in Ensemble Classification

The paper discusses the issue of hypothesis diversity in ensemble classifiers. The measures of diversity previously proposed in the literature are analyzed inside a unifying framework based on Monte Carlo stochastic algorithms. The paper shows that no measure is useful to predict ensemble performance, because all of them have only a very loose relation with the expected accuracy of the classifier.

Lorenza Saitta

Complex Adaptive Systems: Using a Free-Market Simulation to Estimate Attribute Relevance

The authors have implemented a complex adaptive simulation of an agent-based exchange to estimate the relative importance of attributes in a data set. This simulation uses an individual, transaction-based voting mechanism to help the system estimate the importance of each variable at the system/aggregate level. Two variations of information gain – one using entropy and one using similarity – were used to demonstrate that the resulting estimates can be computed using a smaller subset of the data and greater accommodation for missing and erroneous data than traditional methods.

Christopher N. Eichelberger, Mirsad Hadžikadić

Text Mining

Exploring Phrase-Based Classification of Judicial Documents for Criminal Charges in Chinese

Phrases provide a better foundation for indexing and retrieving documents than individual words. Constituents of phrases make other component words in the phrase less ambiguous than when the words appear separately. Intuitively, classifiers that employ phrases for indexing should perform better than those that use words. Although pioneers have explored the possibility of indexing English documents decades ago, there are relatively fewer similar attempts for Chinese documents, partially because segmenting Chinese text into words correctly is not easy already. We build a domain dependent word list with the help of Chien’s PAT tree-based method and HowNet, and use the resulting word list for defining relevant phrases for classifying Chinese judicial documents. Experimental results indicate that using phrases for indexing indeed allows us to classify judicial documents that are closely similar to each other. With a relatively more efficient algorithm, our classifier offers better performances than those reported in related works.

Chao-Lin Liu, Chwen-Dar Hsieh

Regularization for Unsupervised Classification on Taxonomies

We study unsupervised classification of text documents into a taxonomy of concepts annotated by only a few keywords. Our central claim is that the structure of the taxonomy encapsulates background knowledge that can be exploited to improve classification accuracy. Under our

hierarchical Dirichlet

generative model for the document corpus, we show that the unsupervised classification algorithm provides robust estimates of the classification parameters by performing

regularization

, and that our algorithm can be interpreted as a regularized

algorithm. We also propose a technique for the automatic choice of the regularization parameter. In addition we propose a regularization scheme for

K-means

for hierarchies. We experimentally demonstrate that both our regularized clustering algorithms achieve a higher classification accuracy over simple models like minimum distance,

Naïve Bayes

and

K-means

Diego Sona, Sriharsha Veeramachaneni, Nicola Polettini, Paolo Avesani

A Proximity Measure and a Clustering Method for Concept Extraction in an Ontology Building Perspective

In this paper, we study the problem of clustering textual units in the framework of helping an expert to build a specialized ontology. This work has been achieved in the context of a French project, called

Biotim

, handling botany corpora. Building an ontology, either automatically or semi-automatically is a difficult task. We focus on one of the main steps of that process, namely structuring the textual units occurring in the texts into classes, likely to represent concepts of the domain. The approach that we propose relies on the definition of a new non-symmetrical measure for evaluating the semantic proximity between lemma, taking into account the contexts in which they occur in the documents. Moreover, we present a non-supervised classification algorithm designed for the task at hand and that kind of data. The first experiments performed on botanical data have given relevant results.

Guillaume Cleuziou, Sylvie Billot, Stanislas Lew, Lionel Martin, Christel Vrain

An Intelligent Personalized Service for Conference Participants

This paper presents the integration of linguistic knowledge in learning

semantic

user profiles able to represent user interests in a more effective way with respect to classical keyword-based profiles. Semantic profiles are obtained by integrating a naïve Bayes approach for text categorization with a word sense disambiguation strategy based on the WordNet lexical database (Section 2). Semantic profiles are exploited by the “conference participant advisor” service in order to suggest papers to be read and talks to be attended by a conference participant. Experiments on a real dataset show the effectiveness of the service (Section 3).

Marco Degemmis, Pasquale Lops, Pierpaolo Basile

Contextual Maps for Browsing Huge Document Collections

The increasing number of documents returned by search engines for typical requests makes it necessary to look for new methods of representation of contents of the results, like document maps. Though visually impressive, doc maps (e.g. WebSOM) are extensively resource consuming and hard to use for huge collections.

In this paper, we present a novel approach, which does not require creation of a complex, global map-based model for the whole document collection. Instead, a hierarchy of topic-sensitive maps is created. We argue that such approach is not only much less complex in terms of processing time and memory requirement, but also leads to a robust map-based browsing of the document collection.

Krzysztof Ciesielski, Mieczysław A. Kłopotek

Web Intelligence

Classification of Polish Email Messages: Experiments with Various Data Representations

Machine classification of Polish language emails into user-specific folders is considered. We experimentally evaluate the impact of different approaches to construct data representation of emails on the accuracy of classifiers. Our results show that language processing techniques have smaller influence than an appropriate selection of features, in particular ones coming from the email header or its attachments.

Jerzy Stefanowski, Marcin Zienkowicz

Combining Multiple Email Filters Based on Multivariate Statistical Analysis

In this paper, we investigate how to combine multiple e-mail filters based on multivariate statistical analysis for providing a barrier to spam, which is stronger than a single filter alone. Three evaluation criteria are suggested for cost-sensitive filters, and their rationality is discussed. Furthermore, a principle that minimizes the error cost is described to avoid filtering an e-mail of “Legitimate” into “Spam”. Comparing with other major methods, the experimental results show that our method of combining multiple filters has preferable performance when appropriate running parameters are adopted.

Wenbin Li, Ning Zhong, Chunnian Liu

Employee Profiling in the Total Reward Management

The Human Resource departments are now facing a new challenge: how to contribute in the definition of incentive plans and professional development? The participation of the line managers in answering this question is fundamental, since they are those who best know the single individuals; but they do not have the necessary background. In this paper, we present the

Team Advisor

project, which goal is to enable the line managers to be in charge of their own development plans by providing them with a personalized and contextualized set of information about their teams. Several experiments are reported, together with a discussion of the results.

Silverio Petruzzellis, Oriana Licchelli, Ignazio Palmisano, Valeria Bavaro, Cosimo Palmisano

Mining Association Rules in Temporal Document Collections

In this paper we describe how to mine association rules in temporal document collections. We describe how to perform the various steps in the temporal text mining process, including data cleaning, text refinement, temporal association rule mining and rule post-processing. We also describe the Temporal Text Mining Testbench, which is a user-friendly and versatile tool for performing temporal text mining, and some results from using this tool.

Kjetil Nørvåg, Trond Øivind Eriksen, Kjell-Inge Skogstad

Self-supervised Relation Extraction from the Web

Web extraction systems attempt to use the immense amount of unlabeled text in the Web in order to create large lists of entities and relations. Unlike traditional IE methods, the Web extraction systems do not label every mention of the target entity or relation, instead focusing on extracting as many different instances as possible while keeping the precision of the resulting list reasonably high. SRES is a self-supervised Web relation extraction system that learns powerful extraction patterns from unlabeled text, using short descriptions of the target elations and their attributes. SRES automatically generates the training data needed for its pattern-learning component. We also compare the performance of SRES to the performance of the state-of-the-art KnowItAll system, and to the performance of its pattern learning component, which uses a simpler and less powerful pattern language than SRES.

Ronen Feldman, Benjamin Rosenfled, Stephen Soderland, Oren Etzioni

Backmatter

Titel: Foundations of Intelligent Systems
herausgegeben von: Floriana Esposito
Zbigniew W. Raś
Donato Malerba
Giovanni Semeraro
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-45766-4
Print ISBN: 978-3-540-45764-0
DOI: https://doi.org/10.1007/11875604