main-content

## Über dieses Buch

This book constitutes the refereed proceedings of the 13th International Conference on Flexible Query Answering Systems, FQAS 2019, held in Amantea, Italy, in July 2019.

The 27 full papers and 10 short papers presented were carefully reviewed and selected from 43 submissions. The papers present emerging research trends with a special focus on flexible querying and analytics for smart cities and smart societies in the age of big data. They are organized in the following topical sections: flexible database management and querying; ontologies and knowledge bases; social networks and social media; argumentation-based query answering; data mining and knowledge discovery; advanced flexible query answering methodologies and techniques; flexible query answering methods and techniques; flexible intelligent information-oriented and network-oriented approaches; big data veracity and soft computing; flexibility in tools; and systems and miscellanea.

## Inhaltsverzeichnis

### Logic, Machine Learning, and Security

Logic stands at the very heart of computer science. In this talk, I will argue that logic is also an essential part of machine learning and that it has a fundamental role to play in both international security and counter-terrorism. I will first briefly describe the use of logic for high-level reasoning in counter-terrorism applications and then describe the BEEF system to explain the forecasts generated by virtually any machine learning classifier. Finally, I will describe one use of logic in deceiving cyber-adversaries who may have successfully compromised an enterprise network.

V. S. Subrahmanian

### Personal Big Data, GDPR and Anonymization

Big data are analyzed to reveal patterns, trends and associations, especially relating to human behavior and interactions. However, according to the European General Data Protection Regulation (GDPR), which is becoming a de facto global data protection standard, any intended uses of personally identifiable information (PII) must be clearly specified and explicitly accepted by the data subjects. Furthermore, PII cannot be accumulated for secondary use. Thus, can exploratory data uses on PII be GDPR-compliant? Hardly so.Resorting to anonymized data sets instead of PII is a natural way around, for anonymized data fall outside the scope of GDPR. The problem is that anonymization techniques, based on statistical disclosure control and privacy models, use algorithms and assumptions from the time of small data that must be thoroughly revised, updated or even replaced to deal with big data.Upgrading big data anonymization to address the previous challenge needs to empower users (by giving them useful anonymized data), subjects (by giving them control on anonymization) and controllers (by simplifying anonymization and making it more flexible).

Josep Domingo-Ferrer

### Tutorial: Supervised Learning for Prevalence Estimation

Quantification is the task of estimating, given a set $$\sigma$$ of unlabelled items and a set of classes $$\mathcal {C}$$ , the relative frequency (or “prevalence”) $$p(c_{i})$$ of each class $$c_{i}\in \mathcal {C}$$ . Quantification is important in many disciplines (such as e.g., market research, political science, the social sciences, and epidemiology) which usually deal with aggregate (as opposed to individual) data. In these contexts, classifying individual unlabelled instances is usually not a primary goal, while estimating the prevalence of the classes of interest in the data is. Quantification may in principle be solved via classification, i.e., by classifying each item in $$\sigma$$ and counting, for all $$c_{i}\in \mathcal {C}$$ , how many such items have been labelled with $$c_{i}$$ . However, it has been shown in a multitude of works that this “classify and count” (CC) method yields suboptimal quantification accuracy, one of the reasons being that most classifiers are optimized for classification accuracy, and not for quantification accuracy. As a result, quantification has come to be no longer considered a mere byproduct of classification, and has evolved as a task of its own, devoted to designing methods and algorithms that deliver better prevalence estimates than CC. The goal of this tutorial is to introduce the main supervised learning techniques that have been proposed for solving quantification, the metrics used to evaluate them, and the most promising directions for further research.

Alejandro Moreo, Fabrizio Sebastiani

### Algorithmic Approaches to Computational Models of Argumentation

Computational models of argumentation [1] are approaches for non-monotonic reasoning that focus on the interplay between arguments and counterarguments in order to reach conclusions.

Matthias Thimm

### Flexible Querying and Analytics for Smart Cities and Smart Societies in the Age of Big Data: Overview of the FQAS 2019 International Conference

This paper contains a brief overview on issues and challenges of the emerging topic recognized as flexible querying and analytics for smart cities and smart societies, which is strictly related to the actual big data management and analytics research trend, along with a brief overview of the FQAS 2019 international conference.

Alfredo Cuzzocrea, Sergio Greco

### Indexing for Skyline Computation

A Comparison Study

Skyline queries enable satisfying search results by delivering best matches, even if the filter criteria are conflictive. Skyline algorithms are often classified into generic and index-based approaches. While there are uncountable papers on the comparison on generic algorithms, there exists only a few publications on the effect of index-based Skyline computation. In this paper, we give an overview on the most recent index-based Skyline algorithms BBS, ZSky, and SkyMap. We conducted comprehensive experiments on different data sets and present some really interesting outcomes.

Markus Endres, Erich Glaser

### A Simple Data Structure for Optimal Two-Sided 2D Orthogonal Range Queries

Given an arbitrary set A of two-dimensional points over a totally-ordered domain, a two-sided planar range query consists on finding all points of A within an arbitrary quadrant. In this paper we present a novel data structure that uses linear space in |A| while allowing for two-dimensional orthogonal range queries with logarithmic pre-processing and constant-delay enumeration.

Alejandro Grez, Andrea Calí, Martín Ugarte

### Optimizing the Computation of Approximate Certain Query Answers over Incomplete Databases

In many database applications there is the need of extracting information from incomplete data. In such scenarios, certain answers are the most widely adopted semantics of query answering. Unfortunately, the computation of certain query answers is a coNP-hard problem. To make query answering feasible in practice, recent research has focused on developing polynomial time algorithms computing sound (but possibly incomplete) sets of certain answers. In this paper, we propose a novel technique that allows us to improve recently proposed approximation algorithms, obtaining a good balance between running time and quality of the results. We report experimental results confirming the effectiveness of the new technique.

Nicola Fiorentino, Cristian Molinaro, Irina Trubitsyna

### Leveraging Ontology to Enable Indoor Comfort Customization in the Smart Home

This paper introduces the Future Home for Future Communities’ Smart Home, a semantic-based framework for indoor comfort metrics customization inside a living environment. The Smart Home merges Ambient Intelligence, Ambient Assisted Living and Context Awareness perspectives to provide customized comfort experience to the dwellers, also leveraging on a ubiquitous interface. The smart home leverages ontological representations of inhabitants’ health conditions, comfort metrics and available devices to provide dwellers with indoor temperature, humidity rate, CO2 concentration and illuminance suitable for their health conditions and to the activities they want to perform inside the house. Dwellers interactions within the Smart Home are performed via the interface, while the ontologies composing the knowledge base are reasoned and hosted on a semantic repository. Two use cases depict the framework’s functioning in two typical scenarios: adjusting indoor temperature and providing illuminance comfort while preparing a meal.

Daniele Spoladore, Atieh Mahroo, Marco Sacco

### Efficient Ontological Query Answering by Rewriting into Graph Queries

The OWL 2 QL profile of the OWL 2 Web Ontology Language, based on the family of description logics called DL-Lite, allows for answering queries by rewriting, i.e. by reformulating a given query into another query that is then directly processed by a RDBMS system by pure querying, without materialising new data or updating existing data. In this paper we propose a new language whose expressive power goes beyond that of DL-Lite (in particular, our language extends both OWL 2 QL and linear $$\mathcal {ELH}$$ , two well known DL ontology languages) while still allowing query answering via rewriting of queries into conjunctive two-way regular path queries (C2RPQs). Our language is identified by a syntactic property that can be efficiently checked. After defining our new language, we propose a novel rewriting technique for conjunctive queries (CQs) that makes use of nondeterministic finite state automata. CQ answering in our setting is NLogSpace-complete in data complexity and NP-complete in combined complexity; answering instance queries is NLogSpace-complete in data complexity and in $$\textsc {PTime}$$ in combined complexity.

Mirko Michele Dimartino, Andrea Calì, Alexandra Poulovassilis, Peter T. Wood

### WeLink: A Named Entity Disambiguation Approach for a QAS over Knowledge Bases

Question Answering Systems (QASs) are usually built behind queries described by short texts. The explosion of knowledge graphs and Linked Open Data motivates researchers for constructing QASs over these rich data resources. The shortness nature of user questions contributes to complicate the problem of Entity Linking, widely studied for long texts. In this paper, we propose an approach, called WeLink, based on the context and types of entities of a given query. The context of an entity is described by synonyms of the words used in the question and the definition of the named entity, whereas the type describes the category of the entity. During the named entity recognition step, we first identify different entities, their types, and contexts (by the means of the Wordnet). The expanded query is then executed on the target knowledge base, where several candidates are obtained with their contexts and types. Similarity distances among these different contexts and types are computed in order to select the appropriate candidate. Finally, our system is evaluated on a dataset with 5000 questions and compared with some well-known Entity Linking systems.

Wissem Bouarroudj, Zizette Boufaida, Ladjel Bellatreche

### A Heuristic Pruning Technique for Dialectical Trees on Argumentation-Based Query-Answering Systems

Arguments in argumentation-based query-answering systems can be associated with a set of evidence required for their construction. This evidence might have to be retrieved from external sources such as databases or the web, and each attempt of retrieving a piece of evidence comes with an associated cost. Moreover, a piece of evidence may be available at one moment but not at others, and this is not known beforehand. As a result, the set of active arguments (whose entire set of evidence is available) that can be used by the argumentation machinery of the system may vary from one scenario to another. In this work we propose a heuristic pruning technique for building dialectical trees in argumentation-based query-answering systems, with the aim of minimizing the cost of retrieving the pieces of evidence associated with the arguments that need to be accounted for in the reasoning process.

Andrea Cohen, Sebastian Gottifredi, Alejandro J. García

### A Method for Efficient Argument-Based Inquiry

In this paper we describe a method for efficient argument-based inquiry. In this method, an agent creates arguments for and against a particular topic by matching argumentation rules with observations gathered by querying the environment. To avoid making superfluous queries, the agent needs to determine if the acceptability status of the topic can change given more information. We define a notion of stability, where a structured argumentation setup is stable if no new arguments can be added, or if adding new arguments will not change the status of the topic. Because determining stability requires hypothesizing over all future argumentation setups, which is computationally very expensive, we define a less complex approximation algorithm and show that this is a sound approximation of stability. Finally, we show how stability (or our approximation of it) can be used in determining an optimal inquiry policy, and discuss how this policy can be used to, for example, determine a strategy in an argument-based inquiry dialogue.

Bas Testerink, Daphne Odekerken, Floris Bex

### DAQAP: Defeasible Argumentation Query Answering Platform

In this paper we present the DAQAP, a Web platform for Defeasible Argumentation Query Answering, which offers a visual interface that facilitates the analysis of the argumentative process defined in the Defeasible Logic Programming (DeLP) formalism. The tool presents graphs that show the interaction of the arguments generated from a DeLP program; this is done in two different ways: the first focuses on the structures obtained from the DeLP program, while the second presents the defeat relationships from the point of view of abstract argumentation frameworks, with the possibility of calculating the extensions using Dung’s semantics. Using all this data, the platform provides support for answering queries regarding the states of literals of the input program.

Mario A. Leiva, Gerardo I. Simari, Sebastian Gottifredi, Alejandro J. García, Guillermo R. Simari

### An Efficient Algorithm for Computing the Set of Semi-stable Extensions

Argumentation is one of the most relevant fields in the sphere of Artificial Intelligence. In particular, Dung’s abstract argumentation framework (AF) has received much attention in the last twenty years, and many computational issues have been investigated for different argumentation semantics. Specifically, enumerating the sets of arguments prescribed by an argumentation semantics (i.e., extensions) is arguably one of the most challenging problems for AFs, and this is the case also for the well-known semi-stable semantics.In this paper, we propose an algorithm for efficiently computing the set of semi-stable extensions of a given AF. Our technique relies on exploiting the computation of grounded extension to snip some arguments in order to obtain a smaller framework (called cut-AF) over which state-of-the-art solvers for enumerating the semi-stable extensions are called, as needed to return the extensions of the input AF.We experimentally evaluated our technique and found that our approach is orders of magnitude faster than the computation over the whole AF.

Gianvincenzo Alfano

### Using Word Embeddings and Deep Learning for Supervised Topic Detection in Social Networks

In this paper we show how word embeddings can be used to evaluate semantically the topic detection process in social networks. We propose to create and train a word embeddings with word2vec model to be used for text classification process. Then when the documents are classified, we use a pre-trained word embeddings and two similarity measures for semantic evaluation of the classification process. In particular, we perform experiments with two datasets of Twitter, using both bag-of-words with conventional classification algorithms and word embeddings with deep learning-based classification algorithms. Finally, we perform a benchmark and make some inferences about results.

Karel Gutiérrez-Batista, Jesús R. Campaña, Maria-Amparo Vila, Maria J. Martin-Bautista

### Generalized Association Rules for Sentiment Analysis in Twitter

Association rules have been widely applied in a variety of fields over the last few years, given their potential for descriptive problems. One of the areas where the association rules have been most prominent in recent years is social media mining. In this paper, we propose the use of association rules and a novel generalization of these based on emotions to analyze data from the social network Twitter. With this, it is possible to summarize a great set of tweets in rules based on 8 basic emotions. These rules can be used to categorize the feelings of the social network according to, for example, a specific character.

J. Angel Diaz-Garcia, M. Dolores Ruiz, Maria J. Martin-Bautista

### Data Exploration in the HIFUN Language

When big data sets are stored in databases and data warehouses data exploration usually involves ad hoc querying and data visualization to identify potential relationships or insights that may be hidden in the data. The objective of this work is to provide support for these activities in the context of HIFUN, a high level functional language of analytic queries proposed recently by the authors [5]. Our contributions are: (a) we show that HIFUN queries can be partially ordered and this allows the analyst to drill down or roll up from a given query during data exploration, and (b) we introduce a visualization algebra that allows the analyst to specify desirable visualizations of query results.

Nicolas Spyratos, Tsuyoshi Sugibuchi

### Reducing Skyline Query Results: An Approach Based on Fuzzy Satisfaction of Concepts

Querying databases to search for the best objects matching user’s preferences is a fundamental problem in multi-criteria databases. The skyline queries are an important tool for solving such problems. Based on the concept of Pareto dominance, the skyline process extracts the most interesting (not dominated in Pareto sense) objects from a set of data. However, this process may lead to a huge skyline problem as the size of the results of skyline grows with the number of criteria (dimensions). In this case, the skyline is less informative for the end-users. In this paper, we propose an efficient approach to refine the skyline and reduce its size, using some advanced techniques borrowed from the formal concepts analysis. The basic idea is to build the fuzzy lattice of skyline objects based on the satisfaction rate of concepts. Then, the refined skyline is given by the concept that contains k objects (where k is a user-defined parameter) and has the great satisfaction rate w.r.t. the target concept. Experimental study shows the efficiency and the effectiveness of our approach compared to the naive approach.

### Quantify the Variability of Time Series of Imprecise Data

In order to analyze the quality of web data harvest, it is important to consider the variability of the volumes of data harvested over time. However, these volumes of data collected over time form more trend information than accurate information due to the non-exhaustiveness of the harvests and the temporal evolution of the strategies. They form imprecise time series data. Therefore, due to the characteristics of the data, the variability of a particular series must be considered in relation to the other series. The purpose of this paper is to propose a fuzzy approach to measure the variability of time series of imprecise data represented by intervals. Our approach is based (1) on the construction of fuzzy clusters on all data at each time-stamp (2) on the difference in the positioning of data in clusters at each time-stamp.

Zied Ben Othmane, Cyril de Runz, Amine Aït Younes, Vincent Mercelot

### Semantic Understanding of Natural Language Stories for Near Human Question Answering

Machine understanding of natural language stories is complex, and automated question answering based on them requires careful knowledge engineering involving knowledge representation, deduction, context recognition and sentiment analysis. In this paper, we present an approach to near human question answering based on natural language stories. We show that translating stories into knowledge graphs in RDF, and then restating the natural language questions into SPARQL to answer queries can be successful if the RDF graph is augmented with an ontology and an inference engine. By leveraging existing knowledge processing engines such as FRED and NLQuery, we propose the contours of an open-ended and online flexible query answering system, called Omniscient, that is able to accept a natural language user story and respond to questions also framed in natural language. The novelty of Omniscient is in its ability to recognize context and respond deductively that most current knowledge processing systems are unable to do.

Hasan M. Jamil, Joel Oduro-Afriyie

### Deductive Querying of Natural Logic Bases

We introduce a dedicated form of natural logic intended for representation of sentences in a knowledge base. Natural logic is a version of formal logic whose sentences cover a stylized fragment of natural language. Thus, the sentences in the knowledge base can be read and understood directly by a domain expert, unlike, say, predicate logic and description logic. The paper describes the inference rules enabling deductive querying of the knowledge base. The natural logic sentences and the inference rules are represented in Datalog providing a convenient graph form. As such, the natural logic knowledge base may be viewed as an enriched formal ontology structure. We describe various query facilities including pathway finding accommodated by this setup.

Troels Andreasen, Henrik Bulskov, Per Anker Jensen, Jørgen Fischer Nilsson

### Clustering of Intercriteria Analysis Data Using a Health-Related Quality of Life Data

Determination of Inter Criteria Analysis (ICA) dependence very often uses large amounts of data. In this paper, the large amount of data is reduced using the Self Organizing Map Neural Networks to use only the cluster representative vector. The data used are intuitionistic fuzzy estimations of quality of life. To obtain the data, a population study on health-related quality of life is used.

Sotir Sotirov, Desislava Vankova, Valentin Vasilev, Evdokia Sotirova

### A Flexible Query Answering System for Movie Analytics

With advances in technologies, huge volumes of a wide variety of valuable data—which may be of different levels of veracity—are easily generated or collected at a high velocity from a homogenous data source or various heterogeneous data sources in numerous real-life applications. Embedded in these big data are rich sources of information and knowledge. This calls for data science solutions to mine and analyze various types of big data for useful information and valuable knowledge. Movies are examples of big data. In this paper, we present a flexible query answering system (FQAS) for movie analytics. To elaborate, nowadays, data about movies are easy accessible. Movie analytics help to give insights about useful revenues, trends, marketing related to movies. In particular, we analyze movie datasets from data sources like Internet Movie Database (IMDb). Our FQAS makes use of our candidate matching process to generate a prediction of a movie IMDb rating as a response to user query on movie. Users also have flexibility to tune querying parameters. Evaluation results show the effectiveness of our data science approach—in particular, our FQAS—for movie analytics.

Carson K. Leung, Lucas B. Eckhardt, Amanjyot Singh Sainbhi, Cong Thanh Kevin Tran, Qi Wen, Wookey Lee

### Can BlockChain Technology Provide Information Systems with Trusted Database? The Case of HyperLedger Fabric

BlockChain technology has imposed a new perspective in the area of data management, i.e., the possibility of realizing immutable and distributed ledgers. Furthermore, the introduction of the concept of smart contract has further extended the potential applicability of this potentially disruptive technology. Although BlockChain was developed to support virtual currencies and is usually associated with them, novel platforms are under development, that are not at all related to the original application context.An example is HyperLedger Fabric. Developed by the Linux Foundation, it is aimed to provide information systems with distributed databases where the transaction log is immutable. This should ensure trusted cooperation among many parties. In this paper, we briefly present main concepts and functionalities provided by HyperLedger Fabric. We then discuss its potential applicability and current limitations.

Pablo Garcia Bringas, Iker Pastor, Giuseppe Psaila

### Anticipating Depression Based on Online Social Media Behaviour

Mental disorders are major concerns in societies all over the world, and in spite of the improved diagnosis rates of such disorders in recent years, many cases still go undetected. The popularity of online social media websites has resulted in new opportunities for innovative methods of detecting such mental disorders.In this paper, we present our research towards developing a cutting-edge automatic screening assistant based on social media textual posts for detecting depression. Specifically, we envision an automatic prognosis tool that can anticipate when an individual is developing depression, thus offering low-cost unobtrusive mechanisms for large-scale early screening. Our experimental results on a real-world dataset reveals evidence that developing such systems is viable and can produce promising results. Moreover, we show the results of a case study on real users revealing signs that a person is vulnerable to depression.

Esteban A. Ríssola, Seyed Ali Bahrainian, Fabio Crestani

### Method for Modeling and Simulation of Parallel Data Integration Processes in Wireless Sensor Networks

The parallel sensor data integration local processing in Wireless Sensor Networks (WSNs) is one of the possible solutions to reduce the neighbor sensor node’s communication and to save energy. At the same time, the process of local sensor node integration needs an additional processor and energy resources. Therefore the development of a realistic and reliable model of data integration processes in WSNs is critical in many aspects. The proposed GN based method and the related modeling process covers most of the aspects of the parallel sensor data integration in the WSN’s, based on 802.15.4 protocols. For simulation and analysis tool is used the WSNet simulator and some additional software libraries.The article presents a new method for modeling and simulation of sensor data integration parallel processing in WSNs. The proposed method uses modeling based on the Generalized Nets (GN) approach which is a new and an advanced way of parallel data processing analysis of Wireless Sensor Systems (WSS).

Alexander Alexandrov, Rumen Andreev, D. Batchvarov, A. Boneva, L. Ilchev, S. Ivanov, J. Doshev

### Find the Right Peers: Building and Querying Multi-IoT Networks Based on Contexts

With the evolution of the features smart devices are equipped with, the IoT realm is becoming more and more intertwined with people daily-life activities. This has, of course, impacts in the way objects are used, causing a strong increase in both the dynamism of their contexts and the diversification of their objectives. This results in an evolution of the IoT towards a more complex environment composed of multiple overlapping networks, called Multi-IoTs (MIoT). The low applicability of classical cooperation mechanisms among objects leads to the necessity of developing more complex and refined strategies that take the peculiarity of such a new environment into consideration. In this paper, we address this problem by proposing a new model for devices and their contexts following a knowledge representation approach. It borrows ideas from OLAP systems and leverages a multidimensional perspective by defining dimension hierarchies. In this way, it enables roll-up and drill-down operations on the values of the considered dimensions. This allows for the design of more compact object networks and the definition of new strategies for the retrieval of relevant devices.

Claudia Diamantini, Antonino Nocera, Domenico Potena, Emanuele Storti, Domenico Ursino

### Handling Veracity of Nominal Data in Big Data: A Multipolar Approach

With this paper we aim to contribute to the proper handling of veracity, which is generally recognized as one of the main problems related to ‘Big’ data. Veracity refers to the extent to which the used data adequately reflect real world information and hence can be trusted. More specifically we describe a novel computational intelligence technique for handling veracity aspects of nominal data, which are often encountered when users have to select one or more items from a list. First, we discuss the use of fuzzy sets for modelling nominal data and specifying search criteria on nominal data. Second, we introduce the novel concept of a multipolar satisfaction degree as a tool to handle criteria evaluation. Third, we discuss aggregation of multipolar satisfaction degrees. Finally, we demonstrate the proposed technique and discuss its benefits using a film genre example.

Guy De Tré, Toon Boeckling, Yoram Timmerman, Sławomir Zadrożny

### InterCriteria Analysis with Interval-Valued Intuitionistic Fuzzy Evaluations

The Intercriteria Analysis (ICA) is a new tool for decision making similar, but different from the correlation analyses. In the present paper, a new form of ICA, based over interval-valued intuitionistic fuzzy evaluations is described for a first time.

Krassimir Atanassov, Pencho Marinov, Vassia Atanassova

### Business Dynamism and Innovation Capability in the European Union Member States in 2018 Through the Prism of InterCriteria Analysis

Here we apply the intuitionistic fuzzy sets-based InterCriteria Analysis on the data from the Global Competitiveness Index of 2018, about the two best correlating pillars of competitiveness ‘11 Business Dynamism’ and ‘12 Innovation Capability’ based on the data of the 28 European Union Member States. We get a deeper look on how the eight subindicators of the countries’ business dynamism and the ten subindicators of their innovation capability correlate in between and among each other.

Vassia Atanassova, Lyubka Doukovska

### InterCriteria Analysis of the Most Problematic Factors for Doing Business in the European Union, 2017–2018

In this paper, we use the method of the InterCriteria Analysis, based on the concepts of intuitionistic fuzzy sets and index matrices, to analyze a dataset extracted from the Global Competitiveness Index, concerning the most problematic factors for doing business in the European Union member states. The method is applied on the data of these 28 countries extracted from the Global Competitiveness Report 2017–2018.

Lyubka Doukovska, Vassia Atanassova

### An Effective System for User Queries Assistance

The Big Data paradigm has recently come on scene in a quite pervasive manner. Sifting through massive amounts of this kind of data, parsing them, transferring them from a source to a target database, and analyzing them to improve business decision-making processes is too complex for traditional approaches. In this respect, there have been recent proposals that enrich data while exchanging them, such as the Data Posting framework. This framework requires the ability of using domain relations and count constraints, which may be difficult to manage for non-expert users. In this paper, we propose Smart Data Posting, a framework using intuitive constructs that are automatically translated in the standard Data Posting framework. In particular, we allow the use of smart mapping rules extended with additional selection criteria and the direct use of tuple generating dependencies and equality generating dependences. We present a complexity analysis of the framework and describe the architecture of a system for advanced search, tailored for Big Data, that implements the Smart Data Posting framework.

Elio Masciari, Domenico Saccà, Irina Trubitsyna

### On the Usefulness of Pre-Processing Step in Melanoma Detection Using Multiple Instance Learning

Although skin cancers, and melanoma in particular, are characterized by a high mortality rate, on the other hand they can be effectively treated when the diagnosis is made at the initial stages. The research in this field is attempting to design systems aimed at automatically detecting melanomas on the basis of dermoscopic images. The interest is also motivated by the opportunity to implement solutions that favor self-diagnosis in the population. Determining effective detection methods to reduce the error rate in diagnosis is a crucial challenge.Computer Vision Systems are characterized by several basic steps. Pre-processing is the first phase and plays the fundamental role to improve the image quality by eliminating noises and irrelevant parts from the background of the skin. In [1] we presented an application to image classification of a Multiple Instance Learning approach (MIL), with the aim to discriminate between positive and negative images. In [3] we subsequently applied this method to clinical data consisting of non-pre-processed melanoma dermoscopic images. In [2] we also investigated some pre-processing techniques useful for automatic analysis of melanoma images.In this work we propose to use, after applying a pre-processing step, the MIL approach presented in [1] on the same melanoma data set adopted in [3]. The preliminary results appear promising for defining automatic systems that act as a “filter” mechanism to support physicians in detecting melanomas cancer.

Eugenio Vocaturo, Ester Zumpano, Pierangelo Veltri

### Towards Flexible Energy Supply in European Smart Environments

Nowadays, electricity is the most widely used kind of energy, which is composed by a mix of traditional fossil sources and renewable energies. The use of renewable energies is increasingly incentivized at present, but, due to their characteristics connected for example to climatic conditions, they can be subject to temporary unavailability. Production plants, in order to function properly, and to guarantee a standard level of energy, must cope flexibly with this problem. In this article, we aim at presenting the main technologies and solutions that are connected to the considered problem, and we introduce the architecture of a flexible affiliation system that can optimize the use of electricity distribution networks efficiently, reducing energy waste.

Stefania Marrara, Amir Topalovíc, Marco Viviani

### Intelligent Voice Agent and Service (iVAS) for Interactive and Multimodal Question and Answers

This paper describes MITRE’s Intelligent Voice Agent and Service (iVAS) research and prototype system that provides personalized answers to government customer service questions through intelligent and multimodal interactions with citizens. We report our novel approach to interpret a user’s voice or text query through Natural Language Understanding combined with a Machine Learning model trained on domain-specific data and interactive conversations to disambiguate and confirm user intent. We also describe the integration of iVAS with voice or text chatbot interface.

James Lockett, Sanith Wijesinghe, Jasper Phillips, Ian Gross, Michael Schoenfeld, Walter T. Hiranpat, Phillip J. Marlow, Matt Coarr, Qian Hu

### A Study on Topic Modeling for Feature Space Reduction in Text Classification

We examine two topic modeling approaches as feature space reduction techniques for text classification and compare their performance with two standard feature selection techniques, namely Information Gain (IG) and and Document Frequency (DF). Feature selection techniques are commonly applied in order to avoid the well-known “curse of dimensionality” in machine learning. Regarding text classification, traditional techniques achieve this by selecting words from the training vocabulary. In contrast, topic models compute topics as multinomial distributions over words and reduce each document to a distribution over such topics. Corresponding topic-to-document distributions may act as input data to train a document classifier. Our comparison includes two topic modeling approaches – Latent Dirichlet Allocation (LDA) and Topic Grouper. Our results are based on classification accuracy and suggest that topic modeling is far superior to IG and DF at a very low number of reduced features. However, if the number of reduced features is still large, IG becomes competitive and the cost of computing topic models is considerable. We conclude by giving basic recommendations on when to consider which type of method.

Daniel Pfeifer, Jochen L. Leidner

### Backmatter

Weitere Informationen