Skip to main content
main-content

Über dieses Buch

This two-volume set (LNAI 11055 and LNAI 11056) constitutes the refereed proceedings of the 10th International Conference on Collective Intelligence, ICCCI 2018, held in Bristol, UK, in September 2018
The 98 full papers presented were carefully reviewed and selected from 240 submissions. The conference focuses on knowledge engineering and semantic web, social network analysis, recommendation methods and recommender systems, agents and multi-agent systems, text processing and information retrieval, data mining methods and applications, decision support and control systems, sensor networks and internet of things, as well as computer vision techniques.

Inhaltsverzeichnis

Frontmatter

Knowledge Engineering and Semantic Web

Frontmatter

ViewpointS: Towards a Collective Brain

Tracing knowledge acquisition and linking learning events to interaction between peers is a major challenge of our times. We have conceived, designed and evaluated a new paradigm for constructing and using collective knowledge by Web interactions that we called ViewpointS. By exploiting the similarity with Edelman’s Theory of Neuronal Group Selection (TNGS), we conjecture that it may be metaphorically considered a Collective Brain, especially effective in the case of trans-disciplinary representations. Far from being without doubts, in the paper we present the reasons (and the limits) of our proposal that aims to become a useful integrating tool for future quantitative explorations of individual as well as collective learning at different degrees of granularity. We are therefore challenging each of the current approaches: the logical one in the semantic Web, the statistical one in mining and deep learning, the social one in recommender systems based on authority and trust; not in each of their own preferred field of operation, rather in their integration weaknesses far from the holistic and dynamic behavior of the human brain.

Philippe Lemoisson, Stefano A. Cerri

Intelligent Collectives: Impact of Diversity on Susceptibility to Consensus and Collective Performance

In this paper, we present an approach to analyzing the impact of diversity, one of the most crucial determinants of intelligent collectives, on susceptibility to consensus and collective performance. In the common understanding, susceptibility to consensus refers to the situation in which the obtained collective prediction determined on the basis of individual predictions can be accepted as the representative for the collective as a whole. Computational experiments have indicated that when collectives are small, it is difficult to obtain a high probability of susceptibility to consensus. For large collectives, however, diversity seems not to matter susceptibility to consensus. Furthermore, the findings have also shown that higher collective performances can be the consequence of more diverse collectives. In other words, diversity is positively correlated with collective performance.

Van Du Nguyen, Hai Bang Truong, Mercedes G. Merayo, Ngoc Thanh Nguyen

The Increasing Bias of Non-uniform Collectives

In this paper we make initial study of the influence of initial bias in a collective of agents on its knowledge or opinion, after taking into account internal communication between agents. We provide details about the model of collective that we use, with different levels of communication and different strategies utilized by agents to integrate messages into their internal knowledge base. We then perform a simulation of such collective, with introduced different number of biased agents. We observe how these agents influence the overall knowledge of the collective over time. The experiment shows that even a small percentage of biased agents changes the views of the whole collective. We discuss the implications of this result in possible practical applications.

Marcin Maleszka

Framework for Merging Probabilistic Knowledge Bases

Knowledge merging is of major concern in developing probabilistic expert systems. Each system provides a consistent probabilistic knowledge while the merged knowledge base is often inconsistent. Because of this reason, a wide range of approaches has been put forward to merge probabilistic knowledge bases. However, the input of the models is the set of possible probabilistic functions representing the original probabilistic knowledge bases. In this paper, we investigate a framework for merging probabilistic knowledge bases represented by the new form. To this aim, a process to merge probabilistic knowledge bases is introduced, several transformation methods for the representation of the original probabilistic knowledge base is presented, a set of merging operators is proposed, and several desirable logical properties are investigated and discussed.

Van Tham Nguyen, Ngoc Thanh Nguyen, Trong Hieu Tran

Representation of Autoimmune Diseases with RDFS

Complex systems are systems consisting of many diverse and autonomous independent subsystems interacting with each other. Huge amount of interactions with many feedback loops complicate their investigation. Immune system is a typical complex system that attracts medical experts and also non-professionals especially because of its “ambient” nature and amazing complexity. Understandable information about immunity is required not only by the experts but also by the non-professionals. This paper is focused on development of the ontology providing fundamental facts about autoimmune diseases because these facts are not well-structured and presented for the end users on the web. The ontology should improve navigation among diverse pieces of information about these diseases and decrease information overloading.

Martina Husáková

EVENTSKG: A Knowledge Graph Representation for Top-Prestigious Computer Science Events Metadata

Digitization has made the preparation of manuscripts as well as the organization of scientific events considerably easier and efficient. In addition, data about scientific events is increasingly published on the Web, albeit often as raw dumps in unstructured formats, immolating its semantics and relationships to other data and thus restricting the reusability of the data for, e.g., subsequent analyses. Therefore, there is a great demand to represent this data in a semantic representation using Semantic Web technologies. In this paper, we present the EVENTSKG dataset to offer a comprehensive semantic descriptions of scientific events of six computer science communities for 40 top-prestigious event series over the last five decades. We created a new, publicly available and improved release of the EVENTSKG dataset as a unified knowledge graph based on our Scientific Events Ontology (SEO). It is of primary interest to event organizers, as it helps them to assess the progress of their event over time and compare it to competing events. Furthermore, it helps potential authors looking for venues to publish their work. We shed light on these events by analyzing the EVENTSKG data.

Said Fathalla, Christoph Lange

Assessing the Performance of a New Semantic Similarity Measure Designed for Schema Matching for Mediation Systems

Rising adoption of mediation systems leading to plenty of room for improvement. As a matter of fact, schema matching is one of the biggest pressing challenges facing the improvement of mediation systems. The purpose of this research paper is to take aim at schema matching for mediation systems with a semantic similarity measure, which is highly expected to outperform the most well-known semantic similarity measures out there. Using WordNet information, this research study introduced a new semantic similarity measure (as well as a pre-matching strategy) and compared its performance against all the following measures: Resnik’s measure, Jiang and Conrath’s distance, Lin’s measure, and Nababteh’s measure. The results indicated that the new measure provides much better results than the aforementioned measures. This paper definitely solves the problem regarding schema matching for mediation systems, and even though it is still in an early stage, the new measure will benefit both researchers and organizations.

Aola Yousfi, Moulay Hafid Elyazidi, Ahmed Zellou

The Impact of Data Dispersion on the Accuracy of the Data Warehouse Federation’s Response

This article is a preliminary attempt to verify how the values of the metrics that characterize data warehouses affect the accuracy of the data warehouse federation’s response. The federation can be build almost always, but sometimes the heterogeneity of data stored in the component data warehouse is too big to properly handle the user’s request. If that happens, the effort put on the integration is a waste. Some work was done, but the federation does not give accurate response. In this paper some dependencies between the accuracy of federation’s response and data warehouse metrics are discussed.

Rafał Kern

Social Network Analysis

Frontmatter

Physical Activity Contagion and Homophily in an Adaptive Social Network Model

Regular physical activity contributes to higher levels of well-being, healthy aging and prevention of several chronic diseases such as depression. To establish or change behaviours concerning physical activity, social contagion may play a role. The aim of this study was to model the contagion of physical activity based on empirical Twitter data and to assess the role of homophily within this contagion. To model the contagion of physical activity, an adaptive temporal-causal network model was designed, and accordingly, the parameters of the model were tuned using empirical data obtained from Twitter. Two variants of the adaptive temporal-causal network model were created, in which one calculated the weights of the connections between the nodes based on follow relations on Twitter, while in the other the connection weights were modulated by the homophily principle. The results indicate that within the considered social network of already active persons homophily does not play an important role in the physical activity behaviour.

Marit van Dijk, Jan Treur

e-School: Design and Implementation of Web Based Teaching Institution for Enhancing E-Learning Experiences

Numerous technological improvements have found in the academic setting which removes the binding of educators and students from time and space. Day by day, the rate of drop-out is increasing by separation of them from learning. The main effort of our project is to fulfill a mission allowing individuals to learn or educate without physically attending. In this paper, we build an innovative web-based application enabling the teachers and students with numerous educational exercises using computer/smart devices. Using this application teacher, students and parents can collaborate on a single podium, while teachers can counsel with students in a real-time and share the performance and actions with parents as well as administrators. A method to modification of our traditional education system but not the replacement of teaching, it’s only the enhancement idea for teaching helps to learn easily and fill up their liking.

Md. Shohel Rana, Touhid Bhuiyan, A. K. M. Zaidi Satter

Categorizing Air Quality Information Flow on Twitter Using Deep Learning Tools

Environmental health is an emerging and hotly debated topic that covers several fields of study such as pollution in urban or rural environments and the consequences of these changes on health populations. In this field of intersectorial forces, the complexity of stakeholders’ logics is realized in the production, use and communication of data and information on air quality. The Twitter platform is a “partial public space” that can throw light on the different types of stakeholders involved, the information and issues discussed and the dynamics of articulation between these different aspects. A methodology aiming at describing and representing, on the one hand, the modes of circulation and distribution of message flows on this social media and, on the other hand, the content exchanged between stakeholders, is presented. To achieve this, we developed a classifier based on Deep Learning approaches in order to categorize messages from scratch. The conceptual and instrumented methodology presented is part of a broader interdisciplinary methodology, based on quantitative and qualitative methods, for the study of communication in environmental health.

Brigitte Juanals, Jean-Luc Minel

A Computational Network Model for the Effects of Certain Types of Dementia on Social Functioning

This paper introduces a temporal-causal network model that describes the recognition of emotions shown by others. The model can show both normal functioning and cases of dysfunctioning, such as can be the case for persons with certain types of dementia. Simulations have been performed to test the model in both these types of behaviours. A mathematical analysis was done which gave evidence that the model as implemented does what it is meant to do. The model can be applied to obtain a virtual patient model to study the way in which recognition of emotions can deviate for certain types of persons.

Charlotte Commu, Jan Treur, Annemieke Dols, Yolande A. L. Pijnenburg

Homophily Independent Cascade Diffusion Model Based on Textual Information

In this research, we proposed homophily independent cascade model based on textual information, namely Textual-Homo-IC. This model based on standard independent cascade model; however, we exploited the aspect of infected probability estimation relied on homophily. Particularly, homophily is measured based on textual content by utilizing topic modeling. The process of propagation takes place on agent’s network where each agent represents a node. In addition to expressing the Textual-Homo-IC model on the static network, we also revealed it on dynamic agent’s network where there is not only transformation of the structure but also the node’s properties during the spreading process. We conducted experiments on two collected data sets from NIPS and a social network platform-Twitter and have attained satisfactory results.

Thi Kim Thoa Ho, Quang Vu Bui, Marc Bui

A Semi-automated Security Advisory System to Resist Cyber-Attack in Social Networks

Social networking sites often witness various types of social engineering (SE) attacks. Yet, limited research has addressed the most severe types of social engineering in social networks (SNs). The present study investigates the extent to which people respond differently to different types of attack in a social network context and how we can segment users based on their vulnerability. In turn, this leads to the prospect of a personalised security advisory system. 316 participants have completed an online-questionnaire that includes a scenario-based experiment. The study result reveals that people respond to cyber-attacks differently based on their demographics. Furthermore, people’s competence, social network experience, and their limited connections with strangers in social networks can decrease their likelihood of falling victim to some types of attacks more than others.

Samar Muslah Albladi, George R. S. Weir

The Role of Mapping Curve in Swarm-Like Opinion Formation

Recently [1] we have proposed the scheme of performing the opinion formation simulation based on popular global optimization mechanism - the Particle Swarm Optimization. The basic idea was to use the interaction between two potential directions of agents’ heading: those forced by the global opinion and those forced by the opinion of neighbors/colleagues. In the proposed paper some enhancement of the proposed model is shown. We assume that, when performing the binary PSO-like update of system, we use the generalized version of logistic function. The results are promising in the sense that the introduced change increases explicitly the number of possible solutions.

Tomasz M. Gwizdałła

Popularity and Geospatial Spread of Trends on Twitter: A Middle Eastern Case Study

Thousands of topics trend on Twitter across the world every day, making it increasingly challenging to provide real-time analysis of current issues, topics and themes being discussed across various locations and jurisdictions. There is thus a demand for simple and extensible approaches to provide deeper insight into these trends and how they propagate across locales. This paper represents one of the first studies to look at geospatial spread of trends on Twitter, presenting various techniques to provide increased understanding of how trends on social networks can spread across various regions and nations. It is based on a year-long data collection (N = 2,307,163) and analysis between 2016–2017 of seven Middle Eastern countries (Bahrain, Egypt, Kuwait, Lebanon, Qatar, Saudi Arabia, and the United Arab Emirates). Using this year-long dataset, the project investigates the popularity and geospatial spread of trends, focusing on trend information but not processing individual topics, with the findings showing that likelihood of trends spreading to other locales is to a large extent influenced by the place in which it first appeared.

Nabeel Albishry, Tom Crick, Tesleem Fagade, Theo Tryfonas

On the Emergence of Segregation in Society: Network-Oriented Analysis of the Effect of Evolving Friendships

Segregation is a widely-observed phenomenon through history, different cultures and around the world. This paper addresses how in a network of immigrants segregation emerges from friendship homophily. The simulation results show that homophily results in clusters of lower and higher local language use. A mathematical analysis provides a more in depth understanding of the phenomena observed in the simulations.

Christianne Kappert, Rosalyn Rus, Jan Treur

Computational Analysis of Bullying Behavior in the Social Media Era

Internet and cyber-technology have played an important positive role but it also served as a venue for cyber-bullying. This paper presents a computational model for online bullying. Its evaluation was done by simulation experiments and mathematical analysis, in comparison to expected patterns from the empirical literature. This model may provide useful input to build a support system to avoid this negative social behavior within society.

Fakhra Jabeen, Jan Treur

Recommendation Methods and Recommender Systems

Frontmatter

A Hybrid Feature Combination Method that Improves Recommendations

Recommender systems help users find relevant items efficiently based on their interests and historical interactions. They can also be beneficial to businesses by promoting the sale of products. Recommender systems can be modelled by applying different approaches, including collaborative filtering (CF), demographic filtering (DF), content-based filtering (CBF) and knowledge-based filtering (KBF). However, large amounts of data can produce recommendations that are limited in accuracy because of diversity and sparsity issues. In this paper, we propose a novel hybrid approach that combines user-user CF with the attributes of DF to indicate the nearest users, and compare the Random Forest classifier against the kNN classifier, developed through an investigation of ways to reduce the errors in rating predictions based on users past interactions. Our combined method leads to improved prediction accuracy in two different classification algorithms. The main goal of this paper is to identify the impact of DF on CF and compare the two classifiers. We apply a feature combination hybrid method that can improve prediction accuracy and achieve lower mean absolute error values compared with the results of CF or DF alone. To test our approach, we ran an offline evaluation using the 1 M MovieLens data set.

Gharbi Alshammari, Stelios Kapetanakis, Abduallah Alshammari, Nikolaos Polatidis, Miltos Petridis

A Neural Learning-Based Clustering Model for Collaborative Filtering

In this paper we present a neural learning-based clustering method for collaborative filtering. Collaborative filtering is an important task in recommender systems and has been investigated extensively in the past. Traditional approaches often require preprocessing steps, standard conditions or manually set gain. Our method is automatic, fast and robust towards cold start often seen in recommender systems. Furthermore, it can easily be trained to be used with any kind of data. The recommendation task is formulated as hybrid learning problem over two levels: artificial neural networks and clustering. Following the learning paradigm the detection on each level is performed by a trained classifier. First results of collaborative filtering using neural networks and clustering are presented and future additions are discussed.

Grzegorz P. Mika, Grzegorz Dziczkowski

Influence Power Factor for User Interface Recommendation System

User interface is an important element of software system since it provides the means for utilizing applications’ functionalities. There is number of publications that propose guidance for proper interface design, including adaptive approach. Following paper introduces general idea for definition of interface design in a way that allows for easy computing of user interface effectiveness. The introduced factor can be used for recommendation of interface changes and adjustment.

Marek Krótkiewicz, Krystian Wojtkiewicz, Denis Martins

A Generic Framework for Collaborative Filtering Based on Social Collective Recommendation

Collaborative filtering has been considered the most used approach for recommender systems in both practice and research. Unfortunately, traditional collaborative filtering suffers from the so-called cold-start problem, which is the challenge to recommend items for an unknown user. In this paper, we introduce a generic framework for social collective recommendations targeting to support and complement traditional recommender systems to achieve better results. Our framework is composed of three modules, namely, a User Clustering module, a Representative module, and an Adaption module. The User Clustering module aims to find groups of users, the Representative module is responsible for determining a representative of each group, and the Adaption module handles new users and assigns them appropriately. By the composition of the framework, the cold-start problem is alleviated.

Leschek Homann, Bernadetta Maleszka, Denis Mayr Lima Martins, Gottfried Vossen

Recommender System Based on Fuzzy Reasoning and Information Systems

In this research a recommender system with possible applications in e-commerce, based on rule induction mechanism and fuzzy reasoning, is presented. The theoretical concept proposed assume the application of fuzzy sets in a procedure of rule induction, as an information generalization, in purpose to predict the degree of subjective customer satisfaction with respect to his previous reviews. The innovative idea lays in the transformation of decision rules into fuzzy rules, regarding to the basic Mamdani reasoning model. The research was verified on real data, i.e. customer reviews of different products.

Martin Tabakov

Proposal of a Recommendation System for Complex Topic Learning Based on a Sustainable Design Approach

There are several issues compromising the educational role of social networks, particularly in the case of video based online content. Among them, individual (cognitive and emotional), social (privacy and ethics) and structural (algorithmic bias) challenges can be found. To cope with such issues, we propose a recommendation system for online video content, applying principles of sustainable design. Precision and recall in English were slightly lower for the system in comparison to YouTube, but the variety of recommended items increased; while in Spanish, precision and recall were higher. Expected results include fostering learning and adoption of complex thinking by taking on account a user’s objective and subjective context.

Xanat Vargas Meza, Toshimasa Yamanaka

A Group Recommender System for Selecting Experts to Review a Specific Problem

With the increase in the number of publications and scientific projects, its quality requirements are increasingly needed. Reviewing is the most important step in accrediting the quality of scientific work. Criteria such as independence, competence, and lack of conflicts of interest in an expert are essential in the reviewer selection process. However, we also know that experts have limited knowledge, experience, and opinions about the work of others, so they might misunderstand the viewpoints of the authors, which may lead to rejection of an excellent scientific work or an implicitly successful project proposal. Manually selecting reviewers can be a biased and time-consuming process. In order to solve these problems, we developed a recommender system to choose a group of experts to evaluate a specific problem, such as a research proposal or paper. Our recommender system consists of three main modules: data collection, expert detection, and expert prediction. The data collection module is to collect data from various sources to create a database of scientist profiles. The expert detection module is used to determine the experts on each particular topic. The expert prediction module is to provide a list of experts to answer the query. We conducted experiments with the DBLP Computer Science Bibliography dataset, and the results show that our system is an up-and-coming selection process.

Dinh Tuyen Hoang, Ngoc Thanh Nguyen, Dosam Hwang

Agents and Multi-Agent Systems

Frontmatter

An Agent-Based Collective Model to Simulate Peer Pressure Effect on Energy Consumption

This paper presents a novel model for simulating peer pressure effect on energy awareness and consumption of families. The model is built on two well-established theories of human behaviour to obtain realistic peer effect: the collective behaviour theory and the theory of cognitive dissonance. These theories are implemented in a collective agent-based model that produces fine-grained behaviour and consumption data based on social parameters. The model enables the application of different energy efficiency interventions which aim to obtain more aware occupants and achieve more energy saving. The presented experiments show that the implemented model reflects the human behaviour theories. They also provide examples of how the model can be used as an analytical tool to interpret the effect of energy interventions in the given social parameters and decide the optimal intervention needed in different cases.

Fatima Abdallah, Shadi Basurra, Mohamed Medhat Gaber

Agents’ Knowledge Conflicts’ Resolving in Cognitive Integrated Management Information System – Case of Budgeting Module

Nowadays management is supporting by using integrated management information systems, including multi-agent systems, where most often the relational or object databases are used. However, it becomes necessary not only to register, by IT systems, the values of economic phenomena’ attributes but also to automatically analyze their meaning. These functions can be realized by using, among others, the cognitive agents running in the frame of a multi-agent system. More often the knowledge of such agents is represented by using semantic methods. However, it often occurs that a multi-agent integrated management information system generates conflicts of knowledge among the agents. These conflicts result from the fact that agents may generate different decisions or solutions to the user, which, in turn, may result from different methods of decision making employed by the agents, different, heterogeneity data sources or different agents’ goals. The aim of this paper is to analyze the knowledge conflicts of cognitive agents and to develop a heuristic algorithm for these conflicts resolving in a Cognitive Integrated Management Information Systems.

Marcin Hernes, Anna Chojnacka-Komorowska, Adrianna Kozierkiewicz, Marcin Pietranik

Agent-Based Decision-Information System Supporting Effective Resource Management of Companies

The aim of the work is to propose a universal multi-agent environment for resource management in the enterprise. The system being developed is to be useful for employees of various divisions of the company: device operators, engineering staff optimizing the production process and senior management. The paper describes the architecture of the solution, which has a layered structure. The environment uses advanced techniques of artificial intelligence, including machine learning and negotiation algorithms. In the evaluation part, an implementation of a pilot version of the foundry management system is presented and a study of selected test scenarios is carried out.

Jarosław Koźlak, Bartłomiej Śnieżyński, Dorota Wilk-Kołodziejczyk, Stanisława Kluska-Nawarecka, Krzysztof Jaśkowiec, Małgorzata Żabińska

Evolutionary Multi-Agent System in Planning of Marine Trajectories

The paper considers application of agent-based computing system, namely Evolutionary Multi-Agent System, to solving a difficult yet interesting problem of a marine glider path planning. Different version of mutations are compared both for EMAS and evolutionary algorithm parametrized in the most possibly similar manner to EMAS and the observed results show that the EMAS is better in most of the experiments.

Maciej Gawel, Tomasz Jakubek, Aleksander Byrski, Marek Kisiel-Dorohinicki, Kamil Pietak, Daniel Hernandez

Airplane Boarding Strategies Using Agent-Based Modeling and Grey Analysis

The cost pressure is still one of the main concerns of the airline companies, and one of the possible means to reduce these costs can be done my minimizing the turn time of their fleet. Three processes are included in the turn time: the deplanation process, aircraft cleaning and passenger boarding. Among these, the passenger boarding is the part that takes the longest time and therefore is the most important one when reducing the turn time and its associated cost. Trying to minimize the time needed by the boarding procedure, a series of boarding techniques have been developed. As no complete agreement has been made in the literature over the best boarding technique, the present paper considers some of the most used techniques and simulates them on an A320 aircraft. To this extent, a NetLogo program is created and several situations are considered. Some of them, such as, whether the passengers are traveling with no luggage or with hand luggage are often considered in the literature. Besides them, a third case in which the passengers are delaying other passengers due to the fact that they are loading their luggage is implemented as we believe is closer to the reality. Different passengers loading are also considered ranging between 60%–100% aircraft occupancy in order to determine the boarding time. Starting from the determined durations, the grey incidence is used in order to determine the main factors influencing the airplane passengers boarding time, which could allow each company to decide the most appropriate boarding method.

Camelia Delcea, Liviu-Adrian Cotfas, Ramona Paun

Agent-Based Optimization of the Emergency Exits and Desks Placement in Classrooms

Even though the average number of structure fires in educational properties have fallen by 67% since 1980, the National Fire Protection Association has still recorded an average of 4980 structure fires (2011–2015), causing annual damages of 1 death, 70 injuries and $70 million in direct property damage. A series of studies have been conducted over the time in order to minimize the loses, with a particular focus on saving humans life. Thus, in order to reduce the evacuation time and the causalities, factors such as: the distance to exit, the density around the exit, room information, the presence of individuals with disabilities, heterogeneous population and obstacles have been considered. In this context, the present paper aims to determine if there is any connection between the structure of the classroom in terms of placing the exits and the desk placement. For this, a simulation is made using heterogeneous agents and a classroom with two exits. As the position of the exits is rather less changeable in real life as it depends directly on the building’s characteristics, the desk placement can be easily modified inside the classroom with effects on the evacuation time.

Camelia Delcea, Liviu-Adrian Cotfas, Ramona Paun

Agents Interaction and Queueing System Model of Real Time Control of Students Service Center Load Balancing

The problem of effective organization of Students service center (SSC) activities is considered. In this paper is proposed combine agents interaction and queuing system model for creation real time control of SSC load balancing. The developed combined model allows to minimize the number of required personnel resources and their idle time and to create adaptive, modular, well scalable system.

Malika Abdrakhmanova, Galimkair Mutanov, Zhanl Mamykova, Ualsher Tukeyev

Text Processing and Information Retrieval

Frontmatter

Handling Concept Drift and Feature Evolution in Textual Data Stream Using the Artificial Immune System

Data stream mining is an active research area that has attracted the attention of many researchers in the machine learning community. Discovering knowledge from large amounts of continuously generated data from online services and real time applications constitute a challenging task for data analytics where robust and efficient online algorithms are required. This paper presents a novel method for data stream mining. In particular, two main challenges of data stream processing are addressed, namely, concept drift and feature evolution in textual data streams. To address these issues, the proposed method uses the Artificial Immune System metaheuristic. AIS has powerful adapting capabilities which make it robust even in changing environments. Our proposed algorithm AIS-Clus has the ability to adapt its model to handle concept drift and feature evolution for textual data streams. Experimental results have been performed on textual dataset where efficient and promising results are obtained.

Amal Abid, Salma Jamoussi, Abdelmajid Ben Hamadou

A Tweet Summarization Method Based on Maximal Association Rules

A lot of information about different topics is posted by users on Twitter in just one second. People only want a way to get short, full, and accurate content which they are interested in receiving information. Tweet summarization to create that short text is a convenient solution to solve this problem. Many previous works were trying to solve the Tweet summarization problem. However, those researchers generated short texts based on the frequency of words in Tweet. They ignored word order in each Tweet. Moreover, they rarely considered the semantics of the words. This study tries to solve existing on above. The significant contribution of this study is to propose a new method to summary the semantics of the tweets based on mining the maximal association rules on a set of real data. The experiment results show that this proposal improves the accuracy of a summary, in comparison with other methods.

Huyen Trang Phan, Ngoc Thanh Nguyen, Dosam Hwang

DBpedia and YAGO Based System for Answering Questions in Natural Language

In this paper we propose a method for answering class 1 and class 2 questions (out of 5 classes defined by Moldovan for TREC conference) based on DBpedia and YAGO. Our method is based on generating dependency trees for the query. In the dependency tree we look for paths leading from the root to the named entity of interest. These paths (referenced further as fibers) are candidates for representation of actual user intention. The analysis of the question consists of three stages: query analysis, query breakdown and information retrieval. During these stages the entities of interest, their attributes and the question domain are detected and the question is converted into a SPARQL query against the DBpedia and YAGO databases. Improvements to the methods are presented and we discuss the quality of the modified solution. We present a system for evaluation of the implemented methods, showing that the methods are viable for use in real applications. We discuss the results and indicate future directions of the work.

Tomasz Boiński, Julian Szymański, Bartłomiej Dudek, Paweł Zalewski, Szymon Dompke, Maria Czarnecka

Bidirectional LSTM for Author Gender Identification

Author profiling consists in inferring the authors’ gender, age, native language, dialects or personality by examining his/her written text. This important task is a very active research area because of its utility in crime, marketing and business.In this paper, we address the problem of gender identification by applying the Long Short-Term Memory neural network architecture. Which is a novel type of recurrent network architecture that implements an appropriate gradient-based learning algorithm to overcome the vanishing-gradient problem. Experimental results show that our composition outperformed the traditional machine learning methods on gender identification.

Bassem Bsir, Mounir Zrigui

A New Text Semi-supervised Multi-label Learning Model Based on Using the Label-Feature Relations

Multi-label learning has become popular and omnipresent in many real-world problems, especially in text classification applications, in which an instance could belong to different classes simultaneously. Due to these label constraints, there are some challenges occurring in building multi-label data. Semi-supervised learning is one possible approach to exploit abundantly unlabeled data for enhancing the classification performance with a small labeled dataset. In this paper, we propose a solution to select the most influential label based on using the relations among the labels and features to a semi-supervised multi-label classification algorithm on texts. Experiments on two datasets of Vietnamese reviews and English emails of Enron show the positive effects of the proposal.

Quang-Thuy Ha, Thi-Ngan Pham, Van-Quang Nguyen, Minh-Chau Nguyen, Thanh-Huyen Pham, Tri-Thanh Nguyen

Sensor Networks and Internet of Things

Frontmatter

A DC Programming Approach for Worst-Case Secrecy Rate Maximization Problem

This paper is concerned with the problem of secure transmission for amplify-and-forward multi-antenna relay systems in the presence of multiple eavesdroppers. Specifically, spatial beamforming and artificial noise broadcasting are chosen as the strategy for secure transmission with robustness against imperfect channel state information of the intended receiver and the eavesdroppers. In such a scenario, the objective is to maximize the worst-case secrecy rate while guaranteeing the transmit power constraint at the relay and the norm-bounded channel uncertainty. We reformulate the problem as a general DC (Difference-of-Convex functions) program (i.e. minimizing a DC function under DC constraints) and develop a very inexpensive DCA based algorithm for solving it. Numerical results illustrate the effectiveness of the proposed algorithm and its superiority versus the existing approach.

Phuong Anh Nguyen, Hoai An Le Thi

System for Detailed Monitoring of Dog’s Vital Functions

Epilepsy is the most common neurological disorder, affecting 0.6% to 0.75% of dogs. However, it is quite difficult to recognize the start of epileptic seizures. Consequently, the purpose of this article is to explore the devices that may be able to help detect epileptic seizures in dogs, including a discussion of their benefits and limitations.We have designed a new solution because there are no suitable commercial devices or systems that can detect epileptic seizures in dogs, neither is there a solution for a potential detailed analysis of a dog’s vital functions. The vision for the future research is to use the data obtained from the created system for monitoring epileptic seizure in dogs.Several commercial sensors have been compared to determine their ability to monitor vital functions, focused on possible monitoring of epileptic seizures. Our system consists of a wearable sensor, a base station running Windows IoT, and a cloud server. This system enables us to monitor breath frequency and heartbeat, which might be used to detect an epileptic seizure.

David Sec, Jan Matyska, Blanka Klimova, Richard Cimler, Jitka Kuhnova, Filip Studnicka

Driver Supervisor System with Telegram Bot Platform

The biggest factor causes traffic accidents is human errors. Bad driving behavior could increase the risk factor of accidents. Besides, it would lead to damage the vehicle quickly. The bad behavior could come from the less awareness of the driver or the lack of knowledge in the manner of the good and safe driving behavior. Therefore, in this research, a system is developed to know the driver’s behavior while he/she is driving, especially for car rental agencies. The system will work as a supervisor which notify the driver and a car rental administrator when the driver makes a mistake based on desired rules and tell him/her what should be done. The driver will receive supervisions through text messages over a Telegram Channel and voicemails over a speaker. Utilizing vehicle diagnostic data which are retrieved directly from the vehicle using the On-Board Diagnostic (OBD) II unit, the driver’s behavior can be analyzed in real time. OBD II is connected with Raspberry Pi 2 and then integrated with Telegram Bot platform, adopting Internet of Things (IoT) technology. The results of this research show the complete driving log based on selected parameters and supervision messages received by the driver related to him/her mistakes. The field test generated 7098 lines of data in a log file, which is 1145 of them exceed the limit. At least 71 supervision messages received by the driver within limit of 4 messages per minute. The calculated bandwidth used is 68.8 bytes per minute per message and it is recommended to use at least 12.1 MB monthly data plan to accommodate 175,200 supervision messages monthly. Stable Internet connection is required more than high speed Internet to keep the real-time connection alive.

Emir Husni, Faisal Hasibuan

Multi-agent Base Evacuation Support System Using MANET

In this paper, we propose an evacuation support system that provides evacuation routes in the case of a disaster, and verify the usefulness of the system. In recent years, with the development of communication and portable device technologies, people can collect and disperse information using the Internet regardless of time and place. Current popular wireless communication infrastructure is supported by a series of base stations and one communication equipment in such a base station handles a lot of communication. Therefore, when problems occur at an equipment in such a communication base station, it may be difficult, even if possible, for the smartphones to use the Internet. In fact, in the 2011 off the Pacific coast of Tohoku Earthquake in Japan, we have observed a large-scale communication failure due to corruption of the communication equipment and traffic congestion. Paralyzed communication infrastructure made it difficult for people to collect information about the conditions of transportation and safety information about family and friends using smartphones. Our proposed system address this problem by using multiple kinds of mobile agents as well as static agents on smartphones that use a mobile ad hoc network (MANET). The proposed system collects information by mobile agents as well as diffuses information by mobile agents so that the system provides an optimal evacuation route for each user in a dynamically changing disaster environment.

Shohei Taga, Tomofumi Matsuzawa, Munehiro Takimoto, Yasushi Kambayashi

Analysis of Software Routing Solution Based on Mini PC Platform for IoT

The following article presents results of the research aimed at the possibility of software routing implemented into a Mini PC platform in IoT area. The article presents the basic principles of Quagga and Bird software routers. For every software router, its routing architecture in connection to the operating system is presented, including a draft of complex architecture of the software routing solution. In the research, emphasis was put on usability of the given platforms in software routing on the Mini PC platform. An analysis of Quagga and Bird platforms’ respective system resource requirements with the focus mainly on the CPU usage during routing on Humming-Board Gate and Raspberry Pi – Model B platforms, is also included. Furthermore, an analysis of the routing effectiveness using the given testing topology during the simultaneous use of both hardware solutions during the implementation of static and dynamic routing. Acquired results are presented using box graphs, with the course of long-term behavior during individual routings being depicted using 2nd period of moving average, which offers a relevant idea about the course of the routing and both routing daemons.

Josef Horalek, Vladimir Sobeslav

Data Mining Methods and Applications

Frontmatter

SVM Parameter Optimization Using Swarm Intelligence for Learning from Big Data

Support vector machine (SVM) is one of the most successful machine learning algorithms to solve practical pattern classification problems. The selection of the kernel function and its parameter plays a vital role on the results. Radius basis function (RBF) is a prevalently used kernel. For an RBF-SVM, two parameters, c and $$\gamma $$, control the SVM performance. In this paper, we present a SVM parameter learning algorithm, DL&BA, effective for learning from big data. The DL&BA algorithm has two stages. At the first stage, we use a distributed learning (DL) to search for a region which promises optimal parameter pairs. At the second stage, a swarm intelligent optimization algorithm - the Bees Algorithm (BA) is used to search for an optimal pair of c and $$\gamma $$. We applied the DL&BA algorithm to solving an important automotive safety problem, driver fatigue detection, which involves a large amount of real-world driving data. Our experimental results show that DL&BA is not only computational efficient but also effective in finding an optimal pair of c and $$\gamma $$.

Yongquan Xie, Yi Lu Murphey, Dev S. Kochhar

A CNN Model with Data Imbalance Handling for Course-Level Student Prediction Based on Forum Texts

Nowadays teaching and learning activities in a course are greatly supported by information technologies. Forums are among information technologies utilized in a course to encourage students to communicate with lecturers more outside a traditional class. Free-styled textual posts in those communications express the problems that the students are facing as well as the interest and activeness of the students with respect to each topic of a course. Exploiting such textual data in a course forum for course-level student prediction is considered in our work. Due to hierarchical structures in course forum texts, we propose a method in this paper which combines a deep convolutional neural network (CNN) and an adopted and adapted loss function for more correct recognitions of instances of the minority class which includes students with failure. A CNN model with data imbalance handling is a novel method appropriate for the course-level student prediction task. Indeed, through an empirical evaluation, our method has been confirmed to be an effective solution. Compared to other methods such as C4.5, Support Vector Machines, and Long-Short Term Memory networks, the proposed method can provide higher Accuracy, Precision, Recall, and F-measure on average for early predictions of the students with either success or failure in two different real courses. Such better predictions can help both students and lecturers beware of students’ study and support them in time for ultimate success in a course.

Phuc Hua Gia Nguyen, Chau Thi Ngoc Vo

Purity and Out of Bag Confidence Metrics for Random Forest Weighting

Random Forests are an ensemble classification technique that employs a committee of diverse decision trees to make predictive decisions based on training set observations. In the conventional RF algorithm, individual tree decisions are aggregated with equal weighting to arrive at a majority vote. Recent initiatives have found merit in the use of leaf node purity and out of bag sets for estimating the probability of an individual tree’s classification accuracy on unseen instances. This study proposes the concepts of Purity Gap Gain (PGG) and Relative Tree Confidence (RTC) as new ways of rating a decision tree’s classification competence and ultimately influencing the quality of the resulting ensemble decision. PGG extends the idea of leaf node purity by taking into account the rate of purity convergence and the depth at which it takes place. RTC is a comprehensive score which takes into account the confidence with which a tree makes both correct and incorrect out of bag classifications. Statistical tests based on UCI datasets demonstrate the significant relationship between a RF’s strength and the relative confidence of its decision trees. When applied to RFs with high strength, the proposed weighting methods demonstrate classification accuracy results that are predominantly comparable but at times superior to conventional approaches.

Mandlenkosi Victor Gwetu, Serestina Viriri, Jules-Raymond Tapamo

A New Computational Method for Solving Fully Fuzzy Nonlinear Systems

Predicting the solution of complex systems is a significant challenge. Complexity is caused mainly by uncertainty and nonlinearity. The nonlinear nature of many complex systems leaves uncertainty irreducible in many cases.In this work, a novel iterative strategy based on the feedback neural network is recommended to obtain the approximated solutions of the fully fuzzy nonlinear system (FFNS). In order to obtain the estimated solutions, a gradient descent algorithm is suggested for training the feedback neural network. An example is laid down in order to demonstrate the high accuracy of this suggested technique.

Raheleh Jafari, Sina Razvarz, Alexander Gegov

Facial Expression Recognition: A Survey on Local Binary and Local Directional Patterns

Automated facial and emotional recognition has been extensively applied in computer science, medical neuroscience, law enforcement and crowd monitoring. The study evaluates use of popular feature descriptors, Local Binary Pattern (LBP) and Local Directional Pattern (LDP) variants in facial expression recognition feature extraction. It then classifies results of the local facial features of major emotional states, namely neutral, anger, fear, extraction and expression identification using a combined ratio of classifiers called Voting Classifer. Databases used in the experiments involved Cohn-Kanade Database and the Googleset datasets and the expression classification rate of around 99.13% was achieved. The proposed solution included a hybrid of Local Directional Pattern (LBP), Local Directional Pattern (LDP) as the feature extraction algorithms and weighted ensemble of classifiers called voting classifier classification algorithm.

Kennedy Chengeta, Serestina Viriri

Energy-Based Centroid Identification and Cluster Propagation with Noise Detection

Clustering algorithms are used to partition an existing set of objects into groups according to similarity of their attributes. Parametric algorithms for determining initial points (centroids) and subsequent cluster propagation are proposed. The principle of competitive growth of clusters due to the absorption of boundary (contiguous) objects is used. The object is absorbed by that cluster or transferred from an adjacent cluster if it maximizes the total energy of cluster. The remaining objects that have not been clustered are classified as noise. Then the parameter identification problem for the algorithm is considered. Preliminary results on clustering and parameter identification are obtained on several public test data sets.

Alexander Krassovitskiy, Rustam Mussabayev

An Approach to Property Valuation Based on Market Segmentation with Crisp and Fuzzy Clustering

Property valuation is a complex and time-consuming process which is carried out by qualified real estate appraisers. Number of properties and number of purchase-sale transactions grows year by year. Mass real estate appraisal appears as another big problem. These issues are connected with deficiency of human and time resources. Therefore, numerous studies are carried out on computer systems which can support the real estate appraisers. Automated property valuation systems are also developed. A method utilizing clustering algorithms to automate property valuation according to sales comparison approach was proposed in this paper. A crisp and fuzzy clustering algorithms were employed to divide the properties located in a given city into a number of clusters. These clusters established the basis for property valuation process. The effectiveness of the proposed method was examined and compared with the real estate appraisal based on the spatial partition of an area of the city into cadastral regions and expert zones.

Adrian Malinowski, Mateusz Piwowarczyk, Zbigniew Telec, Bogdan Trawiński, Olgierd Kempa, Tadeusz Lasota

Predicting Solar Intensity Using Cluster Analysis

A key goal of smart grid initiatives is significantly increasing the fraction of grid energy contributed by renewable sources and especially from solar power. One challenge with integrating solar power into the grid is that its power generation is stochastic and depends on various environmental factors. Thus, predicting future energy generation is important to moderate the overall energy requirements. In recent years, the use of machine learning approaches to solar power forecasting is becoming very popular. In this paper, a clustering based data segmentation approach is used to find natural subgrouping in the data. These subgroups are then used to construct forecasting models using various machine learning algorithms. The effectiveness of the approach is demonstrated by comparing the accuracy of clustering based forecasting to the standard forecasting models. The experimental results demonstrate that the proposed clustering based models produce more accurate models.

Waseem Ahmad, Sahil Sahil, Aftab Mughal

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise