Skip to main content

2014 | Buch

Mining Intelligence and Knowledge Exploration

Second International Conference, MIKE 2014, Cork, Ireland, December 10-12, 2014. Proceedings

herausgegeben von: Rajendra Prasath, Philip O’Reilly, T. Kathirvalavakumar

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the proceedings of the Second International Conference on Mining Intelligence and Knowledge Exploration, MIKE 2014, held in Cork, Ireland, in December 2014. The 40 papers presented were carefully reviewed and selected from 69 submissions. The papers cover topics such as information retrieval, feature selection, classification, clustering, image processing, network security, speech processing, machine learning, recommender systems, natural language processing, language, cognition and computation, and business intelligence.

Inhaltsverzeichnis

Frontmatter
An Effective Term-Ranking Function for Query Expansion Based on Information Foraging Assessment

With the exponential growth of information on the Internet and the significant increase in the number of pages published each day have led to the emergence of new words in the Internet. Owning to the difficulty of achieving the meaning of these new terms, it becomes important to give more weight to subjects and sites where these new words appear, or rather, to give value to the words that occur frequently with them. For this reason, in this work, we propose an effective term-ranking function for query expansion based on the co-occurrence and proximity of words for retrieval effectiveness enhancement. A novel efficiency/effectiveness measure based on the principle of optimal information forager is also proposed in order to evaluate the quality of the obtained results. Our experiments were conducted using the OHSUMED test collection and show significant performance improvement over the state-of-the-art.

Ilyes Khennak, Habiba Drias, Hadia Mosteghanemi
Personalized Search over Medical Forums

The Internet is used extensively to obtain health and medical information. Many forums, blogs, news articles, specialized websites, research journals and other sources contain such information. However, though current general purpose and special purpose search engines cater to medical information retrieval, there are many gaps in this domain because the information retrieved is not tailored to the specific needs of the user. We argue that personalized requirement of the user needs to take into account the medical history and the demographics of the user in addition to the medical content that the user desires. For example, one user may desire factual information whereas another may desire a vicarious account of another person with respect to the same medical condition. Medical forums contain medical information on several medical topics and are rich in the demographics and the type of information that is available due to the large number of people participating throughout the world. However, present search engines on medical forums do not incorporate the type of information that the user is looking for and their demographics. In this paper we propose a novel approach for facilitating search on medical forums that provides medical information specially tailored to the user’s needs. Our experiments show that such an approach considerably outperforms the search presently available on medical forums…

Arohi Kumar, Amit Kumar Meher, Sudeshna Sarkar
Using Multi-armed Bandit to Solve Cold-Start Problems in Recommender Systems at Telco

Recommending best-fit rate-plans for new users is a challenge for the Telco industry. Rate-plans differ from most traditional products in the way that a user normally only have one product at any given time. This, combined with no background knowledge on new users hinders traditional recommender systems. Many Telcos today use either trivial approaches, such as picking a random plan or the most common plan in use. The work presented here shows that these methods perform poorly. We propose a new approach based on the multi-armed bandit algorithms to automatically recommend rate-plans for new users. An experiment is conducted on two different real-world datasets from two brands of a major international Telco operator showing promising results.

Hai Thanh Nguyen, Anders Kofod-Petersen
An Improved Collaborative Filtering Model Based on Rough Set

Collaborative filtering has been proved to be one of the most successful techniques in recommender system. However, a rapid expansion of Internet and e-commerce system has resulted in many challenges. In order to alleviate sparsity problem and recommend more accurately, a collaborative filtering model based on rough set is proposed. The model uses rough set theory to fill vacant ratings firstly, then adopts rough user clustering algorithm to classify each user to lower or upper approximation based on similarity, and searches the target user’s nearest neighborhoods and make top-

N

recommendations at last. Well-designed experiments show that the proposed model has smaller

MAE

than traditional collaborative filtering and collaborative filtering based on user clustering, which indicates that the proposed model performs better, and can improve recommendation accuracy effectively.

Xiaoyun Wang, Lu Qian
Exploring Folksonomy Structure for Personalizing the Result Merging Process in Distributed Information Retrieval

In this paper we present a new personalized approach that integrates a social profile in distributed search system. The proposed approach exploits the social profile and the different relations between social entities to : (i) make a query expansion, (ii) personalize and improve the result merging process in distributed information retrieval.

Zakaria Saoud, Samir Kechid, Radia Amrouni
Learning to Rank for Personalised Fashion Recommender Systems via Implicit Feedback

Fashion e-commerce is a fast growing area in online shopping. The fashion domain has several interesting properties, which make personalised recommendations more difficult than in more traditional domains. To avoid potential bias when using explicit user ratings, which are also expensive to obtain, this work approaches fashion recommendations by analysing implicit feedback from users in an app. A user’s actual behaviour, such as

Clicks

,

Wants

and

Purchases

, is used to infer her implicit preference score of an item she has interacted with. This score is then blended with information about the item’s

price

and

popularity

as well as the

recentness

of the user’s action wrt. the item. Based on these implicit preference scores, we infer the user’s ranking of other fashion items by applying different recommendation algorithms. Experimental results show that the proposed method outperforms the most popular baseline approach, thus demonstrating its effectiveness and viability.

Hai Thanh Nguyen, Thomas Almenningen, Martin Havig, Herman Schistad, Anders Kofod-Petersen, Helge Langseth, Heri Ramampiaro
Convergence Problem in GMM Related Robot Learning from Demonstration

Convergence problems can occur in some practical situations when using Gaussian Mixture Model (GMM) based robot Learning from Demonstration (LfD). Theoretically, Expectation Maximization (EM) is a good technique for the estimation of parameters for GMM, but can suffer problems when used in a practical situation. The contribution of this paper is a more complete analysis of the theoretical problem which arise in a particular experiment. The research question that is answered in this paper is how can a partial solution be found for such practical problem. Simulation results and practical results for laboratory experiments verify the theoretical analysis. The two issues covered are repeated sampling on other models and the influence of outliers (abnormal data) on the policy/kernel generation in GMM LfD. Moreover, an analysis of the impact of repeated samples to the CHMM, and experimental results are also presented.

Fenglu Ge, Wayne Moore, Michael Antolovich
Hybridization of Ensemble Kalman Filter and Non-linear Auto-regressive Neural Network for Financial Forecasting

Financial data is characterized as non-linear, chaotic in nature and volatile thus making the process of forecasting cumbersome. Therefore, a successful forecasting model must be able to capture long-term dependencies from the past chaotic data. In this study, a novel hybrid model, called UKF-NARX, consists of unscented kalman filter and non-linear auto-regressive network with exogenous input trained with bayesian regulation algorithm is modelled for chaotic financial forecasting. The proposed hybrid model is compared with commonly used Elman-NARX and static forecasting model employed by financial analysts. Experimental results on Bursa Malaysia KLCI data show that the proposed hybrid model outperforms the other two commonly used models.

Said Jadid Abdulkadir, Suet-Peng Yong, Maran Marimuthu, Fong-Woon Lai
Forecast of Traffic Accidents Based on Components Extraction and an Autoregressive Neural Network with Levenberg-Marquardt

In this paper is proposed an improved one-step-ahead strategy for traffic accidents and injured forecast in Concepción, Chile, from year 2000 to 2012 with a weekly sample period. This strategy is based on the extraction and estimation of components of a time series, the Hankel matrix is used to map the time series, the Singular Value Decomposition(SVD) extracts the singular values and the orthogonal matrix, and the components are forecasted with an Autoregressive Neural Network (ANN) based on Levenberg-Marquardt (LM) algorithm. The forecast accuracy of this proposed strategy are compared with the conventional process, SVD-ANN-LM achieved a MAPE of 1.9% for the time series Accidents, and a MAPE of 2.8% for the time series Injured, in front of 14.3% and 21.1% that were obtained with the conventional process.

Lida Barba, Nibaldo Rodríguez
Top-k Parametrized Boost

Ensemble methods such as AdaBoost are popular machine learning methods that create highly accurate classifier by combining the predictions from several classifiers. We present a parametrized method of AdaBoost that we call Top-k Parametrized Boost. We evaluate our and other popular ensemble methods from a classification perspective on several real datasets. Our empirical study shows that our method gives the minimum average error with statistical significance on the datasets.

Turki Turki, Muhammad Ihsan, Nouf Turki, Jie Zhang, Usman Roshan, Zhi Wei
Unsupervised Feature Learning for Human Activity Recognition Using Smartphone Sensors

Feature representation has a significant impact on human activity recognition. While the common used hand-crafted features rely heavily on the specific domain knowledge and may suffer from non-adaptability to the particular dataset. To alleviate the problems of hand-crafted features, we present a feature extraction framework which exploits different unsupervised feature learning techniques to learning useful feature representation from accelerometer and gyroscope sensor data for human activity recognition. The unsupervised learning techniques we investigate include sparse auto-encoder, denoising auto-encoder and PCA. We evaluate the performance on a public human activity recognition dataset and also compare our method with traditional features and another way of unsupervised feature learning. The results show that the learned features of our framework outperform the other two methods. The sparse auto-encoder achieves better results than other two techniques within our framework.

Yongmou Li, Dianxi Shi, Bo Ding, Dongbo Liu
Influence of Weak Labels for Emotion Recognition of Tweets

Research on emotion recognition of tweets focuses on feature engineering or algorithm design, while dataset labels are barely questioned. Datasets of tweets are often labelled manually or via crowdsourcing, which results in strong labels. These methods are time intensive and can be expensive. Alternatively, tweet hashtags can be used as free, inexpensive weak labels. This paper investigates the impact of using weak labels compared to strong labels. The study uses two label sets for a corpus of tweets. The weakly annotated label set is created employing the hashtags of the tweets, while the strong label set is created by the use of crowdsourcing. Both label sets are used separately as input for five classification algorithms to determine the classification performance of the weak labels. The results indicate only a 9.25% decrease in f1-score when using weak labels. This performance decrease does not outweigh the benefits of having free labels.

Olivier Janssens, Steven Verstockt, Erik Mannens, Sofie Van Hoecke, Rik Van de Walle
Focused Information Retrieval & English Language Instruction: A New Text Complexity Algorithm for Automatic Text Classification

The purpose of the present study was to delineate a range of linguistic features that characterize the English reading texts used at the B2 (Independent User) and C1 (Advanced User) level of the Greek State Certificate of English Language Proficiency (KPG) exams in order to better define text complexity per level of competence. The main outcome of the research was the L.A.S.T. Text Difficulty Index that makes possible the automatic classification of B2 and C1 English reading texts based on four in-depth linguistic features, i.e.

lexical density

,

syntactic structure similarity

,

tokens per word family

and

academic vocabulary

. Given that the predictive accuracy of the formula has reached 80% on a new set of reading comprehension texts with 32 out of the 40 new texts assigned to similar levels by both raters, the practical usefulness of the index might extend to EFL testers and materials writers, who are in constant need of calibrated texts.

Trisevgeni Liontou
Efficient Handwritten Numeral Recognition System Using Leaders of Separated Digit and RBF Network

In this paper an efficient method has been proposed to classify handwritten numerals using leader algorithm and Radial Basis Function network. Handwritten numerals are represented in matrix form and clusters with leaders are formed for each row of each digit separately. Every leader is with single target digit. Duplication patterns are avoided from the cluster leaders by combining those in a single pattern with target vectors having corresponding bits in on mode. Now resultant target vectors are with 10 bits corresponding to the number of digits considered for classification. Constructed leaders are trained using Radial Basis Function network. Experimental results show that the minimum number of patterns are enough for training compared to total patterns and it has been observed that convergency is fast during training. Also the number of resultant leaders after avoiding duplication patterns are less and the number of bits in each resultant pattern is 12.

Thangairulappan Kathirvalavakumar, M. Karthigai Selvi, R. Palaniappan
Iterative Clustering Method for Metagenomic Sequences

Metagenomics studies microbial DNA of environmental samples. The sequencing tools produce a set of genome fragments providing a challenge for metagenomics to associate them with the corresponding phylogenetic group. To solve this problem there are binning methods, which are classified into two sequencing categories: similarity and composition. This paper proposes an iterative clustering method, which aim at achieving a low sensitivity of clusters. The approach consists of iteratively run

k

-means reducing the training data in each step. Selection of data for next iteration depends on the result obtained in the previous, which is based on the compactness measure. The final performance clustering is evaluated according with the sensitivity of clusters. The results demonstrate that proposed model is better than the simple

k

-means for metagenome databases.

Isis Bonet, Widerman Montoya, Andrea Mesa-Múnera, Juan Fernando Alzate
Is There a Crowd? Experiences in Using Density-Based Clustering and Outlier Detection

The massive growth of GPS equipped smartphones coupled with the increasing importance of Social Media has led to the emergence of new location-based services over LBSNs (Location-based Social Networks) which allow citizens to act as social sensors reporting about their locations. This proactive social reporting might be beneficial for researchers in a wide number of scenarios like the one addressed in this paper: monitoring crowds in the city involving an assembly of individuals in term of size, duration, motivation, cohesion and proximity. We introduce a methodology for crowd-detection that combines social data mining, density-based clustering and outlier detection into a solution that can operate on-the-fly to

predict

public crowds

, i.e. to foresee, in short term, the formation of potential multitudes based on the prior analysis of the region. Twitter is mined to analyze geo-tagged data in New York at New Year’s Eve, so that those predictable public crowds are discovered.

Mohamed Ben Kalifa, Rebeca P. Díaz Redondo, Ana Fernández Vilas, Rafael López Serrano, Sandra Servia Rodríguez
Detecting Background Line as Preprocessing for Offline Signature Verification

Hand-written signature is commonly used for authenticating a person. Extraction of desired features from the captured signature image is crucial for automated signature verification. Presence of a background line (which may be a part of a paper form) is common in an offline signature. Removal of this kind of background line is necessary for correct extraction of features. But sometimes, a signature contains a line as part of it. This paper shows how intensity distributions can distinguish a background line from a line which is part of a signature.

K. Rakesh, Rajarshi Pal
Application of Zero-Frequency Filtering for Vowel Onset Point Detection

Vowel onset points in speech signals, are the instances where the voicing of the vowels begin. These points serve as important landmarks for the analysis as well as synthesis of speech signals. These landmarks help to identify the information about the behaviour of transition of several different sounds into and out of the vowel regions. In this paper, we propose a new method to identify vowel onset points for a speech signal using the zero frequency filtered (ZFF) speech signal and its frequency spectrum. The ZFF signal is obtained by passing the speech signal through a resonator with central frequency as 0 Hz. Therefore, ZFF signal essentially contains the low pass components of a given speech signal. Vowels are mostly characterized by the significant energy content in the relatively low frequency bands. Significant improvement in VOP detection performance is observed using proposed method compared to existing methods.

Anil Kumar Vuppala
A Hybrid PSO Model for Solving Continuous p-median Problem

p

-Median problem is one of the most applicable problem in the areas of supply chain management and operation research. There are various versions of these problems. Continuous p-median is one of them where the facility points and the demand points lie in an ’n’ dimensional hyperspace. It has been proved that this problem is NP-complete and most of the algorithms that have been defined are mere approximations. In this paper, we present a meta-heuristic based approach that calculates the median points given a set of demand points with arbitrary demands. The algorithm is a combination of genetic algorithms, particle swarm optimization and a number of novel techniques that aims to further improve the result. The algorithm is tested on known data sets as and we show’s its performance in comparison to other known algorithms applied on the same problem.

Silpi Borah, Hrishikesh Dewan
Bees Swarm Optimization for Web Information Foraging

The present work is related to Web intelligence and more precisely to Wisdom Web foraging. The idea is to learn the localization of the most relevant Web surfing path that might interest the user. We propose a novel approach based on bees behaviour for information foraging. We implemented the system using a colony of cooperative reactive agents. In order to validate our proposal, experiments were conducted on MedlinePlus, a benchmark dedicated for research in the domain of Health. The results are promising either for those related to some Web regularities and for the response time, which is very short and hence complies with the real time constraint.

Yassine Drias, Samir Kechid
Modeling Cardinal Direction Relations in 3D for Qualitative Spatial Reasoning

Many fundamental geoscience concepts and tasks require advanced spatial knowledge about the topology, orientation, shape, and size of spatial objects. Besides topological and distance relations, cardinal directions also can play a prominent role in the determination of qualitative spatial relations; one of the facets of spatial objects is the determination of relative positioning of objects. In this paper, we present an efficient approach to representing and determining cardinal directions between free form regions. The development is mathematically sound and can be implemented more efficiently than the existing models. Our approach preserves converseness of direction relations between pairs of objects, while determining directional relations between gridded parts of the complex regions. All the essential details are in 2D. Yet the extension to 3D is seamless; it needs no additional formulation for transition from 2D to 3D. Furthermore, the extension to 3D and construction of a composition table has no adverse impact on the computational efficiency, as the technique is akin to 2D.

Chaman L. Sabharwal, Jennifer L. Leopold
Qualitative Spatial Reasoning in 3D: Spatial Metrics for Topological Connectivity in a Region Connection Calculus

In qualitative spatial reasoning, there are three distinct properties for reasoning about spatial objects: connectivity, size, and direction. Reasoning over combinations of these properties can provide additional useful knowledge. To facilitate end-user spatial querying, it also is important to associate natural language with these relations. Some work has been done in this regard for line-region and region-region topological relations in 2D, and very recent work has initiated the association between natural language, topology, and metrics for 3D objects. However, prior efforts have lacked rigorous analysis, expressive power, and completeness of the associated metrics. Herein we present new metrics to bridge the gap required for integration between topological connectivity and size information for spatial reasoning. The new set of metrics that we present should be useful for a variety of applications dealing with 3D objects.

Chaman L. Sabharwal, Jennifer L. Leopold
Context-Aware Case-Based Reasoning

In the recent years, there has been an increasing interest in ubiquitous computing. This paradigm is based on the idea that software should act according to the context where it is executed in what is known as context-awareness. The goal of this paper is to integrate context-awareness into case-based reasoning (CBR). To this end we propose thee methods which condition the retrieval and the reuse of information in CBR depending on the context of the query case. The methodology is tested using a breast-cancer diagnose database enriched with geospatial context. Results show that context-awareness can improve CBR.

Albert Pla, Jordi Coll, Natalia Mordvaniuk, Beatriz López
Determining the Customer Satisfaction in Automobile Sector Using the Intuitionistic Fuzzy Analytical Hierarchy Process

Customer satisfaction is an important factor sustaining the business and its further development of the organization. To retain the customer is one of the important task in production industries. In these days of high competition customer satisfaction is very much essential, but uncertainty creeps. Analytical hierarchy process (AHP) is an important theory in the decision making problem. In this work we are combining Intuitionistic Fuzzy Analytical Process (IFAHP).The intuitionistic fuzzy set is able to give a very good outcome on uncertainty, and vagueness. Therefore the objective of the work is using Intuitionistic fuzzy analytical hierarchy process (IFAHP) to determine the customer satisfaction

.

S. Rajaprakash, R. Ponnusamy, J. Pandurangan
Pattern Based Bootstrapping Technique for Tamil POS Tagging

Part of speech (POS) tagging is one of the basic preprocessing techniques for any text processing NLP application. It is a difficult task for morphologically rich and partially free word order languages. This paper describes a Part of Speech (POS) tagger of one such morphologically rich language, Tamil. The main issue of POS tagging is the ambiguity that arises because different POS tags can have the same inflections, and have to be disambiguated using the context. This paper presents a pattern based bootstrapping approach using only a small set of POS labeled suffix context patterns. The pattern consists of a stem and a sequence of suffixes, obtained by segmentation using a suffix list. This bootstrapping technique generates new patterns by iteratively masking suffixes with low probability of occurrences in the suffix context, and replacing them with other co-occurring suffixes. We have tested our system with a corpus containing 20,000 Tamil documents having 2,71,933 unique words. Our system achieves a precision of 87.74%.

Jayabal Ganesh, Ranjani Parthasarathi, T. V. Geetha, J. Balaji
Anaphora Resolution in Tamil Novels

We have presented a robust anaphora resolution system for Tamil, one of Indian languages belonging to Dravidian language family. We have used Conditional Random Fields (CRFs), a machine learning technique with linguistically motivated features. We have performed exhaustive experiments using data from different genres and domains and evaluated for portability and scalability. We have obtained an average accuracy of 64.83% across texts of different domains/genres. The results obtained are encouraging and comparable with earlier reported works.

A. Akilandeswari, Sobha Lalitha Devi
An Efficient Tool for Syntactic Processing of English Query Text

A large amount of work has been done on syntactic analysis of English texts. But, for analyzing the short phrases without any structured contexts like capitalization, subject-object-verb order, etc. these techniques are not yet proved to be appropriate. In this paper we have attempted the syntactic analysis of the phrases where contextual information is not available. We have developed stemmer, POS tagger, chunker and Named Entity tagger for English short phrases like chats, messages, and queries, using root dictionary and language specific rules. We have evaluated the technique on English queries and observed that our system outperforms some commonly used NLP tools.

Sanjay Chatterji, G. S. Sreedhara, Maunendra Sankar Desarkar
A Tool for Converting Different Data Representation Formats

Recently, data analysis and processing is one of the most interesting and demanding fields in both academics and industries. There are large numbers of tools openly available in web. But, different tools take inputs and return outputs in different data representation formats. To build the appropriate converter for a pair of data representation formats, we need both sufficient time and in depth knowledge of the formats. Here, we discuss CoNLL, SSF, XML and JSON data representation formats and develop a tool for conversion between them. Other conversions will be included in the extended version.

Sanjay Chatterji, Subrangshu Sengupta, Bagadhi Gopal Rao, Debarghya Banerjee
Generating Object-Oriented Semantic Graph for Text Summarisation

In this research paper we propose to extend the semantic graph representation of natural language text to object-oriented semantic graph representation and generate a summary of the original text from this graph. We have provided rules to construct the object-oriented semantic graph and rules to generate the text summary from it. This process has been elaborated through a case study on a news story. An evaluation of the generated summary shows the effectiveness of the proposed approach. This work is a new direction in single document text summarisation research area from semantic perspective and requires further analysis and exploration.

Monika Joshi, Hui Wang, Sally McClean
Ontology-Based Information Extraction from the Configuration Command Line of Network Routers

Knowledge extraction is increasingly attracting the attention of researchers from different disciplines, as a means to automate complex tasks that rely on bulk textual resources. However, the configuration of many devices in the networking field continues to be a labor intensive task, based on the human interpretation and manual entry of commands through a text-based user interface. Typically, these Command-Line Interfaces (CLIs) are both device and vendor-specific, and thus, commands differ syntactically and semantically for each configuration space. Because of this heterogeneity, CLIs always provide a

“help”

feature—i.e., short command descriptions encoded in natural language—aimed to unveil the semantics of configuration commands for network administrators. In this paper, we exploit this feature with the aim of automating the abstraction of device configurations in heterogeneous settings. In particular, we introduce an Ontology-Based Information Extraction (OBIE) system from the Command-Line Interface of network routers. We also present ORCONF, a domain Ontology for the Router CONFiguration domain, and introduce a semantic relatedness measure that quantifies the degree of interrelation among candidate concepts. The results obtained over the configuration spaces of two widely used network routers demonstrate that this is a promising line of research, with overall percentages of precision and recall of 93%, and 91%, respectively.

Anny Martínez, Marcelo Yannuzzi, René Serral-Gracià, Wilson Ramírez
Using Association Rule Mining to Find the Effect of Course Selection on Academic Performance in Computer Science I

It is important for first year students in higher educational institutions to get the best advice and information with regards to course selection and registration. During registration students select the courses and number of courses they would like to enroll into. The decisions made during registration are done with the assistance of academics and course coordinators. This study focuses on the first year Computer Science students and their overall academic performance in first year. Computer Science I has Mathematics as a compulsory co-requisite, therefore after selecting Computer Science I, the students have to enroll into Mathematics and then select two additional courses. Can data mining techniques assist in identifying the additional courses that will yield towards the best academic performance? Using a modified version of the CRISP-DM methodology this work applies an Association Rule Mining algorithm to first year Computer Science data from 2006 to 2012. The Apriori algorithm from the WEKA toolkit was used. This algorithm was used to select the best course combinations with Computer Science I and Mathematics I. The results showed a good relationship between Computer Science I and Biology on its own, Biology with Chemistry and Psychology with Economics. Most of the rules that were produced had good accuracy results as well. These results are consistent in related literature with areas such as Bio-informatics combining Biology and Computer Science.

Lebogang Mashiloane
Information Extraction from Hungarian, English and German CVs for a Career Portal

Recruiting employees is a serious issue for many enterprises. We propose here a procedure to automatically analyse uploaded CVs then prefill the application form which can save a considerable amount of time for applicants thus it increases user satisfaction. For this purpose, we shall introduce a high-recall CV parsing system for Hungarian, English and German. We comparatively evaluate two approaches for providing training data to our machine learning machinery and discuss other experiences gained.

Richárd Farkas, András Dobó, Zoltán Kurai, István Miklós, Ágoston Nagy, Veronika Vincze, János Zsibrita
Fuzzy Cognitive Map of Research Team Activity

A cognitive model of activity of research team is considered. Object of research is a R&D department (223 persons) in a large scientific and industrial enterprise for sea prospecting works. The model (fuzzy cognitive map) that represents the activity of this department is based on the results of applied sociological research. The fuzzy cognitive map containing 14 concepts, divided into 3 groups (Personal, Group and Organizational). The list of concepts, their initial values and weight matrix are based on an assessment of several experts from the studied organization. The behavior of target concepts at various values of parameters of model is studied results testify to the favorable tendencies in department activity.

Evgenii Evseev, Ivan Kovalev
Knowledge Acquisition for Automation in IT Infrastructure Support

In todays IT-driven world, the IT Infrastructure Support (ITIS) unit aims for effective and efficient management of IT infrastructure of large and modern organizations. Automatic issue resolution is crucial for operational efficiency and agility of ITIS. For manually creating such automatic issue resolution processes, a Subject Matter Expert (SME) is required. Our focus is on acquiring SME knowledge for automation. Additionally, the number of distinct issues is large and resolution of issue instances requires repetitive application of resolver knowledge. Operational logs generated from the resolution process of issues, is resolver knowledge available in tangible form.

We identify functional blocks from the operational logs, as potential standard operators, which the SME will validate and approve. We algorithmically consolidate all the steps the resolvers have performed historically during the resolution process for a particular issue, and present to the SME a graphical view of the consolidation for his assessment and approval. We transform the graphical view into a set of rules along with the associated standard operators and finally ensemble them into a parametrized service operation in tool agnostic language. For an ITIS automation system, it is transformed into a configuration file of a targeted orchestrator tool. Bash and powershell script transformations of service operations are executed by resolvers manually or via an automation web portal.

Sandeep Chougule, Trupti Dhat, Veena Deshmukh, Rahul Kelkar
Developing eXtensible mHealth Solutions for Low Resource Settings

Over the last ten years there has been a proliferation of mHealth solutions to support patient diagnosis and treatment. Coupled with this, increased attention and resources have been attributed to the development of technologies to improve patient health care outcomes in low resource settings. Most significantly, it is the development of highly extensible, portable and scalable technologies which have received the most attention. As part of an mHealth intervention in Malawi Africa, an agnostic clinical guideline decision-support rule engine has been developed which uses classification and treatment rules for assessing a sick child defined in XML; namely, Integrated Management of Childhood Illness (IMCI) and Community Case Management (CCM). Using a two-phased approach, 1) the rules underpinning the cloud-based mobile eCCM application were devised based on the widely accepted WHO/UNICEF paper based guidelines and 2) subsequently validated and extended through a user workshop conducted in Malawi, Africa.

Yvonne O’ Connor, Timothy O’ Sullivan, Joe Gallagher, Ciara Heavin, John O’ Donoghue
Observations of Non-linear Information Consumption in Crowdfunding

The number and scale of crowdfunding platforms has grown rapidly within a short period of time. While some of these platforms offer donors specific financial or material rewards, others ask donors to contribute to campaigns for charitable or philanthropic reasons. This fundamental difference means it is difficult to model interpersonal communication and information consumption in simple linear economic terms. Yet to date the dominant research paradigm for charitable crowdfunding has done exactly that. This study seeks to investigate and model non-linear information consumption based upon a field study of Pledgie.com, an established charitable crowdfunding platform. Quantitative analyses of data from over 5,000 individual crowdfunding campaigns reveal several curvilinear relationships between the information provided and the level of funding received. This suggests that information consumption by the donor community is more accurately modelled across two distinct stages, the first of which is discussion-based, the second of which is based on donation.

Rob Gleasure, Joseph Feller
The Linked Data AppStore
A Software-as-a-Service Platform Prototype for Data Integration on the Web

This paper introduces The Linked Data AppStore (LD-AppStore) – a Software-as-a-Service platform prototype for data integration on the Web. Building upon emerging Linked Data technologies, the LD-AppStore targets data scientists/engineers (interested in simplifying tasks such as data cleaning, transformation, entity extraction, data visualization, crawling, etc.) as well as data integration tool developers (interested in exploiting the use of their tools by data engineers). This paper provides an overview of the architecture of the LD-AppStore, the APIs of the basic data operations supported by the platform, presents a set of data integration workflows, and discusses the current status of the implementation.

Dumitru Roman, Claudia Daniela Pop, Roxana I. Roman, Bjørn Magnus Mathisen, Leendert Wienhofen, Brian Elvesæter, Arne J. Berre
Geospatial Decision Support Systems: Use of Criteria Based Spatial Layers for Decision Support in Monitoring of Operations

This paper brings out a conceptual approach for geospatial analysis for decision making process while monitoring operations. It capitulates on the ability of GDSS to support GIS layers. The decisions, decision making processes and the time involved are all broken down to a function of criteria over a set of information. This information is recommended to be stored in layers in special criteria based spatial form which reduces the processing requirements to simple manipulation of specific independent layers. This enables incremental decisions based on adequate set of data as and when required, which is akin to anytime algorithm. The human and machine analyses are integrated in a geo-visual analytical model. A case study of evacuation of an injured person from a mine using helicopter is presented. In this example all advantages of the concept are proven.

Shanmugavelan Velan
Malware Detection in Big Data Using Fast Pattern Matching: A Hadoop Based Comparison on GPU

In big data environment, hadoop stores the data in distributed file systems called hadoop distributed file system and process the data using parallel approach. When the cloud users store unstructured data in cloud storage, it becomes very important for cloud providers to secure those data. To provide malware security, cloud service providers should scan the whole contents of the database, which is a very time intensive job. It may even take days to complete the tasks. The main aim of the proposed work is to reduce the processing time by introducing Graphics Processing Unit (GPU) in hadoop cluster. The proposed work integrates two text pattern matching algorithms with the map-reduce programming model for faster detection of malware in big data. The results of our study indicate that use of GPU decreases the processing time of text pattern matching algorithms in big data hadoop.

Chhabi Rani Panigrahi, Mayank Tiwari, Bibudhendu Pati, Rajendra Prasath
Design and Implementation of Key Distribution Algorithms of Public Key Cryptography for Group Communication in Grid Computing

The group communication involves the association of various nodes to perform various tasks that depend on communication and resources sharing. In a group communication secured resource sharing should be ensured. In order to have a secured communication a simple and competent security methods are vital. The centralized authentication is required for preventing the hackers from intruding the group. To prevent the different attack in the network and to determine the effectiveness of the key management algorithms like access control polynomial are already in use. This paper proposes design and implementation of key distribution algorithm for secure communication in Grid Computing. This paper gives details of implementation of algorithms using Euclidian and prime number based key generation. The time and security analysis for group communication using the proposed algorithms have been carried out and reported.

M. Ragunathan, P. Vijayavel
Backmatter
Metadaten
Titel
Mining Intelligence and Knowledge Exploration
herausgegeben von
Rajendra Prasath
Philip O’Reilly
T. Kathirvalavakumar
Copyright-Jahr
2014
Verlag
Springer International Publishing
Electronic ISBN
978-3-319-13817-6
Print ISBN
978-3-319-13816-9
DOI
https://doi.org/10.1007/978-3-319-13817-6

Premium Partner