Skip to main content

About this book

This book constitutes the thoroughly refereed proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2017, held in Funchal, Madeira, Portugal, in November 2017.

The 19 full papers presented were carefully reviewed and selected from 157 submissions. The papers are organized in topical sections on knowledge discovery and information retrieval; knowledge engineering and ontology development; and knowledge management and information sharing.

Table of Contents


Knowledge Discovery and Information Retrieval


Transfer Learning in Sentiment Classification with Deep Neural Networks

Cross-domain sentiment classifiers aim to predict the polarity (i.e. sentiment orientation) of target text documents, by reusing a knowledge model learnt from a different source domain. Distinct domains are typically heterogeneous in language, so that transfer learning techniques are advisable to support knowledge transfer from source to target. Deep neural networks have recently reached the state-of-the-art in many NLP tasks, including in-domain sentiment classification, but few of them involve transfer learning and cross-domain sentiment solutions. This paper moves forward the investigation started in a previous work [1], where an unsupervised deep approach for text mining, called Paragraph Vector (PV), achieved cross-domain accuracy equivalent to a method based on Markov Chain (MC), developed ad hoc for cross-domain sentiment classification. In this work, Gated Recurrent Unit (GRU) is included into the previous investigation, showing that memory units are beneficial for cross-domain when enough training data are available. Moreover, the knowledge models learnt from the source domain are tuned on small samples of target instances to foster transfer learning. PV is almost unaffected by fine-tuning, because it is already able to capture word semantics without supervision. On the other hand, fine-tuning boosts the cross-domain performance of GRU. The smaller is the training set used, the greater is the improvement of accuracy.
Andrea Pagliarani, Gianluca Moro, Roberto Pasolini, Giacomo Domeniconi

Prediction and Trading of Dow Jones from Twitter: A Boosting Text Mining Method with Relevant Tweets Identification

Previous studies claim that financial news influence the movements of stock prices almost instantaneously, however the poor foreseeability of news limits their possibility of predicting the stock price changes and trading actions. Recently complex sentiment analysis techniques have also showed that large amount of social network posts can predict the price movements of the Dow Jones Industrial Average (DJIA) within a less stringent timescale. From the idea that the contents of social posts can forecast the future stock trading actions, in this paper we present a simpler text mining method than the sentiment analysis approaches, which extracts the predictive knowledge of the DJIA movements from a large dataset of tweets, boosting also the prediction accuracy by identifying and filtering out irrelevant/noisy tweets. The noise detection technique we introduced improves the initial effectiveness of more than 10%. We tested our method on 10 millions twitter posts spanning one year, achieving an accuracy of 88.9% in the Dow Jones daily predictions, which, to the best of our knowledge, improves the best literature result based on social networks. Finally we have used the prediction method to drive the DJIA buy/sell actions of a trading protocol; the achieved return on investments (ROI) outperforms the state-of-the-art.
Gianluca Moro, Roberto Pasolini, Giacomo Domeniconi, Andrea Pagliarani, Andrea Roli

Behavioural Biometric Continuous User Authentication Using Multivariate Keystroke Streams in the Spectral Domain

Continuous authentication is significant with respect to many online applications where it is desirable to monitor a user’s identity throughout an entire session; not just at the beginning of the session. One example application domain, where this is a requirement, is in relation to Massive Open Online Courses (MOOCs) when users wish to obtain some kind of certification as evidence that they have successfully competed a course. Such continuous authentication can best be realised using some forms of biometric checking; traditional user credential checking methods, for example username and password checking, only provide for “entry” authentication. In this paper, we introduce a novel method for the continuous authentication of computer users founded on keystroke dynamics (keyboard behaviour patterns); a form of behavioural biometric. The proposed method conceptualises keyboard dynamics in terms of a Multivariate-Keystroke Time Series which in turn can be transformed into the spectral domain. The time series can then be monitored dynamically for typing patterns that are indicative of a claimed user. Two transforms are considered, the Discrete Fourier Transform and the Discrete Wavelet Transform. The proposed method is fully described and evaluated, in the context of impersonation detection, using real keystroke datasets. The reported results indicate that the proposed time series mechanism produced an excellent performance, outperforming the comparator approaches by a significant margin.
Abdullah Alshehri, Frans Coenen, Danushka Bollegala

Constructing Language Models from Online Forms to Aid Better Document Representation for More Effective Clustering

Clustering is the practice of finding tacit patterns in datasets by grouping the corpus by similarity. When clustering documents this is achieved by converting the corpus into a numeric format and applying clustering techniques to this new format. Values are assigned to terms based on their frequency within a particular document, against their general occurrence in the corpus. One obstacle in achieving this aim is as a result of the polysemic nature of terms. That is words having multiple meanings; each intended meaning only being discernible when examining the context in which they are used. Thus, disambiguating the intended meaning of a term can greatly improve the efficacy of a clustering algorithm. One approach to achieve this end has been done through the creation of an ontology - Wordnet, which can act as a look-up as to the intended meaning of a term. Wordnet however, is a static source and does not keep pace with the changing nature of language. The aim of this paper is to show that while Wordnet can be affective, however it is static in nature and thus does not capture some contemporary usage of terms. Particularly when the dataset is taken from online conversation forums, who would not be structured in a standard document format. Our proposed solution involves using Reddit as a contemporary source which moves with new trends in word usage. To better illustrate this point we cluster comments found in online threads such as Reddit and compare the efficacy of different representations of these document sets.
Stephen Bradshaw, Colm O’Riordan, Daragh Bradshaw

Identification of Relevant Hashtags for Planned Events Using Learning to Rank

Lots of planned events (e.g. concerts, sports matches, festivals, etc.) keep happening across the world every day. In various applications like event recommendation, event reporting, etc. it might be useful to find user discussions related to such events from social media. Identification of event related hashtags can be useful for this purpose. In this paper, we focus on identifying the top hashtags related to a given event. We define a set of features for (event, hashtag) pairs, and discuss ways to obtain these feature scores. A linear aggregation of these scores is used to finally output a ranked list of top hashtags for the event. The aggregation weights of the features are obtained using a learning to rank algorithm. We establish the superiority of our method by performing detailed experiments on a large dataset containing multiple categories of events and related tweets.
Sreekanth Madisetty, Maunendra Sankar Desarkar

Investigation of Passage Based Ranking Models to Improve Document Retrieval

Passage retrieval deals with identifying and retrieving small but explanatory portions of a document that answers a user’s query. In this paper, we focus on improving the document ranking by using different passage based evidence. Several similarity measures were evaluated and a more in-depth analysis was undertaken into the effect of varying specific. We have also explored the notion of query difficulty to understand whether the best performing passage-based approach helps to improve, or not, the performance of certain queries. Experimental results indicate that for the passage level technique, the worst-performing queries are damaged slightly and the those that perform well are boosted for the WebAp collection. However, our rank-based similarity function boosted the performance of the difficult queries in the Ohsumed collection.
Ghulam Sarwar, Colm O’Riordan, John Newell

Robust Single-Document Summarizations and a Semantic Measurement of Quality

The goal of this paper is to generate an effective summary for a given document with specific realtime requirements. We use the softplus function to enhance keyword rankings to favor important sentences, based on which we present a number of extractive summarization algorithms using various keyword extraction and topic clustering methods. We show that our algorithms not only meet the realtime requirements but also yield the best ROUGE scores on DUC-02 over all previously-known algorithms. We also evaluate our summarization methods over the SummBank dataset and other datasets to ensure that our methods are robust. Experiments show that summaries generated by our methods achieve higher or about the same ROUGE scores than extractive summaries generated by human evaluators. Moreover, we define a semantic measure based on word-embedding using Word Mover’s Distance to evaluate the quality of summaries without human-generated benchmarks. We show that for our algorithms, the orderings of the ROUGE scores and the scores under the new measure are highly comparable, suggesting that this new measure may serve as a viable alternative for measuring the quality of a summary.
Liqun Shao, Hao Zhang, Jie Wang

Adaptive Cluster Based Discovery of High Utility Itemsets

Utility Itemset Mining (UIM) is a key analysis technique for data which is modeled by the Transactional data model. While improving the computational time and space efficiency of the mining of itemsets is important, it is also critically important to predict future itemsets accurately. In today’s time, when both scientific and business competitive edge is commonly derived from first access to knowledge via advanced predictive ability, this problem becomes increasingly relevant. We established in our most recent work that having prior knowledge of approximate cluster structure of the dataset and using it implicitly in the mining process, can lend itself to accurate prediction of future itemsets. We evaluate the individual strength of each transaction while focusing on itemset prediction, and reshape the transaction utilities based on that. We extend our work by identifying that such reshaping of transaction utilities should be adaptive to the anticipated cluster structure, if there is a specific intended prediction window. We define novel concepts for making such an anticipation and integrate Time Series Forecasting into the evaluation. We perform additional illustrative experiments to demonstrate the application of our improved technique and also discuss future direction for this work.
Piyush Lakhawat, Arun Somani

Knowledge Based System for Composing Sentences to Summarize Documents

This chapter provides the details on how to build a knowledge-based system that is capable of composing new sentences to summarize multiple documents. The system is also capable of identifying the main topics of the given documents and is able to derive new concepts based on the given text data. In order to process the documents conceptually to create abstractive summaries, the system makes use of the Cyc development platform that consists of the world’s largest knowledge base and one of the most powerful inference engines. The resultant knowledge based system first uses natural language processing techniques to extracts syntactic structure of the documents and then maps the words of the sentences into related concepts in the knowledge base. It then uses the inference engine to generalize and fuse concepts to form more abstract concepts. Since a word can be mapped into multiple concepts, the system also includes new techniques to handle word-sense disambiguation by using concept weights. After the generalization, the system is able to identify the main topics and the key concepts of the documents. The system then composes new sentences based on the key concepts by linking subject concepts with their related predicate concepts. The syntactic structure of the newly created sentences extends beyond simple subject-predicate-object triplets by incorporating adjective and adverb modifiers. The final stage is then to map the linked concepts back to words to form the abstractive sentences. The system has been implemented and tested. The implementation encodes a process that consists of seven stages: syntactic analysis, words mapping, concept propagation, concept weights and relations accumulation, topic derivation, subject identification, and new sentence generation. The implementation has been tested on various documents and webpages. The test results showed that the system is capable of creating new sentences that include abstracted concepts not explicitly mentioned in the original documents and that contain information synthesized from different parts of the documents to compose a summary.
Andrey Timofeyev, Ben Choi

A Modified Version of AlQuAnS: An Arabic Language Question Answering System

The challenges of the Arabic language and the lack of resources have made it difficult to provide Arabic Question Answering (QA) systems with high accuracy. These challenges motivated us to propose AlQuAnS-an Arabic Language Question Answering System that gives promising accuracy results. This paper proposes a modified version of AlQuAnS with a higher accuracy. The proposed system enhances the accuracy of the question classification, semantic interpreter and answer extraction modules. The provided performance evaluation study shows that our modified system outperforms other existing Arabic QA systems, especially with the newly introduced answer extraction module.
Ahmed Abdelmegied, Yasmin Ayman, Ahmad Eid, Nagwa El-Makky, Ahmed Fathy, Ghada Khairy, Khaled Nagi, Mohamed Nabil, Mohammed Yousri

Knowledge Engineering and Ontology Development


Ontology in Holonic Cooperative Manufacturing: A Solution to Share and Exchange the Knowledge

Cooperative manufacturing is a new trend in industry, which depends on the existence of a collaborative robot. A collaborative robot is usually a light-weight robot which is capable of operating safely with a human co-worker in a shared work environment. During this cooperation, a vast amount of information is exchanged between the collaborative robot and the worker. This information constructs the cooperative manufacturing knowledge, which describes the production components and environment. In this research, we propose a holonic control solution, which uses the ontology concept to represent the cooperative manufacturing knowledge. The holonic control solution is implemented as an autonomous multi-agent system that exchanges the manufacturing knowledge based on an ontology model. Ultimately, the research illustrates and implements the proposed solution over a cooperative assembly scenario, which involves two workers and one collaborative robot, whom cooperate together to assemble a customized product.
Ahmed R. Sadik, Bodo Urban

Integrating Local and Global Data View for Bilingual Sense Correspondences

This paper presents a method of linking and creating bilingual sense correspondences between English and Japanese noun word dictionaries. We used local and global data views to identify bilingual sense correspondences. Locally, we extracted bilingual noun words by using simple sentence-based similarity. Globally, for each monolingual dictionary, we estimated domain-specific senses by using a textual corpus having category information. The extraction method is based on the sense similarities which are obtained by word embedding learning. We incorporated these data views. More precisely, we assigned a sense to each noun word of the extracted bilingual words keeping domain (category) consistency. We used the WordNet 3.0 and EDR Japanese dictionaries using Reuters and Mainichi Japanese newspaper corpora to evaluate our method. The results showed that the integration of local and global data views improved overall performance and we obtained 318 within the topmost 1,000 bilingual noun senses. Moreover, we found that the extracted bilingual noun senses can be used as a lexical resource for the machine translation as the translation results obtained by using our method was better than those obtained by a bilingual dictionary and slightly better than the results obtained by SYSTRANet.
Fumiyo Fukumoto, Yoshimi Suzuki, Attaporn Wangpoonsarp, Meng Ji

Associative Representation and Processing of Databases Using DASNG and AVB+trees for Efficient Data Access

Today, we have to cope with a great amount of data – BIG data problems. The main issues concerned about BIG data are sparing representation, time efficiency of data access and processing, as well as data mining and knowledge discovery. When dealing with the big amount of data, time is crucial. The most of time for data processing in the contemporary computer science is lost for a various search operation to access appropriate data. This paper presents how data collected in relational databases can be transformed into the associative neuronal graph structures, and how searching operations can be accelerated thanks to the use of aggregation and association of the stored data. To achieve an extraordinary efficiency in data access, this paper introduces new AVB+trees which together with Deep Associative Semantic Neuronal Graphs which can typically allow for constant time access to the stored data. The presented solution allows representing horizontal and vertical relations between data and stored objects, expanding possibilities of relational databases and replacing various search operations by the specific graph structure. Another contribution is the expansion of the aggregation of the duplicates to all data tables which contain the same attributes. In such a way, the presented associative structures simplify and speed up all searching operations in comparison to the classic solutions.
Adrian Horzyk

GeCoLan: A Constraint Language for Reasoning About Ecological Networks in the Semantic Web

Ecological Networks (ENs) describe the structure of existing real ecosystems and help planning their expansion, conservation and improvement. While various mathematical models of ENs have been defined, to our knowledge they focus on simulating ecosystems, but none of them deals with verifying whether any transformation proposals, as those collected in participatory decision-making processes for public policy making, are consistent with land usage restrictions.
As an attempt to fill this gap, we developed a model to represent the specifications for the local planning of ENs in a way that can support both the detection of constraint violations within new proposals of expansion, and the reasoning about improvements of the networks. In line with the GeoSpatial Semantic WEB, our model is based on an OWL ontology for the representation of ENs. Moreover, we define a language, GeCoLan, supporting constraint-based reasoning on semantic data. Even though this paper focuses on EN validation, our language can be employed to enable more complex tasks, such as the generation of proposals for improving ENs.
The present paper describes our ontological specification of ENs, the GeCoLan language for reasoning about specifications, and the tools we developed to support data acquisition and constraint verification on ENs.
Gianluca Torta, Liliana Ardissono, Marco Corona, Luigi La Riccia, Adriano Savoca, Angioletta Voghera

The Linked Data Wiki: Leveraging Organizational Knowledge Bases with Linked Open Data

Building meaningful knowledge bases for organizations like enterprises, NGOs or civil services is still a labor intensive and therefore expensive work, although semantic wiki approaches are already adopted in organizational contexts and corporate environments. One reason is that exploiting knowledge from external sources like other organizational knowledge bases or Linked Open Data as well as sharing knowledge in a meaningful way is difficult due to the lack of a common and shared schema definition. Therefore, redundant work has to be carried out for each new context. To overcome this issue, we introduce Linked Data Wiki, an approach that combines the power of Linked Open Vocabularies and -Data with established organizational semantic wiki systems for knowledge management in order to leverage the knowledge represented in organizational knowledge bases with Linked Open Data. Our approach includes a recommendation system to link concepts of an organizational context to openly published concepts and extract statements from that concepts that leverage the concept definition within the organizational context. The inclusion of potentially uncertain, incomplete, inconsistent or redundant public statements within an organization’s knowledge base poses the challenge of interpreting such data correctly within the respective context.
Matthias T. Frank, Stefan Zander

Social and Community Related Themes in Ontology Evaluation: Findings from an Interview Study

A deep exploration of what the term “quality” implicates in the field of ontology selection and reuse takes us much further than what the literature has mostly focused on, that is the internal characteristics of ontologies. A qualitative study with interviews of ontologists and knowledge engineers in different domains, ranging from biomedical field to manufacturing industry reveals novel social and community related themes, that have long been neglected. These themes include responsiveness of the developer team or organization, knowing and trusting the developer team, regular updates and maintenance, and many others. This paper explores such connections, arguing that community and social aspects of ontologies are generally linked to their quality. We believe that this work represents a significant contribution to the field of ontology evaluation, with the hope that the research community can further draw on these initial findings in developing relevant social quality metrics for ontology evaluation and selection.
Marzieh Talebpour, Martin Sykora, Tom Jackson

Knowledge Management and Information Sharing


Empowering IT Organizations Through a Confluence of Knowledge for Value Integration into the IT Services Firm’s Business Model

Challenges in operationalizing business innovation based on information technology (i.e. advancing new technology from the lab to the business operations) affect the ability of IT organizations to implement and effectively exploit these technologies. In IT services firms, these challenges are often linked to conflicting priorities, integration issues, inadequate infrastructure capabilities and the availability of the required knowledge/skills. Sometimes insurmountable these challenges leave the firm incapable to incorporate emerging information technologies into their business model. At the intersection of knowledge-based theory of the firm and the theory of dynamic capabilities, the study draws insight from the two cases in IT services companies. We seek to understand mechanism required to manage the flow knowledge assets for successful integration of innovation, while assimilating the tacit knowledge of the customer as a major component in the value integration. The study has far‐reaching implications for practice and produces interesting opportunities for further research.
Nabil Georges Badr

How Do Japanese SMEs Generate Digital Business Value from SMACIT Technologies with Knowledge Creation?

This study provides further evidence from Japanese Small and Medium Enterprises (SME) on the capabilities organizations need in order to take advantage of the opportunities that digital technologies (such as Social, Mobile, Analytics, Cloud and IoT or SMACIT) offer. Quantitative data is used to validate and expand previous findings on the relationship between IT, Digital Business Value and Knowledge Creation Capabilities (KCC). KCC is explored as an organizational capability that moderates the value obtained from IT and digital technologies. The level of achievement of business objectives by IT and digital technologies was analyzed using four categories of business objectives from the Balanced Scorecard. The evidence shows that organizations that are able to efficiently apply IT to achieve business objectives can also experience similar results on digital technologies. This implies that in order to be successful with digital technologies, a foundation would be the successful delivery of IT. A deeper analysis was conducted on Knowledge Creation Process as a preliminary study yielded to inconclusive findings suggesting that KCC had a negative impact on business objectives in opposition to what Knowledge-based view may consider and; in opposite to the characteristic that SMACIT technologies are highly dependent on information and could be considered to go hand in hand with how new information and knowledge is combined in order to create new products and services.
Christian Riera, Junichi Iijima

Common Information Systems Maturity Validation Resilience Readiness Levels (ResRLs)

This revised study expands a series of investigations of the resilience readiness levels (ResRLs) of information systems, including their aspects, factors, definitions, criteria, references, and questionnaires. The aim is to contribute to the combined total maturity measures approach and the pre-operational validation of shared and adaptive information services and systems. The overall research question is followed: How can ResRLs be understood in the domain of shared operative information systems and services? The purpose of the study to improve the manner of information systems acceptance, operational validation, pre-order validation, risk assessment, the development of adaptive mechanisms, and the integration of information systems and services by actors and authorities across national borders. The main contribution of the study is in the validation of the maturity of operative information systems regarding their resilience, including the examination of several factors and descriptions of technical resilience. In addition to the validation of maturity, the study expands the revised compatibility of maturity levels by upgrading the ResRLs seven-layer model to the nine-level model according to technological readiness levels (TRLs) and integration readiness levels (IRLs) to improve the responsiveness of the European Operational Concept Validation framework.
Rauno Pirinen


Additional information