main-content

These transactions publish research in computer-based methods of computational collective intelligence (CCI) and their applications in a wide range of fields such as the semantic Web, social networks, and multi-agent systems. TCCI strives to cover new methodological, theoretical and practical aspects of CCI understood as the form of intelligence that emerges from the collaboration and competition of many individuals (artificial and/or natural). The application of multiple computational intelligence technologies, such as fuzzy systems, evolutionary computation, neural systems, consensus theory, etc., aims to support human and other collective intelligence and to create new forms of CCI in natural and/or artificial systems. This twenty-sixth issue is a special issue with selected papers from the First International KEYSTONE Conference 2015 (IKC 2015), part of the keystone COST Action IC1302.

### Professional Collaborative Information Seeking: Towards Traceable Search and Creative Sensemaking

Abstract
The development of systems to support collaborative information seeking is a challenging issue for many reasons. Besides the expected support of an individual user in tasks such as keyword based query formulation, relevance judgement, result set organization and summarization, the smooth exchange of search related information within a team of users seeking information has to be supported. This imposes strong requirements on visualization and interaction to enable user to easily trace and interpret the search activities of other team members and to jointly make sense of gathered information in order to satisfy an initial information need. In this paper, we briefly motivate specific requirements with a focus on collaborative professional search, review existing work and propose an adapted model for professional collaborative information seeking. In addition, we discuss the results of a use case study and point out major challenges in professional collaborative search. Finally, we briefly introduce a system that has been specifically developed to support collaborative technology search.
Dominic Stange, Michael Kotzyba, Andreas Nürnberger

### Exploiting Linguistic Analysis on URLs for Recommending Web Pages: A Comparative Study

Abstract
Nowadays, citizens require high level quality information from public institutions in order to guarantee their transparency. Institutional websites of governmental and public bodies must publish and keep updated a large amount of information stored in thousands of web pages in order to satisfy the demands of their users. Due to the amount of information, the “search form”, which is typically available in most such websites, is proven limited to support the users, since it requires them to explicitly express their information needs through keywords. The sites are also affected by the so-called “long tail” phenomenon, a phenomenon that is typically observed in e-commerce portals. The phenomenon is the one in which not all the pages are considered highly important and as a consequence, users searching for information located in pages that are not condiered important are having a hard time locating these pages.
The development of a recommender system than can guess the next best page that a user wouild like to see in the web site has gained a lot of attention. Complex models and approaches have been proposed for recommending web pages to individual users. These approached typically require personal preferences and other kinds of user information in order to make successful predictions.
In this paper, we analyze and compare three different approaches to leverage information embedded in the structure of web sites and the logs of their web servers to improve the effectiveness of web page recommendation. Our proposals exploit the context of the users’ navigations, i.e., their current sessions when surfing a specific web site. These approaches do not require either information about the personal preferences of the users to be stored and processed, or complex structures to be created and maintained. They can be easily incorporated to current large websites to facilitate the users’ navigation experience. Last but not least, the paper reports some comparative experiments using a real-world website to analyze the performance of the proposed approaches.
Sara Cadegnani, Francesco Guerra, Sergio Ilarri, María del Carmen Rodríguez-Hernández, Raquel Trillo-Lado, Yannis Velegrakis, Raquel Amaro

### Large Scale Knowledge Matching with Balanced Efficiency-Effectiveness Using LSH Forest

Abstract
Evolving Knowledge Ecosystems were proposed to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investigate the use of LSH Forest (a self-tuning indexing schema based on locality-sensitive hashing) for solving the problem of placing new knowledge tokens in the right contexts of the environment. We argue and show experimentally that LSH Forest possesses required properties and could be used for large distributed set-ups. Further, we show experimentally that for our type of data minhashing works better than random hyperplane hashing. This paper is an extension of the paper “Balanced Large Scale Knowledge Matching Using LSH Forest” presented at the International Keystone Conference 2015.
Michael Cochez, Vagan Terziyan, Vadim Ermolayev

### Keyword-Based Search of Workflow Fragments and Their Composition

Abstract
Workflow specification, in science as in business, can be a difficult task, since it requires a deep knowledge of the domain to be able to model the chaining of the steps that compose the process of interest, as well as awareness of the computational tools, e.g., services, that can be utilized to enact such steps. To assist designers in this task, we investigate in this paper a methodology that consists in exploiting existing workflow specifications that are stored and shared in repositories, to identify workflow fragments that can be re-utilized and re-purposed by designers when specifying new workflows. Specifically, we present a method for identifying fragments that are frequently used across workflows in existing repositories, and therefore are likely to incarnate patterns that can be reused in new workflows. We present a keyword-based search method for identifying the fragments that are relevant for the needs of a given workflow designer. We go on to present an algorithm for composing the retrieved fragments with the initial (incomplete) workflow that the user designed, based on compatibility rules that we identified, and showcase how the algorithm operates using an example from eScience.
Khalid Belhajjame, Daniela Grigori, Mariem Harmassi, Manel Ben Yahia

### Scientific Footprints in Digital Libraries

Abstract
In recent years, members of the academic community have increasingly turned to digital libraries to follow the latest work within their own field and to estimate papers’, journals’ and researchers’ impact. Yet, despite the powerful indexing and searching tools available, identifying the most important works and authors in a field remains a challenging task, for which a wealth of prior information is needed; existing systems fail to identify and incorporate in their results information regarding connections between publications of different disciplines. In this paper we analyze citation lists in order to not only quantify but also understand impact, by tracing the “footprints” that authors have left, i.e. the specific areas in which they have made an impact. We use the publication medium (specific journal or conference) to identify the thematic scope of each paper and feed from existing digital libraries that index scientific activity, namely Google Scholar and DBLP. This allows us to design and develop a system, the Footprint Analyzer, that can be used to successfully identify the most prominent works and authors for each scientific field, regardless of whether their own research is limited to or even focused on the specific field. Various real life examples demonstrate the proposed concepts and actual results from the developed system’s operation prove the applicability and validity.
Claudia Ifrim, Xenia Koulouri, Manolis Wallace, Florin Pop, Mariana Mocanu, Valentin Cristea

### Mining and Using Key-Words and Key-Phrases to Identify the Era of an Anonymous Text

Abstract
This study is trying to determine the time-frame in which the author of a given document lived. The documents are rabbinic documents written in Hebrew-Aramaic languages. The documents are undated and do not contain a bibliographic section, which leaves us with an interesting challenge. To do this, we define a set of key-phrases and formulate various types of rules: “Iron-clad”, Heuristic and Greedy, to define the time-frame. These rules are based on key-phrases and key-words in the documents of the authors. Identifying the time-frame of an author can help us determine the generation in which specific documents were written, can help in the examination of documents, i.e., to conclude if documents were edited, and can also help us identify an anonymous author. We tested these rules on two corpora containing responsa documents. The results are promising and are better for the larger corpus than for the smaller corpus.
Dror Mughaz, Yaakov HaCohen-Kerner, Dov Gabbay

### Toward Optimized Multimodal Concept Indexing

Abstract
Information retrieval on the (social) web moves from a pure term-frequency-based approach to an enhanced method that includes conceptual multimodal features on a semantic level. In this paper, we present an approach for semantic-based keyword search and focus especially on its optimization to scale it to real-world sized collections in the social media domain. Furthermore, we present a faceted indexing framework and architecture that relates content to semantic concepts to be indexed and searched semantically. We study the use of textual concepts in a social media domain and observe a significant improvement from using a concept-based solution for keyword searching. We address the problem of time-complexity that is a critical issue for concept-based methods by focusing on optimization to enable larger and more real-world style applications.
Navid Rekabsaz, Ralf Bierig, Mihai Lupu, Allan Hanbury

### Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

Abstract
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named entity recognition. Documents in this geological database are described by a summary report, and other data, such as title, domain, keywords, abstract, and geographical location. These metadata were used for generating a bag of words for each document with the aid of morphological dictionaries and transducers. Named entities within metadata were also recognized with the help of a rule-based system. Both the bag of words and the metadata were then used for pre-indexing each document. A combination of several $$tf\_idf$$ based measures was applied for selecting and ranking of retrieval results of indexed documents for a specific query and the results were compared with the initial retrieval system that was already in place. In general, a significant improvement has been achieved according to the standard information retrieval performance measures, where the InQuery method performed the best.
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović

### Domain-Specific Modeling: A Food and Drink Gazetteer

Abstract
Our goal is to build a Food and Drink (FD) gazetteer that can serve for classification of general, FD-related concepts, efficient faceted search or automated semantic enrichment. Fully supervised design of domain-specific models ex novo is not scalable. Integration of several ready knowledge bases is tedious and does not ensure coverage. Completely data-driven approaches require a large amount of training data, which is not always available. For general domains (such as the FD domain), re-using encyclopedic knowledge bases like Wikipedia may be a good idea. We propose here a semi-supervised approach that uses a restricted Wikipedia as a base for the modeling, achieved by selecting a domain-relevant Wikipedia category as root for the model and all its subcategories, combined with expert and data-driven pruning of irrelevant categories.
Andrey Tagarev, Laura Toloşi, Vladimir Alexiev

### What’s New? Analysing Language-Specific Wikipedia Entity Contexts to Support Entity-Centric News Retrieval

Abstract
Representation of influential entities, such as celebrities and multinational corporations on the web can vary across languages, reflecting language-specific entity aspects, as well as divergent views on these entities in different communities. An important source of multilingual background knowledge about influential entities is Wikipedia—an online community-created encyclopaedia—containing more than 280 language editions. Such language-specific information could be applied in entity-centric information retrieval applications, in which users utilise very simple queries, mostly just the entity names, for the relevant documents. In this article we focus on the problem of creating language-specific entity contexts to support entity-centric, language-specific information retrieval applications. First, we discuss alternative ways such contexts can be built, including Graph-based and Article-based approaches. Second, we analyse the similarities and the differences in these contexts in a case study including 219 entities and five Wikipedia language editions. Third, we propose a context-based entity-centric information retrieval model that maps documents to aspect space, and apply language-specific entity contexts to perform query expansion. Last, we perform a case study to demonstrate the impact of this model in a news retrieval application. Our study illustrates that the proposed model can effectively improve the recall of entity-centric information retrieval while keeping high precision, and provide language-specific results.
Yiwei Zhou, Elena Demidova, Alexandra I. Cristea