Skip to main content

Über dieses Buch

This book constitutes the proceedings of the 19th International Conference on Theory and Practice of Digital Libraries, TPDL 2015, held in Poznań, Poland, in September 2015.

The 22 full papers and 14 poster and demo papers presented in this volume were carefully reviewed and selected from 61 submissions. They were organized in topical sections named: interoperability and information integration; multimedia information management and retrieval and digital curation; personal information management and personal digital libraries; exploring semantic web and linked data; user studies for and evaluation of digital library systems and applications; applications of digital libraries; digital humanities; and social-technical perspectives of digital information.



Interoperability and Information Integration


Web Archive Profiling Through CDX Summarization

With the proliferation of public web archives, it is becoming more important to better profile their contents, both to understand their immense holdings as well as support routing of requests in the Memento aggregator. To save time, the Memento aggregator should only poll the archives that are likely to have a copy of the requested URI. Using the CDX files produced after crawling, we can generate profiles of the archives that summarize their holdings and can be used to inform routing of the Memento aggregator’s URI requests. Previous work in profiling ranged from using full URIs (no false positives, but with large profiles) to using only top-level domains (TLDs) (smaller profiles, but with many false positives). This work explores strategies in between these two extremes. In our experiments, we gained up to 22% routing precision with less than 5% relative cost as compared to the complete knowledge profile without any false negatives. With respect to the TLD-only profile, the registered domain profile doubled the routing precision, while complete hostname and one path segment gave a five fold increase in routing precision.

Sawood Alam, Michael L. Nelson, Herbert Van de Sompel, Lyudmila L. Balakireva, Harihar Shankar, David S. H. Rosenthal

Quantifying Orphaned Annotations in

Web annotation has been receiving increased attention recently with the organization of the Open Annotation Collaboration and new tools for open annotation, such as In this paper, we investigate the prevalence of orphaned annotations, where a live Web page no longer contains the text that had previously been annotated in the annotation system (containing 6281 highlighted text annotations). We found that about 27% of highlighted text annotations can no longer be attached to their live Web pages. Unfortunately, only about 3.5% of these orphaned annotations can be reattached using the holdings of current public web archives. For those annotations that are still attached, 61% are in danger of becoming orphans if the live Web page changes. This points to the need for archiving the target of annotations at the time the annotation is created.

Mohamed Aturban, Michael L. Nelson, Michele C. Weigle

Query Expansion for Survey Question Retrieval in the Social Sciences

In recent years, the importance of research data and the need to archive and to share it in the scientific community have increased enormously. This introduces a whole new set of challenges for digital libraries. In the social sciences typical research data sets consist of surveys and questionnaires. In this paper we focus on the use case of social science survey question reuse and on mechanisms to support users in the query formulation for data sets. We describe and evaluate thesaurus- and co-occurrence-based approaches for query expansion to improve retrieval quality in digital libraries and research data archives. The challenge here is to translate the information need and the underlying sociological phenomena into proper queries. As we can show retrieval quality can be improved by adding related terms to the queries. In a direct comparison automatically expanded queries using extracted co-occurring terms can provide better results than queries manually reformulated by a domain expert and better results than a keyword-based BM25 baseline.

Nadine Dulisch, Andreas Oskar Kempf, Philipp Schaer

Multimedia Information Management and Retrieval and Digital Curation


Practice-Oriented Evaluation of Unsupervised Labeling of Audiovisual Content in an Archive Production Environment

In this paper we report on an evaluation of unsupervised labeling of audiovisual content using collateral text data sources to investigate how such an approach can provide acceptable results given requirements with respect to archival quality, authority and service levels to external users. We conclude that with parameter settings that are optimized using a rigorous evaluation of precision and accuracy, the quality of automatic term-suggestion are sufficiently high. Having implemented the procedure in our production work-flow allows us to gradually develop the system further and also assess the effect of the transformation from manual to automatic from an end-user perspective. Additional future work will be on deploying different information sources including annotations based on multimodal video analysis such as speaker recognition and computer vision.

Victor de Boer, Roeland J. F. Ordelman, Josefien Schuurman

Measuring Quality in Metadata Repositories

The need for good quality metadata records becomes a necessity given the large quantities of digital content that is available through digital repositories and the increasing number of web services that use this content. The context in which metadata are generated and used affects the problem in question and therefore a flexible metadata quality evaluation model that can be easily and widely used has yet to be presented. This paper proposes a robust multidimensional metadata quality evaluation model that measures metadata quality based on five metrics and by taking into account contextual parameters concerning metadata generation and use. An implementation of this metadata quality evaluation model is presented and tested against a large number of real metadata records from the humanities domain and for different applications.

Dimitris Gavrilis, Dimitra-Nefeli Makri, Leonidas Papachristopoulos, Stavros Angelis, Konstantinos Kravvaritis, Christos Papatheodorou, Panos Constantopoulos

Personal Information Management and Personal Digital Libraries


Memsy: Keeping Track of Personal Digital Resources Across Devices and Services

It is becoming increasingly difficult for users to keep track of their personal digital resources given the number of devices and hosting services used to create, process, manage and share them. As a result, personal resources are replicated at different locations and it is often not feasible to keep everything synchronised. In such a distributed setting, the types of questions that users want answers to are: Where is the latest version of this document located? How many versions of this image exist and where are they stored? We introduce the concept of global file histories that can provide users with a unified view of their personal information space across devices and services. As proof-of-concept, we present Memsy, an environment that helps users keep track of their resources. We discuss the technical challenges and present the results of a lab study used to evaluate Memsy’s proposed workflow.

Matthias Geel, Moira C. Norrie

Digital News Resources: An Autoethnographic Study of News Encounters

We analyze a set of 35 autoethnographies of news encounters, created by students in New Zealand. These comprise rich descriptions of the news sources, modalities, topics of interest, and news ‘routines’ by which the students keep in touch with friends and maintain awareness of personal, local, national, and international events. We explore the implications for these insights into news behavior for further research to support digital news systems.

Sally Jo Cunningham, David M. Nichols, Annika Hinze, Judy Bowen

Exploring Semantic Web and Linked Data


On a Linked Data Platform for Irish Historical Vital Records

The Irish Record Linkage 1864-1913 is a multi-disciplinary project aiming to create a platform for analyzing events captured in historical birth, marriage and death records by applying semantic technologies for annotating, storing and inferring information from the data contained in those records. This enables researchers to, for instance, investigate to what extent maternal and infant mortality rates were underreported. We report on the semantic architecture, provide motivation for the adoption of RDF and Linked Data principles, and elaborate on the ontology construction process that was influenced by both the requirements of the digital archivists and historians. Concerns of digital archivists include the preservation of the archival record and following best practices in preservation, cataloguing and data protection. The historians in this project wish to discover certain patterns in those vital records. An important aspect of the semantic architecture is the clear separation of concerns that reflects those requirements - the transcription and archival authenticity of the register pages and the interpretation of the transcribed data - that led to the creation of two distinct ontologies and knowledge bases.

Christophe Debruyne, Oya Deniz Beyan, Rebecca Grant, Sandra Collins, Stefan Decker

Keywords-To-SPARQL Translation for RDF Data Search and Exploration

Linked Data is the most common practice for publishing and sharing information in the Data Web. As new data become available, their exploration is a fundamental step towards integration and interoperability. However, typical search methods as SPARQL queries require knowing both the SPARQL syntax and the vocabulary used in the data. For this reason, keyword-based search has been proposed, allowing an intuitive way for searching an RDF dataset. In this paper, we present a novel approach for keyword search on graph-structured data, and in particular temporal RDF graph, i.e. RDF data that involve temporal properties. Our method, instead of providing answers directly from the RDF data graph, automatically generates a set of candidate SPARQL queries that try to capture users information need as expressed by the keywords used. To support temporal exploration, our method is enriched with temporal operators allowing the user to explore data within predefined time ranges. To evaluate our approach, we perform an effectiveness study using two real-world datasets.

Katerina Gkirtzou, Kostis Karozos, Vasilis Vassalos, Theodore Dalamagas

Author Profile Enrichment for Cross-Linking Digital Libraries

This work aims at enriching author profiles with additional information to better support search and retrieval of publications across different digital libraries. To achieve this objective we exploit concepts for cross-linking data to identify correlations between one author and other authors, publications or other related information. We will introduce a profile enrichment approach which adds additional information (e.g. biographic information) from different sources to existing author profiles. Within this context, the linked open data repository DBpedia serves a valuable source for our profile enrichment approach. Still, one of several challenges in this context is the identification of the same author in different sources. To address this challenge we will exploit VIAF (virtual authority file) for author identification. Technically we apply data mining and clustering techniques to uniquely identify authors.

Arben Hajra, Vladimir Radevski, Klaus Tochtermann

User Studies for and Evaluation of Digital Library Systems and Applications


On the Impact of Academic Factors on Scholar Popularity: A Cross-Area Study

In this paper we assess the relative importance of key academic factors - conference papers, journal articles and student supervisions - on the popularity of scholars in various knowledge areas, including areas of exact and biological sciences. To that end, we rely on curriculum vitae data of almost 700 scholars affiliated to 17 top quality graduate programs of two of the largest universities in Brazil, as well as popularity measures crawled from a large digital library, covering a 16-year period. We use correlation analysis to assess the relative importance of each factor to the popularity of individual scholars and groups of scholars affiliated to the same program. We contrast our results with those of two top programs of a major international institution, namely, the Computer Science and Medicine departments of the Stanford University.

Pablo Figueira, Gabriel Pacheco, Jussara M. Almeida, Marcos A. Gonçalves

A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems

The evaluation of recommender systems is key to the successful application of recommender systems in practice. However, recommendersystems evaluation has received too little attention in the recommender-system community, in particular in the community of research-paper recommender systems. In this paper, we examine and discuss the appropriateness of different evaluation methods, i.e. offline evaluations, online evaluations, and user studies, in the context of research-paper recommender systems. We implemented different content-based filtering approaches in the research-paper recommender system of Docear. The approaches differed by the features to utilize (terms or citations), by user model size, whether stop-words were removed, and several other factors. The evaluations show that results from offline evaluations sometimes contradict results from online evaluations and user studies. We discuss potential reasons for the non-predictive power of offline evaluations, and discuss whether results of offline evaluations might have some inherent value. In the latter case, results of offline evaluations were worth to be published, even if they contradict results of user studies and online evaluations. However, although offline evaluations theoretically might have some inherent value, we conclude that in practice, offline evaluations are probably not suitable to evaluate recommender systems, particularly in the domain of research paper recommendations. We further analyze and discuss the appropriateness of several online evaluation metrics such as click-through rate, link-through rate, and cite-through rate.

Joeran Beel, Stefan Langer

Connecting Emotionally: Effectiveness and Acceptance of an Affective Information Literacy Tutorial

Recent developments in affective computing have provided more options in the way online education can be delivered. However, research on how affective computing can be used in online education is lacking. The research objectives are twofold: to investigate the influence of affective EAs on students’ motivation, enjoyment, learning efficacy and intention to use, and to uncover factors influencing their intention to use an online tutorial with affective EAs. To achieve this, 190 tertiary students participated in a between-subjects experiment (text-only vs. affective-EAs). Students benefited from the affective EAs in the tutorial as indicated by the increased learning motivation and enjoyment. Moreover, relevance, confidence, satisfaction, affective enjoyment, and behavioral enjoyment were found to be significant predictors for intention to use.

Yan Ru Guo, Dion Hoe-Lian Goh

Applications of Digital Libraries


A Survey of FRBRization Techniques

The Functional Requirements for Bibliographic Records (FRBR), an emerging model in the bibliographic domain, provide interesting possibilities in terms of cataloguing, representation and semantic enrichment of bibliographic data. However, the automated transformation of existing catalogs to fit this model is a requirement towards a wide adoption of FRBR in libraries. The cultural heritage community proposed a notable amount of FRBRization tools and projects, thus making it difficult for practitioners to compare and evaluate them. In this paper, we propose a synthetic and relevant classification of the FRBRization techniques according to specific criteria of comparison such as model expressiveness or specific enhancements.

Joffrey Decourselle, Fabien Duchateau, Nicolas Lumineau

Are There Any Differences in Data Set Retrieval Compared to Well-Known Literature Retrieval?

Digital libraries are nowadays expected to contain more than books and articles. All relevant sources of information for a scholar should be available, including research data. However, does literature retrieval work for data sets as well? In the context of a requirement analysis of a data catalogue for quantitative Social Science research data, we tried to find answers to this question. We conducted two user studies with a total of 53 participants and found similarities and important differences in the users’ needs when searching for data sets in comparison to those already known in literature search. In particular, quantity and quality of metadata are far more important in data set search than in literature search, where convenience is most important. In this paper, we present the methodology of these two user studies, their results and challenges for data set retrieval system that can be derived thereof. One of our key findings is that for empirical social scientists, the choice of research data is more relevant than the choice of literature; therefore they are willing to put more effort into the retrieval process. Due to our choice of use case, our initial findings are limited to the field of Social Sciences. However, because of the similar characteristics for data sets also in other research areas, such as Economics, we assume that our results are applicable for them as well.

Dagmar Kern, Brigitte Mathiak

tc-index: A New Research Productivity Index Based on Evolving Communities

Digital Libraries are used on contexts beyond organization, archival and search. Here, we use them to extract bibliography data for proposing a new productivity index that emphasizes the venue and the year of the publication. Also, it changes the evaluation perspective from a researcher alone (index based on one’s own publications) to one’s contribution to a whole community. Overall, our results show that the new index considers researchers’ features that other well known indexes disregard, which allows a broader researchers’ productivity analysis.

Thiago H. P. Silva, Ana Paula Couto da Silva, Mirella M. Moro

Digital Humanities


Detecting Off-Topic Pages in Web Archives

Web archives have become a significant repository of our recent history and cultural heritage. Archival integrity and accuracy is a precondition for future cultural research. Currently, there are no quantitative or content-based tools that allow archivists to judge the quality of the Web archive captures. In this paper, we address the problems of detecting off-topic pages in Web archive collections. We evaluate six different methods to detect when the page has gone off-topic through subsequent captures. Those predicted off-topic pages will be presented to the collection’s curator for possible elimination from the collection or cessation of crawling. We created a gold standard data set from three Archive- It collections to evaluate the proposed methods at different thresholds. We found that combining cosine similarity at threshold 0.10 and change in size using word count at threshold -0.85 performs the best with accuracy = 0.987, F


score = 0.906, and AUC = 0.968. We evaluated the performance of the proposed method on several Archive-It collections. The average precision of detecting the off-topic pages is 0.92.

Yasmin AlNoamany, Michele C. Weigle, Michael L. Nelson

Supporting Exploration of Historical Perspectives Across Collections

The ever growing number of textual historical collections calls for methods that can meaningfully connect and explore these. Different collections offer different perspectives, expressing views at the time of writing or even a subjective view of the author. We propose to connect heterogeneous digital collections through temporal references found in documents as well as their textual content. We evaluate our approach and find that it works very well on digital-native collections. Digitized collections pose interesting challenges and with improved preprocessing our approach performs well. We introduce a novel search interface to explore and analyze the connected collections that highlights different perspectives and requires little domain knowledge. In our approach, perspectives are expressed as complex queries. Our approach supports humanity scholars in exploring collections in a novel way and allows for digital collections to be more accessible by adding new connections and new means to access collections.

Daan Odijk, Cristina Gârbacea, Thomas Schoegje, Laura Hollink, Victor de Boer, Kees Ribbens, Jacco van Ossenbruggen

Impact Analysis of OCR Quality on Research Tasks in Digital Archives

Humanities scholars increasingly rely on digital archives for their research instead of time-consuming visits to physical archives. This shift in research method has the hidden cost of working with digitally processed historical documents: how much trust can a scholar place in noisy representations of source texts? In a series of interviews with historians about their use of digital archives, we found that scholars are aware that optical character recognition (OCR) errors may bias their results. They were, however, unable to quantify this bias or to indicate what information they would need to estimate it. This, however, would be important to assess whether the results are publishable. Based on the interviews and a literature study, we provide a classification of scholarly research tasks that gives account of their susceptibility to specific OCRinduced biases and the data required for uncertainty estimations. We conducted a use case study on a national newspaper archive with example research tasks. From this we learned what data is typically available in digital archives and how it could be used to reduce and/or assess the uncertainty in result sets. We conclude that the current knowledge situation on the users’ side as well as on the tool makers’ and data providers’ side is insufficient and needs to be improved.

Myriam C. Traub, Jacco van Ossenbruggen, Lynda Hardman

Social-Technical Perspectives of Digital Information


Characteristics of Social Media Stories

An emerging trend in social media is for users to create and publish “stories”, or curated lists of web resources with the purpose of creating a particular narrative of interest to the user. While some stories on the web are automatically generated, such as Facebook’s “Year in Review”, one of the most popular storytelling services is “Storify”, which provides users with curation tools to select, arrange, and annotate stories with content from social media and the web at large. We would like to use tools like Storify to present automatically created summaries of archival collections. To support automatic story creation, we need to better understand as a baseline the structural characteristics of popular (i.e., receiving the most views) human-generated stories. We investigated 14,568 stories from Storify, comprising 1,251,160 individual resources, and found that popular stories (i.e., top 25% of views normalized by time available on the web) have the following characteristics: 2/28/1950 elements (min/median/max), a median of 12 multimedia resources (e.g., images, video), 38% receive continuing edits, and 11% of the elements are missing from the live web.

Yasmin AlNoamany, Michele C. Weigle, Michael L. Nelson

Tyranny of Distance: Understanding Academic Library Browsing by Refining the Neighbour Effect

Browsing is a part of book seeking that is important to readers, poorly understood, and ill supported in digital libraries. In earlier work, we attempted to understand the impact of browsing on book borrowing by examining whether books near other loaned books were more likely to be loaned themselves, a phenomenon we termed the neighbour effect. In this paper we further examine the neighbour effect, looking specifically at size, interaction with search and topic boundaries, increasing our understanding of browsing behaviour.

Dana McKay, George Buchanan, Shanton Chang

The Influence and Interrelationships Among Chinese Library and Information Science Journals in Taiwan

This study aims to investigate the influences and interrelationships between journals of library and information science in Taiwan, in terms of information flow. Eleven Chinese journals and 2,031 articles during from 2001 to 2012 have been selected as subject and an 11 × 11 matrix was generated to conduct journal-to-journal analysis. Several bibliometric indicators proposed by Xhignesse and Osgood [16] have been examined, including indegree, outdegree, sending-receiving and self-feeding ratios. Degree and betweenness centrality of social network analysis have also employed to investigate the central and brokerage position of eleven journals in terms of network structure. In addition to overall structured analysis of twelve years, this study has furthering separated 12 years into three individual periods of four years to conduct a both synchronic and diachronic journal-to-journal citation analysis. Finally, this study discussed the implications and limitation of this study for Chinese journals of library and information science in Taiwan.

Ya-Ning Chen, Hui-Hsin Yeh, Po-Jui Lai

Poster and Demo Papers


An Experimental Evaluation of Collaborative Search Result Division Strategies

Collaboration during information retrieval has been identified by many empirical studies as a common pattern of teams in everyday work, e.g. [5]. This collaboration is characterized by two or more individuals, who set out together to resolve a shared information need [4]. In this paper, we present an experimental evaluation of different search result division strategies in simulated collaborative search tasks. We compare our proposed approach, which defines optimum collaboration strategies as integer linear problem, with proven principles like, e.g., the PRP.

Thilo Böhm, Claus-Peter Klas, Matthias Hemmje

State-of-the-Art of Open Access Textbooks and Their Implications for Information Provision

The skyrocketing price of textbooks is beyond students’ affordability. Educators and teachers have taken an open access approach to textbooks to address this issue. This study aimed to investigate open access textbooks from the perspectives of source of provision, use license, mode of use, file format, and business model in terms of information provision. Eighteen use cases of open access textbooks were selected for systematic review by two researchers. Future suggestions for open access textbooks are also discussed.

Ya-Ning Chen

Adaptive Information Retrieval Support for Multi-session Information Tasks

Goals and corresponding tasks are both major drivers of information needs which are satisfied within information behaviours and lead to information tasks and finally information retrieval. Changing goals or a changing information need and task interruption during execution are the two most challenging barriers of people working on complex and longitudinal goals or tasks. People often have problems re-locating already visited pages or recapturing previous queries and their results. Thus the support of task-based information retrieval is still not sufficient today. Due to this we examined new concepts that make task-based information retrieval more efficient and useful.

Daniel Backhausen, Claus-Peter Klas, Matthias Hemmje

Transformation of a Library Catalogue into RDA Linked Open Data

The 200,000 records in the catalogue of the Biblioteca Virtual Miguel de Cervantes have been migrated to a new relational database whose data model adheres to the FRBR and FRAD specifications. The database content has been later mapped to RDF triples which employ the RDA vocabulary to describe the entities, as well as their properties and relationships. The intermediate relational model—ensuring, for example, referential integrity—provides tighter control over the process and, therefore, enhanced validation of the output.ThisRDF-based semantic description of the catalogue is now accessible online and supports browsing and searching the information.

Gustavo Candela, Pilar Escobar, Manuel Marco-Such, Rafael C. Carrasco

Segmenting Oral History Transcripts

Dividing oral histories into topically coherent segments can make them more accessible online. People regularly make judgments about where coherent segments can be extracted from oral histories. But when different people are asked to extract coherent segments from the same oral histories, they often do not agree about where such segments begin and end.

Ryan Shaw

Digital Libraries Unfurled: Supporting the New Zealand Flag Debate

This article reports on the development of an interactive web environment, backed by a digital library, that supports the creation of new flag designs. Specifically, it supports the user through an iterative design process, guided by principles drawn from the field of Vexillology. The work has been motivated by a legally binding referendum on the issue in New Zealand, planned to occur in late 2015/early 2016.

Brandon M. Thomas, Joanna M. Stewart, David Bainbridge, David M. Nichols, William J. Rogers, Geoff Holmes

Evaluating Auction Mechanisms for the Preservation of Cost-Aware Digital Objects Under Constrained Digital Preservation Budgets

This is a novel approach to managing costs in digital preservation, one that takes advantage of an object-centric approach developed by means of self-preserving digital objects whereby the objects manage their preservation not only by maximizing the chances of avoiding obsolescence but also doing it at a minimum cost. To accomplish this, we assign a budget that the objects manage to achieve the necessary preservation services at a given cost. Several strategies apply, such as maximal preservation service at all costs or burn low even if the preservation is not perfect. We explore optimizing the budget of self-preserving digital objects through micro-negotiations of objects and services, expecting accurate balance of costs and quality of preservation. Specifically, in negotiation, we will explore the price-based algorithms that are the electronic auctions, notably the combinatorial and multi-unit auctions. We compare the expected lifetime of digital objects with the two electronic auction algorithms with the aim of deciding in what conditions these algorithms apply and deliver good results. In all, this work, exploratory in nature, studies a bottom-up approach of cost management in digital preservation, contrary to the prevailing top-down approach of the state of the art, using e-auctions.

Jose Antonio Olvera, Paulo Nicolás Carrillo, Josep Lluis de la Rosa

Mobile Annotation of Geo-locations in Digital Books

This demo paper introduces an editor for manual annotation of locations in digital books, using a crowd-sourcing approach. It is the first of its kind and allows book lovers and literary travel enthusiasts to annotate the locations in their digital books on-the-go. We show both a mobile and a desktop version, and briefly explain the linkage to the Digital Library that is holding the digital books.

Annika Hinze, Haley Littlewood, David Bainbridge

Teaching Machine Learning: A Geometric View of Naïve Bayes

In this demo, we present two applications which allow users to ‘see’ a geometric interpretation of the Bayes’ rule and interact with a Naïve Bayes text classifier on a real dataset, namely the Reuters-21578 newswire collection. The main objective of this demo is to show how the pattern recognition capabilities of the human increase the effectiveness of the classifier even when technical details are not known in advance or the user is not an expert in the field. These two applications were developed with the R package Shiny; they have been deployed online and they are freely accessible from the links indicated in the paper.

Giorgio Maria Di Nunzio

Study About the Capes Portal of E-Journals Non-users

This study investigated the non-users of the CAPES Portal of E-Journals, a governmental initiative to offer free access to e-journals to consortiated federal educational and research institutions in Brazil. The research used a, mostly, quantitative research methodology, which collected some qualitative data through a Web Survey. 16.1 that they did not use the Portal. These non-users were asked (1) which were the reasons which led to non-use, (2) if there were other electronic information sources they used, and (3) if they would use the Portal in case the barriers identified for the non-use were remedied. The results show that the non-use of the Portal is caused mainly because respondents lack information about its existence (24.5 responses) and prefer printed journals (11.6 finding is that 82.1 indicated by them as the cause of non-use were solved. The study contributes to the scarce literature on non-users of digital libraries and presents recommendations to improve use based on the results obtained.

Wesley Rodrigo Fernandes, Beatriz Valadares Cendón

Czech Digital Library - Big Step to the Aggregation of Digital Content in the Czech Republic

It is necessary to solve interoperability, compatibility of standards and even legal issues to secure aggregation of the digital content on the national level. Activities in the Czech Republic aim to create environment based on open source software to achieve such goals. New tools and systems are developed to secure complex digitization processes including digital data processing, workflow monitoring, archiving and providing access to e-content in digital libraries. It is very cost effective to provide open source solutions for digital data production and dissemination on national level. Usage of the same solutions between culture heritage institutions results also in sharing the same data and metadata standards which is advantage in aggregation process.

Tomas Foltyn, Martin Lhotak

MirPub v2: Towards Ranking and Refining miRNA Publication Search Results

In recent years, many articles studying microRNA (miRNA) molecules and their connection to diseases have been published. However, the wide range of literature in life sciences, raises a barrier in extracting useful information from them. MirPub is a search engine that resolves this issue, providing lists of articles related to particular miRNA terms and useful filters to customise them. In this work, we extend mirPub by utilising publication ranking methods to provide insights about the importance of each publication. Moreover, we automatically identify the species referred to in each publication to serve researchers studying particular species.

Ilias Kanellos, Vasiliki Vlachokyriakou, Thanasis Vergoulis, Georgios Georgakilas, Yannis Vassiliou, Artemis K. Hatzigeorgiou, Theodore Dalamagas

A Proposal for Autonomous Scientific Publishing Agent

A proposal for implementation of autonomous agent model in scientific publishing is presented along with general description of the business framework for its operations and ideas for underlying technological solutions. Drawing on general ideas that have been put forth in other industries the proposal deals with specifics of contemporary scientific publishing and pinpoints the aspects that may be improved by application of autonomous agent model. A proposal for the new approach to scientific publishing is based on available technologies providing possibilities for new ownership model. Software that owns itself and is an independent economic entity that runs operations of a scientific publisher and hires necessary human workforce is technologically viable. Some aspects of bitcoin and other concepts important for building the system described in the proposal are explained. The proposal for this novel approach to scientific publishing also deals with various economic and ethical issues that may arise in its use.

Adam Sofronijevic, Aleksandar Jerkov, Dejana Kavaja Stanisic

Extracting a Topic Specific Dataset from a Twitter Archive

Datasets extracted from the microblogging service Twitter are often generated using specific query terms or hashtags. We describe how a dataset produced using the query term ‘syria’ can be increased in size to include tweets on the topic of Syria that do not contain that query term. We compare three methods for this task, using the top hashtags from the set as search terms, using a hand selected set of hashtags as search terms and using LDA topic modelling to cluster tweets and selecting appropriate clusters. We describe an evaluation method for accessing the relevance and accuracy of the tweets returned.

Clare Llewellyn, Claire Grover, Beatrice Alex, Jon Oberlander, Richard Tobin


Weitere Informationen

Premium Partner