This book constitutes the thoroughly refereed proceedings of the 11th Italian Research Conference on Digital Libraries, IRCDL 2015, held in Bozen-Bolzano, Italy, in January 2015.

The 13 full papers, 4 short papers and 2 invited poster papers presented were carefully selected from 19 submissions. They are organized under the following five categories: semantic modeling; projects; models and applications; content analysis; and digital libraries infrastructures. The papers deal with numerous multidisciplinary aspects ranging from computer science to humanities in the broader sense, including research areas such as archival and library information sciences; information management systems; semantic technologies; information retrieval; new knowledge environments; new organizational/business models.



Invited Talk


10 Years of IRCDL: 2005–2014 (Invited Paper)

This paper reports on the presentation made during the “11th Italian Research Conference on Digital Libraries (IRCDL 2015)” held in Bolzano-Bozen, 29–30 January 2015, in the main building of the Free University of Bozen-Bolzano. The presentation was given during the conference special session devoted to the report on the first ten editions of Italian Research Conference on Digital Libraries (IRCDL). This paper gives an account on the general aspects of the history of IRCDL, whereas the report on the research topics and trends over time, and over the research community that has been built around IRCDL over ten years, can be found in [4].
Maristella Agosti

Semantic Modeling


Description Logics for Documentation

Much of the activity in a digital library revolves around collecting, organizing and publishing knowledge about the resources of the library, in the form of metadata records. In order to document such activity, digital librarians need to express knowledge about the metadata records they produce. This knowledge, which we call documentation knowledge, may express e.g., provenance, trustability, or access restrictions of the records. Today, documentation knowledge is mostly represented in digital libraries via RDF. We propose a new type of information system, called documentation system, as a basic component of a digital library allowing to represent and reason about both domain and documentation knowledge in an expressive language such as OWL.
Carlo Meghini

Towards a Semantic Web Enabled Representation of DL Foundational Models: The Quality Domain Example

The convergence of Libraries, Archives and Museums (LAM) has been a topic of much discussion in the \(Digital \, Library \, (DL)\) research field, but their similarities and common points are not yet fully exploited in existing formal models for DL such as the \(Streams, \, Structures, \, Spaces, \, Scenarios, \, Societies \, (\textit{5}S)\) model or the DELOS Reference Model.
On the other hand, Semantic Web and Linked Data technology are nowadays mostly used for interoperability at the data level but they would represent a viable option for building a semantic representation and interoperability at the level of different DL models of themselves.
To this end, we discuss a quite ambitious goal that should be part of the DL agenda that is expressing foundational models of DL by means of ontologies which leverage Semantic Web and Linked Data technologies and which link them to the ontologies currently used for publishing cultural heritage data. This would pave the way for a deeper interoperability among DL systems and lower the barriers between LAMs.
In this paper we exemplify this proposal by focusing on the quality domain which is a fundamental aspect in the DL universe and we show how this part of the DELOS Reference model can be expressed via a \(Resource \, Description \, Framework \, (RDF)\) model ready to be used in a Semantic Web environment for interoperability at the DL model level and not only at the data level.
Nicola Ferro, Gianmaria Silvello

A Semantic Model for Content Description in the Sapienza Digital Library

In this paper is presented the semantic model defined for descriptive metadata of resources, managed by the Sapienza Digital Library. The semantic model is derived from the Metadata Object Descriptive Schema, a digital library descriptive standard, for library applications. The semantic model can be used as top level conceptual reference model, in order to support the implementation of semantic web technologies for digital library’s descriptive metadata. The semantic model is intended to be agnostic about the technology system to be adopted. The creation of resources’ connections toward the linked data cloud, as well as the opportunity of exploiting the potential of services based on the ontology use, will rely on a well-defined semantic model, which has been widely tested by the implementation of a descriptive metadata profile.
Angela Di Iorio, Marco Schaerf

Text Encoding Initiative Semantic Modeling. A Conceptual Workflow Proposal

In this paper we present a proposal for the XML TEI semantic enhancement, through an ontological modelization based on a three level approach: an ontological generalization of the TEI schema; an intensional semantics of TEI elements; an extensional semantics of the markup content. A possible TEI enhancement will be the result of these three levels dialogue and combination. We conclude with the ontology mapping issue and a Linked Open Data suggestion for digital libraries based on XML TEI semantically enriched model.
Fabio Ciotti, Marilena Daquino, Francesca Tomasi

Structured Descriptions of Roles, Activities, and Procedures in the Roman Constitution

A highly structured description of entities and events in histories can support flexible exploration of those histories by users and, ultimately, support richly-linked full-text digital libraries. Here, we apply the Basic Formal Ontology (BFO) to structure a passage about the Roman Constitution from Gibbon’s Decline and Fall of the Roman Empire. Specifically, we consider the specification of Roles such as Consul, Activities associated with those Roles, and Procedures for accomplishing those Activities.
Yoonmi Chu, Robert B. Allen



The AstroBID: Preserving and Sharing the Italian Astronomical Heritage

The cultural heritage of the National Institute for Astrophysics (INAF), made of rare and modern Books, Instruments and archival Documents, the AstroBID, marks the milestones in the history of astronomy in Italy. INAF, in collaboration with the Department of Physics and Astronomy of the University of Bologna, has developed a project to preserve, digitize, and valorize its patrimony by creating a web portal Polvere di Stelle. It shows the cultural heritage of 12 libraries and historical archives, and 13 instrument collections, and allows both academics and a wider audience to search simultaneously the AstroBID materials.
Mauro Gargano, Antonella Gasperini, Emilia Olostro Cirella, Riccardo Smareglia, Valeria Zanini

The EAGLE Europeana Network of Ancient Greek and Latin Epigraphy: A Technical Perspective

The project EAGLE (Europeana network of Ancient Greek and Latin Epigraphy, a Best Practice Network partially funded by the European Commission) aims at aggregating epigraphic material provided by some 15 different epigraphic archives (about 80 % of the classified epigraphic material from the Mediterranean area) for ingestion to Europeana. The collected material will be made available also to the scholarly community and to the general public, for research and cultural dissemination. This paper briefly presents the main services provided by EAGLE and the challenges encountered for the aggregation of material coming from heterogeneous archives (different data models and metadata schemas, and exchange formats). EAGLE has defined a common data model for epigraphic information, into which data models from different archives can be optimally mapped. The data infrastructure is based on the D-NET software toolkit, capable of dealing with data collection, mapping, cleaning, indexing, and access provisioning through web portals or standard access protocols.
Andrea Mannocci, Vittore Casarosa, Paolo Manghi, Franco Zoppi

The TRAME Project – Text and Manuscript Transmission of the Middle Ages in Europe

TRAME is a research infrastructure for medieval manuscripts. The TRAME engine scans a set of sources for searched terms and retrieves links to a wide range of possible information, from simple reference, to detailed manuscript record, to full text transcriptions. Currently, it is possible to perform queries by: free-text, shelfmark, author, title, date, copyst or incipit, on more than 80 selected scholarly digital resources across EU and USA. Since 2014 September 1st, TRAME has entered a new phase and the current work is focused on: extending the meta-search approach to other web resources, leveraging the users interaction to define an ontology for medieval manuscripts, re-designing the front-end towards a new UX approach.
Emiliano Degl’Innocenti, Alfredo Cosco, Fabrizio Butini, Roberta Giacomi, Vinicio Serafini

The PREFORMA Project: Federating Memory Institutions for Better Compliance of Preservation Formats

In this paper, we describe the motivations, objectives and organization of the PREservation FORMAts for culture information/ e-archives (PREFORMA) project, a Pre-Commercial Procurement (PCP) project focused on conformity check of ingested files for the long-term preservation.
Linda Cappellato, Nicola Ferro, Antonella Fresa, Magnus Geber, Börje Justrell, Bert Lemmens, Claudio Prandoni, Gianmaria Silvello

Models and Applications


Keep, Change or Delete? Setting up a Low Resource OCR Post-correction Framework for a Digitized Old Finnish Newspaper Collection

There has been a huge interest in digitization of both hand-written and printed historical material in the last 10–15 years and most probably this interest will only increase in the ongoing Digital Humanities era. As a result of the interest we have lots of digital historical document collections available and will have more of them in the future.
The National Library of Finland has digitized a large proportion of the historical newspapers published in Finland between 1771 and 1910 [13]; the collection, Digi, can be reached at http://​digi.​kansalliskirjast​o.​fi/​. This collection contains approximately 1.95 million pages in Finnish and Swedish, the Finnish part being about 2.385 billion words. In the output of the Optical Character Recognition (OCR) process, errors are common especially when the texts are printed in the Gothic (Fraktur, blackletter) typeface. The errors lower the usability of the corpus both from the point of view of human users as well as considering possible elaborated text mining applications. Automatic spell checking and correction of the data is also difficult due to the historical spelling variants and low OCR quality level of the material.
This paper discusses the overall situation of the intended post-correction of the Digi content and evaluation of the correction. We shall present results of our post-correction trials, and discuss some aspects of methodology of evaluation. These are the first reported evaluation results of post-correction of the data and the experiences will be used in planning of the post-correction of the whole material.
Kimmo Kettunen

Collaborative Information Seeking with Ant Colony Ranking in Real-Time

In this paper we propose a new ranking algorithm based on Swarm Intelligence, more specifically on the Ant Colony Optimization technique, to improve search engines’ performances and reduce the information overload by exploiting users’ collective behavior. We designed an online evaluation involving end users to test our algorithm in a real-world scenario dealing with informational queries. The development of a fully working prototype – based on the Wikipedia search engine – demonstrated promising preliminary results.
Tommaso Turchi, Alessio Malizia, Paola Castellucci, Kai Olsen

Modeling the Concept of Movie in a Software Architecture for Film-Induced Tourism

Film induced tourism is a recent phenomenon, which is rising increasing interest in tourism management and promotion. A research project on this topic is currently investigated at the Department of Cultural Heritage of the University of Padova, with the aim of developing a software architecture for promoting film-induced tourism. One of the challenges in the development of such system was the design of a suitable model to capture the concept of movie and all the related information. This paper presents the design and implementation of this model: how the entity of movie and its related information have been represented, how the design reflects the special needs and purposes of the system, how a database was implemented and populated and the outcomes of the developed software.
Giulia Lavarone, Nicola Orio, Farah Polato, Sandro Savino

Content Analysis


Unsupervised Author Identification and Characterization

Author identification is a hot topic, especially in the Internet age. Following our previous work in which we proposed a novel approach to this problem, based on relational representations that take into account the structure of sentences, here we present a tool that computes and visualizes a numerical and graphical characterization of the authors/texts based on several linguistic features. This tool, that extends a previous language analysis tool, is the ideal complement to the author identification technique, that is based on a clustering procedure whose outcomes (i.e., the authors’ models) are not human-readable. Both approaches are unsupervised, which allows them to tackle problems to which other state-of-the-art systems are not applicable.
Stefano Ferilli, Domenico Redavid, Floriana Esposito

A Content-Based Approach to Social Network Analysis: A Case Study on Research Communities

Several works in literature investigated the activities of research communities using big data analysis, but the large majority of them focuses on papers and co-authorship relations, ignoring that most of the scientific literature available is already clustered into journals and conferences with a well defined domain of interest. We are interested in bringing out underlying implicit relationships among such containers and more specifically we are focusing on conferences and workshop proceedings available in open access and we exploit a semantic/conceptual analysis of the full free text content of each paper. We claim that such content-based analysis may lead us to a better understanding of the research communities’ activities and their emerging trends. In this work we present a novel method for research communities activity analysis, based on the combination of the results of a Social Network Analysis phase and a Content-Based one. The major innovative contribution of this work is the usage of knowledge-based techniques to meaningfully extract from each of the considered papers the main topics discussed by its authors.
Dario De Nart, Dante Degl’Innocenti, Marco Basaldella, Maristella Agosti, Carlo Tasso

Analysis and Re-Use of Videos in Educational Digital Libraries with Automatic Scene Detection

The advent of modern approaches to education, like Massive Open Online Courses (MOOC), made video the basic media for educating and transmitting knowledge. However, IT tools are still not adequate to allow video content re-use, tagging, annotation and personalization. In this paper we analyze the problem of identifying coherent sequences, called scenes, in order to provide the users with a more manageable editing unit. A simple spectral clustering technique is proposed and compared with state-of-the-art results. We also discuss correct ways to evaluate the performance of automatic scene detection algorithms.
Lorenzo Baraldi, Costantino Grana, Rita Cucchiara

Digital Library Infrastructures


An Interoperability Infrastructure for Digital Identifiers in e-Science

The rapid increase of scientific digital assets in the last years has made clear that digital identifiers are crucial for effectively publishing, accessing and managing digital information in e-science contexts. From persistent keys for access to digital objects in network environments, the concept of persistent identifiers has been more recently extended to identify also physical objects like people, institutions and any type of relevant entity in the e-Science domain, opening the way to the creation of an integrated information space where a network of resources can be resolved, linked, navigated and analyzed, as the Linked Open Data approach envisions for the Web. However, the creation and full exploitation of this valuable network of connections is currently hindered by the fragmentation and lack of coordination of the digital identifier ecosystem. The aim of this paper is to propose an open, distributed and scalable infrastructure for interoperating existing Persistent Identifiers and other digital identifier systems (like Cool URIs) in e-science, overcoming geographical, disciplinary and organizational boundaries. The Digital Identifier interoperability infrastructure is presented as a cross-cutting solution of core services enabling interoperability at three different levels: identifier, co-reference and semantic.
Barbara Bazzanella, Paolo Bouquet

Finding a Needle in a Haystack

The BEIC Digital Library in Search of Its Space on the Web: A Case Study
The paper describes the strategies undertaken by BEIC Digital Library in order to find its identity and space on the Web. It will be of interest and value to other digital libraries facing the same challenges or in search of new strategies to promote their collections and to monitor their use.
Chiara Consonni, Paul Gabriele Weston


