Skip to main content

Über dieses Buch

This book constitutes the thoroughly refereed proceedings of the 16th Italian Research Conference on Digital Libraries, IRCDL 2020, held in Bari, Italy, in January 2020.

The 12 full papers and 6 short papers presented were carefully selected from 26 submissions. The papers are organized in topical sections on information retrieval, bid data and data science in DL; cultural heritage; open science.



Correction to: Identifying, Classifying and Searching Graphic Symbols in the NOTAE System

Maria Boccuzzi, Tiziana Catarci, Luca Deodati, Andrea Fantoli, Antonella Ghignoli, Francesco Leotta, Massimo Mecella, Anna Monte, Nina Sietis

Information Retrieval


Reproducibility of the Neural Vector Space Model via Docker

In this work we describe how Docker images can be used to enhance the reproducibility of Neural IR models. We report our results reproducing the Vector Space Neural Model (NVSM) and we release a CPU-based and a GPU-based Docker image. Finally, we present some insights about reproducing Neural IR models.
Nicola Ferro, Stefano Marchesin, Alberto Purpura, Gianmaria Silvello

Towards a Decision Support Framework for Forensic Analysis of Dynamic Signatures

This paper presents a preliminary easy to explain and effective framework for supporting dynamic signature analysis in forensic settings. The proposed approach is based on measuring similarities among signatures by applying Dynamic Time Warping on easy to derive dynamic measures. The long term goal of our research is to provide forensic handwriting examiners with a decision support tool to perform reproducible and less questionable inference.
Daniela Mazzolini, Patrizia Pavan, Giuseppe Pirlo, Gennaro Vessio

An Information Visualization Tool for the Interactive Component-Based Evaluation of Search Engines

In this paper, we present an InfoVis tool based on SanKey diagrams for the exploration of large combinatorial combinations of IR components – the Grid of Points (GoP).
The goal of this tool is to ease the comprehension of the behavior of single IR components within fully functioning off-the-shelf IR systems without recurring to complex statistical tools. In order to assess the quality of the proposed SanKey-based InfoVis tool we conduceted an initial user study that led to interesting conclusions, yet to be validated in a future and more comprehensive study.
Giacomo Rocco, Gianmaria Silvello

3D Average Common Submatrix Measure

This paper introduces a new measure for computing the similarity among 3D objects as the average volume of the largest sub-cubes matching in the objects. The match is approximate and only verified within a neighbourhood from the position of the sub-cubes. Preliminary tests performed on random and synthetic datasets prove the efficacy of the similarity measure in capturing the visual similarity among the 3D objects and a reduction in the execution time when the neighbourhood is considered.
Federica Franco, Alessia Amelio, Sergio Greco

Big Data and Data Science in DL


Lost in Translation: Can We Talk About Big Data Fairly?

Big data and data science are global, there is no alternative in our connected, digital world. Yet, for a truly open and fair science, cultural biases and different opportunities across different countries must be taken into consideration.
English has become the international language for the scientific debate: a single language is most convenient, moreover it is undergoing a process of refinement and adaptation to the science register. On the other hand, laboratories are populated by researchers from all over the world, and much research takes place in non-English-speaking countries, where research tradition often develops moving from different perspectives, influenced by the cultural context.
A fair and open science would miss an opportunity if it did not take into consideration the multilingualism and multiculturalism of the researchers as individuals and members of specific communities, and could also waste precious time and energies, as language barriers prevent cooperation.
The paper will discuss the above-mentioned issues with examples and reflect on the changing role of librarians and information specialists within a global scientific community.
Matilde Fontanin, Paola Castellucci

An Ontology and Knowledge Graph Infrastructure for Digital Library Knowledge Representation

New technologies for storing and handling knowledge provide unprecedented opportunities for enhanced fruition of digital libraries and archives. Going beyond document retrieval based on lexical content or metadata, using the context of documents, and/or of their content, may provide very new ways to put them in perspective and grasp a deeper understanding thereof, also for non-technical users.
Several components are needed to support this new perspective: suitable ontological resources to describe such variated knowledge, collaborative tools to collect the precious knowledge scattered across many scholars and practitioners spread all over the world, and to store it in a knowledge base, fruition tools to make the collected knowledge available to all interested stakeholders (scholars, researchers, but also common people).
This paper proposes the GraphBRAIN environment as a possible infrastructure. It is a general-purpose tool that allows its users to design and populate knowledge graphs, to collaboratively enrich them, and to exploit advanced fruition tools, consultation and analysis tools. Its functionality may also be provided as a set of Web services to end-user applications. An initial version of the ontology and knowledge graph for digital libraries and archives are also presented and discussed in the paper.
Stefano Ferilli, Domenico Redavid

Text-to-Image Synthesis Based on Machine Generated Captions

Text-to-Image Synthesis refers to the process of automatic generation of a photo-realistic image starting from a given text and is revolutionizing many real-world applications. In order to perform such process it is necessary to exploit datasets containing captioned images, meaning that each image is associated with one (or more) captions describing it. Despite the abundance of uncaptioned images datasets, the number of captioned datasets is limited. To address this issue, in this paper we propose an approach capable of generating images starting from a given text using conditional generative adversarial network (GAN) trained on uncaptioned images dataset. In particular, uncaptioned images are fed to an Image Captioning Module to generate the descriptions. Then, the GAN Module is trained on both the input image and the “machine-generated” caption. To evaluate the results, the performance of our solution is compared with the results obtained by the unconditional GAN. For the experiments, we chose to use the uncaptioned dataset LSUN-bedroom. The results obtained in our study are preliminary but still promising.
Marco Menardi, Alex Falcon, Saida S. Mohamed, Lorenzo Seidenari, Giuseppe Serra, Alberto Del Bimbo, Carlo Tasso

A Streamlined Pipeline to Enable the Semantic Exploration of a Bookstore

Searching in a library or book catalog is a recurrent task for researchers and common users alike. Thanks to semantic enrichment techniques, such as named-entity recognition and linking, texts may be automatically associated with entities in some reference knowledge graph(s). The association of a corpus of texts with a knowledge graph opens up the way to searching/exploring using novel paradigms. We present a pipeline that uses semantic enrichment and knowledge graph visualization techniques to enable the semantic exploration of an existing text corpus. The pipeline is meant to be ready for use and consists of existing free software tools and free software code contributed by us. We are developing and testing the pipeline on the field, by using it to access the catalog of a bookstore specialized in ancient Rome history.
Miguel Ceriani, Eleonora Bernasconi, Massimo Mecella

Re-implementing and Extending Relation Network for R-CBIR

Relational reasoning is an emerging theme in Machine Learning in general and in Computer Vision in particular. Deep Mind has recently proposed a module called Relation Network (RN) that has shown impressive results on visual question answering tasks. Unfortunately, the implementation of the proposed approach was not public. To reproduce their experiments and extend their approach in the context of Information Retrieval, we had to re-implement everything, testing many parameters and conducting many experiments. Our implementation is now public on GitHub and it is already used by a large community of researchers. Furthermore, we recently presented a variant of the relation network module that we called Aggregated Visual Features RN (AVF-RN). This network can produce and aggregate at inference time compact visual relationship-aware features for the Relational-CBIR (R-CBIR) task. R-CBIR consists in retrieving images with given relationships among objects. In this paper, we discuss the details of our Relation Network implementation and more experimental results than the original paper. Relational reasoning is a very promising topic for better understanding and retrieving inter-object relationships, especially in digital libraries.
Nicola Messina, Giuseppe Amato, Fabrizio Falchi

Actual Researcher Contribution (ARC) Versus the Perceived Contribution to the Scientific Body of Knowledge

The aim of this paper is to propose a new quantitative metric that can be used to measure the total actual researcher contribution (ARC) to a body of knowledge. The proposed ARC metric is a fair measure that is needed to address the abuse of research collaboration and issues arising from honorary authorship, which both lead to the inflation of the total number of published articles by a researcher. This inflation can provide misleading information about a researcher’s expertise and competence based on their perceived contribution. Research ranking agencies, database indexes, universities, and other decision makers can rely on the ARC metric to rank and evaluate university and researcher contributions to a body of knowledge and thus make more informed decisions and allocate research resources more efficiently.
Mohanad Halaweh

Cultural Heritage


Towards a Tool for Visual Link Retrieval and Knowledge Discovery in Painting Datasets

This paper presents a preliminary investigation aimed at developing a tool for visual link retrieval and knowledge discovery in painting datasets. The proposed framework is based on a deep convolutional network to perform feature extraction and on a fully-unsupervised nearest neighbor approach to retrieve visual links among digitized paintings. Moreover, the proposed method makes it possible to study influences among artists by means of graph analysis. The tool is intended to help art historians better understand visual arts.
Giovanna Castellano, Gennaro Vessio

Open Access

Identifying, Classifying and Searching Graphic Symbols in the NOTAE System

The use of graphic symbols in documentary records from the 5th to the 9th century has so far received scant attention. What we mean by graphic symbols are graphic signs (including alphabetical ones) drawn as a visual unit in a written text and representing something other or something more than a word of that text. The Project NOTAE represents the first attempt to investigate these graphic entities as a historical phenomenon from Late Antiquity to early medieval Europe in any written sources containing texts generated for pragmatic purposes (contracts, petitions, official and private letters, lists etc.). Identifying and classifying graphic symbols on such documents is a task that requires experience and knowledge of the field, but software applications may come in help by learning to recognize symbols from previously annotated documents and suggesting experts potential symbols and likely classification in newly acquired documents to be validated, thus easing the task. This contribution introduces the NOTAE system that, in addition to the aforementioned task, provides non expert users with tools to explore the documents annotated by experts.
Maria Boccuzzi, Tiziana Catarci, Luca Deodati, Andrea Fantoli, Antonella Ghignoli, Francesco Leotta, Massimo Mecella, Anna Monte, Nina Sietis

TindArt, an Experiment on User Profiling for Museum Applications

In this paper an Android application called TindArt is presented. It has been developed to investigate a way to profile the user in cultural contexts, through the application of Recommender Systems for museum visits in the future. The purpose of the research also includes the study of the User Experience with TindArt to understand how it could be used in a real museum context. Two pilot studies are also presented.
Daniel Zilio, Nicola Orio, Camilla Toniolo

Recognition of Concordances for Indexing in Digital Libraries

We describe a system for the automatic transcription of books with concordances. Even if the recognition of printed text with OCR tools is nearly solved for high quality documents, the recognition of structured text, where dictionaries and other linguistic tools can be of little help, is still a difficult task. In this work, we propose to use several techniques for correcting the imperfect text recognized by the OCR software by taking into account both physical features of the documents and the redundancy of information implicit in concordances.
Simone Marinai, Samuele Capobianco, Zahra Ziran, Andrea Giuntini, Pierluigi Mansueto

Open Science


RepOSGate: Open Science Gateways for Institutional Repositories

Most repository platforms used to operate Institutional Repositories fail at delivering a complete set of functionalities required by institutions and researchers to fully comply with Open Science publishing practices. This paper presents RepOSGate, a software that implements an overlay application capable of collecting metadata records from a repository and transparently deliver search, statistics, upload of Open Access versions functionalities over an enhanced version of the metadata collection, which include: links to datasets, Open Access versions of the artifacts, links to projects from several funders, subjects, citations, etc. The paper will also present two instantiations of RepOSGate, used to enhance the publication metadata collections of two CNR institutes: Institute of Information Science and Technologies (ISTI) and Institute of Marine Sciences (ISMAR).
Michele Artini, Leonardo Candela, Paolo Manghi, Silvia Giannini

Training Data Stewards in Italy: Reflection on the FAIR RDM Summer School

“Fair Research Data Management” Summer School in Parma focused on the skills gap in Italy for data stewards. A distinct feature of the Summer School was its aim to bring together participants from different backgrounds and from different countries. The paper is a reflection on the organization of the Summer School and the evaluation received by the participants.
Anna Maria Tammaro, Stefano Caselli

Creating Digital Cultural Heritage with Open Data: From FAIR to FAIR5 Principles

The Art. 2 of the EU Council Conclusions of 21 May 2014 on cultural heritage as a strategic resource for a sustainable Europe (2014/C 183/08) states the existence of the new Digital Cultural Heritage (born digital and digitized). Starting from this assumption, we must rethink digitization, digitalization and digital transformation as recording and representing the processes of contemporary life cycles, no longer as simple tools to improve access to reality. So, we must define clear and homogeneous criteria to validate and certify what among contemporary digital magma we can identify as Digital Cultural Heritage (DCH). This paper outlines a proposal in such way starting from the extension of the R: Reusable requirement of FAIR Principles to R5 adding the requirements: Readable, Relevant, Reliable and Resilient. These requirements should lead the design and creation of descriptive metadata in open format for indexing and managing digital cultural resources. The Terra delle Gravine between sharing economy and experiential tourism project was a case study for testing this proposal. Three digital libraries of the municipal libraries of Massafra, Mottola and Grottaglie were designed and implemented by creating an open data schema for indexing and describing the digital resources.
Nicola Barbuti

Nanocitation: Complete and Interoperable Citations of Nanopublications

Nanopublication is a data publishing model which has a great potential for the representation of scientific results allowing interoperability, data integration and exchange of scientific findings. But this model suffer of the lack of an appropriate standard methodology to produce complete and interoperable citations providing both data identification and access. In this paper we introduce nanocitation, a framework to automatically get human-readable text-snippet snippet and machine-readable citations of nanopublications.
Erika Fabris, Tobias Kuhn, Gianmaria Silvello


Weitere Informationen