This book constitutes the thoroughly refereed proceedings of the 7th Italian Research Conference on Digital Libraries held in Pisa, Italy, in January 2011. The 20 revised full papers presented were carefully reviewed and cover topics of interest such as system interoperability and data integration; formal and methodological foundations of digital libraries; semantic web and linked data for digital libraries; multilingual information access; digital library infrastructures; metadata creation and management; search engines for digital library systems; evaluation and log data; handling audio/visual and non-traditional objects; user interfaces and visualization; digital library quality.



Selected Papers

Probabilistic Inference over Image Networks

Digital Libraries contain collections of multimedia objects providing services for the management, sharing and retrieval. Involved objects have two levels of complexity: the former refers to the inner object complexity while the latter takes into account the implicit/explicit relationships among objects. Traditional machine learning classifiers do not consider the relationships among objects assuming them independent and identically distributed. Recently, link-based classification methods have been proposed, that try to classify objects exploiting their relationships (links). In this paper, we deal with objects corresponding to digital images, even if the proposed approach can be naturally applied to different kind of multimedia objects. Relationships can be expressed among the features of the same image or among features belonging to different images. The aim of this work is to verify whether a link-based classifier based on a Statistical Relational Learning (SRL) language can improve the accuracy of a classical k-nearest neighbour approach. Experiments will show that the modelling of the relationships in a real-word dataset using a SRL model reduces the classification error.
Claudio Taranto, Nicola Di Mauro, Floriana Esposito

A Keyphrase-Based Paper Recommender System

Current digital libraries suffer from the information overload problem which prevents an effective access to knowledge. This is particularly true for scientific digital libraries where a growing amount of scientific articles can be explored by users with different needs, backgrounds, and interests. Recommender systems can tackle this limitation by filtering resources according to specific user needs. This paper introduces a content-based recommendation approach for enhancing the access to scientific digital libraries where a keyphrase extraction module is used to produce a rich description of both content of papers and user interests.
Felice Ferrara, Nirmala Pudota, Carlo Tasso

Accessing Music Digital Libraries by Combining Semantic Tags and Audio Content

An interesting problem in accessing music digital libraries is how to combine the information of different sources in order to improve the retrieval effectiveness. This paper introduces an approach to represent a collection of tagged songs through an hidden Markov model with the purpose to develop a system that merges in the same framework both acoustic similarity and semantic descriptions. The former provides content-based information on song similarity, the latter provides context-aware information about individual songs. Experimental results show how the proposed model leads to better performances than approaches that rank songs using both a single information source and a their linear combination.
Riccardo Miotto, Nicola Orio

Improving User Stereotypes through Machine Learning Techniques

Users of Digital libraries require more intelligent interaction functionality to satisfy their needs. In this perspective, the most important features are flexibility and capability of adapting these functionalities to specific users. However, the main problem of current systems is their inability to support different needs of individual users due both to their inability to identify those needs, and, more importantly, to insufficient mapping of those needs to the available resources/services. The approaches considered in this paper to tackle such problems concern the use of Machine Learning techniques to adapt the set of user stereotypes with the aim of modelling user interests and behaviour in order to provide the most suitable service. A purposely designed simulation scenario was exploited to show the applicability of the proposal.
Teresa M. A. Basile, Floriana Esposito, Stefano Ferilli

Displaying Phonological Diachronic Changes through a Database Application

This paper presents a project which aims to provide a new digital instrument for linguistic research. This new tool will be able to show the historical evolution of a language into one or more daughter languages, and it will allow users to perform a comparative and typological analysis of diachronic processes. The originality of this project is given by two factors: first, its developers are linguists with notions in computer science, which prevents any communication issue between different teams of experts; second, the data feeding database, though derived from well known corpora, have been processed in a specialist way to display the evolution of words from a mother language to the daughter languages. The instrument will account for all the diachronic phonological rules which occur during the word change.
Marta Manfioletti, Mattia Nicchio

A Digital Library of Grammatical Resources for European Dialects

The paper illustrates the methodology at the basis of the design of a digital library system that enables the management of linguistic resources of curated dialect data. Since dialects are rarely recognized as official languages, first of all linguists need a dedicated information management system providing the unambiguous identification of each dialect on the basis of geographical, administrative and geolinguistic parameters. Secondly, the information management system has to be designed to allow users to search the occurrences of a specific grammatical structure (e.g. a relative clause or a particular word order). Thirdly, user-friendly graphical interfaces must give easy access to language resources and make the building of the language resources easier and distributed. This work, which stems from a project named ASIt (Atlante Sintattico d’Italia), is a first step towards the creation of a European digital library for recording and studying linguistic micro-variation.
Maristella Agosti, Birgit Alber, Giorgio Maria Di Nunzio, Marco Dussin, Diego Pescarini, Stefan Rabanus, Alessandra Tomaselli

Taxonomy Based Notification Service for the ASSETS Digital Library Platform

In this paper, we report our taxonomy-based notification service for the ASSETS digital library platform, which is being developed in an EU co-funded project. Notification is a very fundamental functionality for every living digital library which is continuously updated and dynamically interacts with users. The ASSETS platform provides a common notification service and its extensions based on the publish/subscribe pattern as a message notification infrastructure. Our taxonomy based notification service is one of those extensions that enables users to define subscriptions for receiving notifications by using a hierarchically organized controlled vocabulary, namely a taxonomy. Through this service, users can easily subscribe to messages about specific domain of their interest with a small number of terms in a taxonomy. Then system can efficiently filter a stream of published messages to deliver notifications to proper subscribers by taking account of the taxonomy. This service works as an important piece for enabling various advanced features in the ASSETS platform such as personalized new item lists and a digital preservation service. In this paper, we show an outline of the ASSETS notification architecture, and give a description about a model for the taxonomy-based notification implemented in our service.
Jitao Yang, Tsuyoshi Sugibuchi, Nicolas Spyratos

SIAR: A User-Centric Digital Archive System

This paper presents the SIAR (Sistema Informativo Archivistico Regionale) project supported by the Italian Veneto Region, the aim of which is to design and develop a digital archive system. The main goal of the SIAR project is to develop a system for managing and sharing archive metadata in a distributed environment. In this paper we report the activities that led to the design and development of the SIAR system, underlining the fundamental role played by the user during this process. Indeed, in the SIAR project the archival users provide continuous feedback that allows us to shape the system on a user-needs basis.
Maristella Agosti, Nicola Ferro, Andreina Rigon, Gianmaria Silvello, Erilde Terenzoni, Cristina Tommasi

Relevant Projects

ASIt: A Grammatical Survey of Italian Dialects and Cimbrian: Fieldwork, Data Management, and Linguistic Analysis

ASIt aims to observe, collect and analyse the linguistic variation displayed by the dialects of a language. The main theoretical hypothesis is that linguistic variation is not due to chance, but depends on the combination of a finite number of parameters. It is a first step towards the creation of a European digital library for recording and studying linguistic micro-variation.
Maristella Agosti, Birgit Alber, Paola Benincà, Giorgio Maria Di Nunzio, Marco Dussin, Riccardo Miotto, Diego Pescarini, Stefan Rabanus, Alessandra Tomaselli

ASSETS: Advanced Service Search and Enhancing Technological Solutions for the European Digital Library

ASSETS is a 2 year project co-funded by the CIP Policy Support Programme. The main goal of the project is to improve the usability of Europeana (the European Digital Library) by developing, implementing and deploying largescale services focusing on search, browsing and user interfaces. ASSETS strives also to make more digital items available on Europeana by involving content providers across different cultural environments.
Nicola Aloia, Cesare Concordia, Carlo Meghini

Computational Models Enhancing Semantic Access to Digital Repositories

The growing amount of heterogeneous digital repositories has created a demand for effective and flexible techniques for automatic multimedia data retrieval. While the primary type of information available in documents is usually text, other type of information such as images play a very important role because they pictorially describe concepts that are dealt with in the document. Unfortunately, the semantic gap separating the visual content from the underlying meaning is wide.
The main goal of the project concerns the investigation of machine learning approaches to improve the semantic access to multimedia repositories by combining information gathered from the textual content with the one coming from pictorial representation. Furthermore, they have to be scalable, efficient and robust with respect to the inborn high-dimensionality and noise in the data collection.
Floriana Esposito, Nicola Di Mauro, Claudio Taranto, Stefano Ferilli

The CULTURA Project: CULTivating Understanding and Research through Adaptivity

CULTURA aims at personalisation and community-aware adaptivity for Digital Humanities through the implementation of innovative adaptive services in an interactive environment. The intention is to offer genuine user empowerment and different levels of engagement with digital cultural heritage collections and communities.
Maristella Agosti, Nicola Orio

Project D.A.M.A.: Document Acquisition, Management and Archiving

A paper document processing system is an information system component which transforms information on printed or handwritten documents into a computer-revisable form. In intelligent systems for paper document processing this information capture process is based on knowledge of the specific layout and logical structures of the documents. In this project we design a framework which combines technologies for the acquisition and storage of printed documents with knowledge-based techniques to represent and understand the information they contain. The innovative aspects of this work strengthen its applicability to tools that have been developed for building digital libraries.
Michelangelo Ceci, Corrado Loglisci, Stefano Ferilli, Donato Malerba

DDTA - Digitalisation of Districts in the Textile and Clothing Sector

The main goal of the project was the development of a District Service Center for the SMEs of the Textile and Clothing sector. In particular, it investigates the introduction of innovative technologies to improve the process/product innovation of the sector. In this direction, the research unit proposal consisted in introducing document processing and indexing techniques on a variety (both for structure and content) of document formats whit the aim of improving the exchange of data among companies and the semantic content-based retrieval for the real companies’needs.
Floriana Esposito, Stefano Ferilli, Nicola Di Mauro, Teresa M. A. Basile, Marenglen Biba

DOMINUS plus - DOcument Management INtelligent Universal System (plus)

Activities of most organizations, and of universities in particular, involve the need to store, process and manage collections of different kinds of documents. Examples that require advanced solutions to such issues include the management of libraries, scientific conferences, research projects. DOMINUS plus is an open project born with the aim of harmonizing the Artificial Intelligence approaches developed at the LACAM laboratory with the research on Digital Libraries in a general software backbone for document processing and management, extensible with ad-hoc solutions for specific problems and context (such as universities).
Stefano Ferilli, Floriana Esposito, Teresa M. A. Basile, Domenico Redavid, Incoronata Villani

Europeana v1.0

The Europeana v1.0 is a Thematic Network project funded under the Commission’s eContentplus programme 2008 and is the successor network to the EDLnet thematic network that created the EDL Foundation and the Europeana prototype. The goal of the project is to develop an operational service and solve key operational issues related to the implementation and functioning of the European Digital Library. The work will include also a business development operation to ensure that a steady stream of content is made available in the Digital Library.
Nicola Aloia, Cesare Concordia, Carlo Meghini


EuropeanaConnect delivers core components which are essential for the realisation of the European Digital Library (Europeana) as a truly interoperable, multilingual and user-oriented service for all European citizens.
Franco Crivellari, Graziano Deambrosis, Giorgio Maria Di Nunzio, Marco Dussin, Nicola Ferro

MBlab: Molecular Biodiversity Laboratory

Technologies in available biomedical repositories do not yet provide adequate mechanisms to support the understanding and analysis of the stored content. In this project we investigate this problem under different perspectives. Our contribution is the design of computational solutions for the analysis of biomedical documents and images. These integrate sophisticated technologies and innovative approaches of Information Extraction, Data Mining and Machine Learning to perform descriptive tasks of knowledge discovery from biomedical repositories.
Corrado Loglisci, Annalisa Appice, Michelangelo Ceci, Donato Malerba, Floriana Esposito

A Personalized Intelligent Recommender and Annotator TEStbed for Text-Based Content Retrieval and Classification: The PIRATES Project

This paper presents the PIRATES (Personalized Intelligent Recommender and Annotator TEStbed for text-based content retrieval and classification) Project. This project faces the information overload problem by taking into account semantic and social issues: an integrated set of tools allow the users to customize and personalize the way they retrieve, filter, and organize Web resources.
Felice Ferrara, Carlo Tasso

PROMISE – Participative Research labOratory for Multimedia and Multilingual Information Systems Evaluation

Measuring is a key to scientific progress. This is particularly true for research concerning complex systems, whether natural or human-built. PROMISE will provide a virtual laboratory for conducting participative research and experimentation to carry out, advance and bring automation into the evaluation and benchmarking of complex multilingual and multimedia information systems.
Emanuele Di Buccio, Marco Dussin, Nicola Ferro, Ivano Masiero, Gianmaria Silvello

Cooperative Digital Asset Management in the Scientific Field: Strategies, Policies, Interoperability and Persistent Identifiers

In this paper we present a series of activities carried out within the National Research Council of Italy (CNR) and aimed at the development of a unique, certified and open archive of CNR’s digital research products. Starting from the description of CNR’s distributed library system, we then briefly talk about CNR’s involvement in OA initiatives and the role played by CNR’s Information System Office in providing technological tools for digital asset management. Afterwards we try to point out some criticalities of OA archives. We then talk about the solution we propose for the development of a unique, certified and open archive using a cooperative approach that takes into account previous experiences, existing repositories, policy and organizational issues. We also present the processes we designed for content ingestion and validation and the strategies for persistent identification. We finally illustrate the technical solutions we have developed as prototype proposals for the community.
Maurizio Lancia, Roberto Puccinelli, Massimiliano Saccone, Marco Spasiano, Luciana Trufelli


