nach oben

2011 | Buch

Kapitel lesen Erstes Kapitel lesen

Research and Advanced Technology for Digital Libraries

International Conference on Theory and Practice of Digital Libraries, TPDL 2011, Berlin, Germany, September 26-28, 2011. Proceedings

herausgegeben von: Stefan Gradmann, Francesca Borri, Carlo Meghini, Heiko Schuldt

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book constitutes the refereed proceedings of the International Conference on Theory and Practice of Digital Libraries, TPDL 2011 - formerly known as ECDL (European Conference on Research and Advanced Technology for Digital Libraries) - held in Berlin, Germany, in September 2011. The 27 full papers, 13 short papers, 9 posters and 9 demos presented in this volume were carefully reviewed and selected from 162 initial submissions. In addition the book contains the abstract of 2 keynote speeches and an appendix stating information on the doctoral consortium, as well as the panel, which were held at the conference. The papers are grouped in topical sections on networked information, semantics and interoperability, systems and architectures, text and multimedia retrieval, collaborative information spaces, DL applications and legal aspects, user interaction and information visualization, user studies, archives and repositories, europeana, and preservation.

Inhaltsverzeichnis

Frontmatter

Keynotes

Paper, Pen and Touch

It has long been recognised by researchers that the affordances of paper are likely to ensure that it will continue to be in widespread use in the work place, homes and public spaces. Consequently, numerous research projects have investigated ways of integrating paper with digital media and services. In recent years, a lot of this research has revolved around the digital pen and paper technology developed by the Swedish company Anoto, since it offers a robust solution for tracking the position of a pen on paper. While the commercial sector has tended to focus on applications related to the capture of handwriting, many of these research projects have investigated the use of the pen for real-time interaction and possibilities of turning paper into an interactive medium.

Researchers were also quick to realise that digital pen and paper technology could be adapted to support other forms of pen-based interaction and have developed digital whiteboards and tabletops based on the technology. In addition, some systems have combined the technology with touch devices to support bimanual pen and touch interfaces. In the case of document manipulation, this means that touch could be used to perform actions such as a moving a document or turning pages, while the pen could be used to select elements within a document or to annotate it. Further, there are projects which have integrated the work on interactive paper and pen-based interaction on digital tabletops, investigating ways of allowing users to transfer document elements back and forth between paper and digital surfaces.

Despite the success of these research projects in terms of demonstrating the capabilities of digital pen and paper technology and how it could be exploited to support a wide variety of everyday tasks, there are still some technical and non-technical issues that need to addressed if there are to be major breakthroughs in terms of widespread adoption. The first part of the talk will review research in the field, while the second part will examine these issues and the way ahead.

Moira C. Norrie

The Futures of Digital Libraries: The Evolution of an Idea

The construction of digital libraries have certainly framed technological challenges, particularly with regard to various aspects of scale, and with the complexities of dealing with human languages, and indeed have given rise to substantial progress in these and other technical fields. But I believe that the greatest significance of digital libraries has been at a more profound intellectual level, inviting us to envision new kinds of environments for knowledge discovery, formulation, and dissemination; approaches to defining, managing and interacting with the cultural and intellectual record of our societies. We have repeatedly been forced to revisit questions of what constitutes a digital library, and how (indeed, even if) this differs from simply a collection of digitized or born-digital materials.

Clifford Lynch

Technical Sessions

Networked Information

Connecting Archival Collections: The Social Networks and Archival Context Project

This paper describes the Social Networks and Archival Context project, built on a database of merged Encoded Archival Context - Corporate Bodies, Persons, and Families (EAC-CPF) records derived from Encoded Archival Description (EAD) records held by the Library of Congress, the California Digital Library, the Northwest Digital Archives, and Virginia Heritage, combined with information from name authority files from the Library of Congress (Library of Congress Name Authority File), OCLC Research (The Virtual International Authority File), and the Getty Vocabulary Program (Union List of Artist Names). The database merges information from each instance of an individual name found in the EAD resources, along with variant names, biographical notes and their topical descriptions. The SNAC prototype interface makes this information searchable and browseable while retaining links to the various data sources.

Ray R. Larson, Krishna Janakiraman

How to Become a Group Leader? or Modeling Author Types Based on Graph Mining

Bibliographic databases are a prosperous field for data mining research and social network analysis. The representation and visualization of bibliographic databases as graphs and the application of data mining techniques can help us uncover interesting knowledge regarding how the publication records of authors evolve over time. In this paper we propose a novel methodology to model bibliographical databases as

Power Graphs

, and mine them in an unsupervised manner, in order to learn basic author types and their properties through clustering. The methodology takes into account the evolution of the co-authorship information, the volume of published papers over time, as well as the impact factors of the venues hosting the respective publications. As a proof of concept of the applicability and scalability of our approach, we present experimental results in the

DBLP

data.

George Tsatsaronis, Iraklis Varlamis, Sunna Torge, Matthias Reimann, Kjetil Nørvåg, Michael Schroeder, Matthias Zschunke

Find, New, Copy, Web, Page - Tagging for the (Re-)Discovery of Web Pages

The World Wide Web has a very dynamic character with resources constantly disappearing and (re-)surfacing. A ubiquitous result is the “404 Page not Found” error as the request for missing web pages. We investigate tags obtained from Delicious for the purpose of rediscovering such missing web pages with the help of search engines. We determine the best performing tag based query length, quantify the relevance of the results and compare tags to retrieval methods based on a page’s content. We find that tags are only useful in addition to content based methods. We further introduce the notion of “ghost tags”, terms used as tags that do not occur in the current but did occur in a previous version of the web page. One third of these ghost tags are ranked high in Delicious and also occurred frequently in the document which indicates their importance to both the user and the content of the document.

Martin Klein, Michael L. Nelson

Semantics and Interoperability I

Mapping MPEG-7 to CIDOC/CRM

The MPEG-7 is the dominant standard for multimedia content description; thus, the audiovisual Digital Library contents should be described in terms of MPEG-7. Since there exists a huge amount of audiovisual content in the cultural heritage domain, it is expected that several cultural heritage objects, as well as entities related with them (i.e. people, places, events etc.), have been described using MPEG-7. On the other hand, the dominant standard in the cultural heritage domain is the CIDOC/CRM; consequently, the MPEG-7 descriptions cannot be directly integrated in the cultural heritage digital libraries.

We present in this paper a mapping model and a system that allow the transformation of the MPEG-7 descriptions to CIDOC/CRM descriptions, thus allowing the exploitation of multimedia content annotations in the cultural heritage digital libraries. In addition, the proposed mapping model allows linking MPEG-7 descriptions to CIDOC/CRM descriptions in a Linked Data scenario.

Anastasia Angelopoulou, Chrisa Tsinaraki, Stavros Christodoulakis

A Language Independent Approach for Named Entity Recognition in Subject Headings

Subject headings systems are tools for organization of knowledge that have been developed over the years by libraries. The SKOS Simple Knowledge Organization System has provided a practical way to represent subject headings systems using the Resource Description Framework, and several libraries have taken the initiative to make subject headings systems widely available as open linked data. Each individual subject heading describes a concept, however, in the majority of cases, one subject heading is actually a combination of several concepts, such as a topic bounded in geographical and temporal scopes. In these cases, the label of the concept actually carries several concepts which are not represented in structured form. Our work explores machine learning techniques to recognize the sub concepts represented in the labels of SKOS subject headings. This paper describes a language independent named entity recognition technique based on conditional random fields, a machine learning algorithm for sequence labelling. This technique was evaluated on a subset of the Library of Congress Subject Headings, where we measured the recognition of geographic concepts, topics, time periods and historical periods. Our technique achieved an overall F

score of 0.98.

Nuno Freire, José Borbinha, Pável Calado

Towards Cross-Organizational Interoperability: The LIDO XML Schema as a National Level Integration Tool for the National Digital Library of Finland

The Finnish National Digital Library (NDL) project aims to improve online accessibility and usability of digital content held by libraries, museums and archives. The lack of standardized metadata and numerous different collection management systems without sufficient set of technical standards in the museum sector led us to create a set of instructions and a template mapping of the Lightweight Information Describing Objects (LIDO) XML schema. This national LIDO schema for museum sector described in our paper is unique both in coverage in museum object types as well as number of institutions using it. A common schema presents heterogeneous metadata uniformly, thus enabling easy retrieval, browsing and versatile linking between different object types as well as data fields. In the pilot phase we have mapped the three most commonly used Finnish collection management systems with three different metadata formats to the top level LIDO schema.

Riitta Autere, Mikael Vakkari

Supporting FRBRization of Web Product Descriptions

The FRBR model has the potential for new services and discovery techniques for cultural items such as books, movies and music. In this paper, we present an approach to interpret descriptions found in Web resources and identify the FRBR entities these pertain to. To verify the resulting set of FRBR entities, we have used the Linked Open Data and the verifications have been validated by a group of experts. The results of this work demonstrates applicability of FRBR in a new context and establishes a firm basis for further exploitation.

Naimdjon Takhirov, Fabien Duchateau, Trond Aalberg

Systems and Architectures

Assessing Use Intention and Usability of Mobile Devices in a Hybrid Environment

During the last decades many information providers, such as libraries, have been collecting, organizing and delivering information in both print and digital format, forming a hybrid information environment. However, exploration of a hybrid information environment does not result in a unified seeking experience, which exploits most effectively the available resources. This paper aims to identify the main factors that influence the adoption of wireless, mobile devices (e.g., smartphones) as a means of integrating the information seeking process in hybrid environments. Therefore it presents a prototype system and an evaluation study that provides an insight about the services design.

Spyros Veronikis, Giannis Tsakonas, Christos Papatheodorou

Digital Library 2.0 for Educational Resources

We report on focus group feedback regarding the services provided by existing education-related Digital Libraries (DL). Participants provided insight into how they seek educational resources online, and what they perceive to be the shortcomings of existing educational DLs. Along with useful content, social interactions were viewed as important supplements for educational DLs. Such interactions lead to both an online community and new forms of content such as reviews and ratings. Based on our analysis of the focus group feedback, we propose DL 2.0, the next generation of digital library, which integrates social knowledge with DL content.

Monika Akbar, Weiguo Fan, Clifford A. Shaffer, Yinlin Chen, Lillian Cassel, Lois Delcambre, Daniel D. Garcia, Gregory W. Hislop, Frank Shipman, Richard Furuta, B. Stephen Carpenter II, Haowei Hsieh, Bob Siegfried, Edward A. Fox

An Approach to Virtual Research Environment User Interfaces Dynamic Construction

Virtual Research Environments are internet-based working environments tailored to serve needs of diverse and evolving user communities. These environments are oriented to promote new ways of dealing with modern research tasks. Their realization requires user interfaces that are dynamically built to provide their clients with

organised views

on the data and services aggregated to meet specific community needs. This paper presents an approach to the problem of Virtual Research Environment user interfaces dynamic construction. This approach is characterized by user interfaces built through a component-oriented strategy and an heuristic for user interface constituents arrangement on the screen. The implementation and exploitation of the proposed approach in the context of the D4Science-II, EU funded project is discussed as well as future plans are presented.

Massimiliano Assante, Pasquale Pagano, Leonardo Candela, Federico De Faveri, Lucio Lelii

CloudCAP: A Case Study in Capacity Planning Using the Cloud

Emory University Library teamed with a commercial firm to develop a prototype system for using Amazon’s EC2 to properly size web application server deployment environments. This approach has been successfully applied to both high-transaction commercial environments with hundreds of thousands of users and to lower transaction digital library environments with hundreds of users. Starting with the same EC2-based product, our goal was to assess whether a similar strategy is practical for an academic library as well as for commercial systems. We examined cloud configuration and deployment costs, test preparation and analysis, and overall feasibility of this approach. Typically, for digital libraries, the user levels are significantly lower, the deployment costs are lower, and the return on investment (ROI) is not as immediately obvious. We conclude that the effort is worth the investment only (a) when there are significant repercussions from under-sizing a newly deployed digital library and (b) sufficient engineering staff are on hand to develop and debug the deployment scenarios.

Joan A. Smith, John F. Owen, James R. Gray

Text and Multimedia Retrieval

Query Operators Shown Beneficial for Improving Search Results

Search engines allow users to retrieve documents with respect to a given query. These provide advanced search options, such as query operators (e.g.,

+term

term^10

). Previous work studied how query operators are employed by end-users. In this paper, we study the extent to which using query operators may lead to improved results, regardless of specific users. We hypothesize that the proper use of query operators improves search results. To validate this hypothesis, we present a methodology relying on standard IR test collections. We applied this methodology to TREC-7 and TREC-8 test collections with five IR models implemented in the

Terrier

search engine. Experiments show that queries enriched with operators give an improvement in effectiveness up to 35.1% over regular queries. This result suggests that end-users would benefit from using operators more often.

Gilles Hubert, Guillaume Cabanac, Christian Sallaberry, Damien Palacio

Evaluation Platform for Content-Based Image Retrieval Systems

In all subfields of information retrieval, test datasets and ground truth data are important tools for testing and comparison of new search methods. This is also reflected by the image retrieval community where several benchmarking activities have been created in past years. However, the number of available test collections is still rather small and the existing ones are often limited in size or accessible only to the participants of benchmarking competitions. In this work, we present a new freely-available large-scale dataset for evaluation of content-based image retrieval systems. The dataset consists of 20 million high-quality images with five visual descriptors and rich and systematic textual annotations, a set of 100 test query objects and a semi-automatically collected ground truth data verified by users. Furthermore, we provide services that enable exploitation and collaborative expansion of the ground truth.

Petra Budikova, Michal Batko, Pavel Zezula

Music Video Redundancy and Half-Life in YouTube

YouTube is the largest, most popular video digital library in existence, and is quite possibly the most popular digital library regardless of format type. Furthermore, music videos are one of the primary applications of YouTube. Based on our experiences of linking to music videos in YouTube, we observed that while any single URI had a short half-life, music videos were always available at another URI. For this study we collected 1291 music videos and found that very few had zero or one copies in YouTube at any given time, and some had several thousand copies at any given time. Furthermore, individual URIs had a half-life of anywhere from 9 to 18 months, depending on the publication date and remaining commercial potential.

Matthias Prellwitz, Michael L. Nelson

Linguistic and Semantic Representation of the Thompson’s Motif-Index of Folk-Literature

We present on-going work on the linguistic and semantic processing of the labels of the Thompson’s Motif-Index of Folk-Literature, which has been proposed by Stith Thompson for the classification of narrative elements in folk-literature. We automatically extracted the labels of an on-line version of the Index, and wrote specialised grammars for providing for a multi-layer linguistic annotation of them. We are currently working on enriching the linguistically annotated labels with semantic classes and relations, allowing for a better access to the content of the Index. With this resource, we expect to be able to semi-automatically annotate digitised literary works at the sub-document level by means of automatically comparing the annotated Index with the results of text processing tools applied to those works, and so contribute to a better inter-textual interlinking and understanding of related works in the folk-literature, offering a new way of semantically accessing digital libraries.

Thierry Declerck, Piroska Lendvai

Collaborative Information Spaces

WPv4: A Re-imagined Walden’s Paths to Support Diverse User Communities

The Walden’s Paths Project, as part of our philosophy of continual evaluation, actively seeks out user communities who may find our system to be of interest. In the past few years we noticed a recurring trend of user issues, needs, and sought-after features. In order to better support our users, we initiated a redesign of Walden’s Paths that not only solves these problems, but enables us to perform more rapid prototyping and experimentation of new features and interfaces. In order to accomplish these goals, we have created a web service that handles the storage, modification, and representation of our path data structures. This service is completely isolated from user interface layers, allowing many different interface designs to be implemented on top of the basic Walden’s Paths data structures. We also present several prototype interfaces - Marginalia, CoWPaths, Walden’s Drupal, PathCompiler v2, mWalden - that represent new areas in which we believe our ideas can be applied such as collaborative work, location-aware services, large educational databases, offline presentation, and mobile computing.

Paul Logasa Bogen II, Daniel Pogue, Faryaneh Poursardar, Yuangling Li, Richard Furuta, Frank Shipman

Understanding the Dynamic Scholarly Research Needs and Behavior as Applied to Social Reference Management

We conducted a study with an objective to learn more about the dynamic information needs, information-seeking behavior, information use and other scholarly activities of researchers. Our focus was on the collaborative and social usage and on the social reference managers. We compared the current practices and strategies of scholars and researchers from multidisciplinary research areas. Our findings provide valuable insights and augment the understanding of how the social web is having a significant effect on the current researchers’ activities and digital libraries.

Hamed Alhoori, Richard Furuta

Experiment and Analysis Services in a Fingerprint Digital Library for Collaborative Research

Fingerprint management systems support millions of images and complicated but imperfect image identification algorithms. The forensic community requires a set of digital library services to support large image collections, execute identification algorithms, and analyze experiments that test identification algorithms in development. We present a model and prototype system capable of testing and analyzing fingerprinting algorithms in terms of identification performance based on matches of a known image to partial images, distortions of the images, and sub-regions of the images. These services are provided based on our framework for composing a set of services and a fingerprint image collection. The prototype will be useful in collaborations connecting several algorithm development efforts, and in composing an experimentation workflow. We also describe extensions of these services into other domains.

Sung Hee Park, Jonathan P. Leidig, Lin Tzy Li, Edward A. Fox, Nathan J. Short, Kevin E. Hoyle, A. Lynn Abbott, Michael S. Hsiao

DL Applications and Legal Aspects

A Novel Combined Term Suggestion Service for Domain-Specific Digital Libraries

Interactive query expansion can assist users during their query formulation process. We conducted a user study with over 4,000 unique visitors and four different design approaches for a search term suggestion service. As a basis for our evaluation we have implemented services which use three different vocabularies: (1) user search terms, (2) terms from a terminology service and (3) thesaurus terms. Additionally, we have created a new combined service which utilizes thesaurus term and terms from a domain-specific search term recommender. Our results show that the thesaurus-based method clearly is used more often compared to the other single-method implementations. We interpret this as a strong indicator that term suggestion mechanisms should be domain-specific to be close to the user terminology. Our novel combined approach which interconnects a thesaurus service with additional statistical relations outperformed all other implementations. All our observations show that domain-specific vocabulary can support the user in finding alternative concepts and formulating queries.

Daniel Hienert, Philipp Schaer, Johann Schaible, Philipp Mayr

Did They Notice? – A Case-Study on the Community Contribution to Data Quality in DBLP

Defective metadata is a significant problem of digital libraries. So far, automatic error detectors have been in the focus of research interest. However, recent public projects have shown that patrons are willing to invest time to report errors if they are called to contribute. In this case-study, we analyze the community contribution to error detection for DBLP, a public bibliographic collection. Our study is based on e-mails sent to the project between January 2007 and November 2010. We manually and automatically identify error reports and analyze their contribution to corrections of the DBLP collection. We show that users frequently report certain types of defects while others are ignored. The detection of homonym-name inconsistencies in particular strongly depends on user input. We also discuss who sends the reports and which communities are particularly active in this matter.

Florian Reitz, Oliver Hoffmann

A Comparative Study of Academic Digital Copyright in the United States and Europe

The advent of Internet and digital media has added more complications to the already complex copyright laws. This paper will first summarize the history of copyright laws in the United States and Europe. It will then analyze and compare the digital copyright laws as they are applied in higher education in the United States and major countries in Europe.

Robert J. Congleton, Sharon Q. Yang

User Interaction and Information Visualization

INVISQUE: Technology and Methodologies for Interactive Information Visualization and Analytics in Large Library Collections

When a user knows exactly what they are looking for most library systems are adequate for their needs. However, when the user’s information needs are ill-defined - traditional library systems prove inadequate. This is because traditional library systems are not designed to support sense making rather for information retrieval. Visual analytics is the science of analytical reasoning facilitated by interactive visualizations and visual analytics systems can support both sense making and information retrieval. In this paper, we present INVISQUE - an approach and experimental software for interactive visual search and query. INVISQUE uses an index card metaphor to display library content, organized in a way that visually integrates attributes such citations and date published, making it easy to pick out the most recent and most cited paper. It uses design techniques such as focus+context to reveal relationships between documents, while avoiding the “what-was-I-lookingfor?” problem.

B. L. William Wong, Sharmin (Tinni) Choudhury, Chris Rooney, Raymond Chen, Kai Xu

An Evaluation of Thesaurus-Enhanced Visual Interfaces for Multilingual Digital Libraries

In this paper, we describe a comparative user evaluation of two multilingual thesaurus-enhanced visual user interfaces, namely T-Saurus and Searchling, developed for digital libraries. The study used 25 academic users carrying out three search tasks on both user interfaces to the UNESCO digital portal, holding 400,000 documents. It applied usability and affordance strength questionnaires, interviews, thinkalouds, and direct observation to investigate users’ evaluation of the key components of both user interfaces, namely multilingual features and thesaurus and search functions. The empirical data gathered will be useful for designers of search interfaces that use thesaurus and multilingual features. Results of the study show that users were able to successfully carry out the search tasks using thesaurus-enhanced search interfaces. However, they preferred Searchling for its flexible language option, thesaurus browsing and visualization.

Ali Shiri, Stan Ruecker, Lindsay Doll, Matthew Bouchard, Carlos Fiorentino

Multilingual Adaptive Search for Digital Libraries

We describe a framework for Adaptive Multilingual Information Retrieval (AMIR) which allows multilingual resource discovery and delivery using on-the-fly machine translation of documents and queries. Result documents are presented to the user in a contextualised manner. Challenges and affordances of both adaptive and multilingual IR, with a particular focus on digital libraries, are detailed. The framework components are motivated by a series of results from experiments on query logs and documents from The European Library. We conclude that factoring adaptivity and multilinguality aspects into the search process can enhance the user’s experience with online digital libraries.

M. Rami Ghorab, Johannes Leveling, Séamus Lawless, Alexander O’Connor, Dong Zhou, Gareth J. F. Jones, Vincent Wade

Making Sense in the Margins: A Field Study of Annotation

We report on three years of data collected in the field from students in graduate and undergraduate seminars at two universities. The students annotated texts for discussion in classes where hypertext and computer interfaces were core topics. The results of our analysis show how annotation style changes with a combination of experience and study of material related to annotation. Our major conclusions are that there are essentially six purposes for scholarly user-readers to annotate; and support for textual glosses is a necessary part of any successful annotation technology for such use. Our study suggests tools that will be appreciated by e-text users.

James Blustein, David Rowe, Ann-Barbara Graff

One of These Things Is Not Like the Others: How Users Search Different Information Resources

Transaction log analyses are common practice to understand user behavior in both online databases and library catalogues. While there has been significant work done in each of these domains, there is little work comparing user queries between library catalogues and online resources. In this paper we report on an exploratory comparison between searches performed via the same interface in three different search systems: a library catalogue, an online research database, and Google Scholar.

Dana McKay, George Buchanan

Semantics and Interoperability II

Understanding Documentary Practice: Lessons Learnt from the Text Encoding Initiative

How are definitions of content and the design of digital documents being determined in practice

? In this paper the authors present the relationship between document encoder and document as the central unit of analysis in a framework for making sense of documentary practice at community, organisational and implementation levels. The paper presents the integrated findings from a global survey of document encoders participating in the Text Encoding Initiative, providing important insights into the characteristics of an emergent documentary practice. By focusing on documentation as a field of practice the paper reveals a rich and generative practice at play and provides valuable lessons for other complex metadata and markup initiatives.

Paul Scifleet, Susan P. Williams

Linking FRBR Entities to LOD through Semantic Matching

In this paper, we present an approach to automatically link FRBR works identified in metadata to the corresponding entity in Linked Open Data resources. The main contribution is a basis for semantic enrichment and verification of works identified in existing metadata. Through experiments, we demonstrate that FRBR works can be identified in the LOD cloud, which provides a solid ground for further work.

Naimdjon Takhirov, Fabien Duchateau, Trond Aalberg

Interactive Vocabulary Alignment

In many heritage institutes, objects are routinely described using terms from predefined vocabularies. When object collections need to be merged or linked, the question arises how those vocabularies relate. In practice it often unclear for data providers how well alignment tools will perform on their specific vocabularies. This creates a bottleneck to align vocabularies, as data providers want to have tight control over the quality of their data. We will discuss the key limitations of current tools in more detail and propose an alternative approach. We will show how this approach has been used in two alignment use cases, and demonstrate how it is currently supported by our Amalgame alignment platform.

Jacco van Ossenbruggen, Michiel Hildebrand, Victor de Boer

User Studies

The Impact of Distraction in Natural Environments on User Experience Research

Laboratories have long been seen as reasonable proxies for user experience research. Yet, this assumption may have become unreliable. The trend toward multiple activities in the users’ natural environment, where people simultaneously use a digital library, join a chat or read an incoming Facebook post, changes users’ behavior. The effects of these disruptions generate a gap that is generally not taken into account in user-experience research. This paper presents a psychological experiment that measured how differently people behave in a laboratory and in a natural environment setting. The existence and impact of distraction is measured in a standard laboratory setting and in a remote setting that explicitly allows users to work in their own natural environment. The data indicates that there are significant differences between results from the laboratory and natural environment setting. Distractions like email or chat influence the users’ performance and their ratings.

Elke Greifeneder

Search Behavior-Driven Training for Result Re-Ranking

In this paper we present a framework for improving the ranking learning process, taking into account the implicit search behaviors of users. Our approach is query-centric. That is, it examines the search behaviors induced by queries and groups together queries with similar such behaviors, forming

search behavior clusters

. Then, it trains multiple ranking functions, each one corresponding to one of these clusters. The trained models are finally combined to re-rank the results of each new query, taking into account the similarity of the query with each cluster. The main idea is that similar search behaviors can be detected and exploited for result re-ranking by analysing results into feature vectors, and clustering them. The experimental evaluation shows that our method improves the ranking quality of a state of the art ranking model.

Giorgos Giannopoulos, Theodore Dalamagas, Timos Sellis

An Organizational Model for Digital Library Evaluation

Evaluation is a central digital library practice. It provides important data for managing digital libraries and informing strategic decision-making. Digital library evaluation and management are organizational as well as technical practices. What evaluation models can account for these organizational factors, in practice as well as in theory? To address these questions, this paper integrates two models, one from the organizational literature (Porter’s value chain), and one from the evaluation literature (evaluation logic models), into a generic, flexible and extensible evaluation model that supports the goal-oriented evaluation and management of digital libraries in specific sociotechnical contexts. A case study is provided.

Michael Khoo, Craig MacDonald

Developing National Digital Library of Albania for Pre-university Schools: A Case Study

While the concept of digital library (DL) is well perceived and applied in developed countries, it is still a big challenge to the developing nations. There are great disparities, known as digital divide between developed countries and developing countries in terms of electronic resource funding, availability, and accessibility. DL, together with the information retrieval (IR) system, is believed to be an effective way to mend the gap of digital divide. This paper will employ a real case to discuss the significance of developing a national level of digital library for pre-university schools of Albania, the challenges of designing such information system both economically and technologically, and considerations of designing the digital library.

Xiaohua Li, Ardiana Sula

Archives and Repositories

DAR: Institutional Repository Integration in Action

The Digital Assets Repository (DAR) is a system developed at the Bibliotheca Alexandrina to manage the full lifecycle of a digital asset: its creation and ingestion, its metadata management, storage and archival in addition to the necessary mechanisms for publishing and dissemination. In its third release, the system architecture has been revamped into a modular design including components that are best of the breed, in addition to defining a flexible content model for digital objects based on current standards and a focus on integrating DAR with different sources and applications. The goal of this paper is to demonstrate the building blocks of DAR as an example of a modern repository, in addition to discussing the challenges that face an institution in consolidating its assets and DAR’s answer to these challenges.

Youssef Mikhail, Noha Adly, Magdy Nagi

Linking Archives Using Document Enrichment and Term Selection

News, multimedia and cultural heritage archives are increasingly offering opportunities to create connections between their collections. We consider the task of linking archives: connecting an item in one archive to one or more items in other, often complementary archives. We focus on a specific instance of the task: linking items with a rich textual representation in a news archive to items with sparse annotations in a multimedia archive, where items should be linked if they describe the same or a related event. We find that the difference in textual richness of annotations presents a challenge and investigate two approaches: (i) to enrich sparsely annotated items with textually rich content; and (ii) to reduce rich news archive items using term selection. We demonstrate the positive impact of both approaches on linking to same events and linking to related events.

Marc Bron, Bouke Huurnink, Maarten de Rijke

Transformation of a Keyword Indexed Collection into a Semantic Repository: Applicability to the Urban Domain

In the information retrieval context, resource collections are frequently classified using thesauri. However, the limited semantics provided by thesauri restricts the collection search and browsing capabilities. This work focuses on improving these capabilities by transforming a set of resources indexed according to a thesaurus into a semantically tagged collection. The core mechanism for building this collection is based on the conversion of the domain specific thesaurus (indexing the collection of resources) into a domain ontology connected to an upper level ontology. The feasibility of this work has been tested in the urban domain by transforming the resources accessible through the European Urban Knowledge Network into a Linked Data repository.

Javier Lacasta, Javier Nogueras-Iso, Jacques Teller, Gilles Falquet

Europeana

Improving Europeana Search Experience Using Query Logs

Europeana is a long-term project funded by the European Commission with the goal of making Europe’s cultural and scientific heritage accessible to the public. Since 2008, about 1500 institutions have contributed to Europeana, enabling people to explore the digital resources of Europe’s museums, libraries and archives. The huge amount of collected multi-lingual multi-media data is made available today through the Europeana portal, a search engine allowing users to explore such content through textual queries. One of the most important techniques for enhancing users search experience in large information spaces, is the exploitation of the knowledge contained in query logs. In this paper we present a characterization of the Europeana query log, showing statistics on common behavioral patterns of the Europeana users. Our analysis highlights some significative differences between the Europeana query log and the historical data collected by general purpose Web Search Engine logs. In particular, we find out that both query and search session distributions show different behaviors. Finally, we use this information for designing a query recommendation technique having the goal of enhancing the functionality of the Europeana portal.

Diego Ceccarelli, Sergiu Gordea, Claudio Lucchese, Franco Maria Nardini, Gabriele Tolomei

Implementing Enhanced OAI-PMH Requirements for Europeana

Europeana has put in a stretch many known procedures in digital libraries, imposing requirements difficult to be implemented in many small institutions, often without dedicated systems support personnel. Although there are freely available open source software platforms that provide most of the commonly needed functionality such as OAI-PMH support, the migration from legacy software may not be easy, possible or desired. Furthermore, advanced requirements like selective harvesting according to complex criteria are not widely supported. To accommodate these needs and help institutions contribute their content to Europeana, we developed a series of tools. For the majority of small content providers that are running DSpace, we developed a DSpace plug-in, to convert and augment the Dublin Core metadata according to Europeana ESE requirements. For sites with different software, incompatible with OAI-PMH, we developed wrappers enabling repeatable generation and harvesting of ESE-compatible metadata via OAI-PMH. In both cases, the system is able to select and harvest only the desired metadata records, according to a variety of configuration criteria of arbitrary complexity. We applied our tools to providers with sophisticated needs, and present the benefits they achieved.

Nikos Houssos, Kostas Stamatis, Vangelis Banos, Sarantos Kapidakis, Emmanouel Garoufallou, Alexandros Koulouris

Preservation

A Survey on Web Archiving Initiatives

Web archiving has been gaining interest and recognized importance for modern societies around the world. However, for web archivists it is frequently difficult to demonstrate this fact, for instance, to funders. This study provides an updated and global overview of web archiving. The obtained results showed that the number of web archiving initiatives significantly grew after 2003 and they are concentrated on developed countries. We statistically analyzed metrics, such as, the volume of archived data, archive file formats or number of people engaged. Web archives all together must process more data than any web search engine. Considering the complexity and large amounts of data involved in web archiving, the results showed that the assigned resources are scarce. A Wikipedia page was created to complement the presented work and be collaboratively kept up-to-date by the community.

Daniel Gomes, João Miranda, Miguel Costa

Coherence-Oriented Crawling and Navigation Using Patterns for Web Archives

We point out, in this paper, the issue of improving the coherence of web archives under limited resources (

e.g.

bandwidth, storage space, etc.). Coherence measures how much a collection of archived pages versions reflects the real state (or the snapshot) of a set of related web pages at different points in time. An ideal approach to preserve the coherence of archives is to prevent pages content from changing during the crawl of a complete collection. However, this is practically infeasible because web sites are autonomous and dynamic. We propose two solutions:

a priori

and

a posteriori

. As

a priori

solution, our idea is to crawl sites during the

off-peak

hours (

i.e.

the periods of time where very little changes is expected on the pages) based on patterns. A pattern models the behavior of the importance of pages changes during a period of time. As an

a posteriori

solution, based on the same patterns, we introduce a novel navigation approach that enables users to browse the most coherent page versions at a given query time.

Myriam Ben Saad, Zeynep Pehlivan, Stéphane Gançarski

Demo Sessions

The YUMA Media Annotation Framework

Annotations are a fundamental scholarly practice common across disciplines. They enable scholars to organize, share and exchange knowledge, and collaborate in the interpretation of source material. In this paper, we introduce the YUMA Media Annotation Framework, an ongoing open source effort to provide integrated collaborative annotation functionality for digital library portals and online multimedia collections. YUMA supports image, map, audio and video annotation and follows the OAC annotation model in order to provide data interoperability. A unique feature of YUMA is

semantic enrichment

, a mechanism that allows users to effortlessly augment annotations with links to contextually relevant resources on the Linked Data Web.

Rainer Simon, Joachim Jung, Bernhard Haslhofer

The Reading Desk: Supporting Lightweight Note-Taking in Digital Documents

When reading on paper, readers often write notes, fold corners or insert bookmarks without apparent conscious effort. Research into digital reading has discovered that electronic tools are far less intuitive, require significantly more attention, and are much less used. This paper introduces “The Digital Reading Desk” – a document reading interface that enhances existing digital reading interactions by adopting effective elements of paper interaction, and combining those with digital enhancements.

Jennifer Pearson, George Buchanan, Harold Thimbleby

Metadata Visualization in Digital Libraries

Readers in digital libraries (DL) usually do not lack information, on the contrary while browsing a DL they often struggle with too many documents. Searching and displaying search results appropriately becomes important.

This demonstration shows an experimental interface that displays search results in two forms: textual (which the readers are used to) and visual. Displaying search results as networks of similar documents, articles of the same author or articles with the same keywords often reveal new information.

Presented application is a web page with a Java Applet communicating with the rest of the page and integrated in Czech Mathematics DL website.

Zuzana Nevěřilová

Archiv-Editor – Software for Personal Data: Demo-Presentation at the TPDL 2011

The Archiv-Editor is a multilingual desktop program for working with a Person Data Repository. It is developed as part of the DFG-Project Person Data Repository at the Berlin-Brandenburgische Academy of Science and Humanities (BBAW). Researchers in the humanities can enter any data related to a person, from archives, books and other sources, into the Archiv-Editor offline, and store and exchange the data with colleagues via one or more Person Data Repositories. Information about a person is not entered into a formula or table, but into an open text field and then marked with a customizable markup based on the Text Encoding Initiative. As they do not require a specific structure of statements and information, the Person Data Repository and the Archiv-Editor are open to a wide variety of research projects in Humanities and offer the infrastructure to combine and integrate data from divergent fields and research perspectives.

Christoph Plutte

The MEKETREpository - Middle Kingdom Tomb and Artwork Descriptions on the Web

The MEKETREpository (MR) allows scholars to collect and publish artwork descriptions from Egypt’s Middle Kingdom (MK) period on the Web. Collaboratively developed vocabularies can be used for the semantic classification and annotation of uploaded media. This allows all users with system access to contribute their knowledge about the published artworks. All data, including annotations and vocabularies, are published as Linked Data and can be accessed and reused by others. This paper gives an overview of MR’s functionalities and the current state of our work.

Christian Mader, Bernhard Haslhofer, Niko Popitsch

NotreDAM, a Multi-user, Web Based Digital Asset Management Platform

In this work we present an overview of NotreDAM, an open source Digital Asset Management platform targeted to the mid-market segment. NotreDAM provides a web-based multi-user application environment for uploading, annotating, cataloguing, sharing, searching and retrieving digital resources such as videos, audios, images and documents. NotreDAM main advantages are: XMP metadata support, user-defined workspaces and catalogs, scalable processing of resources, a scripting engine extendible through plugins and a REST API for integration with third party applications. The demo will showcase the capabilities of the platform through a typical user session.

Maurizio Agelli, Maria Laura Clemente, Mauro Del Rio, Daniela Ghironi, Orlando Murru, Fabrizio Solinas

A Text Technology Infrastructure for Annotating Corpora in the eHumanities

We present in this demonstration paper the actual text technology infrastructure we have been establishing for annotating with linguistic and domain-specific information – the personalized death – a corpus of baroque texts (in German) belonging to the genre "Danse Macabre". While the developed and assembled tools are already covering the automatic treatment of various lexical aspects of such texts, and are also supporting the manual annotation of the corpus with concepts related to the personalized death, we are currently extending our work with the integration of methods and tools for automating the annotation procedure. The goal of our project is to offer the philologist, historian or the interested public an improved access to this kind of corpora, allowing for example for topic based queries and navigation.

Thierry Declerck, Ulrike Czeitschner, Karlheinz Moerth, Claudia Resch, Gerhard Budin

An Application to Support Reclassification of Large Libraries

In this paper, we describe a software application that was developed and is successfully applied at the Mannheim University Library to manually reclassify about 1 million books in a very efficient manner by supporting various different working strategys and by using information from several sources.

Kai Eckert, Magnus Pfeffer

The Papyrus Digital Library: Discovering History in the News

Digital archives comprise a valuable asset for effective information retrieval. In many cases, however, the special vocabulary of the archive restricts its access only to experts in the domain of the material it contains and, as a result, researchers of other disciplines or the general public cannot take full advantage of the wealth of information it offers. To this end, the Papyrus research project has worked towards a solution which makes cross-discipline search possible in digital libraries. The developed prototype showcases this approach demonstrating how we can discover history in news archives. In this demo we focus on demonstrating two of the end user tools available in the prototype, the cross-discipline search and the Papyrus browser.

A. Katifori, C. Nikolaou, M. Platakis, Y. Ioannidis, A. Tympas, M. Koubarakis, N. Sarris, V. Tountopoulos, E. Tzoannos, S. Bykau, N. Kiyavitskaya, C. Tsinaraki, Y. Velegrakis

Poster Session

Digitization Practice in Latvia: Achievements and Trends of Development

The 1980s are characterized by rapid development of digitization process and research of digital libraries. In 1994 Latvia was also involved in this process with the first attempt to digitize the materials of high demand and in poor physical condition at the Latvian Academic Library. In 1998 the digitization process was launched at the National Library of Latvia. The study, the first results of which are presented in this publication, is made to analyze the history of digitization in Latvia, and to evaluate the achievements of these activities. Up to now the development of digitization process has been poorly documented, therefore the empirical sources are unpublished documents (project reports, working papers, etc.), as well as interviews with the staff of the first projects.

Liga Krumina, Baiba Holma

Digitizing All Dutch Books, Newspapers and Magazines - 730 Million Pages in 20 Years - Storing It, and Getting It Out There

In the next 20 years, the Dutch national library will digitize all printed publications since 1470, some 730M pages. To realize the first milestone of this ambition, KB made deals with Google and Proquest to digitize 42M pages. To allow improved storage of this mass digitization output, the KB is now replacing its operational

e-Depot -

a system for permanent digital object storage - with a new solution. To meet user demand for centralized access, KB is at the same time replacing its scattered full-text online portfolio by a

National Platform for Digital Publications,

both a content delivery platform for its mass digitization output and a national aggregator for publications. From 2011 onwards, this collaborative, open and scalable platform will be expanded with more partners, content and functionalities. The KB is also involved in setting up a Dutch cross-domain aggregator, enabling content exposure in Europeana.

Olaf D. Janssen

Design, Implementation and Evaluation of a User Generated Content Service for Europeana

The paper presents an overview of the user generated content service that the ASSETS Best Practice Network is designing, implementing and evaluating with the user for Europeana, the European digital library. The service will allow Europeana users to contribute to the contents of the digital library in several different ways, such as uploading simple media objects along with their descriptions, annotating existing objects, or enriching existing descriptions. The user and the system requirements are outlined first, and used to derive the basic principles underlying the service. A conceptual model of the entities required for the realization of the service and a general sketch of the system architecture are also given, and used to illustrate the basic workflow of some important operations. The planning of the user evaluation is finally presented, aimed at validating the service before making it available to the final users

Nicola Aloia, Cesare Concordia, Anne Marie van Gerwen, Preben Hansen, Micke Kuwahara, Anh Tuan Ly, Carlo Meghini, Nicolas Spyratos, Tsuyoshi Sugibuchi, Yuzuru Tanaka, Jitao Yang, Nicola Zeni

Connecting Repositories in the Open Access Domain Using Text Mining and Semantic Data

This paper presents CORE (COnnecting REpositories), a system that aims to facilitate the access and navigation across scientific papers stored in Open Access repositories. This is being achieved by harvesting metadata and full-text content from Open Access repositories, by applying text mining techniques to discover semanticly related articles and by representing and exposing these relations as Linked Data. The information about associations between articles expressed in an interoperable format will enable the emergence of a wide range of applications. The potential of CORE can be demonstrated on two use-cases: (1) Improving the the navigation capabilities of digital libraries by the means of a CORE pluging, (2) Providing access to digital content from smart phones and tablet devices by the means of the CORE Mobile application.

Petr Knoth, Vojtech Robotka, Zdenek Zdrahal

CloudBooks: An Infrastructure for Reading on Multiple Devices

The use of light, portable devices such as iPads whose reading angle is readily changed is radically different to reading on a desktop or laptop. However, it would be naive to view this as mere evolution. Rather, such devices permit reading activity to more closely mirror paper. A light, keyboardless device can be used in many different locations and orientations. This paper reports an infrastructure for supporting reading on multiple slate devices using a single cloud-based system to provide for numerous configurations.

Jennifer Pearson, George Buchanan

Interconnecting DSpace and LOCKSS

Repository managers increasingly use toolkits such as DSpace to manage submission of and access to resources. However, DSpace does not support the highly desirable distributed replication functionality provided by LOCKSS. This paper describes an experiment to seamlessly interconnect DSpace and LOCKSS in a generalisable manner. An experimental prototype confirms that this is indeed possible, and that the interoperation can be efficient within the constraints of the systems.

Mushashu Lumpa, Ngoni Munyaradzi, Hussein Suleman

Encoding Diachrony: Digital Editions of Serbian 18th-Century Texts

Texts in the “Digital Library of Serbian Cultural Heritage of the 18th Century” are encoded as a word-aligned corpus of TEI XML documents in two versions: one using traditional 18th-century orthography, including the graphemes which have since disappeared from Serbian, and one using modernized and standardized Serbian spelling rules that increase the legibility and searchability of these texts for modern users. The corpus also contains linguistic and semantic annotations that add modern phonetic, morphological, lexical and conceptual equivalents to the largely archaic vocabulary. By applying basic techniques of cross-lingual information retrieval to a historical dimension of one language, and making provisions for multiple indexing and annotations, our project exposes a notoriously difficult chapter in the development of the Serbian language to a wider audience, without sacrificing the edition’s scholarly potential.

Toma Tasovac, Natalia Ermolaev

Panel Session

Cross-Border Extended Collective Licensing: A Solution to Online Dissemination of Europe’s Cultural Heritage?

An issue which recently has gained increased attention from legislators is how to stimulate the digitization and online availability of the collections held by libraries, museums and other cultural institutions - sometimes referred to as our “common heritage” - and at the same time give full respect to established copyright norms. At European level, this attention is evident in the Digital Libraries Initiative, the Communication from the European Commission on Copyright in the Knowledge Economy, the Commission’s Digital Agenda for Europe and its recent Communication on a Single Market for Intellectual Property Rights. Inherent in these policy documents is the recognition that the new information technologies have created vast opportunities to make the common heritage of Europe more accessible for users online. It is also a shared belief that such access - if coherent with basic copyright principles - will be for the mutual benefit of users, right holders and the society at large. In line with this the Commission has supported the creation and development of a common access point for Europe’s cultural heritage,

Europeana

Johan Axhamn

Doctoral Consortium

An Investigation of ebook Lending in UK Public Libraries

This research aims to investigate ebook lending, management, and procurement in UK public libraries. A mixed method approach will be utilised to gain an understanding of how ebook lending is currently being achieved and to determine its affect on traditional library services. This research also proposes to run ebook reader lending trials from selected public libraries.

Christopher Gibson

Leveraging EAD in a Semantic Web Environment to Enhance the Discovery Experience for the User in Digital Archives

The proposed study investigates the information needs and information-seeking behavior of archival users. For this purpose the ARGUS information system of the German Bundesarchiv and related reference questions are analyzed in a case study in order to model patterns of questions and search behavior in an ontology. This knowledge graph represents the knowledge archival users expect from archival finding aids. It is being compared with the knowledge graph of archival finding aids encoded with the

Encoded Archival Description

(EAD) standard in order to identify semantic gaps. The aim is to find out if information modeled in EAD matches the archival user’s expectations and to formulate a model and methodology which can be applied and validate in similar cases of digital archives in order to improve and facilitate access to archival information systems.

Steffen Hennicke

Content-Based Image Retrieval in Digital Libraries of Art Images Utilizing Colour Semantics

The paper presents the architecture of experimental Content-Based Image Retrieval (CBIR) system APICAS ("Art Painting Image Colour Aesthetics and Semantics"). This system has been developed within a doctoral thesis which aims to provide a suite of specialized tools for CBIR within a digital library of art images. The high-level architecture suggested in this work takes OAIS as a basis and adds a designated layer to it allowing CBIR functions to be used both within ingest and access to the digital library.

Krassimira Ivanova

New Paradigm of Library Collaboration

The thesis entitled ”New Paradigm of Library Collaboration” presents the case for the holistic approach to the issue of collaboration in a contemporary library. Patron needs and expectations in regards to collaboration, interactivity and ultimately participation are investigated in the specific area of changes in reading process. Collaboration between librarians and patrons and among librarians is discussed in regards to Library 2.0 and Enterprise 2.0 concepts. Based on the research results gathered in European libraries a new paradigm of library collaboration is presented as a must for an efficient library providing up-to-date services.

Adam Sofronijevic

Visual Aesthetics of Websites: The Visceral Level of Perception and Its Influence on User Behaviour

Website aesthetics has become an important research object in the domain of human-computer interaction during the last decade. Influences on acceptance and preference have been shown [1, 2]. The consideration of this quality aspect is also relevant for digital libraries as a possibility to appeal to the users on an emotional level. It is the aim of an empirical study to test the impact of the affectively effective aesthetics of websites on approach and avoidance behaviour. Thus the significance of this visceral level of perception is verified. In consequence of this fundamental research the applicability of affective reactions for the evaluation of website aesthetics could be further investigated.

Rita Strebe

Revealing Digital Documents

The research project aims at revealing common patterns that are used in data, independent from the particular technology in which the data is available. A better understanding of data patterns will not only help to better capture singular characteristics of data by metadata, but will also recover intended structures of digital objects.

Jakob Voß

Designing Highly Engaging eBook Experiences for Kids

The HEBE (Highly Engaging eBook Experiences) project aims to explore how children can be involved into the design and evaluation of novel eBook interfaces in order to make the reading experience more engaging to younger audience.

Luca Colombo

Backmatter

Titel: Research and Advanced Technology for Digital Libraries
herausgegeben von: Stefan Gradmann
Francesca Borri
Carlo Meghini
Heiko Schuldt
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-24469-8
Print ISBN: 978-3-642-24468-1
DOI: https://doi.org/10.1007/978-3-642-24469-8