Skip to main content

2006 | Buch

Research and Advanced Technology for Digital Libraries

10th European Conference, ECDL 2006, Alicante, Spain, September 17-22, 2006. Proceedings

herausgegeben von: Julio Gonzalo, Costantino Thanos, M. Felisa Verdejo, Rafael C. Carrasco

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Inhaltsverzeichnis

Frontmatter

Architectures I

OpenDLibG: Extending OpenDLib by Exploiting a gLite Grid Infrastructure

This paper illustrates how an existing digital library system, OpenDLib, has been extended in order to make it able to exploit the storage and processing capability offered by a gLite Grid infrastructure. Thanks to this extension OpenDLib is now able to handle a much wider class of documents than in its original version and, consequently, it can serve a larger class of application domains. In particular, OpenDLib can manage documents that require huge storage capabilities, like particular types of images, videos, and 3D objects, and also create them on-demand as the result of a computational intensive elaboration on a dynamic set of data, although performed with a cheap investment in terms of computing resource.

Leonardo Candela, Donatella Castelli, Pasquale Pagano, Manuele Simi
A Peer-to-Peer Architecture for Information Retrieval Across Digital Library Collections

Peer-to-peer networks have been identified as promising architectural concept for developing search scenarios across digital library collections. Digital libraries typically offer sophisticated search over their local content, however, search methods involving a network of such stand-alone components are currently quite limited. We present an architecture for highly-efficient search over digital library collections based on structured P2P networks. As the standard single-term indexing strategy faces significant scalability limitations in distributed environments, we propose a novel indexing strategy–

key-based indexing

. The keys are term sets that appear in a restricted number of collection documents. Thus, they are discriminative with respect to the global document collection, and ensure scalable search costs. Moreover, key-based indexing computes posting list joins during indexing time, which significantly improves query performance. As search efficient solutions usually imply costly indexing procedures, we present experimental results that show acceptable indexing costs while the retrieval performance is comparable to the standard centralized solutions with TF-IDF ranking.

Ivana Podnar, Toan Luu, Martin Rajman, Fabius Klemm, Karl Aberer
Scalable Semantic Overlay Generation for P2P-Based Digital Libraries

The advent of digital libraries along with the tremendous growth of digital content call for distributed and scalable approaches for managing vast data collections. Peer-to-peer (P2P) networks emerge as a promising solution to delve with these challenges. However, the lack of global content/topology knowledge in an unstructured P2P system demands unsupervised methods for content organization and necessitates efficient and high quality search mechanisms. Towards this end, Semantic Overlay Networks (SONs) have been proposed in the literature, and in this paper, an unsupervised method for decentralized and distributed generation of SONs, called DESENT, is proposed. We prove the feasibility of our approach through analytical cost models and we show through simulations that, when compared to flooding, our approach improves recall by more than 3-10 times, depending on the network topology.

Christos Doulkeridis, Kjetil Nørvåg, Michalis Vazirgiannis

Preservation

Reevaluating Access and Preservation Through Secondary Repositories: Needs, Promises, and Challenges

Digital access and preservation questions for cultural heritage institutions have focused primarily on primary repositories — that is, around collections of discrete digital objects and associated metadata. Much of the promise of the information age, however, lies in the ability to reuse, repurpose, combine and build complex digital objects[1-3]. Repositories need both to preserve and make accessible primary digital objects, and facilitate their use in a myriad of ways. Following the lead of other annotation projects, we argue for the development of secondary repositories where users can compose structured collections of complex digital objects. These complex digital objects point back to the primary digital objects from which they are produced (usually with URIs) and augment these pointers with user-generated annotations and metadata. This paper examines how this layered approach to user generated metadata can enable research communities to move forward into more complex questions surrounding digital archiving and preservation, addressing not only the fundamental challenges of preserving individual digital objects long term, but also the access and usability challenges faced by key stakeholders in primary digital repository collections—scholars, educators, and students. Specifically, this project will examine the role that secondary repositories can play in the preservation and access of digital historical and cultural heritage materials with particular emphasis on streaming media.

Dean Rehberger, Michael Fegan, Mark Kornbluh
Repository Replication Using NNTP and SMTP

We present the results of a feasibility study using

shared, existing

, network-accessible infrastructure for repository replication. We utilize the SMTP and NNTP protocols to replicate both the metadata and the content of a digital library, using OAI-PMH to facilitate management of the archival process. We investigate how dissemination of repository contents can be piggybacked on top of existing email and Usenet traffic. Long-term persistence of the replicated repository may be achieved thanks to current policies and procedures which ensure that email messages and news posts are retrievable for evidentiary and other legal purposes for many years after the creation date. While the preservation issues of migration and emulation are not addressed with this approach, it does provide a simple method of refreshing content with unknown partners for smaller digital repositories that do not have the administrative resources for more sophisticated solutions.

Joan A. Smith, Martin Klein, Michael L. Nelson
Genre Classification in Automated Ingest and Appraisal Metadata

Metadata creation is a crucial aspect of the ingest of digital materials into digital libraries. Metadata needed to document and manage digital materials are extensive and manual creation of them expensive. The Digital Curation Centre (DCC) has undertaken research to automate this process for some classes of digital material. We have segmented the problem and this paper discusses results in genre classification as a first step toward automating metadata extraction from documents. Here we propose a classification method built on looking at the documents from five directions; as an object exhibiting a specific visual format, as a linear layout of strings with characteristic grammar, as an object with stylo-metric signatures, as an object with intended meaning and purpose, and as an object linked to previously classified objects and other external sources. The results of some experiments in relation to the first two directions are described here; they are meant to be indicative of the promise underlying this multi-facetted approach.

Yunhyong Kim, Seamus Ross

Retrieval

The Use of Summaries in XML Retrieval

The availability of the logical structure of documents in content-oriented XML retrieval can be beneficial for users of XML retrieval systems. However, research into structured document retrieval has so far not systematically examined how structure can be used to facilitate the search process of users. We investigate how users of an XML retrieval system can be supported in their search process, if at all, through summarisation. To answer this question, an interactive information retrieval system was developed and a study using human searchers was conducted. The results show that searchers actively utilise the provided summaries, and that summary usage varied at different levels of the XML document structure. The results have implications for the design of interactive XML retrieval systems.

Zoltán Szlávik, Anastasios Tombros, Mounia Lalmas
An Enhanced Search Interface for Information Discovery from Digital Libraries

Libraries, museums, and other organizations make their electronic contents available to a growing number of users on the Web. A large fraction of the information published is stored in structured or semi-structured form. However, most users have no specific knowledge of schemas or structured query languages for accessing information stored in (relational or XML) databases. Under these circumstances, the need for facilitating access to information stored in databases becomes increasingly more important. Précis queries are free-form queries that instead of simply locating and connecting values in tables, they also consider information around these values that may be related to them. Therefore, the answer to a précis query might also contain information found in other parts of the database. In this paper, we describe a précis query answering prototype system that generates personalized presentation of short factual information précis in response to keyword queries.

Georgia Koutrika, Alkis Simitsis
The TIP/Greenstone Bridge: A Service for Mobile Location-Based Access to Digital Libraries

This paper introduces the first combination of a mobile tourist guide with a digital library. Location-based search allows for access to a rich set of materials with cross references between different digital library collections and the tourist information system. The paper introduces the system’s design and implementation; it also gives details about the user interface and interactions, and derives a general set of requirements through a discussion of related work.

Annika Hinze, Xin Gao, David Bainbridge

Architectures II

Towards Next Generation CiteSeer: A Flexible Architecture for Digital Library Deployment

CiteSeer began as the first search engine for scientific literature to incorporate Autonomous Citation Indexing, and has since grown to be a well-used, open archive for computer and information science publications, currently indexing over 730,000 academic documents. However, CiteSeer currently faces significant challenges that must be overcome in order to improve the quality of the service and guarantee that CiteSeer will continue to be a valuable, up-to-date resource well into the foreseeable future. This paper describes a new architectural framework for CiteSeer system deployment, named CiteSeer Plus. The new framework supports distributed indexing and storage for load balancing and fault-tolerance as well as modular service deployment to increase system flexibility and reduce maintenance costs. In order to facilitate novel approaches to information extraction, a blackboard framework is built into the architecture.

I. G. Councill, C. L. Giles, E. Di Iorio, M. Gori, M. Maggini, A. Pucci
Digital Object Prototypes: An Effective Realization of Digital Object Types

Digital Object Prototypes (DOPs) provide the DL designer with the ability to model diverse types of digital objects in a uniform manner while offering digital object type conformance; objects conform to the designer’s type definitions automatically. In this paper, we outline how DOPs effectively capture and express digital object typing information and finally assist in the development of unified web-based DL services such as adaptive cataloguing, batch digital object ingestion and automatic digital content conversions. In contrast, conventional DL services require custom implementations for each different type of material.

Kostas Saidis, George Pyrounakis, Mara Nikolaidou, Alex Delis
Design, Implementation, and Evaluation of a Wizard Tool for Setting Up Component-Based Digital Libraries

Although component-based architectures favor the building and extension of digital libraries, the configuration of such systems is not a trivial task. Our approach to simplify the tasks of constructing and customizing component-based digital libraries is based on an assistant tool: a setup wizard that segments those tasks into well-defined steps and drives the user along these steps. For generality purposes, the architecture of the wizard is based on the 5S framework and different wizard versions can be specialized according to the pool of components being configured. This paper describes the design and implementation of this wizard, as well as usability experiments designed to evaluate it.

Rodrygo L. T. Santos, Pablo A. Roberto, Marcos André Gonçalves, Alberto H. F. Laender

Applications

Design of a Digital Library for Early 20th Century Medico-legal Documents

The research value of important government documents to historians of medicine and law is enhanced by a digital library of such a collection being designed at the U.S. National Library of Medicine. This paper presents work toward the design of a system for preservation and access of this material, focusing mainly on the automated extraction of descriptive metadata needed for future access. Since manual entry of these metadata for thousands of documents is unaffordable, automation is required. Successful metadata extraction relies on accurate classification of key textlines in the document. Methods are described for the optimal scanning alternatives leading to high OCR conversion performance, and a combination of a Support Vector Machine (SVM) and Hidden Markov Model (HMM) for the classification of textlines and metadata extraction. Experimental results from our initial research toward an optimal textline classifier and metadata extractor are given.

George R. Thoma, Song Mao, Dharitri Misra, John Rees
Expanding a Humanities Digital Library: Musical References in Cervantes’ Works

Digital libraries focused on developing humanities resources for both scholarly and popular audiences face the challenge of bringing together digital resources built by scholars from different disciplines and subsequently integrating and presenting them. This challenge becomes more acute as libraries grow, both in terms of size and organizational complexity, making the traditional humanities practice of intensive, manual annotation and markup infeasible. In this paper we describe an approach we have taken in adding a music collection to the

Cervantes Project

. We use metadata and the organization of the various documents in the collection to facilitate automatic integration of new documents—establishing connection from existing resources to new documents as well as from the new documents to existing material.

Manas Singh, Richard Furuta, Eduardo Urbina, Neal Audenaert, Jie Deng, Carlos Monroy
Building Digital Libraries for Scientific Data: An Exploratory Study of Data Practices in Habitat Ecology

As data become scientific capital, digital libraries of data become more valuable. To build good tools and services, it is necessary to understand scientists’ data practices. We report on an exploratory study of habitat ecologists and other participants in the Center for Embedded Networked Sensing. These scientists are more willing to share data already published than data that they plan to publish, and are more willing to share data from instruments than hand-collected data. Policy issues include responsibility to provide clean and reliable data, concerns for liability and misappropriation of data, ways to handle sensitive data about human subjects arising from technical studies, control of data, and rights of authorship. We address the implications of these findings for tools and architecture in support of digital data libraries.

Christine Borgman, Jillian C. Wallis, Noel Enyedy

Methodology

Designing Digital Library Resources for Users in Sparse, Unbounded Social Networks

Most digital library projects reported in the literature build resources for dense, bounded user groups, such as students or research groups in tertiary education. Having such highly interrelated and well defined user groups allows for digital library developers to use existing design methods to gather and implement requirements from those groups. This paper, however, looks at situations where digital library resources are aimed at much more sparse, ill defined networks of users. We report on a project which explicitly set out to ‘broaden access’ to tertiary education library resources to users not in higher education. In particular we discuss the problem of gathering

á priori

user requirements when by definition, we did not know who the users would be, we look at how disintermediation plays an even stronger negative role for sparse groups, and how we designed a system to replicate an intermediation role.

Richard Butterworth
Design and Selection Criteria for a National Web Archive

Web archives and Digital Libraries are conceptually similar, as they both store and provide access to digital contents. The process of loading documents into a Digital Library usually requires a strong intervention from human experts. However, large collections of documents gathered from the web must be loaded without human intervention. This paper analyzes strategies to select contents for a national web archive and proposes a system architecture to support it.

Daniel Gomes, Sérgio Freitas, Mário J. Silva
What Is a Successful Digital Library?

We synthesize diverse research in the area of digital library (DL) quality models, information systems (IS) success and adoption models, and information-seeking behavior models, to present a more integrated view of the concept of DL success. Such a multi-theoretical perspective, considering user community participation throughout the DL development cycle, supports understanding of the social aspects of DLs and the changing needs of users interacting with DLs. It also helps in determining when and how quality issues can be measured and how potential problems with quality can be prevented.

Rao Shen, Naga Srinivas Vemuri, Weiguo Fan, Edward A. Fox

Metadata

Evaluation of Metadata Standards in the Context of Digital Audio-Visual Libraries

Digital file-based libraries for the audio-visual material of television broadcasters and production houses are becoming desirable. These libraries not only address the problem of loss of content due to tape deterioration, but also improve disclosure of the content. However, switching to a digital file-based library involves many new concerns and problems for content providers. This paper will discuss one of them, namely the metadata. Metadata is additional information that is required in order to be able to search, retrieve, and play out the stored content. Different standards for metadata are currently available, each having its own field of application and characteristics. In this paper, we introduce an objective framework that one can use in order to select the appropriate metadata standard for its particular type of application. This framework is applied to four well-known metadata standards, namely Dublin Core, MPEG-7, P/Meta, and SMEF.

Robbie De Sutter, Stijn Notebaert, Rik Van de Walle
On the Problem of Identifying the Quality of Geographic Metadata

Geographic metadata quality is one of the most important aspects on the performance of Geographic Digital Libraries. After reviewing previous attempts outside the geographic domain, this paper presents early results from a series of experiments for the development of a quantitative method for quality assessment. The methodology is developed through two phases. Firstly, a list of geographic quality criteria is compiled from several experts of the area. Secondly, a statistical analysis (by developing a Principal Component Analysis) of a selection of geographic metadata record sets is performed in order to discover the features which correlate with good geographic metadata.

Rafael Tolosana-Calasanz, José A. Álvarez-Robles, Javier Lacasta, Javier Nogueras-Iso, Pedro R. Muro-Medrano, F. Javier Zarazaga-Soria
Quality Control of Metadata: A Case with UNIMARC

UNIMARC is a family of bibliographic metadata schemas with formats for descriptive information, classification, authorities and holdings. This paper describes the automation of quality control processes required in order to monitor and enforce quality of UNIMARC records. The results are accomplished by format schemas expressed in XML. This paper also describes the tools that take advantage of this technology to support the quality control processes, as also its actual applications in services at the National Library of Portugal.

Hugo Manguinhas, José Borbinha

Evaluation

Large-Scale Impact of Digital Library Services: Findings from a Major Evaluation of SCRAN

This paper reports on an evaluation carried out on behalf of the Scottish Library and Information Council (SLIC) of a Scottish Executive initiative to fund a year’s use of a major commercial digital library service called SCRAN throughout public libraries in Scotland. The methodology used for investigating value for money aspects, content and nature of the service, users and usage patterns, the effects of intermediaries (staff in public libraries), the training of those intermediaries and project rollout is given. Conclusions are presented about SCRAN usage and user and public library staff reactions.

Gobinda Chowdhury, David McMenemy, Alan Poulter
A Logging Scheme for Comparative Digital Library Evaluation

Evaluation of digital libraries assesses their effectiveness, quality and overall impact. To facilitate the comparison of different evaluations and to support the re-use of evaluation data, we are proposing a new logging schema. This schema will allow for logging and sharing of a wide array of data about users, systems and their interactions. We discuss the multi-level logging framework presented in [19] and describe how the community can add to and gain from using the framework. The main focus of this paper is the logging of events within digital libraries on a

generalised, conceptual level

, as well as the services based on it. These services will allow diverse digital libraries to store their log data in a common repository using a common format. In addition they provide means for analysis and comparison of search history data.

Claus-Peter Klas, Hanne Albrechtsen, Norbert Fuhr, Preben Hansen, Sarantos Kapidakis, Laszlo Kovacs, Sascha Kriewel, Andras Micsik, Christos Papatheodorou, Giannis Tsakonas, Elin Jacob
Evaluation of Relevance and Knowledge Augmentation in Discussion Search

Annotation-based discussions are an important concept for today’s digital libraries and those of the future, containing additional information to and about the content managed in the digital library. To gain access to this valuable information, discussion search is concerned with retrieving relevant annotations and comments w.r.t. a given query, making it an important means to satisfy users’ information needs. Discussion search methods can make use of a variety of context information given by the structure of discussion threads. In this paper, we present and evaluate discussion search approaches which exploit quotations in different roles as highlight and context quotations, applying two different strategies, knowledge and relevance augmentation. Evaluation shows the suitability of these augmentation strategies for the task at hand; especially knowledge augmentation using both highlight and context quotations boosts retrieval effectiveness w.r.t. the given baseline.

Ingo Frommholz, Norbert Fuhr

User Studies

Designing a User Interface for Interactive Retrieval of Structured Documents — Lessons Learned from the INEX Interactive Track

The interactive track of the Initiative for the Evaluation of XML retrieval (INEX) aims at collecting empirical data about user interaction behaviour and to build methods and algorithms for supporting interactive retrieval in digital library systems containing structured documents. In this paper we discuss and compare the usability aspects of the web-based user interface used in 2004 with the application based user interface implemented with the

Daffodil

framework in 2005. The results include a validation of the element retrieval approach, successful implementation of the berrypicking model, and that additional clues for facilitating interactive retrieval (e.g. table of contents, indication of entry points, related terms, etc.) are appreciated by users.

Saadia Malik, Claus-Peter Klas, Norbert Fuhr, Birger Larsen, Anastasios Tombros
“I Keep Collecting”: College Students Build and Utilize Collections in Spite of Breakdowns

As people become more and more involved with digital information, they grow proportionally involved in situated practices of collecting. They put together large sets of information elements. However, their attention to those information elements is limited. They use whatever means are at hand in order to form representations of their collections. They need to keep track of the elements in these collections, so they can use them later. We conducted a study with 20 college students. A major concern for the students during collection building was collection management and utilization, particularly as the size and number of their collections grows. They experienced breakdowns in these processes, yet continued to engage in collecting. They developed strategies such as informal metadata schemas and hierarchical organization to try to cope with their problems. We consider the practices observed, and their implications for the development of tools to support digital collection building and utilization. Collection representations that support cognition, collaboration, and semantic schemas are prescribed.

Eunyee Koh, Andruid Kerne
An Exploratory Factor Analytic Approach to Understand Design Features for Academic Learning Environments

Subjective relevance (SR) is defined as usefulness of documents for tasks. This paper enhances objective relevance and tackles its limitations by conducting a quantitative study to understand students’ perceptions of features for supporting evaluations of subjective relevance of documents. Data was analyzed by factor analysis to identify groups of features that supported students’ document evaluations during IR interaction stages and provide design guidelines for an IR interface supporting students’ document evaluations. Findings suggested an implied order of importance amongst groups of features for each interaction stage. The paper concludes by discussing groups of features, its implied order of importance, and support for information seeking activities to provide design implications for IR interfaces supporting SR.

Shu-Shing Lee, Yin-Leng Theng, Dion Hoe-Lian Goh, Schubert Shou-Boon Foo

Modeling

Representing Contextualized Information in the NSDL

The NSDL (National Science Digital Library) is funded by the National Science Foundation to advance science and math education. The initial product was a metadata-based digital library providing search and access to distributed resources. Our recent work recognizes the importance of context – relations, metadata, annotations – for the pedagogical value of a digital library. This new architecture uses Fedora, a tool for representing complex content, data, metadata, web-based services, and semantic relationships, as the basis of an information network overlay (INO). The INO provides an extensible knowledge base for an expanding suite of digital library services.

Carl Lagoze, Dean Krafft, Tim Cornwell, Dean Eckstrom, Susan Jesuroga, Chris Wilper
Towards a Digital Library for Language Learning

Digital libraries have untapped potential for supporting language teaching and learning. Although the Internet at large is widely used for language education, it has critical disadvantages that can be overcome in a more controlled environment. This article describes a language learning digital library, and a new metadata set that characterizes linguistic features commonly taught in class as well as textual attributes used for selection of suitable exercise material. On the system is built a set of eight learning activities that together offer a classroom and self-study environment with a rich variety of interactive exercises, which are automatically generated from digital library content. The system has been evaluated by usability experts, language teachers, and students.

Shaoqun Wu, Ian H. Witten
Beyond Digital Incunabula: Modeling the Next Generation of Digital Libraries

This paper describes several incunabular assumptions that impose upon early digital libraries the limitations drawn from print, and argues for a design strategy aimed at providing customization and personalization services that go beyond the limiting models of print distribution, based on services and experiments developed for the Greco-Roman collections in the Perseus Digital Library. Three features fundamentally characterize a successful digital library design: finer granularity of collection objects, automated processes, and decentralized community contributions.

Gregory Crane, David Bamman, Lisa Cerrato, Alison Jones, David Mimno, Adrian Packel, David Sculley, Gabriel Weaver

Audiovisual Content

Managing and Querying Video by Semantics in Digital Library

Management of video data is an indispensable part of digital library. However, currently most digital library systems only provide the functionality of retrieving video data by meta-data which can not fulfill users’ requirements. This is due to the lack of appropriate video semantic model and powerful query interface. In this paper, we propose such a model named SemTTE together with its query language VSQL. The model incorporates features of temporal structure and typed events of video contents and organizes the whole video into a tree of events. It is implemented based on XML technology with schema and instance mapped to DTD and XML documents, and queries transformed to XQuery for evaluation.

Yu Wang, Chunxiao Xing, Lizhu Zhou
Using MILOS to Build a Multimedia Digital Library Application: The PhotoBook Experience

The digital library field is recently broadening its scope of applicability and it is also continuously adapting to the frequent changes occurring in the internet society. Accordingly, digital libraries are slightly moving from a controlled environment accessible only to professionals and domain-experts, to environments accessible to casual users that want to exploit the potentialities offered by the digital library technology. These new trends require, for instance, new search paradigms to be offered, new media content to be managed, and new description extraction techniques to be used.

Building digital library applications, and effectively adapting them to new emerging trends, requires to develop a platform that offers standard and powerful building blocks to support application developers. In this paper we discuss our experience of using MILOS, a multimedia content management system oriented to the construction of digital libraries, to build a demanding application dedicated to non-professional users. Specifically, we discuss the design and implementation of an on-line photo album (PhotoBook), which is a digital library application that allows people to manage their own photos, to share them with friends, and to make them publicly available and searchable.

PhotoBook, uses a complex internal metadata schema (MPEG-7) and allows users to simply express complex queries (combining similarity search and fielded search), enabling them to retrieve material of interest even if metadata are imprecise or missing.

Giuseppe Amato, Paolo Bolettieri, Franca Debole, Fabrizio Falchi, Fausto Rabitti, Pasquale Savino
An Exploration of Space-Time Constraints on Contextual Information in Image-Based Testing Interfaces

Digital image collection interface layouts vary in the nature and degree of contextual information they provide to their users, thus enabling or impeding specific tasks. We are exploring image presentation techniques to support image-centric cognitive tasks in the context of cardiovascular systems research and education. To investigate the effect of image layout on user performance, we conducted an experimental evaluation of three image layouts for three representative tasks in this domain. The layouts varied the spatial and temporal presentation of images, thus providing different contextual information to the test subjects. Our results indicate that the degree of contextual information provided by the image layouts affected user performance, as did their research expertise. These results will inform the design of user interfaces for performing image-focused cognitive tasks as well as the development of interfaces for training novice researchers.

Unmil Karadkar, Marlo Nordt, Richard Furuta, Cody Lee, Christopher Quick

Language Technologies

Incorporating Cross-Document Relationships Between Sentences for Single Document Summarizations

Graph-based ranking algorithms have recently been proposed for single document summarizations and such algorithms evaluate the importance of a sentence by making use of the relationships between sentences in the document in a recursive way. In this paper, we investigate using other related or relevant documents to improve summarization of one single document based on the graph-based ranking algorithm. In addition to the within-document relationships between sentences in the specified document, the cross-document relationships between sentences in different documents are also taken into account in the proposed approach. We evaluate the performance of the proposed approach on DUC 2002 data with the ROUGE metric and results demonstrate that the cross-document relationships between sentences in different but related documents can significantly improve the performance of single document summarization.

Xiaojun Wan, Jianwu Yang, Jianguo Xiao
Effective Content Tracking for Digital Rights Management in Digital Libraries

A usual way for content protection of digital libraries is to use digital watermarks and a DRM-based access-control environment. These methods, however, have limitations. Digital watermarks embedded in digital content could be removed by malicious users via post-processing, whereas DRM-based access-control solutions could be hacked. In this paper, we introduce a content tracking mechanism that we have built for multimedia-content near-replica detection as the second line of defense. The integrated framework aims to detect unlawful copyright infringements on the Internet, and combines the strengths of static rights enforcement and dynamic illegal content tracking. The issues of accuracy and huge computation cost in copy detection have been addressed by the introduced content-based techniques. Our experiments demonstrate the efficacy of proposed copy detector.

Jen-Hao Hsiao, Cheng-Hung Li, Chih-Yi Chiu, Jenq-Haur Wang, Chu-Song Chen, Lee-Feng Chien
Semantic Web Techniques for Multiple Views on Heterogeneous Collections: A Case Study

Integrated digital access to multiple collections is a prominent issue for many Cultural Heritage institutions. The metadata describing diverse collections must be interoperable, which requires aligning the controlled vocabularies that are used to annotate objects from these collections. In this paper, we present an experiment where we match the vocabularies of two collections by applying the Knowledge Representation techniques established in recent Semantic Web research. We discuss the steps that are required for such matching, namely formalising the initial resources using Semantic Web languages, and running ontology mapping tools on the resulting representations. In addition, we present a prototype that enables the user to browse the two collections using the obtained alignment while still providing her with the original vocabulary structures.

Marjolein van Gendt, Antoine Isaac, Lourens van der Meij, Stefan Schlobach

Posters

A Content-Based Image Retrieval Service for Archaeology Collections

Archeological sites have heterogeneous information ranging from different artifacts, image data, geo-spatial information, chronological data, and other relevant metadata. ETANA-DL, an archaeology digital library, provides various services by integrating the heterogeneous data available in different collections. This demonstration presents an initial prototype for searching DL objects based on the image content, using the Content-Based Image Search Component (CBISC) from Virginia Tech/State University of Campinas.

Naga Srinivas Vemuri, Ricardo da S. Torres, Rao Shen, Marcos André Gonçalves, Weiguo Fan, Edward A. Fox
A Hierarchical Query Clustering Algorithm for Collaborative Querying

In this work, a hierarchical query clustering algorithm is designed and evaluated for the collaborative querying environment. The evaluation focuses on domain specific queries to better understand whether the algorithm meets the needs of users. Experiment results show that the proposed algorithm works well to cluster queries with good precision.

Lin Fu, Dion Hoe-Lian Goh, Schubert Shou-Boon Foo
A Semantics-Based Graph for the Bib-1 Access Points of the Z39.50 Protocol

A graph of Access Points can be used as a tool in a number of applications such as, clarification and better understanding of their semantics and inter-relations, query transformations, more precise query formulation, etc. We apply a procedure to create a graph of the Access Points, according to their subset relationship, based on the official semantics of the Bib-1 Access Points of the Z39.50 protocol. In our three-step method, we first construct the relationship graph of the Access Points by testing for subset relationship between any two Access Points, and assigning each Access Point a weight value designating the number of the Access Points, which are subsets to it. In the second step, we apply a topological sorting algorithm on the graph, and finally in the last step, we reject the redundant subset relationships of the Access Points.

Michalis Sfakakis, Sarantos Kapidakis
A Sociotechnical Framework for Evaluating a Large-Scale Distributed Educational Digital Library

The National Science Digital Library (NSDL: http://www.nsdl.org) supports all levels of science, technology, engineering, and mathematics education. NSDL is conducting a program-wide evaluation of all its activities since 2000. The scale and complexity of the NSDL program pose significant challenges for this evaluation work. This poster outlines a sociotechnical theoretical framework, the ’resource lifecycle,’ that is being used to guide the evaluation of the NSDL program.

Michael Khoo
A Tool for Converting from MARC to FRBR

The FRBR model is by many considered to be an important contribution to the next generation of bibliographic catalogues, but a major challenge for the library community is how to use this model on already existing MARC-based bibliographic catalogues. This problem requires a solution for the interpretation and conversion of MARC records, and a tool for this kind of conversion is developed as a part of the Norwegian BIBSYS FRBR project. The tool is based on a systematic approach to the interpretation and conversion process and is designed to be adaptable to the rules applied in different catalogues.

Trond Aalberg, Frank Berg Haugen, Ole Husby
Adding User-Editing to a Catalogue of Cartoon Drawings

This paper describes an ongoing project to enable user-editing on an existing online database of about 120,000 British newspaper cartoons at the University of Kent. It describes the cartoon catalogue itself and then describes how the online search website has been extended to allow users to edit catalogue records in a way that should be both safe and economical. Finally, it discusses the next stage of the project, which is to experiment with ways to encourage users to become contributors.

John Bovey
ALVIS – Superpeer Semantic Search Engine – ECDL 2006 Demo Submission

ALVIS is a European project (IST-1-002068-STP) building a semantic-based peer-to-peer search engine. A consortium of eleven partners from six European Community countries, Finland, France, Sweden, Denmark, Spain, and Slovenia, plus Switzerland and China, contribute expertise in a broad range of specialities including network topologies, routing algorithms, probabilistic approaches to information retrieval, linguistic analysis and bioinformatics. The project runs from 1 January 2004 to 31 December 2006. Pointers to scientific papers and download sites for components can be found at http://www.alvis.info/.

Gert Schmeltz Pedersen, Anders Ardö, Marc Cromme, Mike Taylor, Wray Buntine
Beyond Error Tolerance: Finding Thematic Similarities in Music Digital Libraries

Current Music Information Retrieval (MIR) systems focus on melody based retrieval with some tolerance for user errors in the melody specification. The system described here presents a novel method for theme retrieval: A theme is described as a list of musical events, containing melody and harmony features, which must be presented in a given order and within a given time frame. The system retrieves musical phrases that fit the description. A system of this type could serve musicians and listeners who wish to discover thematically similar phrases in music digital libraries. The prototype and underlying model have been tested on midi sequences of music by W.A. Mozart and have shown good performance results.

Tamar Berman, J. Stephen Downie, Bart Berman
Comparing and Combining Two Approaches to Automated Subject Classification of Text

A machine-learning and a string-matching approach to automated subject classification of text were compared, as to their performance, advantages and downsides. The former approach was based on an SVM algorithm, while the latter comprised string-matching between a controlled vocabulary and words in the text to be classified. Data collection consisted of a subset from Compendex, classified into six different classes. It was shown that SVM on average outperforms the string-matching approach: our hypothesis that SVM yields better recall and string-matching better precision was confirmed only on one of the classes. The two approaches being complementary, we investigated different combinations of the two based on combining their vocabularies. The results have shown that the original approaches, i.e. machine-learning approach without using background knowledge from the controlled vocabulary, and string-matching approach based on controlled vocabulary, outperform approaches in which combinations of automatically and manually obtained terms were used. Reasons for these results need further investigation, including a larger data collection and combining the two using predictions.

Koraljka Golub, Anders Ardö, Dunja Mladenić, Marko Grobelnik
Concept Space Interchange Protocol: A Protocol for Concept Map Based Resource Discovery in Educational Digital Libraries

The Strand Map Service provides resource discovery in digital libraries using strand maps developed by the American Association for the Advancement of Science, project 2061. Strand maps are a special kind of concept maps that contains interconnected learning goals organized along grade groups and topical strands. The Strand Map Service provides programmatic access to AAAS strand maps that can be used by educational digital libraries to dynamically build resource discovery interfaces. The programmatic access to strand maps is enabled by the Concept Space Interchange Protocol, which provide following services (1) service capability determination, (2) resource alignment, and (3) search and retrieval of dynamically generated strand maps. The protocol is implemented as a web service and integration experiments have been performed for two educational digital libraries. In this poster we describe the Concept Space Interchange Protocol and its integration with educational digital libraries.

Faisal Ahmad, Qianyi Gu, Tamara Sumner
Design of a Cross-Media Indexing System

There is a lack of an integrated technology that will increase effective usage of the vast and heterogeneous multi-lingual and multimedia digital content. The need is being expressed insistently by endusers, and professionals in content business. The EU-IST Framework 6 Reveal-This (R-T) project aims at developing a complete and integrated content programming technology able to capture, semantically index, categorise, multimedia and multilingual digital content, whilst providing search, summarisation and translation functionalities. In order to fulfill this, the project proposes an architectural unit called Cross-Media Indexing Component (CMIC). CMIC leverages the individual potential of each indexing information generated by the analyzers of diverse modalities such as speech, text and image. It hypothesises that a system which combines and cross analyses different high-level modal descriptions of the same audio-visual content will perform better at retrieval and filtering time. The initial prototype utilises the Multiple Evidence approach by establishing links among the modality specific descriptions in order to depict topical similarity in the semantic textual space. This paper gives an overview of the project, CMIC’s enrichment approach and its support for retrieval.

Murat Yakıcı, Fabio Crestani
Desired Features of a News Aggregator Service: An End-User Perspective

Reports on what users experience when interacting with currently available news aggregator services. Five news aggregator services were chosen as the most representatives of emerging trends in this area of research and a combination of quantitative and qualitative methods were used for data collection involving users from the academic and research community. Forty-five responses were received for the online questionnaire survey, and 10 users were interviewed to elicit feedback . Criteria and measures for comparing usability of the chosen services were defined by the researchers based on the review of literature and a detailed study of the chosen news aggregator services. A number of desirable features of news aggregators were identified. Concluded that an ideal model could be designed by combining the usability features of TvEyes and the retrieval performance of GoogleNews.

Sudatta Chowdhury, Monica Landoni
DIAS: The Digital Image Archiving System of NDAP Taiwan

The Digital Image Archiving System (DIAS) was developed by the National Digital Archives Program, Taiwan. Its major purpose is to manage and preserve digital images of cultural artifacts and provide the images to external

DIAS uses the DjVu image technique to solve the speed and distortion problems that arise when browsing very large images on the Internet. It also provides an online, real-time visible watermark appending function for digital image copyright protection, and uses image copy detection techniques to track illegal duplication.

Currently, DIAS manages a vast number of digital images and can be integrated with metadata archiving systems to manage digital images and metadata as a complete digital archiving system. We are developing digital image data exchange, heterogeneous system integration, automatic image classification, and multimedia processing technologies to improve DIAS.

Hsin-Yu Chen, Hsiang-An Wang, Ku-Lun Huang
Distributed Digital Libraries Platform in the PIONIER Network

One of the main focus areas of the PIONIER: Polish Optical Internet program was the development and verification of pilot services and applications for the information society. It was necessary to create a base for new developments in science, education, health care, natural environment, government and local administration, industry and services. Examples of such services are digital libraries, allowing to create multiple content and metadata repositories which can be used as a basis for the creation of sophisticated content-based services. In this paper we are presenting the current state of digital library services in the PIONIER network, we shortly describe dLibra – a digital library framework which is the software platform for the majority of PIONIER digital libraries. We also introduce two content-based services enabled on PIONIER digital libraries: distributed metadata harvesting and searching and virtual dynamic collections.

Cezary Mazurek, Tomasz Parkoła, Marcin Werla
EtanaCMV: A Visual Browsing Interface for ETANA-DL Based on Coordinated Multiple Views

Visual interfaces for digital libraries (DLs) provide support for insightful browsing, presentation of search results in a browsing platform, and analysis of records in the DL. We propose the demonstration of a visual interface to ETANA–DL, a growing union archaeological DL. Our interface EtanaCMV is based on a uniform multiple view design and facilitates browsing of DL records that are multidimensional, hierarchical, and categorical in nature. We use distinct panels to allow flexible browsing across multiple dimensions. Bars in each panel denote the various categories in each dimension. EtanaCMV will give the users a quick overview of the collections in the DL during browsing in addition to showing relationships in the harvested collections. Coordination between multiple views is used to present more insights into the data.

Johnny L. Sam-Rajkumar, Rao Shen, Naga Srinivas Vemuri, Weiguo Fan, Edward A. Fox
Intelligent Bibliography Creation and Markup for Authors: A Step Towards Interoperable Digital Libraries

The move towards integrated international Digital Libraries offers the opportunity of creating comprehensive data on citation networks. These data are not only invaluable pointers to related research, but also the basis for evaluations such as impact factors, and the foundation of smart search engines. However, creating correct citation-network data remains a hard problem, and data are often incomplete and noisy. The only viable solution appear to be systems that help authors create correct, complete, and annotated bibliographies, thus enabling autonomous citation indexing to create correct and complete citation networks. In this paper, we describe a general system architecture and two concrete components for supporting authors in this task. The system takes the author from literature search through domain-model creation and bibliography construction, to the semantic markup of bibliographic metadata. The system rests on a modular and extensible architecture: VBA Macros that integrate seamlessly into the user’s familiar working environment, the use of existing databases and information-retrieval tools, and a Web Service layer that connects them.

Bettina Berendt, Kai Dingel, Christoph Hanser
Introducing Pergamos: A Fedora-Based DL System Utilizing Digital Object Prototypes

This demonstration provides a “hands on” experience to the “internals” of

Pergamos

, the University of Athens DL System.

Pergamos

provides uniform high level DL services, such as collection management, web based cataloguing, browsing, batch ingestion and automatic content conversions that adapt to the underlying digital object type-specific specialities through the use of

Digital Object Prototypes

(DOPs). The demonstration points out the ability of DOPs to effectively model the heterogeneous and complex material of

Pergamos

. Special focus is given on the inexpensiveness of adding new collections and digital object types, highlighting how DOPs eliminate the need for custom implementation.

George Pyrounakis, Kostas Saidis, Mara Nikolaidou, Vassilios Karakoidas
Knowledge Generation from Digital Libraries and Persistent Archives

This poster describes the ongoing research of the Cheshire project with a particular focus on knowledge generation and digital preservation. The infrastructure described makes use of tools from computational linguistics, distributed parallel processing and storage, information retrieval and digital preservation environments to produce new knowledge from very large scale datasets present in the data grid.

Paul Watry, Ray R. Larson, Robert Sanderson
Managing the Quality of Person Names in DBLP

Quality management is, not only for digital libraries, an important task in which many dimensions and different aspects have to be considered. The following paper gives a short overview on DBLP in which the data acquisition and maintenance process underlying DBLP is discussed from a quality point of view. The paper finishes with a new approach to identify erroneous person names.

Patrick Reuther, Bernd Walter, Michael Ley, Alexander Weber, Stefan Klink
MedSearch: A Retrieval System for Medical Information Based on Semantic Similarity

MedSearch

is a complete retrieval system for Medline, the premier bibliographic database of the U.S. National Library of Medicine (NLM).

MedSearch

implements

SSRM

, a novel information retrieval method for discovering similarities between documents containing semantically similar but not necessarily lexically similar terms.

Angelos Hliaoutakis, Giannis Varelas, Euripides G. M. Petrakis, Evangelos Milios
Metadata Spaces: The Concept and a Case with REPOX

This paper describes REPOX, an XML infrastructure to store and manage metadata, in the sense it is commonly defined in digital libraries. The purpose is to make it possible, in alignment with an Enterprise Architecture model, to develop a component of a Service Oriented Architecture that can manage, transparently, large amounts of descriptive metadata, independently of their schemas or formats, and for the good of other services. The main functions of this infrastructure are submission (including synchronisation with external data sources), storage (including long-term preservation) and retrieval (with persistent linking). The case is demonstrated with a deployment at the National Library of Portugal, using metadata from two information systems and three schemas: bibliographic and authority data from a union catalogue and descriptive data from an archival management system.

Nuno Freire, José Borbinha
Multi-Layered Browsing and Visualisation for Digital Libraries

For a scientific researcher it is more and more vital to find relevant publications with their correct bibliographical data, not only for accurate citations but particularly for getting further information about their current research topic.

This paper describes a new approach to develop user-friendly interfaces:

Multi-Layered-Browsing

. Two example applications are introduced that play a central role in searching, browsing and visualising bibliographical data.

Alexander Weber, Patrick Reuther, Bernd Walter, Michael Ley, Stefan Klink
OAI-PMH Architecture for the NASA Langley Research Center Atmospheric Science Data Center

We present the architectural decisions involved in adding an Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) interface to the NASA Langley Research Center Atmospheric Science Data Center (ASDC). We review four possible implementation strategies and discuss the implications of our choice. The ASDC differs from most OAI-PMH implementations because of its complex data model, large size (1.3 petabytes) of its Earth Science data holdings and its rate of data acquisition (>20 terabytes / month).

Churngwei Chu, Walter E. Baskin, Juliet Z. Pao, Michael L. Nelson
Personalized Digital E-library Service Using Users’ Profile Information

We propose a personalized digital E-library system using a collaborative filtering technique, which provides a personalized search list according to users’ preference. The proposed system analyzes the registered users’ actions such as “clicking” and “borrowing” items. According to the different actions, we provide a weight for calculating the users’ preference of each item. However, the list is uniformly provided to the individual users when they search with same keywords. In order to avoid the problem, we customize the order of items in the list according to whether there is any mismatching of profiles among registered users and target users or not.

Wonik Park, Wonil Kim, Sanggil Kang, Hyunjin Lee, Young-Kuk Kim
Representing Aggregate Works in the Digital Library

This paper studies the challenge of representing aggregate works such as encyclopaedia, collected poems and journals in digital libraries. Reflecting on materials used by humanities academics, it demonstrates the complex range of aggregate types and the problems of representing this heterogeneity in the digital library interface. We demonstrate that aggregates are complex and pervasive, challenge many common assumptions and confuse the boundaries between organisational levels within the library. The challenge is amplified by concrete examples.

George Buchanan, Jeremy Gow, Ann Blandford, Jon Rimmer, Claire Warwick
Scientific Evaluation of a DLMS: A Service for Evaluating Information Access Components

In this paper, we propose an architecture for a service able to manage, enrich, and support the interpretation of the scientific data produced during the evaluation of information access and extraction components of a

Digital Library Management System (DLMS)

. Moreover, we describe a first prototype, which implements the proposed service.

Giorgio Maria Di Nunzio, Nicola Ferro
SIERRA – A Superimposed Application for Enhanced Image Description and Retrieval

In this demo proposal, we describe our prototype application, SIERRA, which combines text-based and content-based image retrieval and allows users to link together image content of varying document granularity with related data like annotations. To achieve this, we use the concept of superimposed information (SI), which enables users to (a) deal with information of varying granularity (sub-document to complete document), and (b) select or work with information elements at sub-document level while retaining the original context.

Uma Murthy, Ricardo da S. Torres, Edward A. Fox
The Nautical Archaeology Digital Library

In Nautical Archaeology, the study of components and objects creates a complex environment for scholars and researchers. Nautical archaeologists access, manipulate, study, and consult a variety of sources from different media, geographical origins, ages, and languages. Representing underwater excavations is a challenging endeavor due to the large amount of information and data in heterogeneous media and sources that must be structured, segmented, categorized, indexed, and integrated. We are creating a Nautical Archaeology Digital Library that will a) efficiently catalog, store, and manage artifacts and ship remains along with associated information from underwater archeological excavations, b) integrate heterogeneous data sources in different media to facilitate research work, c) incorporate historic sources to help in the study of current artifacts, d) provide visualization tools to help researchers manipulate, observe, study, and analyze artifacts and their relationships; and e) incorporate algorithm and visualization based mechanisms for ship reconstruction.

Carlos Monroy, Nicholas Parks, Richard Furuta, Filipe Castro
The SINAMED and ISIS Projects: Applying Text Mining Techniques to Improve Access to a Medical Digital Library

Intelligent information access systems integrate text mining and content analysis capabilities as a relevant element in an increasing way. In this paper we present our work focused on the integration of text categorization and summarization to improve information access on a specific medical domain, patient clinical records and related scientific documentation, in the framework of two different research projects: SINAMED and ISIS, developed by a consortium of two research groups from two universities, one hospital and one software development firm. SINAMED has a basic research orientation and its goal is to design new text categorization and summarization algorithms based on the utilization of lexical resources in the biomedical domain. ISIS is a R&D project with a more applied and technology-transfer orientation, focused on more direct practical aspects of the utilization in a concrete public health institution.

Manuel de Buenaga, Manuel Maña, Diego Gachet, Jacinto Mata
The Universal Object Format – An Archiving and Exchange Format for Digital Objects

Long-term preservation is a complicate and difficult task for a digital library. The key to handle this task is the inclusion of technical metadata. These metadata should be packed together with the files for an exchange between digital archives. Archival systems should handle the data in the Data Management and use it for preservation planning. The German project kopal has defined for this purpose the Universal Object Format (UOF) and enhanced the archival system DIAS with generic functions to support flexible handling of preservation metadata.

Tobias Steinke
Tsunami Digital Library

In this paper, we present our Tsunami Digital Library (TDL) which can store and manage documents about the Tsunami, Tsunami run up simulation, newspaper articles, fieldwork data, etc. We offer a multilingual interface. Currently some documents and explanations of the Tsunami videos have been translated into English and French. We are convinced that TDL will support many people who want to mitigate the Tsunami disaster and to plan countermeasures against the Tsunami.

Sayaka Imai, Yoshinari Kanamori, Nobuo Shuto
Weblogs for Higher Education: Implications for Educational Digital Libraries

Based on a modified Technology Acceptance Model (TAM), the paper describes a study to understand the relationships between perceived usefulness, perceived ease of use and intention to use weblogs for learning in higher education. Data was collected from sixty-eight students of a local university. The findings suggested that students were likely to accept weblog use as a course requirement if they perceived the activity to be useful for learning. The paper concludes with a discussion on design implications for educational digital libraries.

Yin-Leng Theng, Elaine Lew Yee Wan
XWebMapper: A Web-Based Tool for Transforming XML Documents

Interoperability has been one of the most challenging issues of last decade. Different solutions with various levels of sophistication have been proposed, such as wrappers, mediators, and other types of middleware. In most solutions, the Extensible Markup Language (XML) has been accepted as the de facto standard for the interchange of information due to its simplicity and flexibility.

Manel Llavador, José H. Canós
Backmatter
Metadaten
Titel
Research and Advanced Technology for Digital Libraries
herausgegeben von
Julio Gonzalo
Costantino Thanos
M. Felisa Verdejo
Rafael C. Carrasco
Copyright-Jahr
2006
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-44638-5
Print ISBN
978-3-540-44636-1
DOI
https://doi.org/10.1007/11863878

Neuer Inhalt