Skip to main content

2004 | Buch

Research and Advanced Technology for Digital Libraries

8th European Conference, ECDL 2004, Bath, UK, September 12-17, 2004. Proceedings

herausgegeben von: Rachel Heery, Liz Lyon

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Inhaltsverzeichnis

Frontmatter

Digital Library Architectures

Dynamic Digital Library Construction and Configuration

This paper describes a digital library architecture and implementation that is configurable, extensible and dynamic in the way it presents content and in the services it provides. The design manifests itself as a network of modules that communicate in terms of XML messages. All modules characterize the functionality they implement in response to a “describe yourself” message, and can transform messages using XSLT to support different levels of configurability. Traditional library values such as backwards compatibility and multiplatform operation are combined with the ability to add new collections and services adaptively. The paper describes the new design and shows how it can be used to build four different digital library systems. We conclude by showing how the design fits existing interoperability frameworks.

David Bainbridge, Katherine J. Don, George R. Buchanan, Ian H. Witten, Steve Jones, Matt Jones, Malcolm I. Barr
Milos: A Multimedia Content Management System for Digital Library Applications

This paper describes the MILOS Multimedia Content Management System: a general purpose software component tailored to support design and effective implementation of digital library applications. MILOS supports the storage and content based retrieval of any multimedia documents whose descriptions are provided by using arbitrary metadata models represented in XML. MILOS is flexible in the management of documents containing different types of data and content descriptions; it is efficient and scalable in the storage and content based retrieval of these documents. The paper illustrates the solutions adopted to support the management of different metadata descriptions of multimedia documents in the same repository, and it illustrates the experiments performed by using the MILOS system to archive documents belonging to four different and heterogenous collections which contain news agencies, scientific papers, and audio/video documentaries.

Giuseppe Amato, Claudio Gennaro, Fausto Rabitti, Pasquale Savino
Designing an Integrated Digital Library Framework to Support Multiple Heterogeneous Collections

Athens University recently initiated a digital collection development project to provide enhanced educational capabilities. Collections vary in terms of the material included and the requirements imposed by potential users. In order to simplify collection management and promote collection interoperability, a common Digital Library platform should be employed to support all collections. In order to deal with the extended requirements imposed, Athens University has decided to support an integrated Digital Library framework for multiple heterogeneous digital collections development. The most important requirements imposed by the University’s collections are discussed in this paper, along with the characteristics of the Folklore Collection, one of the most complex and diverse ones. In order to evaluate available Digital Library systems, the Folklore Collection has been chosen as a guide for the development of two prototype implementations using Fedora and DSpace. Conclusions drawn from their comparison and the proposed integrated Digital Library architecture based on Fedora are also presented.

George Pyrounakis, Kostas Saidis, Mara Nikolaidou, Irene Lourdi
DSpace: A Year in the Life of an Open Source Digital Repository System

The DSpaceTM digital repository system was released as open source software in November of 2002. In the year since then it has been adopted by a large number of research universities and other organizations world-wide that need a digital repository solution for a number of content types: research articles, gray literature, e-theses, cultural materials, scientific datasets, institutional records, educational materials, and more. The DSpace platform and its various applications are becoming better understood with experience and time. As one result of a recent meeting of the DSpace user community, we are now venturing into the territory of broad, community-based open source development and management, and gaining insights from the experience of the Apache Foundation, Global Grid Forum, and other successful open source projects about how to build open source software for the digital library domain.

MacKenzie Smith, Richard Rodgers, Julie Walker, Robert Tansley

Evaluation and Usability

Spatial Ranking Methods for Geographic Information Retrieval (GIR) in Digital Libraries

This paper presents results from an evaluation of algorithms for ranking results by probability of relevance for Geographic Information Retrieval (GIR) applications. We review the work done on GIR and especially on ranking algorithms for GIR. We evaluate these algorithms using a test collection of 2500 metadata records from a geographic digital library. We present an algorithm for GIR ranking based on logistic regression from samples of the test collection. We also examine the effects of different representations of the geographic regions being searched, including minimum bounding rectangles, and convex hulls.

Ray R. Larson, Patricia Frontiera
Evaluation of an Information System in an Information Seeking Process

This paper presents a holistic evaluation of an operational information system that employs the Boolean search technique. An equal focus is laid on both the system (system perspective) and its users (user perspective) in the actual environment where the system and its users are functioning (contextuality). In addition to these research objectives, the study has a methodological objective to test an evaluation approach developed by Borlund [1] in a real life setting. Our evaluation methodology involves triangulation (pre-search questionnaires; search log; post-interviewing) as well as novel interactive performance measures, such as the Ranked Half-Life measure and the Satisfaction and Novelty perception by users supplementing the traditional Precision. The study confirms the finding of earlier research and reveals the discrepancy between the evaluation results according to the system and the user perspectives. More specifically, the system performed better when evaluated from the user perspective than from the system perspective.

Lena Blomgren, Helena Vallo, Katriina Byström
Fiction Electronic Books: A Usability Study

This paper focuses on fiction electronic books and their usability. Two complementary studies were drawn together in order to investigate whether fiction e-books can successfully become part of peoples reading habits: the Visual Book project, which found that electronic texts which closely resemble their paper counterparts in terms of visual components such as size, quality and design were received positively by users, and the EBONI Project which aimed to define a set of best practice guidelines for designing electronic textbooks. It was found that the general guidelines for the design of textbooks on the Internet that have been proposed by the EBONI project can also be applied to the design of fiction e-books. Finally, in terms of the electronic production of fiction e-books, this study suggests that concentrating on the appearance of text, rather than the technology itself, can lead to better quality publications to rival the print version of fiction books.

Chrysanthi Malama, Monica Landoni, Ruth Wilson
Interoperable Digital Library Programmes? We Must Have QA!

Digital library programmes often seek to provide interoperability through use of open standards. In practice, however, deployment of open standards in a compliant manner is not necessarily easy. The author argues that a strict checking regime would be inappropriate in many circumstances. The author proposes deployment of quality assurance (QA) principles which provide documented policies on the standards and best practices to be implemented and systematic procedures for measuring compliance with these policies. The paper describes the work of the QA Focus project which has developed a QA methodology to support JISCs digital library programmes. A summary of the application of the methodology to support selection of standards and the deployment of deliverables into service is given. The author argues that similar approaches are needed if we are to provide interoperability across digital library programmes.

Brian Kelly

User Interfaces and Presentation

Next Generation Search Interfaces – Interactive Data Exploration and Hypothesis Formulation

To date, the majority of Web search engines have provided simple keyword search interfaces that present the results as a ranked list of hyperlinks. More recently researchers have been investigating interactive, graphical and multimedia approaches which use ontologies to model the knowledge space. Such systems use the semantic relationships to structure the assimilated search results into interactive semantic graphs or hypermedia presentations which enable the user to quickly and easily explore the results and detect previously unrecognized associations. More recently, the proliferation of eResearch communities has led to a demand for search interfaces which automate the discovery, analysis and assimilation of multiple information sources in order to prove or disprove a particular scientific theory or hypothesis. We believe that such semi-automated analysis, assimilation and hypothesis-driven approaches represent the next generation of search engines. In this paper we describe and evaluate such a search interface which we have developed for a particular eScience application.

Jane Hunter, Katya Falkovych, Suzanne Little
Ontology Based Interfaces to Access a Library of Virtual Hyperbooks

A virtual hyperbook is a virtual document made of a set of information fragments linked to a domain ontology and equipped with selection and assembly methods or rules. In this paper, we study the problem of accessing and reading in a digital library of virtual hyperbooks. In this case it is necessary to generate hyperdocuments that present information and knowledge originating from several hyperbooks. Moreover, these hyper-documents must fit with the reading objectives or specific point of views of readers. Our approach is based on the integration of domain ontologies and the re-use of interface specifications.

Gilles Falquet, Claire-Lise Mottaz-Jiang, Jean-Claude Ziswiler
Document Icons and Page Thumbnails: Issues in Construction of Document Thumbnails for Page-Image Digital Libraries

Digital libraries are increasingly based on digital page images, but techniques for constructing usable versions of these page images are largely folklore. This paper documents some issues encountered in creating various kinds of renderings of page images for the UpLib digital library system, and suggests approaches for each, based on both problem analysis and user feedback. Several factors important in determining useful sizes for small visual representations of the documents, called document icons, are discussed; one algorithm, called log-area, seems most effective.

William C. Janssen
Citiviz: A Visual User Interface to the CITIDEL System

The Digital Library (DL) field is one of the most promising areas of application for information visualization technology. In this paper, we propose a visual user interface tool kit for digital libraries, to deliver an overview of document sets, with support for interactive direct manipulation. Our system, Citiviz, employs a dynamic hyperbolic tree to display hierarchical relationships among documents, based on where their topics fit into the ACM classification system. Also, Citiviz provides an interactive, animated 2-dimensional scatter plot. With it, users may gain insight by changing various parameters, or may directly jump to a particular document based on its label or location. According to a preliminary evaluation, our system shows advantages in performance and user preference relative to traditional text based DL web interfaces.

Nithiwat Kampanya, Rao Shen, Seonho Kim, Chris North, Edward A. Fox

New Approaches to Information Retrieval

System Support for Name Authority Control Problem in Digital Libraries: OpenDBLP Approach

In maintaining Digital Libraries, having bibliographic data up-to-date is critical, yet often minor irregularities may cause information isolation. Unlike documents for which various kinds of unique ID systems exist (e.g., DOI, ISBN), other bibliographic entities such as author and publication venue do not have unique IDs. Therefore, in current Digital Libraries, tracking such bibliographic entities is not trivial. For instance, suppose a scholar changes her last name from A to B. Then, a user, searching for her publications under the new name B, cannot get old publications that appeared under A although they are by the same person. For such a scenario, since both A and B are the same person, it would be desirable for Digital Libraries to track their identities accordingly. In this paper, we investigate this problem known as name authority control, and present our system-oriented solution. We first identify three core building blocks that underlie the phenomenon, and show taxonomy where different combinations of the building blocks can occur. Then, we consider how systems can support the problem in two common functions of Digital Libraries – Update and Search. Finally, our test-bed called OpenDBLP is presented where the suggested solution is fully implemented as a proof of the concept.

Yoojin Hong, Byung-Won On, Dongwon Lee
NLP Versus IR Approaches to Fuzzy Name Searching in Digital Libraries

Name Search is an important search function in Digital Library systems and various types of information retrieval systems, such as directory search systems, electronic phonebooks and yellow pages. The paper discusses two main approaches to fuzzy name matchingthe natural language processing (NLP) approach and the information retrieval (IR) approachand proposes a hybrid approach. Person names can be considered a (sub-)language, in which case a name search system will be developed using Natural Language Processing apparatus including dictionary, thesaurus and grammatical schema. On the other hand, if names are perceived as (free) text, then an entirely different system may be built incorporating indexing, retrieving, relevance ranking and other Information Retrieval techniques. These two schools of thought, NLP and IR, have somewhat different sets of techniques originating from different theoretical concerns and research traditions. A selective combination of their complementary features is likely to be more effective for fuzzy name matching. Two principles, position attribute identity (PAI) and position transition likelihood (PTL), are proposed to incorporate aspects of both approaches. The two principles have been implemented in an NLP- and IR- hybrid model system called Friendly Name Search (FNS) for real world applications in multilingual directory searches on the Singapore Yellowpages website.

Paul Horng-Jyh Wu, Jin-Cheon Na, Christopher S. G. Khoo
From Abstract to Virtual Entities: Implementation of Work-Based Searching in a Multimedia Digital Library

Libraries of digitized multimedia content provide access to virtual entities. In the case of music, where there are frequently many different performances, editions, and arrangements of a given work, the Variations2 metadata model, links all instances of a work to an abstract work record, thus yielding superior search capabilities to digital library users. This paper summarizes the motivation for addressing the music metadata problem and describes the Variations2 search user interface, which is based on our work-centric, FRBR-like metadata model.

Mark Notess, Jenn Riley, Harriette Hemmasi
Approaching the Problem of Multi-lingual Information Retrieval and Visualization in Greek and Latin and Old Norse Texts

In this paper, we explore approaches to multi-lingual information retrieval for Greek, Latin, and Old Norse texts. We also describe an information retrieval tool that allows users to formulate Greek, Latin, or Old Norse queries in English and display the results in an innovative clustering and visualization facility.

Jeffrey A. Rydberg-Cox, Lara Vetter, Stefan Rüger, Daniel Heesch

Interoperability

Building Interoperability for United Kingdom Historic Environment Information Resources

The paper will present the work of the Forum on Information Standards in Heritage (FISH) – www.fish-forum.info – in the development of standards and protocols to support interoperability between historic environment sector information systems. The paper describes barriers to interoperability within the sector. These originate in the unique character of the historic environment as an information source. Progress in the development of relevant standards is reviewed and emphasis placed upon community building to support standardisation. Current work to develop an XML-based interoperability ‘toolkit’ of schema and protocols to support knowledge-sharing networks is described. This will be based on current FISH standards along with the CIDOC Conceptual Reference Model, an emerging ISO standard ontology for cultural heritage information.

Edmund Lee
Prototyping Digital Libraries Handling Heterogeneous Data Sources – The ETANA-DL Case Study

Information systems used in archaeology have several needs: interoperability among heterogeneous systems, making information available without significant delay, long-term preservation of data, and providing a suite of services to users. In this paper, we show how digital library techniques can be employed to provide solutions to three of these problems. We show this by describing a prototype for an archaeological Digital Library (ETANA-DL). First, ETANA-DL applies and extends the metadata harvesting approach to address some of the needs interoperability, rapid access to data, and data preservation. Second, we show that availability of a pool of components that implement common DL services has helped in rapidly creating the prototype, which was subsequently used for requirements elicitation. However, understanding complex archaeological information systems is a difficult task. Third, therefore, we describe our efforts to model these systems using the 5S framework, and show how the partially developed model has been used to implement complex services helping users carry out key tasks with the integrated data.

Unni Ravindranathan, Rao Shen, Marcos André Gonçalves, Weiguo Fan, Edward A. Fox, James W. Flanagan
zetoc SOAP: A Web Services Interface for a Digital Library Resource

This paper describes the provision of a Web Services interface that will extend the possibility of machine-to-machine access to the zetoc current awareness service, within the JISC Information Environment and eScience applications. This bespoke interface includes open standard XML metadata for searches and responses where possible. Elements from the OpenURL XML metadata formats for journals and books are used to transmit the bibliographic citation information that is an integral part of a zetoc record for a journal article or conference paper.

Ann Apps

Enhanced Indexing and Searching Methods

A Comparison of Text and Shape Matching for Retrieval of Online 3D Models

Because of recent advances in graphics hard- and software, both the production and use of 3D models are increasing at a rapid pace. As a result, a large number of 3D models have become available on the web, and new research is being done on 3D model retrieval methods. Query and retrieval can be done solely based on associated text, as in image retrieval, for example (e.g. Google Image Search [1] and [2,3]). Other research focuses on shape-based retrieval, based on methods that measure shape similarity between 3D models (e.g., [4]). The goal of our work is to take current text- and shape-based matching methods, see which ones perform best, and compare those. We compared four text matching methods and four shape matching methods, by running classification tests using a large database of 3D models downloaded from the web [5]. In addition, we investigated several methods to combine the results of text and shape matching. We found that shape matching outperforms text matching in all our experiments. The main reason is that publishers of online 3D models simply do not provide enough descriptive text of sufficient quality: 3D models generally appear in lists on web pages, annotated only with cryptic filenames or thumbnail images. Combining the results of text and shape matching further improved performance. The results of this paper provide added incentive to continue research in shape-based retrieval methods for 3D models, as well as retrieval based on other attributes.

Patrick Min, Michael Kazhdan, Thomas Funkhouser
Corpus-Based Query Expansion in Online Public Access Catalogs

We propose a probabilistic method for query expansion in online public access catalogs that utilizes both historical query logs and the subject headings in the library catalog. Our method creates correlations between query and document terms, allowing relevant subject headings from the corpus to be retrieved and added to a query. Experiments demonstrate an average of 31.1% performance increase over currently fielded baselines.

Jeffry Komarjaya, Danny C. C. Poo, Min-Yen Kan
Automated Indexing with Restricted Random Walks on Large Document Sets

We propose a method based on restricted random walk clustering as a (semi-)automated complement for the tedious, error-prone and expensive task of manual indexing in a scientific library. The first stage of our method is to cluster a set of (partially) indexed documents using restricted random walks on usage histories in order to find groups of similar documents. In the second stage, we derive possible keywords for documents without indexing information from the frequencies of keywords assigned to other documents in their respective cluster.Due to the specific clustering algorithm, the proposed algorithm is still efficient with millions of documents and can be deployed on standard PC hardware.

Markus Franke, Andreas Geyer-Schulz

Personalisation and Annotation

Annotations in Digital Libraries and Collaboratories – Facets, Models and Usage

This paper presents the results of our study regarding the different facets and ways of using annotations in both digital libraries and collaboratories. This study represents an innovative attempt at gathering methodological tools and synergies from both fields in order to effectively define a comprehensive model for annotations. Thus we propose a conceptual model for annotations in order to develop an annotation service that can be plugged into digital libraries and collaboratories. Finally, starting from our model, we introduce a search strategy for exploiting annotations in order to search and retrieve relevant documents for a user query.

Maristella Agosti, Nicola Ferro, Ingo Frommholz, Ulrich Thiel
P-News: Deeply Personalized News Dissemination for MPEG-7 Based Digital Libraries

Advanced personalization techniques are required to cope with novel challenges posed by attribute-rich MPEG-7 based digital libraries. At the heart of our deeply personalized news dissemination system P-News is one extensible preference model that serves all purposes, preventing impedance mismatches between the various stages: User modeling by structured preference patterns, automatic query expansion including ontologies, preference query evaluation by Preference XPath including nested preferences on categorical data, quality assessment of query results, personalized notification and news syndication.

Qiuyue Wang, Wolf-Tilo Balke, Werner Kießling, Alfons Huhn
Laws of Attraction: In Search of Document Value-ness for Recommendation

In this paper we explore the uniqueness of paper recommendation for e-learning systems through a human-subject study. Experiment results showed that the majority of learners have struggled to reach a ‘harmony’ between their interest and educational goal: they admit that in order to acquire new knowledge, they are willing to read not-interesting-yet-pedagogically-useful papers. In other words, learners seem to be more tolerant than users in commercial recommender systems. Nevertheless, as educators, we should still maintain a balance of recommending interesting papers and pedagogically helpful ones in order to retain learners and continuously engage them throughout the learning process.

Tiffany Y. Tang, Gordon McCalla

Music Digital Libraries

Sound Footings: Building a National Digital Library of Australian Music

MusicAustralia is a Web portal for anyone interested in Australian music. A joint development of the National Library of Australia and ScreenSound Australia: National Screen and Sound Archive, it provides users with access to a federated resource discovery service for Australian music in notated and audio representations and in digital and non-digital formats, as well as a directory service providing information on people, organisations and services associated with Australian music. This paper outlines the architecture of the MusicAustralia service, focusing particularly on its federated service model and the infrastructure elements and business processes developed to support this architecture. It also looks at the way in which another major component of the federated digital library of Australian music the Peter Burgis Performing Arts Archive is using the MusicAustralia service model and architecture to shape its own strategies and structures.

Marie-Louise Ayres, Toby Burrows, Robyn Holmes
Content-Based Retrieval in Digital Music Libraries

MiDiLiB is a six year research project on digital music libraries funded by the German Research Foundation (DFG) as a part of the Distributed Processing and Delivery of Digital Documents (V3D2) research initiative. MiDiLiB’s main focus is the development of content-based retrieval algorithms for both score- and waveform-based music. In this paper we give an overview of our research results, describe several prototypical systems for content-based music retrieval which have been developed during the project, and discuss applications of the presented techniques in the context of today’s and future digital music libraries.

Michael Clausen, Frank Kurth, Meinard Müller, Andreas Ribbrock
Knowledge-Based Scribe Recognition in Historical Music Archives

For the content-based management and access to domain-specific data in digital libraries, special domain-knowledge and knowledge processing functionality are required. However, the integration of knowledge components has not yet become an integral part of existing digital library systems. The current paper represents the realization of a digital archive of historical music scores, integrating special domain-specific data and functionality for writer identification in historical music scores. We introduce the basic formalisms and heuristics for the representation of handwriting characteristics. To compare two handwritings we propose the usage of a normalized, weighted Hamming distance function to calculate the degree of similarity between their handwriting characteristics. For the identification of writers we employ the k-nearest neighbor method to build clusters of similar writers, based on the calculated distance. And finally, we represent and evaluate the test results from the prototype implementation of the system.

Ilvio Bruder, Temenushka Ignatova, Lars Milewski

Personal Digital Libraries

Enhancing Kepler Usability and Performance

Kepler is an attempt to bridge the gap between established, organization-backed digital libraries and groups of researchers that wish to publish their findings under their control, anytime, anywhere yet have the advantages of an OAI-compliant digital library. We describe an architecture and implementation of the Kepler system that allows an archivelet to be installed in the order of minutes by an author on a personal machine and a group server in less than an hour. The group server will harvest from all archivelets and make the union of all published papers available for search to a community. We describe how a group administrator can provide an XML schema for the metadata and how the Kepler engine will validate against them when an author publishes a paper and completes the metadata. We have demonstrated that we can surmount the technical difficulties for authors to publish as easy as to a website yet produce OAI-compliant digital libraries.

Kurt Maly, Michael Nelson, Mohammad Zubair, Ashraf Amrou, Sathish Kothamasa, Lan Wang, Rick Luce
Media Matrix: Creating Secondary Repositories

This paper argues for the necessity of digital libraries to increase access to their holdings and have greater impact on e-learning and education by facilitating the creation of secondary repositories. These repositories will provide discipline/community specific metadata and applications and will allow users to find, use, manipulate and analyze digital objects more easily. To this end, MATRIX has developed Media Matrix 1.0 an online, easy to use server-side suite of tools that allows users to locate specific media and streaming media files found in digital repositories and segment, annotate and organize this media online. This application provides users with an environment both to work with and personalize digital media, and also to share and discuss their findings with a community of users. Through creating a secondary repository of usage statistics and user-generated materials/metadata to supplement both traditional cataloging records and discipline-specific online indexes, tools like Media Matrix can help extend the usefulness of digital libraries without increasing costs to the libraries.

Mark Kornbluh, Michael Fegan, Dean Rehberger
Incorporating Physical and Digital Artifacts into Growing Personal Collections

We have produced a system that automatically incorporates syndicated materials from sources including library acquisition records and online news sites to form growing hypertextual structures. This system enables users to create personal and shared collections built atop a growing substrate. It also seeks to empower users through the use of information filters to create dynamic personal collections that can themselves grow over time to include materials as they appear within the underlying collection. In addition, we are investigating particular benefits of intersecting hypertextual paths as a useful structure for representing such sub-collections and the resources extracted from the feeds themselves. We present our prototype system, the emerging standards for syndicating online content, and a discussion of the importance of supporting growth within digital libraries generally.

Pratik Dave, Luis Francisco-Revilla, Unmil P. Karadkar, Richard Furuta, Frank M. Shipman, Paul Logasa Bogen II

Innovative Technologies for Digital Libraries

Enhancing the OpenDLib Search Service

This paper presents a new technique for supporting query formulation and processing experimentally integrated in the OpenDLib search service. This technique provides a better support for unified search by enhancing the capability of the digital library to satisfy the user needs. The paper presents the theory underlying the proposed technique and describes how it has been exploited in the OpenDLib system.

Leonardo Candela, Donatella Castelli, Pasquale Pagano
Multi-level Exploration of Citation Graphs

In previous work, we proposed a focus-based multi-level clustering technique. It consists in computing a particular clustered graph from a given graph and a focus. The resulting clustered graph is called multi-level outline tree. It is a tree whose meta-nodes are sub-sets of nodes. A meta-node is itself hierarchically clustered depending on its connectivity. In this paper we introduce a cluster cohesiveness measure to enhance the results of the previously proposed algorithm. We further propose an optimization of this algorithm to support fluid interaction when focus changes. Finally, we report the results of a case study that consists in applying the enhanced algorithm to citation graphs where documents are considered as vertices and citation links as edges.

François Boutin, Mountaz Hascoët
Collaborative Querying for Enhanced Information Retrieval

Communication and collaboration with other people is a major theme in the information seeking process. Collaborative querying addresses this issue by sharing other users’ search experiences to help users formulate appropriate queries to a search engine. This paper describes a collaborative querying system that helps users with query formulation by finding previously submitted similar queries through mining web logs. The system operates by clustering and recommending related queries to users using a hybrid query similarity identification approach. The system employs a graph-based approach to visualize the query recommendations.

Lin Fu, Dion Hoe-Lian Goh, Schubert Shou-Boon Foo, Yohan Supangat

Open Archives Initiative

Servicing the Federation: The Case for Metadata Harvesting

The paper presents a comparative analysis of data harvesting and distributed computing as complementary models of service delivery within large-scale federated digital libraries.Informed by requirements of flexibility and scalability of federated services, the analysis focuses on the identification and assessment of model invariants. In particular, it abstracts over application domains, services, and protocol implementations.The analytical evidence produced shows that the harvesting model offers stronger guarantees of satisfying the identified requirements. In addition, it suggests a first characterisation of services based on their suitability to either model and thus indicates how they could be integrated in the context of a single federated digital library.

Fabio Simeoni
Developing a Technical Registry of OAI Data Providers

With the continued growth of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) [1] it has become increasingly difficult for OAI service providers to discover new and keep up-to-date with existing data providers. There are currently several registries of OAI data providers. Most of these registries are incomplete. Most contain minimal information about registered providers – typically a base URL and little if anything else – providing service providers no clue as to repository scope, content, or size. These deficiencies mean significant extra overhead for service providers. This paper describes a more comprehensive registry of OAI data providers (available at http://oai.grainger.uiuc.edu/registry), developed to address some of these issues. While our registry as it presently exists facilitates discovery of data providers, utility is limited by lack of consistent practice for collection-level metadata. To realize the full potential of a better registry, the OAI community needs to develop better practices for collection-level description.

Thomas G. Habing, Timothy W. Cole, William H. Mischo
Applying SOAP to OAI-PMH

The Web Services paradigm for distributed computing promises to provide a breakthrough in interoperability by defining standardised mechanisms for inter-process communication. The SOAP standard, in particular, is widely discussed but not as widely adopted by standards bodies. The OAI is one such organisation that has been criticised for not adopting SOAP. Since the OAI-PMH is driven by semantics and SOAP describes syntax, a merger of the two technologies seems natural and inevitable. This paper discusses an attempt to remodel and repackage the OAI-PMH as a layer over SOAP and implement an end-to-end solution based on this experimental protocol. The project highlighted important concerns, such as the relative efficiency of layering in structured textual data and the problem of moving standards. The results show that few compromises are needed for a move to SOAP provided that protocol design is appropriately abstracted, and this has far reaching implications for the adoption of SOAP and Web Services within the DL community and OAI in particular.

Sergio Congia, Michael Gaylord, Bhavik Merchant, Hussein Suleman

New Models and Tools

Connexions: An Alternative Approach to Publishing

Web technologies offer new methods for quickly sharing and disseminating knowledge. Digital libraries of scholarly assets are proliferating online, with materials being openly licensed and shared. Sharing knowledge provides learners, instructors and researchers with access to the most recent findings, encouraging more rapid breakthroughs that lead to positive impacts to society. These new publication processes pose challenges to traditional publishers by redefining methods for providing quality information in a timely manner. The Connexions project at Rice University is a collaborative, community-driven approach to authoring, teaching, and learning. By collaborating both within and across disciplines, communities of authors work together to pool their expertise in the form of knowledge modules. These modules form the basis for building courses that are authored by many, with each author receiving attribution for his or her contributions. Information can be modified under an open license to tailor the material for the audiences of learners.

Geneva Henry
: Bridging the Gap Between a Simple Set of Structured Documents and a Functional Digital Library

Digital Libraries are complex systems that take a long time to create and tailor to specific requirements [1]. Their implementation requires specialized computer skills, which are not usually found within humanities text encoding projects. Many encoders working on text encoding projects find they cannot take their work to the next level by transforming their collections of structured XML [2] texts into a publishable web searchable and browsable service. Most often these teams find the way to encode their texts with a high degree of sophistication, but unless they have funds to hire computer programmers their collections remain on local disk storage away from public access. ${\bf\it <teiPublisher>}$is a novel tool designed with the aim of bridging the gap between simply having a collection of structured documents and having a functional digital library for public access via the web. The goal of this project is to build the tools to manage an extensible, modular and configurable XML-based repository which will house, search/browse on, and display documents encoded in TEI-Lite [3] on the World Wide Web. ${\bf\it <teiPublisher>}$provides an administrative interface that allows DL administrators to upload and delete documents from a web accessible repository, analyze XML documents to determine elements for searching/browsing, refine ontology development, decide on inter and intra document links, partition the repository into collections, create backups of the entire repository, generate search/browse and display pages for users of the website, change the look of the interface, and associate XSL transformation scripts and CSS stylesheets to obtain different target outputs (HTML [4], PDF, etc.).

Amit Kumar, Alejandro Bia, Martin Holmes, Susan Schreibman, Ray Siemens, John Walsh
Managing a Paradigm Shift – Aligning Management, Privacy Policy, Technology and Standards

It is argued that we are experiencing a paradigm shift from a user perspective to a client perspective in library and information science. The paradigm shift is brought about by recent changes in scholarly publishing, which have enabled end-users to search for and retrieve information by themselves. Libraries are increasingly providing services that are more and more personalized. The implications of the paradigm shift for management, privacy policy, integration of services, and standards are discussed. It is suggested that libraries are increasingly considering customer relationship management and that privacy policy should be split up in to personal and professional privacy. Current systems should be developed to support successive searching behaviour. Finally the need for an Open Services Initiative to solve the appropriate service problem is discussed.

Jonas Holmström

User-Centred Design

Towards an Integrated Digital Library: Exploration of User Responses to a ‘Joined-Up’ Service

Digital library users have to deal with many separate services. This paper describes efforts in the United Kingdom to use OpenURL technology to provide ‘joined-up’ services. The focus is on zetoc, a national electronic service, which enables users to find references in a British Library bibliographic database. zetoc now uses OpenURL technology to provide routes to services, which might give users access to electronic full text versions of references they have found. Data is provided from two questionnaire surveys and an interview programme conducted to explore user responses to these services. These evaluation studies show that users want these integrated services and are extremely positive about them when they work. However, ‘joined-up’ services depend for their success on the access rights that each user has to full text sources in their institution. As a result, the success level in obtaining full text varies considerably between institutions. Users in disadvantageous positions have expressed disappointment and frustration; the service may be regarded as a promise not fulfilled. The paper describes the development of ‘joined-up services’ as a partnership at national and local levels.

Ken Eason, Susan Harker, Ann Apps, Ross MacIntyre
Supporting Information Structuring in a Digital Library

In this paper we present Garnet, a spatial hypertext interface to a digital library. Spatial hypertext systems support information structuring the organisation of documents performed by a user to complement their information seeking. In the past, spatial hypertext systems have suffered from poor connectivity with information sources such as digital libraries. Conversely, digital libraries have provided strong support for document retrieval whilst offering little support for information structuring over the retrieved documents. Garnet provides an integrated environment for both seeking and organising information. We report on the results of a user study that elicits the response of users to a combined seeking and structuring environment. The feasibility of exploiting the information structuring of users to identify the interests of users is also investigated.

George Buchanan, Ann Blandford, Harold Thimbleby, Matt Jones
Evaluating Strategic Support for Information Access in the DAFFODIL System

The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.

Claus-Peter Klas, Norbert Fuhr, André Schaefer

Innovative Technologies for Digital Libraries

Using Digital Library Techniques – Registration of Scientific Primary Data

Registration of scientific primary data, to make these data citable as a unique piece of work and not only a part of a publication, has always been an important issue. With the new digital library techniques, it is finally made possible. In the context of the project “Publication and Citation of Scientific Primary Data” founded by the German research foundation (DFG) the German national library of science and technology (TIB) has become the first registration agency worldwide for scientific primary data. The datasets receive unique DOIs and URNs as citable identifiers and all relevant metadata information is stored at the online library cataloque. Registration has started for the field of earth science, but will be widened for other subjects in 2005.In this paper we will give you a quick overview about the project and the registration of primary data.

Jan Brase
Towards Topic Driven Access to Full Text Documents

We address the issue of providing topic driven access to full text documents. The methodology we propose is a combination of topic segmentation and information retrieval techniques. By segmenting the text into topic driven segments, we obtain small and coherent documents that can be used in two ways: as a basis for automatically generating hypertext links, and as a visualization aid for the reader who is presented with a small set of focused and restricted text snippets. In the presence of a concept hierarchy, or ontology, information retrieval techniques can be used to connect the segments obtained to concepts in the ontology. In this paper we concentrate on the text segmentation phase: we describe our approach to segmentation, discuss issues related to evaluation, and report on preliminary results.

Caterina Caracciolo, Willem van Hage, Maarten de Rijke
Bibliographic Component Extraction Using Support Vector Machines and Hidden Markov Models

Article citations are composed of subfields such as author, title, journal, and year. It is useful to automatically identify attributes of these subfields, since they are used for linking a citation with the actual cited article. In this article, we employ a Support Vector Machine (SVM), a method of machine learning, to automatically identify subfields. We then employ a Hidden Markov Model (HMM) to improve the identification accuracy. Information from the subfields identified by the SVM, and syntactic information analyzed by the HMM, are integrated to make an accurate identification.

Takashi Okada, Atsuhiro Takasu, Jun Adachi
Towards a Policy Language for Humans and Computers

A policy is a statement that an action is permitted or forbidden if certain conditions hold. We introduce a language for reasoning about policies called Rosetta. What makes Rosetta different from existing approaches is that its syntax is essentially a fragment of English. The language also has formal semantics, and we can prove whether a permission follows from a set of Rosetta policies in polynomial time. These features make it fairly easy for policy language developers to provide translations between their languages and ours. As a result, policy writers and (human) readers can create and access policies via the interface of their choice; these policies can be translated to Rosetta; and once in Rosetta can be translated to an appropriate language for enforcement.

Vicky Weissman, Carl Lagoze
Backmatter
Metadaten
Titel
Research and Advanced Technology for Digital Libraries
herausgegeben von
Rachel Heery
Liz Lyon
Copyright-Jahr
2004
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-30230-8
Print ISBN
978-3-540-23013-7
DOI
https://doi.org/10.1007/b100389