Skip to main content

2005 | Buch

Digital Libraries: Implementing Strategies and Sharing Experiences

8th International Conference on Asian Digital Libraries, ICADL 2005, Bangkok, Thailand, December 12-15, 2005. Proceedings

herausgegeben von: Edward A. Fox, Erich J. Neuhold, Pimrumpai Premsmit, Vilas Wuwongse

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Inhaltsverzeichnis

Frontmatter

Concepts and Models for Digital Library Systems

A Model of ITS Using Cold Standby Cluster

Current intrusion detection mechanisms have quite low detection and high false alarm rates. Thus we propose a model of intrusion tolerant system (ITS) to increase the survivability level from the successful attacks. In this paper, we present the cluster recovery model using cold standby cluster with a software rejuvenation methodology, which is applicable in security field and also less expensive. Firstly, we perform the steady state analysis of a cluster system and then consider an ITS with cold standby cluster. The basic idea is – investigate the consequences for the exact responses in face of attacks and rejuvenate the running service or/and reconfigure it. It shows that the system operates through intrusions and provides continued the critical functions, and gracefully degrades non-critical system functionality in the face of intrusions.

Khin Mi Mi Aung, Kiejin Park, Jong Sou Park
From Heterogeneous Information Spaces to Virtual Documents

This paper introduces DoMDL, a powerful and flexible document model capable to represent multi-edition, structured, multimedia documents that can be disseminated in multiple manifestation formats. This model also allows any document to be associated with multiple metadata descriptions in different formats and to include semantic relationships with other documents and parts of them. The paper discusses also how the OpenDLib Digital Library Management System exploits this model to abstract from the specific organization and structure of the documents that are imported from different heterogeneous information sources in order to provide virtual documents that fulfill the needs of the different DL user communities.

Leonardo Candela, Donatella Castelli, Pasquale Pagano, Manuele Simi
Traveling in Digital Archive World: Sightseeing Metaphor Framework for Enhancing User Experiences in Digital Libraries

Digital libraries are currently growing rapidly in number and size, covering various fields. However, despite the great deal of intellectual effort and large budgets necessary for their construction, digital libraries are often limited to use by specialists. We describe a framework that will attract more users to digital libraries through enhancing the presentation layers. We propose the guiding principle to such enhancement, the Sightseeing Metaphor Framework (SF), which is a structure based on user activities during sightseeing. We exemplify it in a Web-based regional information presentation system. The evaluation we did for an elementary school classroom revealed that the system created a better impression on the students compared to an existing presentation scheme.

Taro Tezuka, Katsumi Tanaka
Flexing Digital Library Systems

Digital library systems with monolithic architectures are rapidly facing extinction as the discipline adopts new practices in software engineering, such as component-based architectures and Web Services. Past projects have attempted to demonstrate and justify the use of components through the construction of systems such as NCSTRL and ScholNet. This paper describes current work to push the boundaries of digital library research and investigate a range of projects made feasible by the availability of suitable components. These projects include: the ability to assemble component-based digital libraries using a visual interface; the design of customisable user interfaces and workflows; the packaging and installation of systems based on formal descriptions; and the shift to a component farm for cluster-like scalability. Each of these sub-projects makes a potential individual contribution to research in architectures, while sharing a common underlying framework. Together, all of these projects support the hypothesis that a consistent component architecture and suite of components can provide the basis for advanced research into flexible digital library architectures.

Hussein Suleman, Kevin Feng, Siyabonga Mhlongo, Muammar Omar
An Ontology-Based Model of Digital Libraries

In this paper a new unifying model is suggested for digital libraries which contains four conceptual layers, and defines the concepts of each layer as an OWL ontology. Instances of the ontology can be used to define an overall view of a digital library in terms of the four layers and the relationships between them. Such a model has the advantage that the methodology is formalized and extensible, thus models are comparable and manageable.

László Kovács, András Micsik

Case Studies in Digital Libraries

The Impact of ICT on Library Services for the Visually Impaired

ICT gives visually impaired people two fundamental freedoms – Independence and Choice in library services. Before electronic information and on-line catalogues became available visually impaired people required assistance with reading and had limited choice of reading material. But now visually impaired people are no longer disabled in searching and surfing information on digital libraries. This study examines the ICT impact on library services for the visually impaired in mainstream libraries. New opportunities for mainstream libraries to integrate visually impaired people are discussed as well as the problems facing the mainstream libraries.

Young Sook Lee
An Asian Study of Healthcare Web Portals: Implications for Healthcare Digital Libraries

In contrast to most studies conducted in the West, this study investigated online trust of healthcare Web portals from Asian countries. A Web-based survey was conducted through the Internet for about two weeks and achieved 127 responses. The respondents assessed two healthcare Web portals based on task completion before answering questions in a Web-based questionnaire. Congruent to related studies carried in the West, this study also suggested a significant relationship between usability and perceived credibility of healthcare Web portals. Findings from this pilot study seemed to indicate that the “error prevention” usability heuristic was most severely violated in two healthcare Web portals. The paper then concludes with implications on design of user-centred healthcare digital libraries.

Yin-Leng Theng, Eng-Soon Soh
Annotations in an Academic Digital Library: The Case of Conference Note-Taking and Annotation

This paper explores the potential usefulness and acceptability of annotation facilities by prospective users of an IT research digital library. We studied current annotation and note-taking behavior of IT researchers (academic and commercial), as exhibited at IT conferences. Here, we examine the implications of this information behavior for the design of annotation tools in a research-oriented digital library.

Sally Jo Cunningham, Chris Knowles
Relevance Judgments for Image Retrieval in the Field of Journalism: A Pilot Study

The objective of this pilot study is to investigate relevance judgments made by end-users when searching for image information. The pilot study involved 10 undergraduate students from the Department of Journalism and Media Studies at Rutgers University using the AccuNet/AP Photo Archive to retrieve specific, general, and subjective photos. The study identified core relevance criteria used across the three different image searches, and found that the participants in the general and subjective image searches relied more on personal feelings and textual information of photos to make relevance judgments, while the participants in the specific image search depended more on the features of objects in photos. Four textual representations–caption, object name, location, and creation date, were chosen to see how useful they were for the participants making relevance judgments. The results show that location was the most useful information among the four textual representations.

Tsai-Youn Hung, Chuck Zoeller, Santiago Lyon

Digital Archives and Museums

Image Classification for Digital Archive Management

As tools and systems for producing and disseminating image data have improved significantly in recent years, the volume of digital images has grown rapidly. An efficient mechanism for managing such images in a digital archive system is therefore needed. In this study, we propose an image classification technique that meets this need. The technique can be employed to annotate and verify image categories when gathering images. The proposed method segments each image into non-overlapping blocks from which color and texture features can be extracted. Support Vector Machine (SVM) classifiers are then applied to train and classify the images. Our experimental results show that the proposed classification mechanism is feasible for digital archive management systems.

Cheng-Hung Li, Chih-Yi Chiu, Hsiang-An Wang
Digital Content Development of Taiwanese Folklore Artifacts

Folklore artifacts hold strong cultural meaning for a people. Taiwan Folklore Museum (TFM) is Taiwan’s first official folklore museum which aims at providing the people of Taiwan with a place where they can reflect about the past and experience how the pioneers lived. There are a great variety of artifacts, which were classified into ten categories according to their life styles and functions, collected in the museum and it attracts a great number of oversea tourists each year. The museum is also the most popular place for students from kindergartens and primary schools and for general citizens to learn what their tradition is and how their ancestors lived. In this paper, we report our current progress in digitization and content development of the artifacts. Totally 1412 collected artifacts have been digitized so far. The originality and function of each collected artifact was described in three different languages, including Chinese, Japanese, and English, in which detailed information of the artifacts were examined and studied by several Taiwanese folklore specialists. To facilitate inter-museum communication, metadata based on Dublin core and its extensions were provided as well. A website (http://www.folkpark.org.tw) dedicated to demonstrate the digital contents of the artifacts and to support digital surrogates for the folklore researchers was also constructed. It allows people from all over the world to surrogate the information about the collected artifacts so that studies regarding Taiwanese folklore artifacts can be done without territory constraint. Future works will focus on the construction of 3D models for the artifacts on demonstrating their global views. E-learning contents for Taiwanese folklore courses will also be authored for providing general publics and children an interactive way of learning on the Internet.

Po-Chou Chan, Yung-Fu Chen, Kuo-Hsien Huang, Hsuan-Hung Lin
Electronic Restoration: Eliminating the Ravages of Time on Historical Maps

Geographic and mathematic analyses of historical maps require highly accurate adjustments to manuscripts in order to eliminate distortions caused by time and use. Earlier proposals for electronic restoration only offered effective solutions when compensating for tightly bound or straightly creased books. Applying a different solution, we have encountered a way of electronically restoring the map to its original shape, producing not only a more beautiful map, but also one suitable for further geographical analyses.

German Diaz, Patricia Seed
Automatic 3D Blogging to Support the Collaborative Experience of 3D Digital Archives

Digital archives that include 3D CG models of, for example, art works and archaeological sites are now commonly created for a wide range of purposes. Unlike traditional museums where people can visit in a group, access to digital archives and browsing of their contents are solitary activities. Thus, users cannot as easily gain a deep understanding of the contents and are less likely to truly enjoy the contents. We therefore propose a method to support collaborative experiencing of 3D digital archives related to cultural heritages. To achieve this goal, we developed a system that enables automatic blogging of annotations made to 3D content. The annotations a user makes to the 3D content of archives while “walking” through them are automatically converted into a blog on a web page. Users’ experiences, through annotations expressing their impressions, opinions, suggestions, and so on, are converted into content, which can then be used by subsequent users as a reference or guide when browsing the archives.

Rieko Kadobayashi

Multimedia Digital Libraries

Towards a Unified Framework for Context-Preserving Video Retrieval and Summarization

Entirely watching separate video segments of interest or their summary might not be smooth enough nor comprehensible for viewers since contextual information between those segments may be lost. A unified framework for context-preserving video retrieval and summarization is proposed in order to solve this problem. Given a video database and ontologies specifying relationships among concepts used in MPEG-7 annotations, the objective is to identify according to a user query

relevant

segments together with summaries of

contextual

segments. Two types of contextual segments are defined:

intra-contextual

segments intended to form semantically coherent segments, and

inter-contextual

segments intended to semantically link together two separate segments.

Relationships among verbs

[3] are exploited to identify contextual segments as the relationships can provide the knowledge about events, causes and effects of actions over time. A query model and context-preserving video summarization are also presented.

Nimit Pattanasri, Somchai Chatvichienchai, Katsumi Tanaka
Content Augmentation and Webification for Enhancing TV Viewing

A system is described for enhancing the viewing of programs on storage televisions. The content of a program is webified and augmented by analyzing the closed captions to structuralize the content online and by searching for Web pages that provide information complementary to the program. The structuralized content and related information are viewed using an intuitive, zooming user interface that enables the user to switch gradually from watching a program to browsing the program like a Web page and to change the level of detail. Prototype testing validated the concept of this ”

WA-TV

” (

W

ebifying and

A

ugmenting

TV

-content) system.

Qiang Ma, Hisashi Miyamori, Katsumi Tanaka
CLOVER: A Mobile Content-Based Leaf Image Retrieval System

In this paper, we present an effective and robust leaf image retrieval system called CLOVER that works especially in the mobile environment. For the inquiry, users sketch or photograph a leaf using a PDA equipped with a digital camera, and then send it to a server. Most leaves tend to have similar color and texture, which makes shape-based image retrieval more effective than color-based image retrieval. In order to improve retrieval performance, we proposed a new shape representation scheme based on the well-known MPP algorithm. The new scheme can reduce the number of points to consider for matching. In addition, we proposed a new dynamic matching algorithm based on the Nearest Neighbor search to reduce the matching time. We implemented a prototype system that supports adaptive transmission of images over 802.11b wireless networks to mobile devices and demonstrate its effectiveness and scalability through various experimental results.

Yunyoung Nam, Eenjun Hwang, Dongyoon Kim

Information Processing in Asian Digial Libraries

Global Memory Net Offers New Innovative Access to Tsurumi’s Old Japanese Waka Poems and Tales, and Maps

This paper describes how

Global Memory Net (GMNet)

has been able to provide new kind of innovative access to the invaluable content of the classical Japanese ancient poems and maps at Tsurumi University that was not available for public access before. The collaboration began with the development of a prototype collection, based on images included in two publications of the Tsurumi University Library – the

Eighty Selections of Waka Poems and Tales from the Classical Japanese Literature

and the

Japanese Maps in the Old Age

. As the project developed, coinciding with the technology development of

GMNet

in bilingual retrieval as well as with sound presentations, the inclusion of sound files for each of the Waka selection was considered a very desirable feature since Waka poems are generally only readable by very small number of specialists.

The paper will present a bird’s eye view of how Tsurumi’s rare collection is organized, presented with much enhanced access in

GMNet

system. Through this project, the Tsurumi team has gained considerable important experiences. The overall process for them was very time consuming even though the technology of

GMNet

was already in place. These valuable experiences will be discussed and shared.

Takashi Nagatsuka, Ching-chih Chen
Keyword Spotting on Korean Document Images by Matching the Keyword Image

In this paper, we propose a keyword spotting system for Korean document images and compare the proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to resolve the connection between adjacent characters. In the query creation step, feature vector for the query is constructed by a combination of the features for the constituent characters. In the matching step, word-to-word matching is applied based on a character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on Korean document images, especially when the quality of documents is quite poor.

Soo Hyung Kim, Sang Cheol Park, Chang Bu Jeong, Ji Soo Kim, Hyuk Ro Park, Guee Sang Lee
Robust Feature Extraction for Automatic Classification of Korean Traditional Music in Digital Library

In this paper, we propose an automatic classification system that classifies the Korean traditional music in digital library. In contrast to previous works, this paper focuses on the following issues of music classification. Firstly, the proposed system accepts query sound and automatically classifies input query into one of the six Korean traditional music categories such as “Court music”, “Classical chamber music”, “Folk song”, “Folk music”, “Buddhist music”, and “Shamanist music”. Secondly, in order to overcome system uncertainty due to the different query patterns, a robust feature extraction method called multi-feature clustering (MFC) combined with SFS feature selection is proposed. Finally, several pattern classification algorithms such as k-NN, Gaussian, GMM and SVM are tested and compared in terms of the classification accuracy. The experimental results indicate that the proposed MFC-SFS method shows more stable and higher classification performance than the one without the MFC-SFS.

Kang-Kue Lee, Kyu-Sik Park
A New Re-ranking Method for Generic Chinese Text Summarization and Its Evaluation

In this paper a new EMD-MMR (EMD: earth mover’s distance; MMR: maximal marginal relevance) re-ranking method is proposed for generic Chinese text summarization. Our extraction-based summarization approach first ranks the sentences in a document by their weight calculated based on word frequency and position, and then re-ranks a few highly weighted sentences by the EMD-MMR method for sentence extraction. The proposed re-ranking method adopts a novel EMD-based similarity metric instead of the Cosine metric into the MMR approach. The EMD-based similarity metric can naturally take into account the semantic relatedness between words and compute the semantic similarity between texts with a many-to-many matching among words. We evaluate the performance of the proposed approach with a novel

nk-blind

method and the results demonstrate its effectiveness.

Xiaojun Wan, Yuxin Peng

Digital Libraries for Community Building

A User Classification for Internet Content Provider Based Modified Fuzzy Neural Network

With the explosive growth of the Internet, it has entered the age led by ICP (internet content provider). Helping users to locate relevant information in an efficient manner is very important both do the person and to the ICPs. As such, it is highly desired to have a systematic system for extracting user features effectively, and subsequently, analyzing user orientations quantitatively. The experimental results of this clustering technique show the promise of our system. This paper presents a new approach that employs a modified fuzzy neural network based on adaptive resonance theory to group users dynamically based on their Web access patterns. Such a user clustering method should be performed prior to ICPs as the basis to provide personalized service. The experimental results of this clustering technique show the promise of our system. The scheme could be used in local data management application, digital library, and so on.

Yunfeng Li, Yukun Cao
Eyes of a Wiki: Automated Navigation Map

There are many potential uses of a Wiki within a community-based digital library. Users share individual ideas to build up community knowledge by efficient and effective collaborative authoring and communications that a Wiki provides. In our study, we investigated how the community knowledge is organized into a knowledge structure that users can access and modify efficiently. Since a Wiki provides users with freedom of editing any pages, a Wiki site increases and changes dynamically. We also developed a tool that helps users to navigate easily in the dynamically changing link structure. In our experiment, it is shown that the navigation tool fosters Wiki users to figure out the complex site structure more easily and thus to build up more well-structured community knowledge base. We also show that a Wiki with the navigation tool improves collaborative learning in a web-based e-learning environment.

Hee-Seop Han, Hyeoncheol Kim
A Collaborative Filtering Based Re-ranking Strategy for Search in Digital Libraries

Users of a digital book library system typically interact with the system to search for books by querying on the meta data describing the books or to search for information in the pages of a book by querying using one or more keywords. In either cases, a large volume of results are returned of which, the results relevant to the user are not often among the top few. Re-ranking of the search results according to the user’s interest based on his relevance feedback, has received wide attention in information retrieval. Also, recent work in collaborative filtering and information retrieval has shown that sharing of search experiences among users having similar interests, typically called a community, reduces the effort put in by any given user in retrieving the exact information of interest. In this paper, we propose a collaborative filtering based re-ranking strategy for the search processes in a digital library system. Our approach is to learn a user profile representing user’s interests using Machine Learning techniques and to re-rank the search results based on collaborative filtering techniques. In particular, we investigate the use of Support Vector Machines(SVMs) and k-Nearest Neighbour methods (kNN) for the task of classification. We also apply this approach to a large scale online Digital Library System and present the results of our evaluation.

U. Rohini, Vamshi Ambati

Information Retrieval Techniques

Harvesting for Full-Text Retrieval

We propose an approach to Distributed Information Retrieval based on the periodic and incremental centralisation of full-text indices of widely dispersed and autonomously managed content sources.

Inspired by the success of the Open Archive Initiative’s protocol for

metadata harvesting

, the approach occupies middle ground between: (

i

) the crawling of content, and (

ii

) the distribution of retrieval. As in crawling, some data moves towards the retrieval process, but it is statistics about the content rather than content itself. As in distributed retrieval, some processing is distributed along with the data, but it is indexing rather than retrieval itself. We show that the approach retains the good properties of centralised retrieval without renouncing to cost-effective resource pooling. We discuss the requirements associated with the approach and identify two strategies to deploy it on top of the OAI infrastructure.

Fabio Simeoni, Murat Yakici, Steve Neely, Fabio Crestani
Word Extraction from Table Regions in Document Images

This paper describes a method to extract words from table regions in document images. The proposed approach consists of two stages: cell detection and word extraction. In the cell detection module, a table frame is extracted first by analyzing connected components and then intersection points are detected by a method using masks in the table frame. We correct false intersections, and detect the location of the cells within the table. In the word extraction module, a text region in each cell is located by using the connected components information that was obtained during the cell extraction module, and segmented into text lines by using projection profiles. Finally we divide the segmented lines into words using gap clustering and special symbol detection. The method correctly included character components touching the table frame with words, so experimental results show that more than 99% of words were successfully extracted from table regions.

Chang-Bu Jeong, Sang-Cheol Park, Hwa-Jeong Son, Soo-Hyung Kim
Where the Speed Matters... Zero-Response-Time Search Engine for Small Collections

Users with slow internet connections experience slow retrieval of results in web catalogues. JavaScript search engines can be used to enable client side search, reducing the load on the server, and increasing the response time. However, it is not a popular method until now, because of various reasons including limitation of number of data objects and lengthier response time for the first search. Here the author suggests negotiating the issue of response delay with the user. This would enable high speed basic search in small catalogues, usually with less than 300 data objects. Larger catalogues can be divided into smaller ones. Special or rare collections, multimedia artifacts and subject (web) directories are prospective candidates for this type of search systems. A prototype catalogue of ‘Sri Lankan Web Sites’ was tested in www.srilankasupersearch.com. Users’ behavior and response to the system is yet to be studied.

Ruwan Gamage
A Hybrid Information Retrieval Model Using Metadata and Text

Information retrieval (IR) with metadata tends to have high precision as long as the user expresses the information need accurately but may suffer from low recall because queries are too exact with the specification of the metadata fields. On the other hand, full-text retrieval tends to suffer more from low precision especially when queries are simple and the number of documents is large. While structured queries targeted at metadata can be quite precise and the retrieval results can be accurate, it is not easy to construct an effective structured query without understanding the characteristics of the metadata. Casual users, however, are usually interested in spending time to understand the meaning of various metadata. In this paper, we propose a hybrid IR model that searches both metadata and text fields of documents. User queries are analyzed and converted into a hybrid query automatically. Experiments show that the hybrid approach outperforms either of the cases, i.e. searching text only or metadata only.

Sung Soo Kim, Sung Hyon Myaeng, Jeong-Mok Yoo

Ontologies and Content Management in Digital Libraries

A Standards-Based Approach for Supporting Dynamic Access Policies for a Federated Digital Library

With the increasing acceptability of interoperability standards like Open Archives Initiative protocol for metadata harvesting, it is becoming feasible to build federated discovery services which aggregate metadata from different digital libraries (data providers) and provide a unified search interface to users. Content-based access control is one of the primary requirements of data providers. While this concept has been predominant in the research realm, practical systems incorporating this concept are rare. In this paper, we propose a framework that supports and enforces content-based access policies using existing COTS components. We have prototyped the framework by building a system using XACML, and a XACML policy engine. The system can also be generalized to environments other than digital libraries.

K. Bhoopalam, K. Maly, F. McCown, R. Mukkamala, M. Zubair
Exploiting Lexical Knowledge in Learning User Profiles for Intelligent Information Access to Digital Collections

Algorithms designed to support users in retrieving relevant information base their relevance computations on

user profiles

, in which representations of the users interests are maintained. This paper focuses on the use of

supervised machine learning

techniques to induce user profiles for Intelligent Information Access. The access must be

personalized

by profiles allowing users to retrieve information on the basis of

conceptual content

. To address this issue, we propose a method to learn sense-based user profiles based on WordNet, a lexical database.

G. Semeraro, P. Lops, M. Degemmis, C. Niederée, A. Stewart
Government Ontology and Thesaurus Construction: A Taiwanese Experience

Due to the quantity and the diversity involved in e-government presentations and operations, traditional approaches to web site information management have been found to be rather inefficient in time and cost. Consequently, the necessity of establishing a government knowledge management system, so as to speed up information lookups, sharing, and linkups, naturally arises. Moreover, this knowledge management system would in turn enhance e-government effectiveness as it helps to store and transmit information, be it explicit or implicit in nature. The first step in creating this knowledge management system is to build up the government ontology and thesaurus. Upon the completion of the ontology and thesaurus needed, semantic searching can be conducted, which in turn kickstarts other mechanisms required for effective information management.

Our research team has been commissioned by the Executive Yuan of Taiwan to establish the draft of government ontology and thesaurus and to design a framework for multiple-layered information management systems upon which the ontology and thesaurus can be constructed. The goal of this paper is to present the government ontology and thesaurus which our research team has come up with as well as the related infrastructure and function of the multiple-layered information management system.

Chao-chen Chen, Jian-hua Yeh, Shun-hong Sie
Concept Expansion Using Semantic Fisheye Views

Exploratory search over a collection often requires users to iteratively apply a variety of strategies, such as searching for more general or more specific concepts in reaction to the information they encounter. Rich semantic models, such as WordNet, are potentially valuable aids for making sense of this information. However, these large complex models often contain specialized vocabularies and a detailed level of granularity that makes them difficult to use for opportunistic search. In this paper, we describe how Semantic Fisheye Views (SFEV) can be designed to transparently integrate rich semantic models into the search process, allowing users to effectively explore a diverse range of related concepts without explicitly navigating over the underlying model. The SFEV combines semantic guided search with interactive visualization techniques, creating a search tool that we have found to be significantly more effective for exploratory tasks than those based on keyword-similarity alone.

Paul Janecek, Vincent Schickel, Pearl Pu

Information Integration and Retrieval Technologies in Digital Libraries

Development and Evaluation of a Multi-document Summarization Method Focusing on Research Concepts and Their Research Relationships

This paper reports the design and evaluation of a method for summarizing a set of related research abstracts. This summarization method extracts research concepts and their research relationships from different abstracts, integrates the extracted information across abstracts, and presents the integrated information in a Web-based interface to generate a multi-document summary. This study focused on sociology dissertation abstracts, but can be extended to other research abstracts. The summarization method was evaluated in a user study to assess the quality and usefulness of the generated summaries in comparison to a sentence extraction method used in MEAD and a method that extracts only research objective sentences. The evaluation results indicated that the majority of sociology researchers preferred our variable-based summary generated with the use of a taxonomy.

Shiyan Ou, Christopher S. G. Khoo, Dion H. Goh
Automatic Classification of Western Music in Digital Library

In this paper, we propose a new robust content-based western music genre classification algorithm using multi-feature clustering (MFC) method combined with feature selection procedure. This paper focuses on the dependency problems of the classification result to different query patterns and query lengths which causes serious uncertainty of the system performance. In order to solve these problems, a new approach called MFC-SFSS based on k-means clustering is proposed. To verify the performance of the proposed method, several excerpts with variable duration were extracted from every other position in a same queried music file. Effectiveness of the system with MFC –SFSS and without MFC-SFSS is compared in terms of the classification results with

k

-NN decision rule. It is demonstrated that the use of MFC-SFSS significantly improves the system stability of musical genre classification with better accuracy.

Won-Jung Yoon, Kang-Kue Lee, Kyu-Sik Park, Hae-Young Yoo
Finding Pertinent Page-Pairs from Web Search Results

Conventional Web search engines evaluate each single page as a ranking unit. When the information a user wishes to have is distributed on multiple Web pages, it is difficult to find pertinent search results with these conventional engines. Furthermore, search result lists are hard to check and they do not tell us anything about the relationships between the searched Web pages. We often have to collect Web pages that reflect different viewpoints. Here, a collection of pages may be more pertinent as a search result item than a single Web page. In this paper, we propose the idea to realize the notion of “multiple viewpoint retrieval” in Web searches. Multiple viewpoint retrieval means searching Web pages that have been described from different viewpoints for a specific topic, gathering multiple collections of Web pages, ranking each collection as a search result and returning them as results. In this paper, we consider the case of page-pairs. We describe a feature-vector based approach to finding pertinent page-pairs. We also analyze the characteristics of page-pairs.

Takayuki Yumoto, Katsumi Tanaka
A Relevant Score Normalization Method Using Shannon’s Information Measure

Given the ranked lists of images with relevance scores returned by multiple image retrieval subsystems in response to a given query, the problem of combined retrieval system is how to combine these lists equivalently. In this paper, we propose a novel relevance score normalization method based on Shannon’s information measure. Generally, the number of relevant images is exceedingly smaller than that of the entire retrieval targets. Therefore, we suppose that if the subsystems can clearly identify which retrieval targets are relevant, the subsystems should calculate high relevance scores to a few retrieval targets. In short, we can calculate the sureness of the IR subsystem using the distribution of the relevance scores. Then, we calculate the sureness of the IR subsystems using Shannon’s information measure, and calculate the normalized relevance scores using the sureness of the IR subsystems and the raw relevant scores. In our experiment, our normalization method outperformed the others.

Yu Suzuki, Kenji Hatano, Masatoshi Yoshikawa, Shunsuke Uemura, Kyoji Kawagoe
Configurable Meta-search for Integrating Web Public Access Catalogs

A Web Public Access Catalog (WebPAC) is an important feature of modern libraries. In this paper we propose a meta-search method to provide users with simultaneous access to WebPACs of different libraries. Our method gives a librarian full freedom to select WebPACs to be incorporated in the service but requires no programming effort from the librarian’s side. At the core of our method is a meta-search engine which sends a query to incorporated WebPACs, receives results, and post-processes the query results into a uniform presentation format. To incorporate an existing WebPAC into our system, one needs to analyze the query interaction behavior between the WebPAC and the browser. This can be done by extracting the query parameters from a query and the subsequent query result web pages. We modeled and abstracted these interactions and defined the corresponding XML formats to capture the needed parameters from these web pages. The resulting XML pages will then be fed to the search engine which will automatically incorporate the designated WebPAC as part of its search.

The advantage of our method is that the search engine does not need to be modified when new WebPACs are added. When adding a new WebPAC, the librarian only needs to analyze a few web pages to decide the parameters. Even this step can mostly be done automatically. To illustrate the effectiveness of our method, we have built a system, called MetaCat, that has incorporated the WebPACs of 26 major libraries in Taiwan. MetaCat can be accessed at http://MetaCat.ntu.edu.tw.

This research is supported in part by the National Science Council of the Republic of China under grant numbers NSC-94-2422-H-002-008 and NSC-93-2213-E-002-039.

Hou Ieong Ho, Jieh Hsiang

Information Mining Technologies in Digital Libraries

Annotating Text Segments Using a Web-Based Categorization Approach

Conventional automatic text annotation tools mostly extract named entities from texts and annotate them with information about persons, locations, and dates, etc. Such kind of entity type information, however, is insufficient for machines to understand the context or facts contained in the texts. This paper presents a general text categorization approach to categorize text segments into broader subject categories, such as categorizing a text string into a category of paper title in Mathematics or a category of conference name in Computer Science. Experimental results confirm its wide applicability to various digital library applications.

Hsin-Chen Chiao, Hsiao-Tieh Pu, Lee-Feng Chien
iQA: An Intelligent Question Answering System

Question answering (QA) is the study on the methodology that returns exact answers to natural language questions. This paper attempts to increase the coverage and accuracy of QA systems by narrowing the semantics gap between questions with terms written in abbreviations and their potential answers. To achieve this objective, the processing includes (1) identifying terms that might be abbreviations from the user’s natural language question; (2) retrieving documents relevant to that abbreviation term; (3) filtering noun phrases that are considered to be potential long forms for that abbreviation within the returned result.

Zhiguo Gong, Mei Pou Chan
Evaluating the Effectiveness of a Collaborative Querying Environment

Collaborative querying seeks to help users formulate queries by sharing expert knowledge or other users’ search experiences. In previous work, a collaborative query environment (CQE) was developed for a digital library. The system operates by clustering and recommending related queries to users using a hybrid query similarity identification approach. Users can explore the query clusters using a graph-based visualization system known as the Query Graph Visualizer (QGV). The purpose of this paper is to evaluate the CQE with goal of informing the usefulness and usability of such a system. Our results show that compared with traditional information retrieval systems, collaborative querying can lead to faster information seeking when users perform unspecified tasks.

Lin Fu, Dion Hoe-Lian Goh, Schubert Shou-Boon Foo
Opinion Leader Based Filtering

Recommendation systems are helping users find the information, products, and other people they most want to find, therefore many on-line stores provide recommending services e.g. Amazon, CDNOW, etc. Most recommendation systems use collaborative filtering, content-based filtering, and hybrid techniques to predict user preferences. We discuss the strengths and weaknesses of the techniques and present a unique recommendation system that automatically selects opinion leaders by category or genre to improve the performance of recommendation. Finally, our approach will help to solve the cold-start problem in collaborative filtering.

Hyeonjae Cheon, Hongchul Lee

Digital Library System Architecture and Implementations

Comparison and Analysis of the Citedness Scores in Web of Science and Google Scholar

An increasing number of online information services calculate and report the citedness score of the source documents and provide a link to the group of records of the citing documents. The citedness score depends on the breadth of source coverage, and the ability of the software to identify the cited documents correctly. The citedness score may be a good indicator of the influence of the documents retrieved. Google Scholar gives the most prominence to the citedness score by using it in ranking the search results. Tests have been conducted to compare the individual and aggregate citedness scores of items in the results list of various known-item and subject searches in Web of Science (WoS) and Google Scholar (GS). This paper presents the findings of the comparison and analysis of the individual and aggregate citation scores calculated by WoS and GS for the papers published in 22 volumes of the

Asian Pacific Journal of Allergy and Immunology

(APJAI). The aggregate citedness score was 1,355 for the 675 papers retrieved by WoS, and 595 for 680 papers found in GS. The findings of the analysis and comparison of tests, and the reasons for the significant limitations of Google Scholar in calculating and reporting the citedness scores are presented.

Peter Jacso
Enhancing Services in a Digital Age – 10 Years of Experience from the Systems Librarians’ Perspective

This paper is an attempt by the authors to share their experiences in equipping a young academic library with the information technologies needed to enhance services in a digital environment. After discussing the advent of

digital libraries

, the paper explores a progression of projects which make use of advancing technologies, from Web interfaces to XML metadata, and their effectiveness in a non-English (CJK) environment. These digital initiatives have become a core component of the Hong Kong University of Science and Technology (HKUST) Library’s service infrastructure, in addition to enhancing its traditional roles. The past ten years’ accelerating pace of technological change has had a tremendous impact on the provision of library services. Through this paper, the authors have provided one institution’s experiences both in benefiting from and contributing to these changes.

Edward F. Spodick, Ki-Tat Lam
Constructing a Wrapper-Based DRM System for Digital Content Protection in Digital Libraries

Conventional digital libraries utilize access control and digital watermarking techniques to protect their digital content. These methods, however, have drawbacks. First, after passing the identity authentication process, authorized users can easily redistribute the digital assets. Second, it is impractical to expect a digital watermarking scheme to prevent all kinds of attack. Thus, how to enforce property rights after digital content has been released to authorized users is a crucial and challenging issue. In this paper, we propose a wrapper-based approach to digital content protection that integrates digital watermarking, cryptography, information protection technology, and a rights model. In this rights enforcement environment, the behavior of all content players is monitored and digital content can only be accessed after certain usage rules have been satisfied. Furthermore, the proposed architecture can be easily integrated into any digital content player, or even existing DRM systems in digital libraries. With the protection of the proposed DRM system, the abuse of digital content can be drastically reduced.

Jen-Hao Hsiao, Jenq-Haur Wang, Ming-Syan Chen, Chu-Song Chen, Lee-Feng Chien
Digital Preservation Lifecycle Management for Multi-media Collections

Increasingly, intellectual content is “born digital.” In order to make it as easy as possible for content creators to preserve their content for the long-term, preservation processes should be integrated into the content production lifecycle. Our project takes an existing video production workflow and integrates it with a digital preservation life-cycle management process that will enable the digital content to be archived for long-term preservation. The collection, “Conversations with History,” is produced at the University of California, Berkeley, edited by University of California, San Diego–TV (UCSD-TV), and broadcast and Web-cast through UCTV. The proposed system will demonstrate an effective preservation methodology by demonstrating a standard reference model for digital preservation lifecycle management that can be integrated into active production workflows.

Arcot Rajasekar, Reagan Moore, Fran Berman, Brian Schottlaender
DRMS: Massive Digital Resource Management System Based on OSS

We discuss challenging issues and technologies in managing massive digital resources, and review the related works. We design a massive digital resource management system(DRMS) based on an open source system (OSS) and Web service, and implement the key components and core services. The virtual collection of DRMS is built for configuring and managing related service components. The DRMS has shown three important characteristics – universality, extensibility, and interoperability.

Chunxiao Xing, Fengrong Gao, Lizhu Zhou

Information Processing in Digital Libraries

Developing Communities and Collections with New Media and Information Literacy

As part of its many functions, the reference library is charged with developing both its collection and its user community. These two functions are sometimes pursued as separate initiatives (with separate funding) by library managers. In Australia, the State Library of Queensland (SLQ) is committed to an exciting policy of simultaneous collection development and community engagement by integrating new media technologies with public programs. SLQ’s Mobile Multimedia Laboratory is a purpose-designed portable digital creativity workshop which is made available to communities as a powerful platform to capture and disseminate local digital culture, and also to promote and train community members in information literacy. The Mobile Multimedia Laboratory facility operates in conjunction with SLQ’s Queensland Stories project, an innovative portal for the display and promotion of community co-created multimedia. Together, the Mobile Multimedia Laboratory and the Queensland Stories initiatives allow SLQ to directly engage with existing and new communities, and also to increase its digital collection with community created content. Not only are both initiatives relatively cost-effective, they have a positive impact upon information literacy within the state.

Jerry Watkins, Angelina Russo
Discovering Patterns from Ontology-Derived Texts

We propose a framework for constructing semantic features for textual documents from tackling the problem of abstracting information in document representation. Semantic patterns are discovered from ontology-derived texts which provide rich contextual information regarding the concepts. The patterns represent the syntactic and semantic relationships implied in the textual documents which can help in extracting and representing the underlying concepts in texts. We also investigate the significance of using the patterns in automatic summarization of biomedical articles.

Ki Chan, Wai Lam
Multimedia Retrieval Using Time Series Representation and Relevance Feedback

Multimedia data is ubiquitous and is involved in almost every aspect of our lives. Likewise, much of the world’s data is in the form of time series, and as will be shown, many other types of data, such as video, image, and handwriting, can be transformed into time series. This fact has fueled enormous interest in time series retrieval in the database and data mining community. However, much of this work’s narrow focus on efficiency and scalability has come at the cost of usability and effectiveness. In this work, we explore the utility of the multimedia data transformation into a much simpler one-dimensional time series representation. With this time series data, we can exploit the capability of Dynamic Time Warping, which results in a more accurate retrieval. We can also use a general framework that learns a distance measure with arbitrary constraints on the warping path of the Dynamic Time Warping calculation for both classification and query retrieval tasks. In addition, incorporating a relevance feedback system and query refinement into the retrieval task can further improve the precision/recall to a great extent.

Chotirat Ann Ratanamahatana, Eamonn Keogh
WebArc: Website Archival Using a Structured Approach

Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose

WebArc

as a set of software tools to allow users to construct a logical structure for a website to be archived. Classifiers are trained to determine relevant web pages and their categories, and subsequently used in website downloading. The archival schedule can be specified and executed by a scheduler. A website viewer is also developed to browse one or more versions of archived web pages.

Ee-Peng Lim, Maria Marissa

Human-Computer Interfaces

Interactive Causal Schematics for Qualitative Scientific Explanations

We present a simple model for describing causal processes. We apply it to generate schematics of complex scientific processes. Our interface allows users to select among causal threads and then to follow the state transitions of those explanations. Moreover, these schematics can provide a framework for interacting with texts.

Robert B. Allen, Yejun Wu, Jun Luo
Automatic Conversion System for Mobile Cartoon Contents

As the production of mobile contents is increasing and many people are using it, the existing mobile contents providers manually split cartoons into frame images fitted to the screen of mobile devices. It needs much time and is very expensive. This paper proposes an Automatic Conversion System (ACS) for mobile cartoon contents. It converts automatically the existing cartoon contents into mobile cartoon contents using an image processing technology as follows: 1) A scanned cartoon image is segmented into frames by structure layout analysis. 2) The frames are split at the region that does not include the semantic structure of the original image 3) Texts are extracted from the splitting frames, and located at the bottom of the screen. Our experiment shows that the proposed ACS is more efficient than the existing methods in providing mobile cartoon contents.

Eunjung Han, Sungkuk Chun, Anjin Park, Keechul Jung
Subjective Relevance: Implications on Interface Design for Information Retrieval Systems

Information retrieval (IR) systems are traditionally developed using the objective relevance approach based on the “best match” principle assuming that users can specify their needs in queries and that the documents retrieved are relevant to them. This paper advocates a subjective relevance (SR) approach to value-add objective relevance and address its limitations by considering relevance in terms of users’ needs and contexts. A pilot study was conducted to elicit features on SR from experts and novices. Elicited features were then analyzed using characteristics of SR types and stages in information seeking to inform the design of an IR interface supporting SR. The paper presents initial work towards the design and development of user-centered IR systems that prompt features supporting the four main types of SR.

Shu-Shing Lee, Yin-Leng Theng, Dion Hoe-Lian Goh, Schubert Shou-Boon Foo

Metadata Issues in Digital Libraries

Scalability of Databases for Digital Libraries

Search engines of main-stream literature digital libraries such as ACM Digital Library, Google Scholar, and PubMed employ file-based systems, and provide users with a basic boolean keyword search functionalities. As a result, new and powerful querying capabilities are not easy to implement on top of such systems, and not provided. In comparison, query languages of database systems traditionally have high expressive power. This paper evaluates the scalability of the approach of deploying relational databases as backend systems to digital libraries, and, thus, making use of the query languages and the query processing capabilities of database query engines for literature digital libraries.

To evaluate our approach, we built a scalable prototype digital library built on top of a relational database management system, and its advanced query interface which allows users to specify dynamic text and path queries in an intuitive, hierarchical manner. This paper evaluates the scalability of two search query processing approaches, namely, ad-hoc queries, pre-compiled queries (stored-procedures). We demonstrate that, with reasonably priced hardware, we are able to build an RDBMS-based digital library search engine that can scale to handle millions of queries per day.

John Chmura, Nattakarn Ratprasartporn, Gultekin Ozsoyoglu
GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol for Dynamic Metadata Management Under Grid Environment

The imperative demand on the description of semantic metadata and the processing of real-time data presents unique challenge to Grid Metadata Service. Grid Monitoring Architecture (GMA), which is a framework for dynamic data management, is limited by its conventional interface of relational database and therefore fails to address the problem of interoperability. Faced with the problem of metadata publishing in GMA, we present a new publish-harvest protocol for semantic metadata called GMA-PSMH (Grid Monitoring Architecture-Protocol for Semantic Metadata Harvesting) by modifying the OAI-PMH metadata harvest framework. As part of the Semantic Metadata Service Project in Peking University, the associated dynamic metadata management framework is then implemented according to the above protocol. At the end, we make the conclusion and overview the future work.

Yaping Zhu, Ming Zhang, Kewei Wei, Dongqing Yang
Choosing Appropriate Peer-to-Peer Infrastructure for Your Digital Libraries

Peer-to-Peer (P2P) overlay network aims to be a feasible platform for building federated but autonomous digital libraries. However, due to a plethora number of P2P infrastructures and corresponding functionalities, it is often not easy to choose appropriate candidates for specific applications. This paper is devoted for this issue by comparing some typcial P2P systems widely used in digtal library or databbase communities and extending an open discussion on how to determine proper infrastructures according to specific system requirements.

Hao Ding, Ingeborg Sølvberg

Posters

XML Document Retrieval for Digital Museum

In this paper, we design an XML document retrieval system used for a digital museum. The system can retrieve XML documents based on both document structure and content. In order to support retrieval based on document structure, we perform the indexing of XML documents based on their basic unit of elements. For supporting retrieval based on content, we design a high-dimensional index structure using the CBF [1] method. Finally, we provide a similarity measure for retrieval on a composite query, based on both document structure and content.

Jae-Woo Chang, Yeon-Jung Kim
Design and Implementation of the IMS-IPMP System in Convergence Home-Network Environment

The traditional contents distribution architecture was fixed pattern, as the distribution subjects – Contents Provider (CP), Contents Distributor (CD), and Contents Consumer (CC) – create, distribute, and consume the digital contents, while in Convergence Home-network Environment (CHE) it would be necessary for each subject to have the distribution system that can be flexibly changed.

In this paper, we designed and implemented the Interactive Multimedia Service- Intellectual Property Management and Protection (IMS-IPMPS) by supplementing and expanding MPEG-21 DID, IPMP, REL, and the mechanism of OMA-DRM. In the proposed system, it is possible for CP to complete the appropriate authentication in CHE, generate the contents, and then register the packaged contents in safety for its distribution. In addition, it is the IMS-IPMP system where one can simultaneously consume the contents that the users from the different home network environments have generated and registered.

Jong-Hyuk Park, Sang-Jin Lee, Yeog Kim, Byoung-Soo Koh
Geotechnical Use of WebGIS in Digital Library Projects

This paper briefly introduces GIS, how the technology can be applied, and discusses the benefits of its use in design. To illustrate GIS applications, an actual project utilizing GIS is presented.

Bo Yu, Huijuan Zheng, Meng Zhan
Establishment of “The Multiple Thesaurus of Chinese Classification” and Using Its Application to Improve the Retrieval Efficiency in the Chinese Bibliographic Databases

Due to the limitations of Chinese character set in the windows operating system, homophones, pictographic characters or pinyin are often used to replace the complex traditional Chinese characters or any characters that haven’t been designed in the character set of the Computer Operating System; as an investigation shows

[1]

the item of subject in most bibliographic databases is not complete; there are various kinds of classification schemes co-exiting in the bibliographic database, which is very common in the merged universities and regional union catalogues. These problems have greatly affected the efficiency and sharing of digital service. To improve the retrieval efficiency within the present situation in the bibliographic databases, we can design a “multiple thesaurus” i.e. the “Database of Classification Reference” including two classifications and design new method of retrieval for the computerized management of libraries.

Meng Zhan
METS Cataloging Tool for Heterogeneous Collections

This paper describes the implementation of DRMSCata, an XML Schema driven web-base cataloging subsystem, which helps to produce various METS encoded documents.

Li Dong, Chunxiao Xing, Kehong Wang, Shixin Peng, Bei Zhang, Airong Jiang
A Method for Creating a High Quality Collection of Researchers’ Homepages from the Web

In the web space, information of an entity is often presented by a set of pages that constitutes a logical page group and its proper handling is required. This paper proposes a method for collecting researchers’ homepages (or entry pages) by applying new simple and effective page group models for combining page group structure and page content, aiming at narrowing down the candidates for further precise and heavy processes. We mainly focus on high recall but less on precision.

Yuxin Wang, Keizo Oyama
Personalized Information Service Based on Social Bookmarking

Social bookmarking is emerging as a new information infrastructure on the web, and has the ability of organizing multicultural metadata for large scale of digital entities. To achieve its potential, we propose a technique to extract the substantial correlation among tags. Based on it, we maintain profiles for users’ interests, and recommend items according to them. Experiments against data from del.icio.us reveal the superiority of our method.

Yanfei Xu, Liang Zhang
Multi-indexing System for News Stories Based on XML Documents

Indexing is one of the most important key factors in efficient XML information retrieval. Inappropriate indexing may result in improper search results. This paper presents a multi-indexing system that considers not only structure information of documents but also characteristics of pertinent elements in XML documents. The system extracts semantic elements from XML document corpus and identifies characteristics of the elements. By using the characteristics, document structures are classified and a particular indexing method is assigned to each classified structure. Efficiency of our system is confirmed through XML dataset from news stories with relatively accurate formats. The results indicate that the system has significantly high precision in search by element.

Youngrok Song, Kyonam Choo, Yoseop Woo, Hongki Min, Wonung Lee
Two-Stage Access Control Model for XML Security

As large corporations and organizations increasingly exploit the Internet as a means of improving business-transaction efficiency and productivity, it is increasingly common to find operational data and other business information in XML format. Access control for XML database is non-trivial subjects. A number of recent research efforts have considered access control models for XML data

[1 − 5]

. Our first contribution is a novel model for specifying XML security access control. Given an XML document accompanied by a document DTD, we allow a two-stage access control policies to pledge to security access XML document at file-level and element-level respectively. On the element-level access control, our approach for these access control policies is based on the novel notion of

hide-node views

. While the hide-node view DTD is exposed to authorized users, neither the internal XPath annotations nor the full document DTD is visible. Authorized users can only operate data over the hide-node view, making use of the exposed view DTD to access data. Our hide-node view mechanism guarantees that unauthorized user cannot access sensitive data and protects the schema information from access by unauthorized users. We think that the schema information also is sensitive data and should be protected from gain through the data accessing.

Wei Sun, Da-xin Liu, Tong Wang
Optimal Face Classification by Using Nonsingular Discriminant Waveletfaces for a Face Recognition

This paper proposes an algorithm on a face classification by using 2D wavelet subband transform and nonsingular fisher discriminant analysis for a face recognition. For a feature extraction, we apply the multiresolution wavelet transform to extract waveletfaces. We also perform the linear discriminant on waveletfaces to reinforce the discriminant power. During classification, the nonsingular fisher discriminant waveletfaces are used. In this study, we found that NDW (Nonsingular Discriminant Waveletface) solves the small sample size matter. Thus, NDW is superior to LDA for an efficient face classification.

Jin Ok Kim, Kwang Hoon Chung, Chin Hyun Chung
Evaluating Score and Publication Similarity Functions in Digital Libraries

Digital libraries do not assign importance/relevance scores to their publications, authors, or publication venues, even though scores are potentially useful for (a) providing comparative assessment, or “importances”, of publications, authors, publication venues, (b) ranking publications returned in search outputs, and (c) using scores in locating similar publications. Using social networks and bibliometrics, one can define several score functions.

S. Bani-Ahmad, A. Cakmak, A. Al-Hamdani, Gultekin Ozsoyoglu
Facilitating Resource Utilization in Union Catalog Systems

Numerous papers have addressed building Union Catalog (UC) technologies, such as metadata schema, data exchange (e.g., OAI-PMH, Z39.50), and resource classification. In contrast to these approaches, in this article, we focus on UC technologies that leverage resource utilization across digital resources. We take the repositories of Taiwan’s National Digital Archives Program (NDAP) in Taiwan as the project content to identify major resource utilization issues. Our solution methodologies include resource unification, information query, information navigation, and unencoded character handling.

Yung-Teng Tsai, I-Chia Chang
Research on Grid-Aware Mechanisms and Issues for CADAL Project

CADAL (China America Digital Academic Library) is a cooperated project of universities and institutes in China and America. Regarding the problem that multi-discipline digital libraries are dispersed and isolated without sufficient interconnection, grid technology is introduced for its advantages in large-area resource cooperation and sharing. This paper analyzes how design principles and mechanisms, applied to digital libraries, can be nested within the OGSA paradigm, and further improves multimedia retrieval.

Hong Zhang, Yueting Zhuang, Jiangqin Wu, Fei Wu
Data Cleansing and Preparation for Moving Toward Electronic Library Repository

Manually annotated metadata usually contains errors from mistyping; however, correcting those metadata manually could be costly and time consuming. This paper proposed a framework to ease metadata correction processed by proposing a system that utilizes OCR and NLP techniques to automatically extract metadata from document image. The system firstly converts images into text using OCR and then extracts metadata from OCR results. After that, the extracted metadata are compared with the data in existing repository to locate error entries. The error entries are then displayed to users whom will correct them using supporting information. Although human decision is required to correct the error manually, this step is necessary with only error entries. The experimental results with 3,712 thesis abstracts show that the proposed solution can automatically extract the relevance information with 91.41% accuracy.

Asanee Kawtrakul
A Peer-to-Peer Architecture for Web Annotation Sharing

In this paper, we present a peer-to-peer (P2P) architecture design called PWAS to share personal Web annotations with a hybrid hierarchical P2P approach. PWAS maintains a user-centric annotation environment in which personal annotations can be flexibly shared with a reduced number of query messages.

Cheng-Zen Yang, Shen-Chi Chen, Ing-Xiang Chen

Keynote and Invited Papers

A Common Grammar for Diverse Vocabularies: The Abstract Model for Dublin Core

In its tenth year, the Dublin Core Metadata Initiative has published the “DCMI Abstract Model” (DCAM) as a syntax-independent basis for interoperability of metadata across a diversity of technologies and implementation platforms. Developed since 1997 in parallel with related W3C standards, the DCAM provides a praxis-oriented model for describing resources and for carrying descriptions of multiple resources — i.e., a Dissertation, its Author, and the author’s Institution — in exchangeable records. The model associates properties with resources in a way designed to facilitate the creation of mappings and the merging of metadata from a diversity of sources into cross-domain portals and repositories. By design, the model is also compatible with more complex and expressive ontology languages.

Underlying the DCMI approach are several practical insights: The first is that in our complex, multi-lingual world, it is realistic to limit expectations for shared understanding (“semantic interoperability”) to a pidgin-like core of generic concepts. The second is that metadata based on complex, hierarchical schemas is difficult to re-use outside a specific application context unless it was pre-designed to be mapped to simpler models.

This approach to interoperability — a focus on core semantics on the basis of a modular, generic model — is of more general use than for describing resources with the well-known, fifteen-element “Dublin Core.” The approach is also reflected in standards such as “SKOS Core,” an RDF vocabulary for translating existing thesauri (and other Simple Knowledge Organization Systems) into a form usable for intelligent processing. Sharing a model allows implementers to draw on a diversity of vocabularies — DC, SKOS, and vocabularies more specialized or application-specific — as needed, in creating “application profiles” that reflect requirements and content-level agreements (“cataloging rules”) within particular implementation communities. Sharing a common model also allows different communities to maintain vocabularies which themselves remain small and manageable, yet when combined in application profiles may be highly expressive.

Having achieved a stable model, DCMI is shifting the emphasis of its activities to that reviewing real-world profiles which can be used as good-practice examples by designers of new applications.

Thomas Baker
Thai’s Invaluable Memory Celebrated Via Global Memory Net

In this digital era, the mode of universal access for information seeking and knowledge acquisition differs greatly from the traditional ways. We have witnessed the exciting convergence of content, technology, and global collaboration in the development of digital libraries, which has offered us unbounded opportunities for dynamic information access and delivery. This author has experienced much of the transformations from analog to digital in the last two decades through her own R&D activities – from the creation of interactive videodiscs and multimedia CDs on the First Emperor of China’s terracotta warriors and horses in the 1980s and 1990s to leading a current international digital library project, Global Memory Net (GMNet), supported by the US National Science Foundation. In presenting her vision for linking world digital resources together for universal access, she will share with the audience the latest development of Global Memory Net In honor of H.R.H. Princess Maha Chakri Sirindhorn at the special occasion of her 50

th

Birthday. The invaluable images of H.R.H. Princess and the Royal Family, as well as some of the most attractive sites of Thailand, such as the Grand Palace, etc. are included in GMNet as a part of the rare Thai Memory.

Ching-chih Chen
Digital Library Development in the Asia Pacific

Over the past decade the development of digital library activities within Asia Pacific has been steadily increasing. Through a meta-analysis of the publications and content within International Conference on Asian Digital Libraries (ICADL) and other major regional digital library conferences over the past few years, we see an increase in the level of activity in Asian digital li brary research. This reflects high continuous interest among digital library re searchers and practitioners internationally. Digital library research in the Asia Pacific is uniquely positioned to help develop digital libraries of significant cul tural heritage and indigenous knowledge and advance cross-cultural and cross-lingual digital library research.

Hsinchun Chen
From the WWW and Minimal Digital Libraries, to Powerful Digital Libraries: Why and How

Digital libraries have emerged since the early 1990s, distinguished in part by their emphasis on useful content, helpful organization, and a range of services that include at least indexing, searching, and browsing. In the 5S (Streams, Structures, Spaces, Scenarios, and Societies) formal model for digital libraries we precisely define key concepts and terms, so the field can move beyond the stage of continually explaining basic ideas and debating definitions. Thus, we define a

minimal digital library

in terms of clear definitions for

repository, metadata catalog, services,

and

societies,

which in turn build upon characterizations of

digital object, collection, hypertext,

etc.

Edward A. Fox
Metadata Models Toward Community-Oriented Metadata Schemas

Metadata has been widely recognized as an important issue in digital libraries in many aspects. This report briefly describes models and frameworks of metadata schemas developed through metadata-centric research projects at University of Tsukuba, which are a few subject gateways and a few metadata schema projects primarily based on Dublin Core and Web technologies.

Shigeo Sugimoto
Backmatter
Metadaten
Titel
Digital Libraries: Implementing Strategies and Sharing Experiences
herausgegeben von
Edward A. Fox
Erich J. Neuhold
Pimrumpai Premsmit
Vilas Wuwongse
Copyright-Jahr
2005
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-32291-7
Print ISBN
978-3-540-30850-8
DOI
https://doi.org/10.1007/11599517

Neuer Inhalt