Skip to main content

2011 | Buch

Digital Libraries: For Cultural Heritage, Knowledge Dissemination, and Future Creation

13th International Conference on Asia-Pacific Digital Libraries, ICADL 2011, Beijing, China, October 24-27, 2011. Proceedings

herausgegeben von: Chunxiao Xing, Fabio Crestani, Andreas Rauber

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 13th International Conference on Asia-Pacific Digital Libraries, ICADL 2011, held in Beijing, China, in October 2011. The 33 revised full papers, 8 short papers and 9 poster papers presented were carefully reviewed and selected from 136 submissions. The topics covered are digital archives and preservation; information mining and extraction; medata, catalogue; distributed repositories and cloud computing; social network and personalized service; mobile services and electronic publishing; multimedia digital libraries; information retrieval; and tools and systems for digital library.

Inhaltsverzeichnis

Frontmatter

Keynotes

Drowning in the Data Deluge: Digital Library Challenges for Asia

Scholarly communication no longer consists merely of papers and publications. Research data have become valuable objects to be captured, documented, and shared. Funding agencies are requiring “data management plans” for all new proposals. Libraries, universities, and research institutes are assessing how to manage those data in ways that can be leveraged for future value. But what are “data”? We are drowning in them without being able to define what they are. This talk will explore the shifting landscape of scholarly information, with special attention to how these shifts may influence digital libraries in Asia. Research is disseminated by many formal and informal means, not only by libraries and publishers but also by new media such as preprint repositories and tweets. Access may be faster – if one can separate signal from noise amidst the plethora of communication channels. These changes are the result of the transition from a closed scholarly world to the open Web, the shift in content and context of networked information, the shift in focus from information services for readers to those for authors, and differences between publications and data. If future scholars are to use the scholarly content of yesterday, today, and tomorrow, the digital library community must reclaim information retrieval, rethink partnerships throughout the information life cycle, share responsibility for the information infrastructure, and address policy and incentive issues.

Christine L. Borgman
Building a Social Media Digital Library: Collection, Management, and Analytics

In this talk I will present the University of Arizona Artificial Intelligence Lab’s recent research in Dark Web, Geopolitical Web, and Business Analytics. Based on funding from the NSF and several other US agencies, the AI Lab has developed techniques for collecting, managing and analyzing large-scale multilingual and multimedia social media contents of relevance to social, geopolitical, and business applications. Our projects aim to study and understand critical social and business phenomena in the cyber world and real world via a computational, data-centric approach. We aim to collect critical social media content generated by various political and business groups, including web sites, forums, chat rooms, blogs, social networking sites, videos, virtual worlds, etc. A social media digital library and portal system has been developed to manage and access these critical multilingual and multimedia contents. We have also developed advanced multilingual data mining, text mining, and web mining techniques to perform link analysis, content analysis, web metrics (technical sophistication) analysis, sentiment analysis, authorship analysis, and video analysis in our research. Selected case studies in geopolitical domains and business intelligence applications will be discussed.

Hsinchun Chen
Developing MetaKnowledge Services: The Next Paradigm for Digital Libraries

Science is matching toward a new paradigm of data-intensive knowledge discovery enabled by massive availability of digital data at a time of grand challenges of global scale, interdisciplinary nature, and translational complexity. This combination of events gives rise to great opportunities of meta-knowledge services where the relations, patterns, emerging trends, hidden possibilities, ignored abnormalities, etc., can be revealed and tested.

Several approaches of meta-knowledge services are here today or in near-future. Intelligent monitoring and visualizing of research fields and emerging topics help researchers keep track of development; Literature and patent analysis reveals complicated patterns of research and its competition or cooperation; Output, impact, and portfolio analysis supports official evaluation of research organizations, groups, and individuals; Path exploration and road-mapping are interactively used to build and test research plans; Meta-reading of large amount of data provides students with effective ways to structure knowledge and identify key points.

National Science Library, CAS, as its innovation and future-enabling strategy, has been developing a meta-knowledge-service-centric service structure. On one hand, it arms its analyst teams with sophisticated computational tools of R&D tracking, trends detecting, technology analysis, competition/cooperation analysis, R&D mapping, etc. On the other hand, it re-structures its digital information services into a linked open data based and ontological systems driven discovery platform. These meta-knowledge services require a much different approach from current digital libraries, with the emphasis on the discovery and decision-making utilization of content. A meta-knowledge-driven service cannot be achieved as a simple extension of current digital libraries. Paradigmatic shifts are needed to go beyond the traditional search and retrieval model.

Xiaolin Zhang
Mobile Information Management and Retrieval

The number of “smart” mobile devices such as wireless phones and tablet computers has been rapidly growing. These mobile devices are equipped with a variety of sensors such as camera, gyroscope, accelerometer, compass, NFC, WiFi, GPS, etc. These sensors can be used to capture images and voice, detect motion patterns, and predict locations, to name just a few. This keynote depicts techniques in configuration, calibration, computation, and fusion for improving sensor performance and conserving power consumption. Novel information management and retrieval applications that can benefit a great deal from enhanced sensor technologies are also presented.

Furthermore, the Mobile 2014 research program coordinated by Google Research in China has been funding research projects related to mobile location based service since 2010. This program has granted several research awards to universities in the US and Asia to conduct work in sensor signal fusion, location- based data service, peer-to-peer protocols, privacy-preserved data mining, and applications assisted by inertial navigation systems. Highlights of this program are enumerated to motivate research into advancing mobile information management and retrieval.

Edward Y. Chang

Digital Archives and Preservation

High Speed Capture, Retrieval and Rendering of Segment-Based Annotations on 3D Museum Objects

The aim of the 3D Semantic Annotation (3DSA) system is to deliver a Web-based semantic tagging and annotation service for 3D cultural heritage objects - that enables users to attach semantic tags/annotations to points, surface regions and volumetric segments on 3D digital objects. Specific objectives of the 3DSA system are: support for interactively defined, complex 3D segments; interoperability of the resulting tags/annotations; and fast, efficient capture, retrieval and rendering of annotations on complex 3D fragments. With these objectives in mind, the 3DSA system is based on the Open Annotations Collaboration (OAC) model, which has been extended using X3D fragment identifiers. This paper describes our implementation of the X3D extensions to the OAC data model and demonstrates how this approach significantly improves the speed of capturing, retrieving, downloading and rendering annotations on volumetric segments. The context for this work is the capture of community-generated tags and annotations for cultural heritage artifacts from the University of Queensland Antiquities Museum.

Chih-Hao Yu, Tudor Groza, Jane Hunter
Digitization and Value-Add Application of Bamboo Weaving Artifacts

Chinese people used bamboo to make bamboo weaving utensils for hunting, farming, fishing, and even transportation before. However, bamboo weaving utensils are no longer needed in daily life nowadays. The craft of bamboo weaving utensils is gradually losing people’s attention that few craftsmen can still work on it. In this study, a folklore hobbyist, a craftsman, a horticulturist, and an interior decorator were invited to digitize bamboo weaving artifacts and crafts, as well as to develop value-add applications of the artifacts. Among the 1200 collected bamboo weaving artifacts, 150 artifacts accompanied with 20 weaving patterns have been digitized and stored with image and video formats, respectively. The value-add refers to the adoption of the bamboo weaving artifacts as flower vases for orchid planting and flower arranging with artworks designed by the horticulturist, which were then adopted by the interior decorator to decorate restaurants to elevate the environmental quality. The digitized contents were also used as part of the e-learning materials in a community college. The questionnaire surveys show that the digitized material is useful for learning bamboo weaving craft and flower arrangement skills for students. It was found that combination of bamboo weaving artifact and flower arrangement highly promote quality of service of restaurants.

Kuo-An Wang, Ya-Chin Liao, Wei-Wei Chu, John Yi-Wu Chiang, Yung-Fu Chen, Po-Chou Chan
Digital Archive “Dao Fa Hui Yuan” for Daoism Research

Dao Fa Hui Yuan, is an important resource for the study of Daoism. It is a compilation of a number of

Fus

used in Daoism. A

Fu

is expressed as a complex graphical symbol composed of one or more constituent parts. This paper presents the Dao Fa Hui Yuan digital archive named Digital DFHY which is designed to help researchers explore relationships among sacred symbols of yet unknown meaning and associated natural language text. Some of the descriptions of

Fus

in DFHY include only the name and the graphic expression of a

Fu

, and others include descriptions of the constituent parts of a

Fu

in addition to its name and graphic expression. This paper describes the Digital DFHY, and how it can be used with a network analysis tool to identify relationships among the

Fus

and their constituent parts.

XiaoXiao Feng, Koichi Matsumoto, Shigeo Sugimoto
Libraries in a Digital Frontier: Preserving Chinese Canadian Cultural Heritage

As a three-year community-based research project at the University of British Columbia,

Chinese Canadian Stories: Uncommon Histories from a Common Past

is government grant-funded project by the Community Historical Recognition Program (CHRP) that brings together the expertise and resources of a wide range of UBC Library units and off-campus partners: from the digitization of archival material of UBC Library’s Rare Books & Special Collections; to the digital storage infrastructure of UBC’s Digital Initiatives; to the community outreach and digital technology of the Irving K. Barber Learning Centre; to the Chinese language online resources and community historical preservation expertise of the Asian Library. Through partnerships with community and civic institutions nationwide, this UBC-library led project focuses on three initiatives: a one-stop web portal, a series of community workshops, and digital interactive cultural game using cutting edge technologies. This paper is a progress report of the project.

Allan Cho, Yu Li
Automated Preservation: The Case of Digital Raw Photographs

In digital preservation, a common approach for preservation actions is the migration to standardized formats. Full validation of the results of such conversion processes is required to ensure authenticity and trust. This process of

quality assurance

is a key obstacle to achieving scalability for large volumes of content. In this article, we address the quality assurance process for the preservation of born-digital photographs and validate conversions of raw image formats into standard formats such as Adobe Digital Negative. To achieve this, we rely on a systematic planning framework. We classify requirements that have to be evaluated according to their measurement needs. We extend an existing measurement framework using a combination of tools, image similarity algorithms, and purpose-built plugins. By combining metadata extraction, image rendering and comparison, and perceptual-level quality assurance, we evaluate the feasibility of automating the core part of quality assurance that is often the most costly part of preservation processes.

Stephan Bauer, Christoph Becker

Information Mining/Extraction

Image Tagging by Exploiting Feature Correlation

Image tagging is a task that automatically assigns the query image with semantic keywords called tags. Since tags and image visual content are represented in different feature space, how to merge the multiple features by their correlation to tag the query image is an important problem. However, most of existing approaches merge the features by using a relatively simple mechanism rather than fully exploiting the correlations between different features. In this paper, we propose a new approach to fusing different features and their correlation simultaneously for image tagging. Specifically, we employ a Feature Correlation Graph to capture the correlations between different features in an integrated manner, which take features as nodes and their correlations as edges. Then, a revised probabilistic model based on Markov Random Field is used to describe the graph for evaluating tag’s relevance to query image. Experiments on large real-life corpuses collected from Flickr indicate the superiority of our proposed approach.

Xiaoming Zhang, Zhoujun Li
Semi-supervised Bibliographic Element Segmentation with Latent Permutations

This paper proposes a semi-supervised bibliographic element segmentation. Our input data is a large scale set of bibliographic references each given as an unsegmented sequence of word tokens. Our problem is to segment each reference into bibliographic elements, e.g. authors, title, journal, pages, etc. We solve this problem with an LDA-like topic model by assigning each word token to a topic so that the word tokens assigned to the same topic refer to the same bibliographic element. Topic assignments should satisfy contiguity constraint, i.e., the constraint that the word tokens assigned to the same topic should be contiguous. Therefore, we proposed a topic model in our preceding work [8] based on the topic model devised by Chen et al. [3]. Our model extends LDA and realizes unsupervised topic assignments satisfying contiguity constraint. The main contribution of this paper is the proposal of a

semi-supervised

learning for our proposed model. We assume that at most one third of word tokens are already labeled. In addition, we assume that a few percent of the labels may be incorrect. The experiment showed that our semi-supervised learning improved the unsupervised learning by a large margin and achieved an over 90% segmentation accuracy.

Tomonari Masada, Atsuhiro Takasu, Yuichiro Shibata, Kiyoshi Oguri
A Discretization Algorithm of Numerical Attributes for Digital Library Evaluation Based on Data Mining Technology

We present here a discretization algorithm for numerical attributes of digital collections. In our research data mining technology is imported into digital library evaluation to provide a better decision-making support. But data prediction algorithms work not well based on the traditional discretization method during the data mining process. The reason is that numerical attributes of digital collections are complicated and not in the same scale of distribution distance. We study the characteristic of numerical attributes and put forward a discretization method based on the Z-score idea of mathematical statistics. This algorithm can reflect the dynamic semantic distance for different numerical attributes and significantly enhance the precision rate and recall rate of data prediction algorithms. Furthermore a ‘nonlinear conditional relationship’ among attributes of digital collections is discovered during the study of discretization algorithm and impacts the actual application result of traditional data mining algorithms.

Yumin Zhao, Zhendong Niu, Xueping Peng, Lin Dai
Sentence-Level Sentiment Polarity Classification Using a Linguistic Approach

Recent sentiment analysis research has focused on the functional relations of words using typed dependency parsing as this provides a refined analysis on the grammar and semantics of the textual data, which could improve performance. However, typed dependencies only provide the grammatical relationships between individual words while there exist more complex relationships between words that could influence a sentence sentiment polarity. In this paper, we propose a linguistic approach, called Polarity Prediction Model (PPM), that combines typed dependencies and subjective phrase analysis to detect sentence-level sentiment polarity. Our approach also considers the intensity of words and domain terms that could influence the sentiment polarity output. PPM is shown to provide a fine-grained analysis for handling and explaining the complex relationships between words in detecting a sentence sentiment polarity. PPM was found to consistently outperform a baseline model by 5% in terms of overall F1-score, and exceeding 10% in terms of positive F1-score when compared to a Typed-dependency only approach.

Luke Kien-Weng Tan, Jin-Cheon Na, Yin-Leng Theng, Kuiyu Chang
A System for Using National Bibliographies in Rights Information Infrastructures

In the process of digitising a book, a library needs to clear the rights associated with it. Rights clearance is largely a manual, time-consuming process which possibly costs more than the actual digitisation. To analyse the rights situation, a range of information is required, which is distributed across several national databases hosted in national libraries, publishers and collective rights organisations. National bibliographies are a key data source in these processes, as they are the only source to identify all the publications of a specific intellectual work in a country. However, national bibliographies are not designed and built to support rights clearance purposes. The information in bibliographic records results from cataloguing with users and library management in mind, and links between different publications of a single intellectual work are not available. This paper presents a system implemented in The European Library to integrate national bibliographies into the ARROW (Accessible Registries of Rights Information and Orphan Works) rights information infrastructure. The system makes it possible to identify all different publications with a common underlying intellectual work. This system forms the main source of bibliographic metadata due to the fact that The European Library is an aggregator of all Europe’s national library catalogues.

Nuno Freire, Andreas Juffinger
Exploiting Attribute Redundancy for Web Entity Data Extraction

Web entities are often associated with many attributes that describe them. It is essential to extract these attributes for Web entity data extraction. This paper proposes a novel approach using duplicated attribute value pairs. We start by constructing a initial seed set of attributes including names and enumerable values, and a training set of Web pages from target website; After that we locate the position of each attribute by matching attribute values within the pages of the site with values contained in the seed set; Thirdly we choose the position with the highest supportiveness as path for extraction, which we use to extract other attribute value pairs with the same template. Finally, we conduct an extensive experimental study with large real data set to demonstrate the effectiveness of our extraction approach.

Yanxu Zhu, Gang Yin, Xiang Li, Huaimin Wang, Dianxi Shi, Lin Yuan
Understanding Playability and Motivational Needs in Human Computation Games

Human computation games (HCGs) refer to applications that use games to harness human intelligence to perform various computational tasks. We examine how different types of HCGs affect players’ perceptions and their motivational appeal, as these influence good HCG design. We focus on image tagging HCGs, where users play games to generate keywords for images. Three versions were created: collaborative HCG which required players to cooperate, competitive HCG where players worked against each other, and a control non-game manual tagging application. The applications were evaluated to uncover participants’ playability perceptions, and the influence of motivational needs on usage intention. Results suggest that participants reported liking the collaborative and competitive HCGs over the control application. Further, using the trichotomy of needs theory, we found that an individual’s need for achievement and power influenced intention to use the various applications.

Dion Hoe-Lian Goh, Chei Sian Lee

Metadata/Catalogue

A Metadata Framework for Cloud-Based Digital Archives Using METS with PREMIS

An increasing number of organizations are using cloud computing to create and store digital records. To ensure safe storage and long-term preservation, standards for metadata are needed. This paper proposes a metadata application profile for cloud archives.

We use guidelines from the Singapore Framework for Dublin Core Application Profiles to define the functional requirements, domain model and description set profile that form the basis of the proposed application profile. In our profile, we use METS as a transmission and package format, extending it with metadata from the PREMIS data dictionary and Dublin Core Metadata Element Set.

Using the proposed application profile, we create an example METS information package using predefined criteria. We find that the profile meets the functional requirements, and that it simplifies metadata provision for business systems, compared to systems that do not allow pre-registration.

Jan Askhoj, Shigeo Sugimoto, Mitsuharu Nagamori
Coding FRBR-Structured Bibliographic Information in MARC

The lack of support for the FRBR model in current bibliographic standards has been a major bottleneck for the implementation and use of this model in library databases. In this paper we present solutions for coding FRBR structures using MARC and show that it is possible to code even more complex FRBR structures within the current format. This solution promises a migration path for library systems without losing the compatibility with existing standards.

Trond Aalberg, Tanja Merčun, Maja Žumer
Metrics for Metadata Quality Assurance and Their Implications for Digital Libraries

This study aims at developing a set of common metrics for metadata quality at data element level. The proposed metrics are used to assure the quality for metadata that converted from heterogeneous metadata formats and sources into a Dublin Core based centralized digital repository. This study adopted the metrics provided by Bruce and Hillmann [2] as a basis to develop the proposed metrics as follows: conformance to expectations, provenance, accuracy, completeness and accessibility. A hybrid approach of automatic and manual had developed for assessing metadata quality in practice. Target subjects were two individual projects selected from the Taiwan E-Learning and Digital Archives Program as use case to illustrate the details and examine the feasibility of the proposed metrics. Finally, this study discussed the implications of proposed metadata quality metrics for digital libraries in the following perspectives: project management, metadata management, hidden quality problems and accessibility.

Ya-Ning Chen, Chun-Ya Wen, Hui-Pin Chen, Yen-Hung Lin, Hon-Chung Sum
Research and Practice of Electronic Resources Preservation in Tsinghua University Library

Our university library decide to build up a local electronic resources preservation and management platform, which can both provide the preservation of subscribed electronic resources, and provide the access to those resources. In this paper, we introduce some of our research and practice for the task, including the framework design, AIP and metadata framework design and practice. In addition to deciding on the more traditional descriptive and administrative metadata, particular care needs to be given to the choice of structural and preservation metadata, as well as to integrating the various metadata components. We have selected METS structural, PREMIS preservation and QDC descriptive metadata for the Tsinghua University Library’s digital preservation system and used METS as the container format for integrating the various metadata components, and developed a metadata framework to aid in the preservation of digital resources.

Ting Zeng, Li Dong, Chao Li, Gang Chen
User Tagging for Digital Archives: The Case of Commercial Keywords from the Grand Secretariat

To make digital archives more accessible to industries, this study used tags given by cultural creative industries to items in digital archives to analyze the differences between the terms used by commercial users and scholars from the archive agency. We analyzed the self-created commercial tags (60%) and the tags adopted from academic terms (40%). The results showed that terms provided by the archive agency were still more likely to be based on domain knowledge. In comparison, the superordinate terms are more likely to be needed by the commercial users. This study suggests that the research findings of six types of semantic relationship and eight types of linked properties could be further transformed into metadata best practices for the digital archive agency, thus bridging the divide between the commercial tags and academic terms.

Shu-Jiun Chen

Distributed Repositories and Cloud Computing

From Box to Bin – Semi-automatic Digitization of a Huge Collection of Ethnological Documents

Especially in the field of cultural heritage preservation, one has to deal with information formerly collected on non-digital media such as paper, which is highly subject to deterioration and decay. The phrase “From Box to Bin” provokes, because it is connoted differently when not used in context of digitization. Most digitization projects follow a very straightforward workflow to convert documents into digital data. “From

Box

to

Bin

” literally depicts the workflow of our digitization project, based on a huge archive of ethnological documents, in which documents are filed in

box

es. On the one hand, our project is aimed at long time preservation of documents on film reels in metallic

bin

s stored under the earth’s surface in a closed mine. On the other hand, documents are converted into

bin

ary data, which will be provided for research within a digital archive system. In this article, we describe our very specific digitization workflow and the experiences, we have had with it so far.

Alf-Christian Schering, Ilvio Bruder, Susanne Jürgensmann, Holger Meyer, Christoph Schmitt
An Approach for Processing Large and Non-uniform Media Objects on MapReduce-Based Clusters

Cloud computing enables us to create applications that take advantage of large computer infrastructures on demand. Data intensive computing frameworks leverage these technologies in order to generate and process large data sets on clusters of virtualized computers. MapReduce provides an highly scalable programming model in this context that has proven to be widely applicable for processing structured data. In this paper, we present an approach and implementation that utilizes this model for the processing of audiovisual content. The application is capable of analyzing and modifying large audiovisual files using multiple computer nodes in parallel and thereby able to dramatically reduce processing times. The paper discusses the programming model and its application to binary data. Moreover, we summarize key concepts of the implementation and provide a brief evaluation.

Rainer Schmidt, Matthias Rella
Risks, Benefits and Revelations: An Exploratory Study of Doctoral Students’ Perceptions of Open Access Theses in Institutional Repositories

This exploratory study examined doctoral students’ awareness of and attitudes towards open access (OA) particularly in relation to institutional repositories (IR). Levels of students’ awareness of OA and the concept of IRs, publishing behaviour, and perceived benefits and risks of OA publishing were explored. The study also examined students’ willingness to comply with mandatory submission policies. The study sample was drawn from Massey University, one of the two universities in New Zealand which has had a mandatory submission policy in place since 2007.Qualitative and quantitative data was collected from doctoral students in two stages: first through a series of qualitative interviews with students from different disciplines, followed by self-completion questionnaires collected from a larger sample. In this paper, we discuss and highlight a number of potential strategies for raising awareness and for improving understanding of the benefits of OA IR to encourage acceptance and adoption.

Kate V. Stanton, Chern Li Liew

Social Network/Personalized Service

A Social Tagging Based Collaborative Filtering Recommendation Algorithm for Digital Library

Recommendation is one of the new personalized services in the digital library. This paper proposes a new collaborative filtering recommendation algorithm based on the social tagging, which try to settle the semantic gap and the cold start problems of traditional collaborative filtering. Firstly, the communities with the similar habits are detected in the social network of the digital library. Then the candidate tags are derived from the user-book-tag correlation model. Finally, the books with highest posterior of the tags are recommended by the naïve Bayes classifier. Experiments results show that the proposed algorithm improves the performance of the collaborative filtering algorithms. And it has been a core recommendation algorithm in China Academic Digital Associative Library (CADAL).

Zhenming Yuan, Tianhao Yu, Jia Zhang
Co-Ranking Multiple Entities in a Heterogeneous Network: Integrating Temporal Factor and Users’ Bookmarks

In this paper, we present a novel approach that models the mutual reinforcing relationship among papers, authors and publication venues with due cognizance of publication time. We further integrate bookmark information which models the relationship between users’ expertise and papers’ quality into the composite citation network using random walk with restart framework. The experimental results with ACM dataset show that 1) the proposed method outperforms the traditional methods; 2) by incorporating the temporal factor, the ranking result of latest publications can be greatly improved; 3) the integration of user generated content further enhances the ranking result.

Ming Zhang, Sheng Feng, Jian Tang, Bolanle Ojokoh, Guojun Liu
On Modeling Virality of Twitter Content

Twitter is a popular microblogging site where users can easily use mobile phones or desktop machines to generate short messages to be shared with others in realtime. Twitter has seen heavy usage in many recent international events including Japan earthquake, Iran election, etc. In such events, many tweets may become viral for different reasons. In this paper, we study the virality of socio-political tweet content in the Singapore’s 2011 general election (GE2011). We collected tweet data generated by about 20K Singapore users from 1 April 2011 till 12 May 2011, and the follow relationships among them. We introduce several quantitative indices for measuring the virality of tweets that are retweeted. Using these indices, we identify the most viral messages in GE2011 as well as the users behind them.

Tuan-Anh Hoang, Ee-Peng Lim, Palakorn Achananuparp, Jing Jiang, Feida Zhu
Creating a Handwriting Recognition Corpus for Bushman Languages

Handwriting recognition systems rely on the existence of a corpus for training recognition models and evaluating accuracy. Creating a handwriting recognition corpus for the Bushman languages of southern Africa is difficult due to the complexities of the script used to represent them and the fact that this script cannot be represented using Unicode. To solve this problem, a semi-automatic Web-based tool was developed to segment, capture and encode the Bushman text. A case study demonstrated how the tool could be used to create a Bushman handwriting corpus with few errors.

Kyle Williams, Hussein Suleman
User Value Oriented Functional Architecture and Implementation of Regional Digital Library: The Case of ZADL Project

To construct the functional framework of digital library based on utilization value of target user is the core idea of Zhejiang Academic Digital Library (ZADL) project started in 2008. This paper highlights objectives of the project, functional design of the system architecture driven by the user-oriented services and the implementation strategy and discusses the experience learnt from the project.

Jiaping Qian, Hong Li, Huazhang Tong, Jindi Ma

Mobile Services/ Electronic Publishing

Comparative Evaluation of Interfaces for Presenting Location-Based Information on Mobile Devices

Location-based information can now be easily accessed on mobile devices and is commonly presented in the form of lists of items, maps and more recently, augmented reality (AR). Each approach (list, map, AR) has its strengths and weaknesses. In this paper, we investigate the three approaches for searching and browsing location-based information, taking into account performance and users’ perceptions of usability. Participants were issued with Android smartphones pre-loaded with a specific interface and performed a set of browsing and searching tasks. Results suggest that all three interfaces performed similarly for search tasks, but for browsing, the map performed worst. In terms of usability, participants appeared to rate the list better than the other interfaces for presentation location-based information.

Dion Hoe-Lian Goh, Chei Sian Lee, Khasfariyati Razikin
Who, What, Why: Examining Annotations in Mobile Content Sharing Games

Recently, mobile content sharing applications incorporating gaming features have attracted much interest. This paper investigates motivations for using these games. We used the TAG framework to analyze motivations based on the recipients of the content, the type of content created, and the goals behind content creation. Participants maintained a one week long diary, and selected participants were called for an in-depth interview. Results suggest that motivations for creating content include knowledge creation, self-expression, creation/maintenance of social relationships, self-presentation, competition and achievement. Additionally, we found that games and mobile content sharing are mutually reinforcing. Implications of our work are also discussed.

Dion Hoe-Lian Goh, Chei Sian Lee, Guanghao Low
Flexible Publication Workflows Using Dynamic Dispatch

Publication processes within Digital Libraries are seldom supported by a workflow management system (

Wfms

). Publication workflows are often described within the application logic due to its data dependency — publication processes are data-driven. Though, documents and the processes should not be treated independently of each other. Rather, processes should dynamically react to changes of document structure and content.

We present an approach for flexible, data-centric publication workflows. The approach extends the control-flow perspective of a

Wfms

with concepts for handling process adaption at run-time, depending on a document’s structure and its content.

Sebastian Schick, Holger Meyer, Andreas Heuer
An RDF-Based Platform for E-Book Publishing

E-books have become an important player in the industry. The market is now flooded with e-readers and tablets designed to optimize e-book reading experience. However, the majority of e-books being published today fail to take full advantage of the electronic form and often appear mostly identical to their printed book counterparts. A platform for e-book publishing is proposed to assist authors in creating rich e-books and enhance reading experience. With the underlying data model that makes use of the Resource Description Framework, the platform has been designed to reap the benefits of the large pool of information available on the internet and on the Linked Open Data (LOD) cloud whenever possible to give authors and readers access to relevant information resources. The developed platform has been compared with other similar tools and the results showed that it is able to provide useful services while maintaining the basic functions of e-book authoring and reading.

Kornschnok Dittawit, Vilas Wuwongse

Multimedia Digital Libraries

Visual Sentiment Summarization of Movie Reviews

A prototype digital library of social media content was developed to present a summarized view of public opinion in a visual interface. The domain of the study was movie reviews of multiple genres harvested from weblogs, discussion boards, user and critic review Web sites, and Twitter. The system performs fine-grained analysis to determine both the sentiment orientation and sentiment strength of the reviewer towards various aspects of a movie, such as overall opinion, director, cast, story, scene, and music. Various visual interface components were developed to present an overview of public opinion on multiple aspects of each movie, and a usability evaluation was conducted to observe their effectiveness. Aspect-based sentiment summarization interface has the highest score for usefulness while a sentiment link analysis graph visualizing how positive and negative sentiment terms are associated with review aspects has the highest score for overall rating.

Jin-Cheon Na, Tun Thura Thet, Christopher S. G. Khoo, Wai Yan Min Kyaing
Towards Ontology-Based Knowledge Visualization

This paper introduces the successful implement of two ontology visualization tools in ontology-based knowledge retrieval model, which supplies solutions for knowledge visualization.

Yigang Zhou
A Feedback Enabled Multimedia WebQuest Model for College Public English Learning

With the rapid development of information technologies, the college English reform of China has arrived at a spot where the mature technology platform is able to support it to make every possible novel attempt. Of many experimental teaching practices, one representative Web based instructional model is WebQuest. In this article, we outline a WebQuest model designed for college public English learning. The novel aspect of this model includes a student feedback module with the WebQuest’s

Resource

component. The students are able to give score ratings to the existing resource items, or recommend new resource items. These information will then be taken into account when refining the design of the same WebQuest project. The contribution of this proposed model is two-fold: (1) it promotes the resource reuse by applying advanced data mining methods, so as to improve the efficiency and effectiveness of the WebQuest project maintenance; (2) it opens a channel to collect the thoughts and ideas from the students, so as to achieve the intelligence integration from multiple sources.

Zheng Zhang, Yan Zhang, Yiyu Jia

Information Retrieval

Retrieval Effectiveness of Cross Language Information Retrieval Search Engines

This study evaluates the retrieval effectiveness of English-Chinese (EC) cross-language information retrieval (CLIR) on four common search engines along the dimensions of recall and precision. We formulated a set of simple and complex queries on different topics including queries with translation ambiguity. Three independent bilingual proficient evaluators reviewed a total of 960 returned web pages each to assess document relevance. Findings showed that CLIR effectiveness is poor with average recall and precision values of 0.165 and 0.539 for monolingual EE/CC searches, and 0.078 and 0.282 for cross lingual CE/EC searches. Google outperformed Yahoo! in the experiments, and EC and EE searches returned better results than CE and CC results respectively. As this is the first set CLIR retrieval effectiveness measurements reported in literature, these findings can serve as a benchmark and provide a better understanding of the current CLIR capabilities of Web search engines.

Schubert Foo
Term Familiarity to Indicate Perceived and Actual Difficulty of Text in Medical Digital Libraries

With increasing text digitization, digital libraries can personalize materials for individuals with different education levels and language skills. To this end, documents need meta-information describing their difficulty level. Previous attempts at such labeling used readability formulas but the formulas have not been validated with modern texts and their outcome is seldom associated with actual difficulty. We focus on medical texts and are developing new, evidence-based meta-tags that are associated with perceived and actual text difficulty. This work describes a first tag, ’term familiarity’, which is based on term frequency in the Google corpus. We evaluated its feasibility to serve as a tag by looking at a document corpus (N=1,073) and found that terms in blogs or journal articles displayed unexpected but significantly different scores. Term familiarity was then applied to texts and results from a previous user study (N=86) and could better explain differences for perceived and actual difficulty.

Gondy Leroy, James E. Endicott
An Entailment-Based Question Answering System over Semantic Web Data

This paper reports a novel knowledge-based Question Answering (QA) method with the use of Semantic Web technologies and textual entailment recognition. Different from most of ontology-driven QA methods, this method does not perform deep question analysis to transform a natural language question into an ontology-compliant query for answer retrieval. Instead, it performs textual entailment recognition to discover the question template entailed by a user question from the whole machine-generated set and then takes the associated SPARQL query template to produce the complete query for retrieving the answers from the Semantic Web data that subscribe to the same ontology. An evaluation was carried out to assess the accuracy of the QA method, and the results revealed that the generated question templates can cover almost all the user questions and 65.6% of the user questions can be correctly answered with the support of a semantic entailment engine.

Shiyan Ou, Zhenyuan Zhu

Tools and Systems for Digital Library

An Integrated Interactive and Persistent Map-Based Digital Library Interface

In this paper we explore the enrichment of the interface to a “standard” text-based digital library with interactive digital maps to enhance its capabilities. Central to the design is to maintain two interlocked views, displayed side-by-side: one the traditional text-based digital library view, the other an interactive map view. At any stage of the user’s interaction the two views represent the same information, only displayed in the form most fitting to that view. The views are interlocked, in that the user can interact with either view, and as a result both views are simultaneously updated. To achieve this, the underlying digital library infrastructure was extended to include automatic place name detection and disambiguation capabilities, which are used to augment the text view. To assist user navigation within this view, we explore a fisheye and multicolumn representation of the text documents. The work is evaluated through a user study, focusing on the new and extended services in the digital library.

Samuel J. McIntosh, David Bainbridge
Towards Very Large Scale Digital Library Building in Greenstone Using Parallel Processing

As very large digital library collections become more commonplace, software tools must adapt appropriately. This paper reports on an evolution of the Greenstone Digital Library software to support parallel processing during the collection building phase. A series of experiments were conducted to first establish a basic speed-up factor, and then deconstruct the parallelisation process to understand the execution profile of the application. Several bottlenecks were identified and resolved to further improve the performance. The adaptation of Greenstone confirms that the build phase is indeed a suitable candidate for parallelisation; and suggests that parallelisation of processing is a new avenue for exploration in emerging digital library architectures.

John Thompson, David Bainbridge, Hussein Suleman
CJK Indexing Prototype for Asian Digital Collections: Developing a Software Tool Where Generations Meet

FamilySearch is the largest genealogy organization in the world. FamilySearch consists of a collection of records, resources, and services designed to help people learn more about their family history. FamilySearch gathers, preserves, and shares genealogical records worldwide. FamilySearch is a non-profit organization that offers free access to its resources and service online at FamilySearch.org, one of the most heavily used genealogy sites on the Internet.

Alan M. Heath
Effective Approaches to the Evaluation and Selection of a Discovery Tool

Evaluating and selecting a discovery tool can be a tough process, given that the discovery tools available in the market have many features in common, and from an end user’s point of view, they do pretty much the same things. The more or less aggressive advertisements make the evaluation and selection process even more challenging. How to choose the most appropriate product to best serve your users’ needs and protect your library’s investment? Which key factors should be taken into consideration when making this decision? This article is intended to suggest some effective approaches to the evaluation and selection of a discovery tool. The authors hope that by taking these approaches, libraries that are looking for a discovery tool or switching to another one will be able to make informed and reasonable decisions.

Huibin (Heather) Cai, Tianfang Dou, Airong Jiang
Design of Automatic Mapping System between DDC and CLC

Dewey Decimal Classification (DDC) and Chinese Library Classification (CLC) are two widely-used library classification schemes internationally and in China. The construction of interoperability between DDC and CLC is urgent and significant to information resources organization and retrieval, especially the classification indexing of electric resources in foreign languages. This paper aimed to design an automatic mapping system based on a statistical mapping table and a manual one. Both tables were established one-direction, from DDC to CLC. The statistical mapping table was created based on the frequency of the mapping relations with a sample of USMARC records, which contained DDC and CLC numbers. The manual mapping table provided a CLC number for each DDC section. A mapping strategy was generated according to the mapping results comparison of a second sample. A third sample was utilized to analyze the results of the automatic mapping system with two established mapping tables and the specific mapping strategy. In conclusion, possible improvements of the system in the future were discussed.

Yihua Zhang, Jia Peng, Di Huang, Fang Li
Digital Library Research (1990-2010): A Knowledge Map of Core Topics and Subtopics

In the paper, we briefly present a methodology and a digital library (DL) research knowledge map of core topics and subtopics (1990-2010) which can be used to create a visual knowledge map of DL research, and also to develop a new DL curriculum. The knowledge map can be an essential knowledge platform for DL researchers, educators, students and practitioners.

Son Hoang Nguyen, Gobinda Chowdhury

Posters

A Case Study for Multilingual Support: Applying the AAT-Thesaurus to TELDAP’s Multilingual Project

The translation of cultural-laden words has been a formidable project. This paper discusses how AAT-Thesaurus, with its classification of semantic equivalence and carefully structured terms, helps the Taiwan e-Learning and Digital Archives Program (TELDAP) conquer the difficulty of untranslatability.

Hsueh-Hua Chen, Shu-Jiun Chen, Shin-Yen Lee, Jessamine Cheng
The World Digital Library

The World Digital Library (WDL) is a website,

www.wdl.org

, which provides free access to primary cultural heritage materials that tell the stories and cultural achievements of all countries. The WDL is a joint effort of the Library of Congress, United Nations Educational, Scientific and Cultural Organization (UNESCO), and over 120 participating institutions around the world. This poster session will introduce the WDL by presenting brief history of the WDL, its objectives, contributors, content, capacity building, and features. The poster will also show how the WDL web site is used by the people around the world.

Allison B. Zhang
Embryo App for iPhone/iPad/iPod Touch

Scientists and educators worldwide use the Carnegie Collection of Embryology to define normal human embryo development for decades. The Embryo App for iPhone/iPad and iPod Touch, developed by National Library of Medicine (NLM), utilizes mobile telecommunication and multimedia technologies to add interactive capabilities to the embryo data, enhancing our understanding of embryo development. Embryo is a collaborative project between NLM, the

Eunice Kennedy Shriver

National Institute of Child Health & Human Development (NICHD), Louisiana State University Health Sciences Center (LSUHSC) and National Museum of Health & Medicine’s Human Developmental Anatomy Center (NMHM & HDAC). This App is part of the National Library of Medicine’s program to fulfill the NLM’s role as a provider of medical, science and health care information with mobile technologies. The Embryo App provides a new venue for us to access and interact with the Carnegie Collection of Embryology for the first time.

Ying Sun, Florence Haseltine, John Cork, Elizabeth Lockett, Florence Chang, Lucie Chen
A Survey on E-Book Utilization in University Libraries

The development of network and electronic technology has brought about the blooming of the digital resource, and the library is now in the age of coexistence of paper and digital resources. In this paper, we present the detailed results of a survey on e-books utilization among the students and faculties in University of Electronic Science and Technology of China (UESTC). We hope that this survey can provide a valuable reference for e-book prompting in university libraries.

Ying Yang, Jiayan Yang, Xuemei Luo
Evaluation of Link System between Repository and Researcher Database

This paper evaluates the effect of a Web system which activates institutional repositories. Institutional repository is an important service of libraries in academic institutions. The authors developed a link system between the institutional repository and the researcher database of their university. The system reduces the efforts of researchers by reusing the metadata in the researcher database for registrations of their papers to the repository. The authors observed the access log of the repository before and after the start up of the link system. The result shows that the system increased the number of access, however there was no significant change on the number of registration of papers.

Kensuke Baba, Toshie Tanaka, Emi Ishita, Masao Mori, Eisuke Ito, Sachio Hirokawa
Characteristic Practice in the Construction of the Chinese Medical Digital Library – Wanfang MED ONLINE as the Example of the Characteristic Resources Organization and Presentation as Well as Data Mining of the Medical Literature

This paper uses Wanfang Data MED Online (WFMO) [1] as an example to looks in depth into characteristic practice in the resources organization and presentation as well as data mining of the medical literature in CMDL.

Xiumei Zhang, Gongliang Yang, Xiaolei Li, Jing Li
Use of Information Technology in Library Service: A Study on Some Selected Libraries in Rajshahi District of Bangladesh

Bangladesh is one of the least developed countries, which have been facing a lot of obstacles in introducing Information Technology (IT) in all information-related sectors, especially in its libraries and information centers. This current work is centered on the Rajshahi District of Bangladesh, perceived to be worse hit in terms of inadequacy of the application of IT in library and information centers, when compared with the capital city of Bangladesh. Therefore, this research work, made with a full study plan and procedures seeks to investigate the environment, identify barriers and explore possibilities of improving IT application in libraries. The study took into account the various systems, services, problems and prospects of these selected libraries. Besides this, the study also showed how to improve the entire condition and services of these libraries by adopting information technology (computer, fax, internet ) including provision of phone, photocopier, television, microfiche, microfilm etc. This investigation will reveal the entire condition of these libraries including the problems encountered by them in applying IT and put forward necessary recommendation, which may be helpful in improving their services in information dissemination process.

Md. Jamal Uddin
Effect of the Number of Comments Inserted by Students during Each Lecture on Their Grades in the Course

When students read textbooks in the classroom, they usually apply active reading. Our previous study was to show the relation between the total number of comments inserted by students into their digital textbooks and their grades at the end of course. In this study, the number of comments inserted by students into their digital textbooks during each lecture is highlighted as one of main factors influencing their grades at the end of the course. Students who wrote a lot of comments during each lecture tend to receive a higher grade. Their grades at the end of the course are related to the number of comments inserted during each lecture in the early stages of the course. The finding suggests that if teacher can access information about comments that students wrote into digital textbooks just after each lecture, he may use it for improving students performance at the next lecture.

Akihiro Motoki, Tomoko Harada, Takashi Nagatsuka
Coordinating Concepts and Discourse in Model-Oriented Research Reports

Model-oriented research reports has been proposed as a highly structured approach that weaves together models for research methods and analyses, conceptual process models for the phenomena under investigation, and discourse structures for presenting the models.

Robert B. Allen
Backmatter
Metadaten
Titel
Digital Libraries: For Cultural Heritage, Knowledge Dissemination, and Future Creation
herausgegeben von
Chunxiao Xing
Fabio Crestani
Andreas Rauber
Copyright-Jahr
2011
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-24826-9
Print ISBN
978-3-642-24825-2
DOI
https://doi.org/10.1007/978-3-642-24826-9

Neuer Inhalt