Skip to main content
Top

2016 | Book

Digital Libraries: Knowledge, Information, and Data in an Open Access Society

18th International Conference on Asia-Pacific Digital Libraries, ICADL 2016, Tsukuba, Japan, December 7–9, 2016, Proceedings

insite
SEARCH

About this book

This book constitutes the refereed proceedings of the 18th InternationalConference on Asia-Pacific Digital Libraries, ICADL 2016, held in Tsukuba,Japan, in December 2016.
The 18 full papers, 17 work-in-progress papers and 7 practitioner papers presented were carefully reviewed and selected from 71 submissions. The papers cover topics such as community informatics, digital heritage preservation, digital curation, models and guidelines, information retrieval/integration/extraction/recommendation, privacy, education and digital literacy, open access and data, and information access design.

Table of Contents

Frontmatter

Community and Digital Libraries

Frontmatter
When Personal Data Becomes Open Data: An Exploration of Lifelogging, User Privacy, and Implications for Privacy Literacy

This paper argues that there is a need for an awareness and education about privacy literacy in an age where lifelogging technologies are ubiquitous and open up private data to commercial and other uses, wherein commercial entities build up huge digital libraries of private data that they then mine with big data analytics. Often, data is represented as if they are the raw material of information and algorithms, and as neutral agents for processing these pieces of information, but in our digital society, this so-called neutral data can become open data that can be processed easily to reveal informational metadata on individuals’ behaviors. Whilst much of this may be beyond individual control, and simply an unavoidable part of our information society, there are certain types of personal and private data that can be, and need to be, under individual control, and not open to integration or ‘hashing’ with public data. This requires a new type of data literacy on the part of users that we term as privacy literacy.

Zablon Pingo, Bhuva Narayan
The Value of Public Libraries During a Major Flooding:
How Digital Resources Can Enhance Health and Disaster Preparedness in Local Communities

In October 2015, several counties in South Carolina experienced catastrophic flooding that caused severe damage, including loss of residential homes and other calamities. Using a framework for risk communication preparedness and implementation about pandemic influenza for vulnerable populations recommended by public health experts, this case study investigates public libraries’ value to their communities and their legitimacy as partners of public health agencies during and after a disaster. Public libraries’ situation-specific information services in the target areas affected by flooding during and after the disaster were explored. The methodology was qualitative-based. Focus-group meetings with public library administrators and librarians, one-on-one interviews with community members, and an in-depth interview with a FEMA agent were conducted. Preliminary results reveal essential needs regarding health information and technology access during and after the disaster. Recommendations on the use of digital library resources and social media for disaster and health information dissemination are discussed.

Feili Tu-Keefner
Current Situation and Countermeasures of the Legal Protection of Digital Archives User’s Privacy in China

In order to protect the digital archives users’ privacy interests and wins more solid social foundation and wider living space for the development of the digital archives career, the article begins with the privacy and network privacy and lists the new connotation of digital archives users’ privacy. It also analyzes the privacy right when digital archives collecting, transporting, storing and using the users’ personal information. Then it describes the current situation of the legal protection of digital archives users’ privacy in China. In regard to the protection of the digital archives users’ privacy in China, the author thinks three steps should be taken. Firstly, privacy should be protected directly as an independent right of human dignity and unitary protective laws of privacy should be made. Secondly, protective regulations of digital archives users’ privacy should be made. Thirdly, the contents of protecting users’ privacy should be added to the Archives Laws.

Jing Zhang, Jiaping Lin
Involving Source Communities in the Digitization and Preservation of Indigenous Knowledge

The digital era has transformed the ways people share information and preserve knowledge for the future. Increasingly, Web 2.0 technologies have been used for participatory practices aimed at constructing cultural heritage knowledge. Memory institutions, including libraries and museums have become keen on opportunities to engage with potential partners and collaborators. For such participatory construction of cultural knowledge to be successful however, some underlying contradictions between traditional documentary practices that privilege ‘expert knowledge’ and the distributed social Web practices that emphasize the allowance for multiple (at times contradictory) perspectives need to be resolved. This interpretive qualitative study examines the values and challenges of collaborating with communities who are the originators, owners and/or guardians of the traditional beliefs, expressions and other cultural artifacts that bear the indigenous knowledge of a cultural group, as well as people who are recognized by indigenous communities to hold the knowledge. Data was collected through 27 semi-structured interviews in Ghana.

Eric Boamah, Chern Li Liew
Students and Their Videos: Implications for a Video Digital Library

Personal information collections have expanded to include video files but users often organize their content with the same tools they use for other simpler media types. We analyze the ‘native’ video management behavior expressed in 35 self-interviews and diary studies produced by New Zealand students, to create a ‘rich picture’ of personal video collection size, formats, organization and intended usage. We consider how conventional digital libraries can better support usage of personal video material.

Sally Jo Cunningham, David M. Nichols, Judy Bowen

Digital Library Design

Frontmatter
Supporting Gender-Neutral Digital Library Creation: A Case Study Using the GenderMag Toolkit

Software is assumed by its creators and maintainers to be gender-neutral: that is, that it is equally well suited for use by any user, regardless of gender. We investigate this assumption in the digital libraries context through analysis of a significant digital library construction and maintenance tool—the Greenstone Digital Librarian Interface (GLI)—using the GenderMag toolkit. GenderMag provides personas whose approaches to software use fall across the spectrum of gender-stereotypic actions and motivations. The personas are used as the basis for cognitive walkthroughs of the system under investigation, to uncover potential gender biases in system functionality and interface design. We uncover significant such biases in GLI.

Sally Jo Cunningham, Annika Hinze, David M. Nichols
Developing Institutional Research Data Repository: A Case Study

We introduce VTechData, a Sufia/Fedora based institutional repository specifically implemented to meet the needs of research data management at Virginia Tech. Despite the rapid maturity of Hydra and Fedora code bases, the gaps between the released packages and a launched production-level service are still many and far from trivial. In this practitioner paper we describe the strategy and efforts through which these gaps were filled and lessons learned in the process of creating our first Hydra/Sufia-based repository.

Zhiwu Xie, Julie Speer, Yinlin Chen, Tingting Jiang, Collin Brittle, Paul Mather
Cultural Digital Map Prototype of Tourist Attractions in NirasSuphan Written by SunthonPhu, Poet of Thailand

NirasSuphan is poem written by SunthonPhu explaining the trip from ThepThida Temple, Bangkok to PahLuk, Danchang District in Suphanburi province. It was written in 2374 B.E. and recites the trip to find a leklai, a metal charm believed to melt when exposed to fire, which was believed to be a kind of elixir. The literature is significant to Suphanburi province since it describes how people lived in the past. The objective of this research was to design a digital map prototype of tourist attractions based on a sample of communities mentioned in the NirasSuphan and develop an accompanying web site to describe the communities along this route. One research study found that Thai tourists did not acknowledge having received guidance or having any existing awareness of cultural heritage. Developing a digital map for cultural tourist attractions could be a means of raising awareness of Thai cultural heritage.

Watcharee Phetwong, Bhornchanit Leenaraj, Nanthiya Charin, Chadaphon Janchian, Krisorn Sawangsire
The Rise and Fall of the Wonder Okinawa Digital Archive: Comparing Japanese and American Conceptualizations of Digital Archives

This paper examines the development of what once was Japan’s largest local digital archive, Wonder Okinawa, created in 2003. It collected a diverse view of Okinawa’s cultural properties as a treasure house for future generations. It was created under the banner of establishing an Okinawan “brand” to promote tourism, and to nurture human resources, so that Okinawa could foster a hub of IT industries. In the early 2000s, the national government envisioned digital archives as part of its scheme to become a highly networked society, as the means to address social problems, such as the low birthrate, graying population, and shrinking workforce. The digital archive project spearheaded the government’s effort. However, the $13.5 millon project was dismantled less than a decade after its spectacular debut. The paper analyzes the causes of the failure and explores some key differences between the conceptual model of digital archives in Japan and North America.

Andrew Wertheimer, Noriko Asato
Toward Access to Multi-Perspective Archival Spoken Word Content

During the mid-twentieth century Apollo missions to the Moon, dozens of intercommunication and telecommunication voice channels were recorded for historical purposes in the Mission Control Center. These recordings are now being digitized. This paper describes initial experiments with integration of multi-channel audio into a mission reconstruction system, and it describes work in progress on the development of more advanced user experience designs.

Douglas W. Oard, John H. L. Hansen, Abhijeet Sangawan, Bryan Toth, Lakshmish Kaushik, Chengzhu Yu

Information Access Design and User Experience

Frontmatter
Rarity-Oriented Information Retrieval: Social Bookmarking vs. Word Co-occurrence

We propose rarity-oriented retrieval methods for serendipity using two approaches. We define rare information as relevant and atypical information. We propose two approaches. In the first approach, we use social bookmark data. We introduce tag estimation to our previous work. The second approach is based on word co-occurrence in a dataset. In both approaches, we use conditional probabilities to express relevancy and atypicality. In experiments, we compared our methods with the relevance-oriented method, the diversity-oriented method, and another rarity-oriented method. Our methods using word co-occurrence obtained better nDCG scores than the other methods.

Takayuki Yumoto, Takahiro Yamanaka, Manabu Nii, Naotake Kamiura
Proposing a Scientific Paper Retrieval and Recommender Framework

In this paper, we propose a framework that combines aspects of user role modeling and user-interface features with retrieval and recommender systems components. The framework is based on emergent themes identified from participants feedback in a user evaluation study conducted with a prototype assistive system. 119 researchers participated in the study for evaluating the prototype system that provides recommendations for two literature review and one manuscript writing tasks.

Aravind Sesagiri Raamkumar, Schubert Foo, Natalie Pang
Investigating the Use of a Mobile Crowdsourcing Application for Public Engagement in a Smart City

It has been reported that crowdsourcing applications are valuable to support smart city initiatives. However, there still remains a gap in using such applications to empower and engage city residents This study introduces a mobile crowdsourcing platform prototype known as My Smart Mobile City app (i.e. MSMC) that aims to help cities manage public engagement with their residents. The aim of apps like MSMC is to help cities to collect useful local information by empowering and motivating residents to contribute content related to the city’s public spaces. Hence, motivations driving the use of MSMC will be explored. Preliminary results and implications of our work are discussed.

Chei Sian Lee, Vishwaraj Anand, Feng Han, Xiaoyu Kong, Dion Hoe-Lian Goh
User Testing of Prototype Systems in Two Different Environments: Preliminary Results

The paper presents a preliminary report on two studies testing the same prototype system user interfaces in Slovenia and the USA. A comparison of results highlights some of the differences in performance and preferences between the two studies and leads to a discussion of possible implications for testing in different cultural environments on one hand and on the other hand, the question of universally accepted user interfaces.

Tanja Merčun, Athena Salaba, Maja Žumer
Finding “Similar but Different” Documents Based on Coordinate Relationship

Traditional search technologies are based on similarity relationship such that they return content similar documents in accordance with a given one. However, such similarity-based search does not always result in good results, e.g., similar documents will bring little additional information so that it is difficult to increase information gain. In this paper, we propose a method to find similar but different documents of a user-given one by distinguishing coordinate relationship from similarity relationship between documents. Simply, a similar but different document denotes the document with the same topic as that of the given document, but describing different events or concepts. For example, given as the input a news article stating the occurrence of the Oregon school shooting, articles stating the occurrence of other school shooting events, such as the Virginia Tech shooting, are detected and returned to users. Experiments conducted on the New York Times Annotated Corpus verify the effectiveness of our method and illustrate the importance of incorporating coordinate relationship to find similar but different documents.

Meng Zhao, Hiroaki Ohshima, Katsumi Tanaka

Information Extraction and Analysis

Frontmatter
Rule-Based Page Segmentation for Palm Leaf Manuscript on Color Image

Palm leaf manuscripts are important source of history and ancient wisdom. Large number of manuscripts have been already digitized in the form of folio images. To extract useful information, an optical character recognition (OCR) is often considered to be the first step towards text mining. Unfortunately, folio images contain multiple unsegmented palm leaf images, making it difficult to manage in OCR process. This motivates us to propose a new page segmentation method for palm leaf manuscripts. This method consists of two main steps, first of which is the detection of objects in folio images using Connected Component Labeling method in a transformed L*a*b* color space. The second step is rule-based selection of objects as either palm leaf or not palm leaf. The experiments performed on 20 publicly available palm leaf manuscripts composed of 384 folio images demonstrated that the proposed method effectively segmented folio images into separate palm leaf images, with 99.86 % precision and 96.67 % recall scores.

Papangkorn Inkeaw, Jakramate Bootkrajang, Phasit Charoenkwan, Sanparith Marukatat, Shinn-Ying Ho, Jeerayut Chaijaruwanich
Exploiting Synonymy and Hypernymy to Learn Efficient Meaning Representations

Word representation learning methods such as word2vec usually associate one vector per word; however, in order to face polysemy problems, it’s important to produce distributed representations for each meaning, not for each surface form of a word. In this paper, we propose an extension for the existing AutoExtend model, an auto-encoder architecture that utilises synonymy relations to learn sense representations. We introduce a new layer in the architecture to exploit hypernymy relations predominantly present in existing ontologies. We evaluate the quality of the obtained vectors on word-sense disambiguation tasks and show that the use of the hypernymy relation leads to improvements of 1.2 % accuracy on Senseval-3 and 0.8 % on Semeval-2007 English lexical sample tasks, compared to the original model.

Thomas Perianin, Hajime Senuma, Akiko Aizawa
Entity Linking for Mathematical Expressions in Scientific Documents

This paper addresses the challenge of determining the identity of math expressions in scientific documents by linking these expressions to their corresponding Wikipedia articles. Math expressions are frequently used to denote important concepts in scientific documents, yet several of them, for example, famous equations, often have minimal explanation in the documents. This task will allow us to obtain an additional explanation from Wikipedia regarding these math expressions. This paper proposes an approach to this challenge, where the structures and surrounding text of math expressions are used for math entity linking. Our initial evaluation shows that a balanced combination of math structures and textual descriptions is required to obtain reliable linking performance.

Giovanni Yoko Kristianto, Goran Topić, Akiko Aizawa
Improved Identification of Tweets that Mention Books: Selection of Effective Features

In this paper, we assessed the effectiveness of different types of features for the identification of tweets on Twitter that mention books among tweets that contain the same strings as full book titles. In the previous work, the bag-of-words based features were taken from the context of individual tweets. While performance was reasonable, we identified room for improvement in terms of the extraction of features. We proposed additional types of features such as words appearing in the profiles of tweet authors, POS tags of mentioned book titles, and bibliographic elements within tweets, e.g. authors and publishers. We conducted a grid search for all combinations of the above feature sets, and observed performance improvements suitable for practical applications.

Shuntaro Yada, Kyo Kageura
A Visualization of Relationships Among Papers Using Citation and Co-citation Information

When we conduct scholarly surveys, we occasionally encounter difficulties in grasping the vast amount of related papers. Because academic papers have relationships, such as citing and cited relationships, we considered utilizing them for supporting scholarly surveys. In this paper, we propose a method for visualizing relationships among papers, and we construct paper graphs using two types of relationships, namely, citation and co-citation. Moreover, we quantify the strengths of citations and co-citations based on their frequency and the positions of co-citations, and show both types of relationships together in a graph. We constructed paper graphs using papers in the database field and discussed their usefulness.

Yu Nakano, Toshiyuki Shimizu, Masatoshi Yoshikawa

Education and Digital Literacy

Frontmatter
A Lecture Slide Reconstruction System Based on Expertise Extraction for e-Learning

MOOC has brought many benefits to e-learning systems as students are able to obtain various educational presentation slides through digital libraries. These presentation slides provide varying levels of knowledge to specific students. On the other hand, students usually have different levels of knowledge. Thus, it is important to detect expertise levels of lecture slides for specific students, and supplement the lecture slides with related information automatically for different knowledge levels of students. Therefore, we developed a novel automatic slide reconstruction system for digital libraries in e-learning, it generates new lecture contents from one original content related to users’ interests and knowledge levels by adding and removing slides, in order to enable users to learn the reconstructed slides that they do not need no more searching. Our system first extracts topics and groups slides on topics to detect the expertise level of an original content by considering the context in the presentation. The system then searches other necessary contents and determines unnecessary original slide groups based on users’ interests and knowledge levels. Through this, the system can automatically reconstruct lecture slides by classifying them into four groups based on expertise of lecture slides. Those groups are: basic contents for beginners, basic or specialized contents for intermediate students, and specialized contents for advanced students. As a result, users can satisfy and joyfully learn the newly reconstructed slides that are suit to their interests and knowledge levels. In this paper, we discuss our automatic slide reconstruction system to deal with different knowledge levels of students for content understanding, knowledge deepening, and interest-expanding, and verify its effectiveness.

Yuanyuan Wang, Yukiko Kawai
Developing a Mobile Learning Application with LIS Discipline Ontology

For university students stepping into learning of the professional field or even graduate students that have engaged in learning of such a field for many years, they are often unable to grasp the big picture of “the scope of professionalism”, neither do they have a clue how to engage in self-learning. Therefore, how to construct the discipline-oriented ontology from the learner’s standpoint to enable the learner to grasp the whole picture of the discipline as well as developing a learning APP for young people to use are important issues that facilitate self-learning. In this study, library and information science was used as the example for developing discipline-oriented LIS Ontology. In addition, the ORCID system API was employed to integrate teachers’ English publications included in the Scopus database and Chinese publications included in the CLISA database in order to develop a learning APP for the discipline, allowing students to search departments, teachers, curriculums, research projects, knowledge scope, and other information of the discipline and link to the full-text database via their mobile phone, so that they can plan their own learning map and path.

Chao-Chen Chen, Wei-Chung Cheng, Yi-Ting Yang
Heuristic Evaluation of an Information Literacy Game

Libraries have tapped on the popularity of digital game-based learning to promote information literacy (IL) education to students. However, among the many IL games that have been developed, evaluations have mostly relied on anecdotal quotations, or procedures which were neither systematic nor rigorous. This study fills in this gap by adopting the heuristic evaluation method with end-users to evaluate Library Escape, an IL game for tertiary students. Participants identified problems with the game according to the Heuristic Evaluation of Playability (HEP) framework. Useful feedback was gathered, as well as suggestions on how to improve it. We proposed to extend the HEP framework by including two more categories on characters/graphics and pedagogical effectiveness. Implication and limitations of this study are discussed, and directions for future work are pointed out.

Yan Ru Guo, Dion Hoe-Lian Goh

Models and Guidelines

Frontmatter
Guideline for Digital Curation for the Princess Maha Chakri Sirindhorn Anthropology Centre’s Digital Repository: Preliminary Outcome

This research will examine the development of the guideline for digital curation by demonstrating a case study from digital repository of the SAC. This institution is an interesting case study because the historical background and the digital resources that have been produced by the centre are very unique. However, during the work process, the SAC’s databases have faced several problems about data management because the centre has not provided any guideline for its staffs. In order to solve the problems, the centre has planned to develop the guideline for digital work process. Therefore, this research is part of an attempt to accrete the digital curation guideline for the SAC.

Sittisak Rungcharoensuksri
Describing Scholarly Information Resources with a Unified Temporal Map

We consider the use of procedures for providing structured descriptions of information resources such as scholarly works and of their contents. This goes beyond the usual view of metadata as discrete elements. For instance, we consider mapping the structured and interdependent activities in the publication of Ulysses. We discuss some specific representations and discuss the development of structured scholarly guides. Finally, we consider how the activities associated with publication, along with other historical activities, can be positioned on a unified temporal map. Ultimately, there should be a unified framework for the description of individual information resources and collections of information resources across periods and technologies.

Robert B. Allen, Hanna Song, Bo Eun Lee, Jiyoung Lee
Issues for the Direct Representation of History

We propose that representations for structured models of human and social history need to go beyond traditional ontologies to the combination of rich semantic ontologies with programming languages. We base our approach on the Basic Formal Ontology (BFO) and then consider how to extend it beyond traditional approaches to ontology with higher-level structures. For instance, we propose the need for composite entities that allow transitions in the configuration of component entities. We then explore the relationship of these composite entities to notion of systems and consider how they may provide a definition of “causal unity” and be related to models of social systems. We identify some challenges in defining the nature of social entities. Finally, we introduce structured applied epistemology as a framework for managing historical evidence, analysis, and argumentation.

Robert B. Allen
Preserving Containers – Requirements and a Todo-List

Container technology has been quickly adopted as a tool to encapsulate and share complex software setups, e.g. in the domain of computational science. With growing significance of this class of complex digital objects their longevity is also of growing importance. In this paper we analyze requirements for long-term maintenance and preservation of containers in memory institutions.

Klaus Rechert, Thomas Liebetraut, Dennis Wehrle, Euan Cochrane
Development of Imaginary Beings Ontology

A knowledge organization system is the key element of knowledge engineering. Ontology provides a fundamental framework for the development of the Semantic Web. This paper presents a building method for an imaginary beings’ knowledge base. According to the approach, we established an ontological structure including primitive and contemporary imaginary beings’ information. Combining the existent creature knowledge, we have applied the idealized cognitive models: ICM to build the knowledge system. Based on the introduction of ontology theory, we use Hozo of the Osaka University, for the construction, editor, and maintenance tool of ontology, to design and complete the imaginary beings’ knowledge, based on ontology. The resulting ontology, Imaginary Beings Ontology (IBO), covers concepts derived from old as well as contemporary information. The system is applied to semantic web technology. The validity of IBO was evaluated by eight professional experts—three ontology engineers and five comics experts; this system makes significant improvements in the key techniques including the scope determination, classes definition, properties definition, instance definition, and future development and application. Finally, we describe our results that the system could resolve many problems in the field of imaginary beings’ knowledge engineering.

Wirapong Chansanam, Kulthida Tuamsuk

Open Access and Data

Frontmatter
MathDL: A Digital Library of Mathematics Questions

The open-access movement is a global effort to make available scientific and scholarly research articles online for free. Today digital content is readily available beyond the full texts of articles, from raw and semi-raw data to images, audio, video, multimedia, and software. However, to date, no open access database of mathematics questions exists. This paper describes the current status of the development of MathDL, a digital library that provides access to mathematics questions useful to high school and pre-university students. MathDL provides the highest level of openness, allowing the author not just to reuse, but also to remix, revise and redistribute the questions. The benefits of MathDL are discussed, along with the possibility of using it to transform the way mathematics textbooks are published. Finally, the future plans for MathDL are presented.

Chu Keong Lee, Joan Jee Foon Wee, Don Tze Wai Chai
Interleaving Clustering of Classes and Properties for Disambiguating Linked Data

As Linked Data (or LD) increasingly expands its capacity, ambiguity in vocabularies on LD has become more problematic. This paper deals with a part of the ambiguity, namely, class ambiguity and property ambiguity. In this paper, we propose a novel clustering method, CPClustering, which clusters synonymous classes and properties in an interleaving manner. CPClustering groups classes by their related properties, and, inversely, groups properties by their related classes. CPClustering iteratively clusters classes and properties, and updates their representations in terms of immediate clustering results.

Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa
A Framework for Linking RDF Datasets for Thailand Open Government Data Based on Semantic Type Detection

Most of datasets in open government data portals are mainly in tabular format in spreadsheet, e.g. CSV and XLS. To increase the value and reusability of these datasets, the datasets should be made available in RDF format that can support better data querying and data integration. Our previous work proposed a semi-automatic framework for generating RDF datasets from existing datasets in tabular format. In this paper, we extend our framework to support automatic linking of the RDF datasets. One of the important steps is mapping some literal values that appear in a dataset to some standard URIs. Several previous researches use semantic search API such as DBpedia or Sindice for URI mapping. However, this approach is not appropriate for the datasets of Thailand open data portal (Data.go.th) because there is insufficient data for Thai name entities. In addition, a name may match with more than one URI, i.e. word ambiguity. For example, the name “Bangkok” may match with those referenced by URIs of a province, a hospital or a university. To resolve these issues, our framework proposes that finding semantic types is essential to resolve word ambiguity in retrieving a proper URI for a name entity. This paper presents a framework for finding semantic types and mapping name entities to URIs, i.e. URI lookup. A Name Entity Recognition (NER) technique is applied in finding semantic type of a column in a CSV dataset. The results are used for creating ontology and RDF data that include the URI mappings for name entities. We evaluate two approaches by comparing the performance of a semantic search API, i.e. Wikipedia and the NER technique using some datasets from the Data.go.th website.

Pattama Krataithong, Marut Buranarach, Nattanont Hongwarittorrn, Thepchai Supnithi
An Attempt to Promote Open Data for Digital Humanities in Japanese University Libraries

Many universities have declared open access policies in response to increasing interest in open access in the academic world. The next developments will be focused on open data. Huge data repositories are already used in specific fields. However, the discussion regarding open data in universities has just begun. We attempted to promote open data for digital humanities in a university library. University libraries hold rare collections, which are generally highly valued research resources. We selected a rare collection in a library, and then digitized and published it. We investigated additional data that aids a reader’s understanding of the material. To promote the open data, we produced images of the resources and multiple types of interpretation texts. We displayed the digital images in an exhibition and obtained an evaluation using a survey of visitors.

Emi Ishita, Tetsuya Nakatoh, Kohei Hatano, Michiaki Takayama
Redesigning the Open-Access Institutional Repository: A User Experience Approach

This paper details how a university library evaluated its institutional repository using a user experience design (UXD) methodology and redesigned it based on the findings. The online repository, running on DSpace, was not being utilized as expected by academics and researchers, so a detailed user evaluation and usability study was undertaken to find out the reasons why. Findings showed lack of usability and a mismatch between user expectations and system architecture. Hence, significant improvements were made to the user interface, and in communicating the status of items held in the repository (open or closed access). The authors assess the impact of these changes and argue that better usability results in greater visibility of the open-access repository, and hence, greater visibility for the university’s researchers. Other challenges regarding the adoption of open access by academics and researchers at the university are also discussed.

Edward Luca, Bhuva Narayan

Opinion, Sentiment and Location

Frontmatter
Expanding Sentiment Lexicon with Multi-word Terms for Domain-Specific Sentiment Analysis

The increasing interest to extract valuable information from networked data has heightened the need for effective and reliable sentiment analysis techniques. To this end, lexicon-based sentiment classification has been extensively studied by the research community. However, little is known about the usefulness of different multi-word constructs in creating domain-specific sentiment lexicons. Thus, our primary objective in this paper is to evaluate the performance of bigram, typed dependency, and concept as multi-word lexical entries for domain-specific sentiment classification. Pointwise Mutual Information (PMI) was adopted to select the lexical entries and to calculate the sentiment scores of the multi-word terms. With the features generated from the domain lexicons, a series of experiments were carried out using support vector machine (SVM) classifiers. While all the domain-specific classifiers outperformed the baseline classifier, our results showed that lexicons consisting of bigram entries and typed dependency entries improved the performance to a greater extent.

Sang-Sang Tan, Jin-Cheon Na
Twitter User Classification with Posting Locations

Twitter contains a large number of postings related to the reputation of products and services. Analyzing these data can provide useful marketing information. Inferring the user class would make it possible to extract opinions related to each class. In this paper, we propose a method that treats each user’s posting location for a tweet as a feature in the analysis of user classes. The proposed method creates clusters of geotags (obtained from Twitter tags) to identify the locations most often visited by the target user, which are then used as features. As an example, we conducted experiments to classify targets based on three classes: “student,” “working member of society,” and “housewife.” We obtained an average F-measure of 0.779, which represents an improvement on baseline results.

Naoto Takeda, Yohei Seki
Temporal Analysis of Comparative Opinion Mining

Social media have become a popular platform for people to share their opinions and emotions. Analyzing opinions that are posted on the web is very important since they influence future decisions of organizations and people. Comparative opinion mining is a subfield of opinion mining that deals with identifying and extracting information that is expressed in a comparative form. Due to the fact that there is a huge amount of opinions posted online everyday, analyzing comparative opinions from a temporal perspective is an important application that needs to be explored. This study introduces the idea of integrating temporal elements in comparative opinion mining. Different type of results can be obtained from the temporal analysis, including trend analysis, competitive analysis as well as burst detection. In our study we show that temporal analysis of comparative opinion mining provides more current and relevant information to users compared to standard opinion mining.

Kasturi Dewi Varathan, Anastasia Giachanou, Fabio Crestani

Social Media

Frontmatter
Social Q&A Question-and-Comments Interactions and Outcomes: A Social Sequence Analysis

Scholars and developers have long recognized that the collections of user-generated content at social questions and answers (SQA) sites can benefit open knowledge sharing and resolve individual information needs. This has prompted strong interest in improving the quality of SQA postings, and the creation, curation, and use of these collections. While interactivity is a key feature of SQA, few studies have investigated the interaction sequence between the OP (original poster) and commenters. Drawing from Robert Taylor’s question-negotiation perspective, we posit that interaction patterns may affect SQA outcomes. Social sequence analysis (SSA) and the R package TraMineR were used to analyze the commenting sequences of Stack Overflow postings (8,132 questions and 16,598 comments). The relationships between commenting sequence structure and outcome metrics (e.g., question score, view count) were then tested with logistic regressions. Implications of the results for SQA research, SQA site design, and digital literacy training are discussed.

Sei-Ching Joanna Sin, Chei Sian Lee, Yin-Leng Theng
Why Do People View Photographs on Instagram?

Drawing from the uses and gratifications framework, the aim of the present study is to examine the needs driving users to view photographs on Instagram, a popular photo-sharing social networking service. Data (N = 115) were collected from an online survey. A principal component factor analysis indicated that there were four cluster of needs. Specifically, we found that users were driven by diversion needs, surveillance needs, personal relationship needs, and voyeuristic needs. Further data analysis revealed and found that age, number of followers and number of followings on Instagram are related to the gratifications users seek on Instagram. Specifically, we found that older respondents were more likely to seek gratifications to meet personal relationship and surveillance needs. We also found that users with more followers viewed photos to seek for diversion and voyeurism needs. Implications of our work are also discussed.

Chei Sian Lee, Sei-Ching Joanna Sin
Sharing Brings Happiness?: Effects of Sharing in Social Media Among Adult Users

Given that sharing is a fundamental activity among social media users, this study explores the associations between sharing activities in social media and their psychological social well-being in two age groups – young and mature adults. We focus on two dimensions of social and psychological well-being which are life satisfaction and loneliness. We examine four social media platforms which are social networking sites, microblogging services, video-sharing sites and photo- sharing sites. The study comprised of 171 adult social media users in Singapore. Data analyses revealed that young adults who participated in more sharing activities in social networking sites reported higher life satisfaction and lower loneliness. Mature adults who participated more in sharing activities on social networking sites reported lower life satisfaction and higher loneliness. Implications and future research directions are discussed.

Winston Jin Song Teo, Chei Sian Lee

Analyzing and Using Wikipedia

Frontmatter
DOI Links on Wikipedia
Analyses of English, Japanese, and Chinese Wikipedias

In this paper, we analyzed Digital Object Identifier (DOI) links among English, Japanese, and Chinese Wikipedias (hereafter, enwiki, jawiki, and zhwiki, respectively), which possibly work as a bridge between the Web users and scholarly information. Most of the DOI links in these Wikipedias were revealed to be CrossRef DOIs. The second most-referenced in jawiki were JaLC DOIs, whereas those in zhwiki were ISTIC DOIs. JaLC DOIs were uniquely referenced in jawiki, and ISTIC DOIs tend to be referenced in zhwiki. In terms of DOI prefixes, Elsevier BV was the largest registrant in all languages. Nature Publishing Group and Wiley-Blackwell were also commonly referenced. The content hosted by these registrants was shared among the Wikipedia communities. Moreover, overlapping analysis showed that jawiki and zhwiki share the DOI links with enwiki at a similar high rate. The analysis of revision histories showed that the DOI links had been added to enwiki before they were included in jawiki and zhwiki — indicating that the majority of DOI links in jawiki and zhwiki were added by translating from enwiki. These findings imply that the DOI links in Wikipedia may result in multiple counts of altmetrics.

Jiro Kikkawa, Masao Takaku, Fuyuki Yoshikane
Cross-Modal Search on Social Networking Systems by Exploring Wikipedia Concepts

The increasing popularity of social networking systems (SNSs) has created large quantities of data from multiple modalities such as text and image. Retrieval of data, however, is constrained to a specific modality. Moreover, text on SNSs is usually short and noisy, and remains active for a (short) period. Such characteristics, conflicting with settings of traditional text search techniques, render them ineffective in SNSs. To alleviate these problems and bridge the gap between searches over different modalities, we propose a new algorithm that supports cross-modal search about social documents as text and images on SNSs. By exploiting Wikipedia concepts, text and images are transformed into a set of common concepts, based on which searches are conducted. A new ranking algorithm is designed to rank social documents based on their informativeness and semantic relevance to a query. We evaluate our ranking algorithm on both Twitter and Facebook datasets. The results confirm the effectiveness of our approach.

Wei Wang, Xiaoyan Yang, Shouxu Jiang
Suggesting Specific Segments as Link Targets in Wikipedia

Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. Links within Wikipedia indicate that the two texts of a link origin and destination are related about their semantic topics. Existing link detection methods focus on article titles because most of links in Wikipedia point to article titles. But there are a number of links in Wikipedia pointing to corresponding segments, because the whole article is too general and it is hard for readers to obtain the intention of the link. We propose a method to automatically predict whether a link target is a specific segment and provide which segment is most relevant. We propose a combination method of Latent Dirichlet Allocation (LDA) and Maximum Likelihood Estimation (MLE) to represent every segment as a vector, then we obtain similarity of each segment pair, finally we utilize variance, standard deviation and other statistical features to predict the results. Through evaluations on Wikipedia articles, our method performs better result than existing methods.

Renzhi Wang, Mizuho Iwaihara
Backmatter
Metadata
Title
Digital Libraries: Knowledge, Information, and Data in an Open Access Society
Editors
Atsuyuki Morishima
Andreas Rauber
Chern Li Liew
Copyright Year
2016
Electronic ISBN
978-3-319-49304-6
Print ISBN
978-3-319-49303-9
DOI
https://doi.org/10.1007/978-3-319-49304-6