Intelligence and Equity: Shaping the Future of Knowledge
27th International Conference on Asian Digital Libraries, ICADL 2025, Metro Manila, Philippines, December 3-5, 2025, Proceedings
- 2026
- Book
- Editors
- Sanghee Oh
- Antoine Doucet
- Marut Buranarach
- Iyra Buenrostro-Cabbab
- Yuenan Liu
- Benedict Salazar Olgado
- Book Series
- Lecture Notes in Computer Science
- Publisher
- Springer Nature Singapore
About this book
This book constitutes the refereed proceedings of the 27th International Conference on Asia-Pacific Digital Libraries, ICADL 2025, held in Metro Manila, Philippines, during December 3-5, 2025.
The 12 full papers, 26 short papers, 5 demo/poster papers, and 3 practice papers presented in this volume were carefully reviewed and selected from 102 submissions. They were categorized under the following topical sections: Large Language Models and Generative AI; Digital Libraries, Archives, and Metadata; Scholarly Communication, Open Science, and Research Data; Information Behavior, Literacy, and HCI; Information Rights, Privacy, and Data Management; Emerging Technologies in Knowledge Organization and Description, and the Future of Cultural Heritage; Ethics, Social Divides, and Lived Experiences; Archival Education, Models, and Practices.
Table of Contents
-
Digital Libraries, Archives, and Metadata
-
Frontmatter
-
A Linked Open Data Infrastructure for Promoting the Educational Use of Digital Archives
Masao Takaku, Yuka Egusa, Satoshi Enomoto, Masao Oi, Yumiko Ariyama, Takayuki AkoAbstractDigital cultural heritage is increasingly available from memory institutions. However, its integration into formal education remains limited due to a lack of education-specific metadata and weak alignment with curricula. This paper addresses this gap in the Japanese context, where national curricula are centrally defined but rarely linked to digital resources. We present a Linked Open Data (LOD) infrastructure that connects curriculum standards (Course of Study), textbook units, and digital archive content through a reusable, semantically modeled framework. We contribute to implementing a scalable, curriculum-aligned metadata foundation that supports practical educational use. We describe the structure, vocabulary design, and implementation of the infrastructure, and demonstrate its utility through linked applications such as search tools and visualizations. This work lays the groundwork for connecting curricula with cultural resources, enabling more effective discovery, reuse, and integration of archival content in educational settings. -
Enhancing Information Retrieval in Digital Libraries Through Unit Harmonisation in Scholarly Knowledge Graphs
Golsa Heidari, Markus Stocker, Sören AuerAbstractScientists have always used the studies and research of other researchers to achieve new objectives and perspectives. In particular, employing and operating the measured data in previous studies is so practical. Searching the content of other scientists’ articles is a challenge that researchers have always struggled with. Nowadays, the use of knowledge graphs as a semantic database has helped a lot in saving and retrieving scholarly knowledge. Such technologies are crucial to upgrading traditional search systems to smart knowledge retrieval, which is crucial to getting the most relevant answers for a user query, especially in information and knowledge management. However, in most cases, only the metadata of a paper is searchable, and it is still cumbersome for scientists to have access to the content of the papers. In this paper, we present a novel method of faceted search structured content for comparing and filtering measured data in scholarly knowledge graphs while different units of measurement are used in different studies. This search system proposes applicable units as facets to the user and would dynamically integrate content from further remote knowledge graphs to materialize the scholarly knowledge graph and achieve a higher order of exploration usability on scholarly content, which can be filtered to better satisfy the user’s information needs. The state of the art is that, by using our faceted search system, users can not only search the contents of scientific articles, but also compare and filter heterogeneous data. -
InfraKG: Extracting and Structuring Infrastructure Entities from Scientific Articles
Aftab Anjum, Ralf Krestel, Khansa Maqbool, Muhammad Mudasser AfzalAbstractIn recent years, advances in natural language processing (NLP) have increasingly relied on computational infrastructure, including hardware accelerators, scalable memory systems, software libraries, and framework, and widespread adoption of cloud platforms. However, existing entity recognition methods and scientific knowledge graphs largely overlook these components, instead focusing on research tasks, methods, datasets, and evaluation metrics. To address this gap, we present InfraKG, a large-scale Infrastructure Knowledge Graph that captures and links infrastructure-related entities mentioned in scientific publications. InfraKG is built using a hybrid information extraction framework applied to 85,000 arXiv papers in the computational linguistics domain, combining transformer-based NER models, semantic sentence filtering, and large language models (LLMs). The resulting graph contains 166,728 nodes and 1.5 million relations across seven types, connecting infrastructure entities to scientific publications along with their metadata. InfraKG is the first large-scale resource to systematically represent computational infrastructure in NLP research, enabling advanced queries, trend analysis, and infrastructure-aware literature reviews. We evaluated the proposed framework on 470 manually annotated PDF papers for infrastructure entities, covering a test set of 20,774 sentences. All code and data are publicly available at: code repository. -
Modeling Styles of Vernacular Architecture Using CIDOC CRM
Michail Agathos, Katerina Tsiouprou, Eleftherios Kalogeros, Manolis GergatsoulisAbstractDocumenting cultural heritage has become increasingly critical in the evolving domains of digital libraries, computing, and global information studies. Architecture and its expressions constitute a central component of this endeavor, given their intrinsic connection to human societies and their role as enduring manifestations of cultural heritage. The CIDOC CRM is a well-established and continuously emerging reference model aiming to represent cultural heritage information. This paper establishes a data model for the documentation of vernacular architecture using CIDOC CRM, thereby bridging a critical gap in the representation of culturally embedded architectural practices. In this paper, we discuss the notion of vernacular architecture and the previous work on documenting architecture. Through the classes and properties provided by the CIDOC CRM, we model the formative contexts of vernacular architectural practices, we represent the distinctive constructional, morphological, and typological consistencies inherent in vernacular architecture, and document the interrelations among vernacular stylistic traditions, including their temporal relationships and mutual influences. -
Developing a Metadata Framework for the Digitisation and Access of Malaysian Historical Newspaper Archives
Jennifer Lee Lynn Phun, Noorhidawati AbdullahAbstractThis study presents a metadata framework for the digitisation and accessibility of Malaysian historical newspaper collections. It employs a design-based methodology guided by two research questions: (i) What are the metadata requirements and challenges specific to Malaysian historical newspaper archives? and (ii) How can a metadata framework be designed to support effective digitisation, preservation, and user access? The research comprises three essential components: (i) selection of a policy framework; (ii) assessment and evaluation of metadata standards; and (iii) construction of a metadata framework tailored to the requirements of Malaysian historical newspapers. The proposed framework combines standard features with custom fields for cultural and linguistic significance through a comparative analysis of metadata standards, including Dublin Core and its extended DCMI Metadata Terms (ISO/NISO standard), MODS, METS, ALTO, and PREMIS. The prototype, implemented in Omeka, demonstrates a scalable and locally relevant method for cataloguing and conserving Malaysian newspaper information. The framework strengthens efforts to refine metadata practice in heritage digitisation by integrating global standards with national and institutional requirements. -
Digital Preservation of Traditional Malay Midwifery Practices through Semantic Metadata and Thematic Thesaurus Modelling
Jashira Jamin, Yanti Idaya Aspura Mohd Khalid, Masitah AhmadAbstractThis study addresses the urgent need to preserve the oral heritage of Traditional Malay Midwifery (TMM) through a dual approach: a semantic metadata schema and a domain-specific thematic thesaurus. Using oral history methodology, the framework captures both the content and the cultural depth of TMM practices. The metadata schema, grounded in standards but extended with culturally relevant elements, ensures midwives lived knowledge is accurately represented. The thesaurus, developed from practitioner vocabulary, maps the interconnections of concepts within the traditional knowledge system. Expert evaluations affirmed the model’s relevance, usability, and ethical soundness, emphasizing the importance of culturally responsive archival systems. This research demonstrates that digital preservation of intangible cultural heritage (ICH) must move beyond technical storage to embrace community voice, relational meaning, and ethical design. While developed for TMM, the framework is scalable for other community-based heritage, bridging informal knowledge and structured digital systems. -
A Digital Archive System of Intermediate Products for Understanding the Expression Structure of Hand-Drawn Animation
Hinata Tomita, Tetsuya Mihara, Mitsuharu NagamoriAbstractHand-drawn anime production generates various intermediate products, such as key drawings, modified key drawings, in-between drawings, and timesheets. These products are paper-based and describe the visual and temporal structure of each cut in a video. However, existing archives generally digitize and store them only as static images, omitting the relationships of timing, layering, and revisions in the animation process. We propose a digital archive system that reconstructs and visualizes these relationships by extracting structural information from timesheets and use it as metadata to link images. We extended the International Image Interoperability Framework (IIIF) manifest to describe frame-by-frame timing, multi-layer composition, and revision histories. Our interactive viewer provides synchronized playback, layer controls, and timeline navigation. In a user experiment involving 20 participants, our system significantly enhanced understanding of motion flow, layer composition, and revision intent compared to a conventional archive system. While participants still found symbolic notations in timesheets difficult to interpret, the proposed system helped them grasp the production process more easily. -
Rethinking OCR Evaluation for Information Extraction in Business Documents
Ngoc Nhi Nguyen, Ahmed Hamdi, Antoine Doucet, Adam Jatowt, Mickaël CoustatyAbstractThe increasing reliance on OCR technologies to digitize documents has enabled large-scale automation but also introduced new challenges for information extraction systems. While state-of-the-art OCR engines perform well under ideal conditions, they remain prone to errors. Traditional OCR evaluation metrics like character and word error rates fail to capture the impact of such errors on downstream tasks, particularly when only semantically critical words are affected. In this paper, we systematically investigate the relationship between OCR quality and extraction accuracy in business documents, with a focus on key field extraction and line item recognition. We introduce a controlled evaluation framework that simulates realistic OCR noise scenarios by selectively injecting errors into clean datasets. Our experiments show that standard OCR metrics poorly reflect the impact of noise on information extraction performance and highlight the need for task-specific OCR evaluation protocols and more resilient pipelines tailored to real-world settings. -
A TEI-Based OCR Correction Tool Using GitHub for Collaborative Digital Humanities: Practical Implementation and Applications
Satoru Nakamura, Yoshimitsu AitaniAbstractThis paper presents a practical OCR correction tool developed for digital humanities projects, designed to handle TEI/XML texts generated by various OCR systems including the National Diet Library’s Kotenseki OCR (NDLOCR). By adopting TEI (Text Encoding Initiative) as its data format and leveraging GitHub as the backend infrastructure, the tool enables collaborative editing with version control capabilities across diverse projects. We demonstrate the tool’s effectiveness through real-world application in the Yurenja Catalog Database project. Our approach addresses critical challenges in OCR post-processing while maintaining interoperability with existing TEI-compatible tools and providing a sustainable framework for collaborative text correction in digital humanities. -
AI-Powered Knowledge Discovery in the Digital Library of Old Ephemeral Prints: A Case Study
Maciej Ogrodniczuk, Dariusz CzerskiAbstractWe investigate how state-of-the-art Large Language Models (LLMs) can unlock new knowledge from the Cyfrowa Biblioteka Druków Ulotnych (CBDU)—a digital collection of early-modern Polish ephemeral prints. Our end-to-end pipeline compares three transcription approaches: pure OCR, LLM-based post-correction, and multimodal models. The resulting transcriptions then serve as input for our two main contributions: the automatic extraction of bibliographic metadata and the generation of expert-style historical commentaries. Experiments show a leading multimodal model excels, reducing transcription CER from 33% to 9%, while achieving high F1-scores for publication place (0.85) and date (0.71), and a 2.31/3 mean score for commentaries. We conclude that large multimodal models can serve as effective “digital archivists”, enriching historical collections with structured metadata and contextual analysis. -
Planning the Development of Malaysia’s National Knowledge Infrastructure: A Practical Case Study
Ranita Hisham Shunmugam, Zanaria Saupi Udin, Noorhidawati Abdullah, Ali Fauzi, Syaiful Hisyam SalehAbstractNational knowledge infrastructures are becoming increasingly vital in higher education as digital transformation accelerates and open-access demands intensify. Malaysia’s National Knowledge Infrastructure (NKI), initiated by the Ministry of Higher Education (MoHE), is a strategic effort to integrate the scholarly resources of 20 public universities into a federated, policy-driven ecosystem. This paper presents a practical case study of the NKI’s early development, documenting the initial planning, design frameworks, and stakeholder strategies that informed its creation. The methodological approach combined policy analysis, environmental scanning, repository audits, international benchmarking, stakeholder consultations, and risk assessment. Planning was guided by the Zachman Framework, which structured activities across four phases—planning, design, implementation, and operation, ensuring coherence between policy objectives and technical design. The FAIR (Findable, Accessible, Interoperable, Reusable) and CARE (Collective Benefit, Authority to Control, Responsibility, Ethics) principles further shaped the initiative by promoting both interoperability and ethical knowledge governance. The feasibility study identified several significant challenges, including diverse repository platforms and metadata standards, licensing restrictions, uneven adoption of open access, and concerns about long-term governance and sustainability. Addressing these issues, lessons learned were synthesized into six strategic pillars: interoperability, licensing and access, open access advocacy, sustainability and governance, equity and inclusion, and monitoring and adaptability. These pillars form the basis of a phased roadmap that connects Malaysia’s immediate next steps with transferable lessons for practitioners designing similar infrastructures elsewhere. By embedding stakeholder participation, policy integration, and inclusive design at every stage, the NKI demonstrates a viable pathway for building national-level infrastructures that are both technically robust and socially responsible. Beyond its national scope, this case study contributes to the broader global discourse on open, federated, and sustainable scholarly infrastructures.
-
-
Scholarly Communication, Open Science, and Research Data
-
Frontmatter
-
Data Papers in Scholarly Communication: Patterns of Publication and Citation Performance
Hiroyuki Tsunoda, Yuan Sun, Masaki Nishizawa, Xiaomin Liu, Kou AmanoAbstractThis study examines the publication and citation patterns of data papers—scholarly articles that describe research datasets—across 32 journals indexed in Web of Science. As a cornerstone of open science, data papers enhance transparency, reproducibility, and data reuse. By analyzing articles published in 2023, the study reveals that data papers often receive more citations than traditional research articles in several prominent journals, such as Nucleic Acids Research, Earth System Science Data, and Scientific Data. A new metric, Citation efficiency (CE), is introduced to assess the relative impact of data papers. Journals that employ editorial strategies like special issues tend to achieve higher CE scores, suggesting that visibility and thematic focus contribute to citation performance. While data papers are most common in life and earth sciences, their presence is gradually expanding into the humanities and social sciences. Interestingly, some journals with fewer data papers show disproportionately high citation efficiency, indicating that quality and relevance may outweigh quantity. These findings offer valuable insights for researchers, editors, and policymakers aiming to promote responsible data sharing and improve scholarly communication. By highlighting the impact of data papers, this study supports the advancement of open science and encourages a culture of data-driven research. -
Monitoring the Implementation of the ACM-THU Open Transformative Agreement: Institutional Practices and Insights from Tsinghua University
Yuanming Guo, Tianfang Dou, Shuhua Zhang, Chen Zhang, Qian Li, Jianbin JinAbstractThis study examines the implementation and impact of the ACM-THU Open transformative agreement at Tsinghua University. Quantitative content analysis of 1,317 ACM publications from 2021 to 2024 by Tsinghua authors reveals that the ACM-THU Open agreement has effectively increased open access publishing, with all articles in 2022 and 2023 being openly accessible. However, challenges remain in author engagement and copyright licensing, as 35% of 2024 publications still used non-Creative Commons (CC) licenses. The study also investigates the relationship between open science practices and research impact. OA articles had significantly higher download counts than non-OA articles, regardless of the presence of open data, code, or materials. Among articles without these open practices, OA status was associated with higher citation counts, but this difference was not significant when open data, code, or materials were present. The findings underscore the impact of various open practices on research visibility and citation advantage. The results suggest that universities should adopt a multifaceted approach to support the implementation of transformative agreements and foster the advancement of open science. This includes establishing stronger mandates for CC-BY licensing, conducting targeted outreach to researchers, and developing integrated information systems to monitor open access compliance and align research practices with institutional open science strategies. -
Exploring the Open Data Policy Based on the COM-B and Personality Model
Wei Yu, Junpeng ChenAbstractThis study aims to analyze the differences in researchers’ data-sharing behaviors to comprehensively measure the acceptance of open scientific data, and to explore the interactions among influencing factors. To elucidate the mechanisms of interaction among influencing factors, a combination of the Big Five personality traits with the COM-B model was proposed, utilizing differential testing methods and structural equation modeling to analyze and validate the questionnaire data. The analysis of the questionnaires showed that different personality traits have varying impacts on data-sharing behaviors. According to COM-B model, the motivation, capability, and opportunity form a positive cycle in promoting data sharing. In the personality traits, conscientiousness and agreeableness significantly and indirectly influence data-sharing intentions. -
Contribution of Dataset Reuse to the Diversity of Research Areas
Emi Ishita, Yosuke MiyataAbstractThis study investigated the contribution of dataset reuse to the diversity of research areas using articles that reused datasets. The Framingham Heart Study (FHS) and Atherosclerosis Risk in Communities Study (ARIC), which are widely used in life sciences, were selected as samples. Disease names in the articles that reused these datasets were extracted based on Medical Subject Headings (MeSH) descriptors. The frequencies of disease names were examined from the 1950s for FHS and the 1980s for ARIC. The amount of “cardiovascular diseases” research was decreasing, while that of “pathological conditions, signs, and symptoms” was increasing from the beginning. The Herfindahl-Hirschman Index (HHI) scores, which are an index of diversity, were calculated based on disease names; these scores indicate that research areas have diversified over time. This study found the HHI and MeSH descriptors could be used to measure the contribution of dataset reuse to the diversity of research areas. -
Where Did the Research Data Originate? Acquiring Provenance of Research Data from Scholarly Papers
Koshi Motegi, Koichiro Ito, Shigeki MatsubaraAbstractPrior knowledge of existing methodologies and resources is essential for a comprehensive understanding of research. In scholarly papers, citations help readers identify prior knowledge. However, research data, such as datasets, lack such mechanisms, making it difficult to trace their origins. In this study, we discuss approaches to acquiring information about the origins of research data, i.e., research data provenance. We focused on scholarly papers as a source for acquiring information about research data provenance, as they often include descriptions of how research data were created. Accordingly, we conducted a preliminary experiment on extracting information about research data provenance from scholarly papers using large language models (LLMs). The results demonstrated the feasibility of extracting information about research data provenance from scholarly papers.
-
- Title
- Intelligence and Equity: Shaping the Future of Knowledge
- Editors
-
Sanghee Oh
Antoine Doucet
Marut Buranarach
Iyra Buenrostro-Cabbab
Yuenan Liu
Benedict Salazar Olgado
- Copyright Year
- 2026
- Publisher
- Springer Nature Singapore
- Electronic ISBN
- 978-981-9548-61-3
- Print ISBN
- 978-981-9548-60-6
- DOI
- https://doi.org/10.1007/978-981-95-4861-3
PDF files of this book have been created in accordance with the PDF/UA-1 standard to enhance accessibility, including screen reader support, described non-text content (images, graphs), bookmarks for easy navigation, keyboard-friendly links and forms and searchable, selectable text. We recognize the importance of accessibility, and we welcome queries about accessibility for any of our products. If you have a question or an access need, please get in touch with us at accessibilitysupport@springernature.com.