Intelligence and Equity: Shaping the Future of Knowledge
27th International Conference on Asian Digital Libraries, ICADL 2025, Metro Manila, Philippines, December 3-5, 2025, Proceedings
- 2026
- Book
- Editors
- Sanghee Oh
- Antoine Doucet
- Marut Buranarach
- Iyra Buenrostro-Cabbab
- Yuenan Liu
- Benedict Salazar Olgado
- Book Series
- Lecture Notes in Computer Science
- Publisher
- Springer Nature Singapore
About this book
This book constitutes the refereed proceedings of the 27th International Conference on Asia-Pacific Digital Libraries, ICADL 2025, held in Metro Manila, Philippines, during December 3-5, 2025.
The 12 full papers, 26 short papers, 5 demo/poster papers, and 3 practice papers presented in this volume were carefully reviewed and selected from 102 submissions. They were categorized under the following topical sections: Large Language Models and Generative AI; Digital Libraries, Archives, and Metadata; Scholarly Communication, Open Science, and Research Data; Information Behavior, Literacy, and HCI; Information Rights, Privacy, and Data Management; Emerging Technologies in Knowledge Organization and Description, and the Future of Cultural Heritage; Ethics, Social Divides, and Lived Experiences; Archival Education, Models, and Practices.
Table of Contents
-
Frontmatter
-
Large Language Models and Generative AI
-
Frontmatter
-
Automatic Subject Indexing: How has it Evolved and Where is AI Taking it?
Yi-Shuai Xu, Yanti Idaya Aspura Mohd Khalid, Muhammad Shahreeza Safiruz KassimAbstractAutomatic Subject Indexing (ASI) plays a critical role in knowledge organization and information retrieval. However, existing studies remain fragmented, often emphasizing algorithmic details while overlooking the broader intellectual structure and thematic evolution of the field. To address this gap, this study constructs a comprehensive knowledge map of ASI research spanning 2000–2024 by integrating scientometric assessment with content analysis. Based on 1,161 publications retrieved from Web of Science and Scopus, we applied co-authorship, co-citation, and thematic evolution analyses to reveal influential contributors, collaboration patterns, and topic dynamics. To complement the scientometric insights, 26 representative studies on technical innovations and domain applications were examined in depth. Results reveal three major developmental phases: an initial reliance on rule-based approaches, a subsequent shift toward statistical and machine learning techniques, and the current dominance of deep learning architectures. Within this latest phase, large language models (LLMs) have emerged as a transformative development, while recent research also underscores multilingual indexing, ontology alignment, and linked data integration as critical directions for improving semantic interoperability, knowledge organization, and retrieval effectiveness. Despite progress, persistent challenges include data scarcity for low-resource languages, limited alignment between algorithmic outputs and professional cataloging standards, and the opacity of deep learning models. Key opportunities include leveraging LLMs for context-aware indexing, advancing human-in-the-loop workflows, and promoting open datasets and benchmarks to foster transparency and interoperability. This study offers an integrated analysis of ASI research, providing structural insights, critical interpretation of methodological trends, and forward-looking perspectives for scholars and practitioners. -
A Multi-stage Rumor Detection Framework Based on Retrieval-Augmented Generation Optimization
Zimeng He, Yunhao Yang, Wei Yu, Junpeng ChenAbstractThis study aims to construct a multi-stage rumor detection framework based on Retrieval-Augmented Generation (MS-RAG-RD) that enhances the accuracy and credibility of rumor detection. The MS-RAG-RD model achieved an F1 score of 91.3% and a robustness score of 94.97 for rumor detection performance. The model exhibited strong resistance to interference and low vulnerability to structured semantic perturbations in adversarial attacks. Ablation experiments validated the individual module contributions, emphasizing the critical role of the hybrid retrieval module in achieving performance gains. The study results indicated that the MS-RAG-RD model could enhance the performance of the rumor detection task, providing a technical framework for real-time monitoring and accurate tracing of misinformation in cyberspace governance. -
How LLMs Handle Cultural Bias: Reactions to Asian Minority Historical Narratives
Shirin Shujaa, Ginel Dorleon, Arthur TangAbstractLarge Language Models (LLMs) are increasingly utilized in digital libraries and knowledge systems to facilitate access to cultural and historical information. However, their outputs can reproduce subtle biases, particularly when addressing minority and low-resource communities. This study evaluates seven state-of-the-art LLMs on ten English prompts that embed culturally sensitive and potentially biased assumptions related to Vietnam, Myanmar, and Nepal. We systematically analyze these prompts’ responses for the presence of subtle bias, including gender stereotyping, linguistic ethnocentrism, epistemic bias, victim-blaming, and cultural essentialism. Our findings reveal significant variation in bias prevalence and type across models, with some exhibiting pervasive stereotyping and cultural marginalization, while others demonstrate more balanced and nuanced responses. These results emphasize the necessity for robust bias mitigation, culturally diverse training data, and human-in-the-loop oversight when deploying LLMs in digital heritage contexts. We discuss implications for ethical AI development in knowledge access and outline directions for future research to ensure fairness, transparency, and inclusivity in culturally sensitive AI applications. -
Does Trustworthy AI Motivate Generative AI Usage?
Sei-Ching Joanna SinAbstractDue to growing concerns about the risks associated with artificial intelligence (AI), interest in trustworthy AI (TAI) has amplified. Currently, few empirical studies investigate (a) whether TAI factors indeed encourage individuals to increase their usage of Generative AI (GenAI), compared to motivators such as those identified in the Unified Theory of Acceptance and Use of Technology (UTAUT) and (b) whether users’ levels of motivation from specific TAI and UTAUT factors vary with their GenAI usage types (e.g., obtaining answers, assisting with writing). Such research focusing specifically on Asia-Pacific nations, such as Singapore, is even rarer. This study thus conducted an online survey of 300 adults in Singapore to explore the two research gaps above. Data were analyzed using descriptive and inferential statistics (multiple regressions). The study found “Effort Expectancy” and “Performance Expectancy” (both from UTAUT) to be the top motivators for GenAI usage, followed by the “Technical Robustness and Safety” TAI. Significant associations were observed between types of GenAI usage and motivation levels from different TAI and UTAUT factors. For instance, using GenAI “to get answers” was correlated with being motivated to increase GenAI usage by “Technical Robustness and Safety”, and using GenAI “to solve problems” was associated with being encouraged by the “Human Agency and Oversight” TAI. Among the demographic variables, education yielded the most statistically significant associations (five out of 11). The implications for AI governance, system design, and stakeholder engagement and training are discussed. -
Uncovering Cultural Biases and Stereotypes in Large Language Models
Ginel Dorleon, Shirin ShujaaAbstractGenerative Artificial Intelligence (AI) models, especially large language models (LLMs), are increasingly used to retrieve and generate information in digital libraries. However, these models often reflect cultural biases and stereotypes that distort or marginalize knowledge representations. This paper tackles bias in LLM-generated English text on Asian history and culture. We formally define bias categories, including stereotyping, omission, ethnocentrism, and simplification, in the context of generative AI outputs. We propose a novel framework combining multi-perspective generation with bias detection to mitigate such biases. Supported by a theoretical analysis, we introduce formal bias measures and prove that under ideal conditions, our method can eliminate stereotypical content and perspective omissions. Furthermore, we present a bias annotation scheme and algorithm that generates answers incorporating diverse cultural viewpoints while filtering out identified stereotypes. Our approach provides formal guarantees for bias reduction, advancing the state-of-the-art by bridging bias mitigation, information retrieval, and digital library research to promote fairness and cultural inclusivity in AI-generated content. -
SinoSarcasmClassifier: A Multi-View Model for Sarcasm Detection in Chinese Social Media with Emoji Mapping
Zipei Liu, Akira MaedaAbstractDetecting sarcasm in Chinese texts presents unique challenges due to its indirect expression, which complicates accurate identification. Inaccurate assessments of sarcastic content on social media platforms often lead to negative user interactions, highlighting the importance of precise sarcasm detection. Emojis, which are widely used on Chinese social networking platforms such as Sina Weibo (hereafter Weibo), a service similar to X (formerly Twitter) and Facebook, add additional layers to textual communication, conveying emotions, intentions, and potentially sarcasm. However, the lack of well-annotated, high-quality Chinese datasets poses a significant obstacle to effective sarcasm detection, while the contextual complexity of Chinese sarcasm remains a major challenge for current language models. To address these issues, we propose a method that integrates four distinct modules to achieve comprehensive sarcasm detection in Chinese social media comments, particularly short user-generated texts with emoji interactions. Our model leverages emojis as a critical feature and capitalizes on the structural and lexical characteristics of Chinese sarcastic sentences. By incorporating features from both emoji-enhanced and plain text representations, the model demonstrates significantly improved accuracy in detecting sarcasm. Additionally, to support the training, testing, and validation of the system, we constructed a carefully designed dataset comprising multiple subsets. These subsets not only support the model’s training and evaluation but also serve as valuable resources for future research on Chinese sarcasm detection. -
Question-Based Viewing with LLM-Powered Personified Characters: A Role-Playing Dialogue System for Perspective-Taking in Museums
Akito Nakano, Shio Takidaira, Tsukasa Sawaura, Yoshiyuki Shoji, Takehiro Yamamoto, Yusuke Yamamoto, Hiroaki Ohshima, Kenro Aihara, Noriko KandoAbstractThis paper proposes an interactive museum-viewing support system designed to foster diverse perspective-taking through role-play with fictional characters. While museums are widely regarded as valuable informal learning environments, passive viewing often results in low engagement and limited knowledge retention. Inspired by Visual Thinking Strategies (VTS), our tablet-based application leverages large language models (LLMs) to generate context-sensitive, character-driven questions posed by fantasy personas such as elves, dwarves, and werewolves, each with distinctive values and interpretive tendencies. Visitors engage with these questions and are occasionally prompted to assume a character’s role, generating their own questions in that persona’s voice. To explore the feasibility and user response, we conducted a small-scale case study at the National Museum of Ethnology, Japan. Although the number of participants was limited, the results provided valuable qualitative insights: the role-playing interaction increased engagement, encouraged perspective-shifting, and facilitated the generation of more varied and reflective questions. -
Comparative Analysis of Ideological Moderation and Bias in LLM Translation of Controversial Texts
Yulia Levit, Maayan Zhitomirsky-Geffet, Kfir PshititskyAbstractThis study investigates the presence and nature of political bias in the translation outputs of three state-of-the-art large language models (LLMs) ‒ ChatGPT-4, Claude 3.5 Sonnet, and Gemini 2.0 ‒ when translating between Hebrew and English in both directions. Focusing on politically sensitive terminology within the context of news reporting, the research examines how each model renders key lexical items such as “terrorist,” “Judea and Samaria,” and “West Bank,” using translation accuracy as a proxy for bias detection. The article presents promising preliminary results of the data-driven analysis of 100 politically diverse news excerpts and a comparative evaluation of translation tendencies across language directions. The findings reveal distinct patterns of ideological framing among the LLMs. ChatGPT-4 exhibited the highest rate of left-oriented bias in Hebrew-to-English translations and a notable degree of right-oriented bias in English-to-Hebrew translations. In contrast, Claude 3.5 Sonnet and Gemini 2.0 demonstrated greater consistency and neutrality, with minimal evidence of politically biased behavior. These results support prior research suggesting that LLMs’ outputs are sensitive not only to the prompt language but also to the underlying training data in specific languages and moderation systems. This study contributes to emerging discussions on AI ethics, multilingual fairness, and the role of post-processing in bias mitigation. The comparative framework presented here offers a foundation for evaluating political bias in multilingual LLM applications, particularly in low-resource language settings. -
Semi-automatic Assessment of Multiple Viewpoint Representation in Wikidata
Sara Minster, Maayan Zhitomirsky-GeffetAbstractLarge knowledge graphs, such as Wikidata, have immense potential to present all shades of thought and diverse opinions in global public discourse. Understanding and identifying different viewpoints form the basis for research and information systems in various knowledge domains. Hence, this study aims to assess the level of inclusion and representation of multiple viewpoints in Wikidata. This paper proposes a new semi-automatic approach for assessing multiple viewpoint representation within Wikidata, focusing on six inherent mechanisms. The preliminary results reveal that the percentage of items with explicitly presented multiple viewpoints is relatively small compared to the overall number of items in the knowledge base. Wikidata and other large knowledge graphs are widely used as training data and ground truth knowledge bases for AI algorithms and smart decision-making systems. Therefore, building knowledge graphs by ethical principles of inclusion and diversity of viewpoints is a crucial issue. -
Nudgr: A Context-Aware Digital Nudge Intervention to Promote Fact-Checking of GenAI Content
Hamzah Osop, Delia Ching Yee Chia, Chei Sian Lee, Dion Hoe-Lian GohAbstractGenerative artificial intelligence (GenAI) tools are gaining popularity in academia, despite concerns about hallucinations and misinformation, which raises questions about their reliability in educational contexts. A key gap lies in the lack of interventions that encourage critical evaluation of GenAI content. Our study addresses this gap by introducing Nudgr, a digital nudging tool designed to promote fact-checking behaviours when using GenAI tools. Nudgr is implemented as a popover alert within GenAI interfaces. It delivers tailored nudge messages, context-relevant keyword suggestions, and one-click verification buttons to direct learners to verify the accuracy of GenAI responses. We evaluated Nudgr with 13 university students, comprising undergraduates and postgraduates across different disciplines. Overall, Nudgr shows promise as a tool that promotes learners’ engagement to verify AI-generated content critically. Its design fosters trust, reduces cognitive effort, and enables faster and more user-friendly fact-checking. It offers a practical response to the challenges posed by misinformation in GenAI use. -
Dataset Similarity Estimation Using LLM-Based Metadata Embeddings
Koichiro Ito, Shigeki MatsubaraAbstractTo increase the use of research artifacts, such as datasets, it is essential not only to publish them but also to ensure their accessibility. One effective way to enhance accessibility is by revealing relationships between research artifacts, with similarity being a key type of relationship. Identifying the similarities can support the recommendation of research artifacts. A previous study estimated dataset similarity using their metadata. In recent years, methods that estimate sentence similarity through embeddings generated by large language models (LLMs) have achieved strong performance. These findings suggest that LLMs can also be effective for estimating dataset similarity. This paper experimentally verifies the effectiveness of LLMs in estimating similarity between datasets. We implement a similarity estimation method that generates embeddings of dataset metadata using LLMs and computes the cosine similarity between these embeddings. To generate embeddings, we use PromptEOL, which produces high-quality text embeddings by prompting LLMs to capture the meaning of input text in one word. An experiment was conducted to evaluate the estimation performance of the implemented method, and the results demonstrated the effectiveness of the LLM-generated metadata embeddings for estimating dataset similarity -
Automated Book Genre Categorization Using Lightweight Machine Learning: Moving Toward Practical Solutions for Libraries
Yi-Shuai Xu, Yanti Idaya Aspura Mohd Khalid, Muhammad Shahreeza Safiruz KassimAbstractThe rapid growth of digital collections has intensified the need for accurate and efficient book classification in digital libraries, yet manual cataloging remains labor-intensive and resource-demanding. Although deep learning approaches achieve strong performance in text classification, their high computational cost and limited interpretability hinder adoption in real-world library environments, particularly in small and medium-sized libraries with constrained resources. This study explores the feasibility of lightweight machine learning (ML) models as practical and resource-efficient methods for automated book genre classification. A curated subset of the Kaggle Books dataset was preprocessed through data cleaning, normalization, and text vectorization, yielding 56,260 records across multiple categories. A set of ML models was evaluated for their effectiveness in automated genre classification. Experimental results show that Logistic Regression outperformed other models, followed by Ridge, LinearSVC, Multinomial Naïve Bayes, and K-Nearest Neighbors, whereas tree-based models demonstrated relatively lower effectiveness and higher computational costs. These findings validate the applicability of linear and probabilistic models for bibliographic categorization, offering a practical entry point for libraries that have not yet explored automation. This research bridges the gap between traditional cataloging and AI-driven knowledge organization by demonstrating that lightweight ML models can serve as effective decision-support tools, particularly for resource-constrained libraries. While full automation remains challenging due to the stringent demands of accuracy and interpretability, incremental adoption of interpretable, resource-efficient models offers a realistic pathway toward Human-in-the-Loop paradigms, mitigating misclassification risks while advancing digital libraries toward more adaptive and intelligent services. -
When Do We Talk to AI? A User-Centric Analysis of ChatGPT Interaction Patterns
Avshalom Elmalech, Ronyt Gomez, Israel KleinAbstractAs generative AI systems like ChatGPT become integrated into daily routines, understanding how different users engage with these tools over time is essential for designing future information services. This paper presents an empirical user study examining ChatGPT usage patterns across a diverse set of 38 participants. Using exported chat histories, we analyze over 82,000 prompts spanning up to 21 months per participant. Our findings reveal distinct interaction behaviors tied to age, gender, and time of day. Notably, older users exhibit stronger work-week patterns in usage, while women show higher activity at night. These results highlight the roles generative AI plays in users’ lives, extending beyond productivity into moments of reflection, support, and multitasking.
-
- Title
- Intelligence and Equity: Shaping the Future of Knowledge
- Editors
-
Sanghee Oh
Antoine Doucet
Marut Buranarach
Iyra Buenrostro-Cabbab
Yuenan Liu
Benedict Salazar Olgado
- Copyright Year
- 2026
- Publisher
- Springer Nature Singapore
- Electronic ISBN
- 978-981-9548-61-3
- Print ISBN
- 978-981-9548-60-6
- DOI
- https://doi.org/10.1007/978-981-95-4861-3
PDF files of this book have been created in accordance with the PDF/UA-1 standard to enhance accessibility, including screen reader support, described non-text content (images, graphs), bookmarks for easy navigation, keyboard-friendly links and forms and searchable, selectable text. We recognize the importance of accessibility, and we welcome queries about accessibility for any of our products. If you have a question or an access need, please get in touch with us at accessibilitysupport@springernature.com.