Information Integration and Web Intelligence
27th International Conference, iiWAS 2025, Matsue, Japan, December 8–10, 2025, Proceedings
- 2026
- Book
- Editors
- Eric Pardede
- Qiang Ma
- Gabriele Kotsis
- Toshiyuki Amagasa
- Akiyo Nadamoto
- Ismail Khalil
- Book Series
- Lecture Notes in Computer Science
- Publisher
- Springer Nature Switzerland
About this book
This book constitutes the refereed proceedings of the 27th International Conference on Information Integration and Web Intelligence, iiWAS 2025, held in Matsue, Japan, during December 8–10, 2025. The 23 full papers, 12 short papers and 1 keynote paper included in this book were carefully reviewed and selected from 79 submissions. They were organized in topical sections as follows: Keynote; Foundations of AI and Data Intelligence; Knowledge, Reasoning, and Human Interaction; Emerging Technologies and Applied Innovation; Creative and Generative AI.
Table of Contents
-
Frontmatter
-
Keynote
-
Frontmatter
-
Survival Informatics: Reliable Social Media Analysis for Societal Well-Being
Takako HashimotoAbstractIn the era of AI, pandemics, and frequent disasters, informatics is no longer a matter of efficiency alone but of survival. We introduce Survival Informatics, a novel academic approach that emphasizes human well-being and societal resilience. Survival Informatics goes beyond traditional informatics by addressing fundamental issues of trust, inclusiveness, and sustainability, and by integrating ethical, legal, social, and economic perspectives into technical development. As one representative field where Survival Informatics can be practically implemented, the analysis of people’s reactions on social media provides a powerful means to capture public perceptions in real time and to link informatics research directly with societal well-being. Motivated by this perspective, we develop a stepwise technical framework that operationalizes the principles of Survival Informatics, highlighting a two-stage clustering approach for large-scale discourse analysis. Through a comprehensive case study of COVID-19 vaccine discourse in Japan, analyzing 32 million tweets, we demonstrate methodological innovations in scalability, reproducibility, and consistency. Finally, we discuss the broader implications of Survival Informatics for social implementation, including real-time public opinion monitoring, misinformation detection, and policy integration.
-
-
Foundations of AI and Data Intelligence
-
Frontmatter
-
Supplementing Product Reviews: Retrieving Opinions from Products with Similar Attributes
Marino Fujii, Takehiro Yamamoto, Takayuki YumotoAbstractIn this study, we propose a method to help understand products that don’t have many reviews. Specifically, we use an LLM to retrieve reviews of similar products. This helps users get more useful information when they are thinking of buying a product. First, when the user inputs a product and a question, the system uses the LLM to find the attributes that are related to the user’s question. Next, the system calculates the similarity of the attributes and find similar products. After that, by using an LLM, the system retrieves opinions from reviews of the similar products that are related to these attributes. Finally, the system ranks the opinions that were judged to be related to the attribute queries and shows them as the search results. This adds more useful information to support the target product. In the experiment, we compared the similarity between the opinions retrieved by the proposed method and the actual opinions that the target product has. As a result, opinions retrieved from similar products were slightly more similar to the actual opinions than those from randomly selected products. -
Generating Comparative Table by LLM-Based Product Review Summarization
Kanako Nakai, Takehiro Yamamoto, Hiroaki OhshimaAbstractIn this study, we propose a method for generating a comparative table by summarizing product reviews using a Large Language Model (LLM). A comparative table summarizes, for each aspect of two products, the evaluations present in reviews and the number of reviews for each evaluation. By using an LLM to create the table, it becomes possible to generate the table on demand and to change the aspects to be compared depending on the products. The proposed method employs an LLM to extract aspects and their associated evaluations from reviews. These evaluations are summarized for each aspect to create a table mapping evaluations to their respective aspects. We used a review dataset from Rakuten Ichiba and automatically summarized reviews using an LLM to generate comparative tables. We conducted a user study to verify whether the tables are useful for comparing products. Based on the results of the user study, the proposed method was found to be more helpful for comparison than the baseline. -
Expanding Aspect Queries into Review Sentence Fragments for Product Comparison via LLM-Generated Synthetic Reviews
Naito Yoshihara, Takehiro Yamamoto, Yoshiyuki ShojiAbstractThis paper proposes a method for retrieving diverse real-world user reviews that refer to a specific Aspect Query representing a user’s information need. Given a short Aspect Query, such as “practicality,” the system generates a variety of Sentence Fragment queries, e.g., “*able for da*” to retrieve phrases such as “suitable for daily use” or “comfortable for daytime work.” These Sentence Fragments act as wildcard-like queries and are particularly effective in languages like Japanese, where inflection and agglutinative structures make exact keyword matching challenging. To construct such fragments, we first use a large-scale language model (LLM) to generate a large number of synthetic Aspect Query–review sentence pairs. These pairs are filtered to retain only high-quality examples, which are subsequently used to fine-tune a lightweight local LLM. The fine-tuned model generates synthetic reviews for arbitrary Aspect Queries, from which Sentence Fragments that are frequent in the synthetic reviews but rare in general reviews are extracted and used as expanded queries. A user study on a real-world review dataset demonstrates that our method enables the retrieval of diverse reviews without compromising accuracy, effectively bridging the lexical gap between abstract Aspect Queries and concrete review expressions. -
Learning Disentangled Document Representations Based on a Classical Shallow Neural Encoder
Yuro Kanada, Sumio Fujita, Yoshiyuki ShojiAbstractThis paper proposes a document embedding method designed to obtain disentangled distributed representations. The resulting representations are expected to satisfy two key criteria: independence across dimensions and semantic interpretability of each dimension. We enhanced a classic shallow neural network-based embedding model with two modifications: 1) guidance task integration, where the network is trained to perform both a simple auxiliary metadata prediction task and a surrounding term prediction task simultaneously, and 2) loss regularization for independence, where the loss function includes both prediction accuracy and the independence across dimensions (i.e., the Kullback-Leibler divergence from a multivariate normal distribution). We evaluated the proposed method through both automatic and human-subject experiments using synthetic datasets and movie review texts. Experimental results show that even shallow neural networks can generate disentangled representations when dimensional independence is explicitly promoted. -
Generating Interactive Japanese Puns Based on Phoneme Similarity
Yilin Wang, Takehiro Yamamoto, Hiroaki OhshimaAbstractDajare, a form of Japanese pun, utilizes phonetically identical or similar words and phrases with different meanings to create a funny effect. This paper presents a method for generating interactive Dajare, a Japanese pun formed by a response that includes a segment phonetically similar to a part of the original utterance. Our approach involves retrieving candidate words and phrases based on phoneme similarity. We then leverage large language models (LLMs) to generate response sentences for these candidates, which are subsequently ranked according to conversational naturalness to select the most suitable interactive Dajare. Our experiment revealed that the proposed approach, by explicitly providing a phonetically similar segment, makes it easier for annotators to identify interactive Dajare compared to the baseline, which generates them without providing such a segment. This suggests that our method more effectively produces recognizable interactive Dajare. Additionally, the results showed that word selection positively influences the perceived cleverness (the ingenuity of transforming words with similar pronunciations) of interactive Dajare than on its funniness. Furthermore, current LLMs still fall significantly short of human capabilities in generating genuinely funny text. -
Japanese Rhyme Generation Based on Mora Similarity and Generation Probability
Ryota Mibayashi, Takehiro Yamamoto, Hiroaki OhshimaAbstractThis paper proposes a method for Japanese rhyme generation. A rhyme is defined as a pair of words with similar phonetic patterns and is widely used to enhance creative writing. To support creative writing, several rhyme search services are available on the web. However, these services typically rely on predefined word lists and search only for strict vowel-level matches. This approach limits their usefulness in creative applications. Therefore, this study proposes a Japanese rhyme generation method that supports not only strict vowel-level matches but also the generation of words outside predefined word lists. We use a GPT-2 model and control the token generation process to follow a given phoneme sequence, resulting in rhyme generation. The proposed method outperformed all other methods on 100 test inputs, achieving the best performance in both CER and mora similarity, and successfully generated rhymes for all test cases. -
Addressing Label Scarcity: Hybrid Anomaly Detection in Mental Healthcare Billing
Samirah Bakker, Yao Ma, Seyed Sahand Mohammadi ZiabariAbstractThe complexity of mental healthcare billing enables anomalies, including fraud. While machine learning methods have been applied to anomaly detection, they often struggle with class imbalance, label scarcity, and complex sequential patterns. This study explores a hybrid deep learning approach combining Long Short-Term Memory (LSTM) networks and Transformers, with pseudo-labeling via Isolation Forests (iForest) and Autoencoders (AE). Prior work has not evaluated such hybrid models trained on pseudo-labeled data in the context of healthcare billing. The approach is evaluated on two real-world billing datasets related to mental healthcare. The iForest LSTM baseline achieves the highest recall (0.963) on declaration-level data. On the operation-level data, the hybrid iForest-based model achieves the highest recall (0.744), though at the cost of lower precision. These findings highlight the potential of combining pseudo-labeling with hybrid deep learning in complex, imbalanced anomaly detection settings. -
BATT2GRAPH: A Hybrid CNN-LSTM and Temporal Graph-Based Approach for Lithium-Ion Battery SOH Prediction and Anomaly Detection
Hajer Akid, Mohamed Wadhah Mabrouk, Slimane Arbaoui, Ahmed Samet, Boudour AmmarAbstractThe rapid adoption of electric vehicles (EVs) underscores the growing need for reliable battery health monitoring systems to ensure safety, optimize performance, and extend operational lifespan. In this paper, we introduce BATT2GRAPH, a novel approach that combines a temporal graph-based representation with a CNN-LSTM predictive model for accurate State-of-Health (SOH) estimation and anomaly detection in lithium-ion batteries (LIBs). On one hand, BATT2GRAPH constructs a temporal property graph using Neo4j to store enriched charge-discharge cycles with both raw time-series data and aggregated statistical indicators, enabling interpretable SOH monitoring and anomaly detection through expressive Cypher queries. On the other hand, a hybrid CNN-LSTM model is trained on this data to capture fine-grained variations and long-term degradation trends. Extensive experiments on the Stanford-MIT battery aging dataset demonstrate that our approach consistently outperforms existing baselines across multiple evaluation metrics. -
CoRA: Continual Learning for Multimodal Sensing with a Case Study in Mental Health
Tarannum Ara, Bivas MitraAbstractPhysiological sensing is essential for mental health monitoring, but models often degrade over time due to user behavior changes, sensor noise, and contextual variation. We propose CoRA (Continual and Regularized Adaptation), a lightweight continual learning framework that monitors latent feature drift using class-wise KL divergence and selectively retrains a downstream classifier with Elastic Weight Consolidation (EWC) to prevent forgetting. CoRA operates on top of a pretrained encoder, enabling efficient adaptation without storing raw past samples. In stress detection experiments on LifeSnaps, DAPPER, and WESAD, CoRA improves F1-score by up to 10.4% while reducing retraining overhead by over 40%, demonstrating a robust, personalized solution for real-world physiological monitoring. -
Peak Pattern Based Similarity Search for High-Dimensional Spectral Data
Kohei Asano, Yuki Toyosaka, Kai ChengAbstractSimilarity search in high-dimensional spectral datasets is critical for applications in analytical chemistry, bioinformatics, and material science. Conventional methods often struggle with variability in peak positions, intensities, and noise, limiting their effectiveness for large-scale spectral comparison. In this paper, we propose a peak pattern based similarity search framework that abstracts spectra into robust peak representations and performs flexible, metric-based comparisons. The approach integrates preprocessing techniques such as noise filtering, normalization, and peak detection, followed by peak alignment with tunable tolerance N and matching threshold \(\delta \). Similarity is quantified using intensity-weighted metrics designed to accommodate spectral distortions and scaling variations. Experimental validation on real-world high-dimensional spectral datasets demonstrates that the framework achieves efficient and accurate retrieval of similar spectra. Parameter analysis highlights the impact of alignment tolerance and intensity weighting on retrieval performance, showing improved robustness against noise and peak shifts. -
Fast Approximate Aggregation with Error Guarantee Using Encoded Bit-Slice Indexing
Kakeru Ito, Ryogo Maeda, Qiong Chang, Jun MiyazakiAbstractWe propose error-range-guarantee approximate-aggregation methods called patch based encoding plus deterministic approximate querying (PBE+DAQ) and its extension, PBE+DAQ/WN (Wide-Narrow), which perform better than conventional DAQ by reducing the amount of data to be computed.With the increase in data volume, fast data analysis is required, and aggregation operations play an important role in data analysis. However, the larger the data volume, the longer time is required for aggregation operations. In many cases, while fast approximate-aggregation operations are required rather than accurate operations, the approximation error must be guaranteed. We use PBE for compressing the majority of data for faster aggregation operations and DAQ for error-guarantee approximation. We also developed cost models for the two methods as well as for conventional DAQ. We implemented the proposed methods and conducted experiments using real-world datasets. The experimental results indicate that the execution times of PBE+DAQ and PBE+DAQ/WN are 1.1x to 1.2x faster than that of DAQ while guaranteeing the error range of the aggregation results. -
Retrieving More Concrete Product Reviews by Query Rewriting with Retrieved Review Concretization
Tomoya Fukui, Takehiro Yamamoto, Takayuki YumotoAbstractIn this study, we propose a method for retrieving concrete reviews for a product, using an abstract review as a query. However, it is difficult to retrieve concrete reviews from keywords or abstract reviews, as simple sparse or dense retrieval methods cannot account for concreteness. Therefore, we propose a method called Query Rewriting with Retrieved Review Concretization (QR-ReReC), which utilizes retrieved reviews to rewrite the original query. For the experiment, we constructed a dataset to evaluate the effectiveness of QR-ReReC. The results showed that QR-ReReC is more effective for retrieving more concrete reviews than the retrieval methods without query rewriting and pseudo relevance feedback.
-
- Title
- Information Integration and Web Intelligence
- Editors
-
Eric Pardede
Qiang Ma
Gabriele Kotsis
Toshiyuki Amagasa
Akiyo Nadamoto
Ismail Khalil
- Copyright Year
- 2026
- Publisher
- Springer Nature Switzerland
- Electronic ISBN
- 978-3-032-11976-6
- Print ISBN
- 978-3-032-11975-9
- DOI
- https://doi.org/10.1007/978-3-032-11976-6
PDF files of this book have been created in accordance with the PDF/UA-1 standard to enhance accessibility, including screen reader support, described non-text content (images, graphs), bookmarks for easy navigation, keyboard-friendly links and forms and searchable, selectable text. We recognize the importance of accessibility, and we welcome queries about accessibility for any of our products. If you have a question or an access need, please get in touch with us at accessibilitysupport@springernature.com.