Skip to main content
Top

2024 | Book

Advances in Knowledge Discovery and Data Mining

28th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2024, Taipei, Taiwan, May 7–10, 2024, Proceedings, Part IV

Editors: De-Nian Yang, Xing Xie, Vincent S. Tseng, Jian Pei, Jen-Wei Huang, Jerry Chun-Wei Lin

Publisher: Springer Nature Singapore

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The 6-volume set LNAI 14645-14650 constitutes the proceedings of the 28th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2024, which took place in Taipei, Taiwan, during May 7–10, 2024.

The 177 papers presented in these proceedings were carefully reviewed and selected from 720 submissions. They deal with new ideas, original research results, and practical development experiences from all KDD related areas, including data mining, data warehousing, machine learning, artificial intelligence, databases, statistics, knowledge engineering, big data technologies, and foundations.

Table of Contents

Frontmatter

Financial Data

Frontmatter
Look Around! A Neighbor Relation Graph Learning Framework for Real Estate Appraisal
Abstract
Real estate appraisal is a crucial issue for urban applications, aiming to value the properties on the market. Recently, several methods have been developed to automatize the valuation process by taking the property trading transaction into account when estimating the property value to mitigate the efforts of hand-crafted design. However, existing methods 1) only consider the real estate itself, ignoring the relation between the properties. Moreover, naively aggregating the information of neighbors fails to model the relationships between the transactions. To tackle these limitations, we propose a novel Neighbor Relation Graph Learning Framework (ReGram) by incorporating the relation between target transaction and surrounding neighbors with the attention mechanism. To model the influence between communities, we integrate the environmental information and the past price of each transaction from other communities. Since the target transactions in different regions share some similarities and differences of characteristics, we introduce a dynamic adapter to model the different distributions of the target transactions based on the input-related kernel weights. Extensive experiments on the real-world dataset with various scenarios demonstrate that ReGram robustly outperforms the state-of-the-art methods.
Chih-Chia Li, Wei-Yao Wang, Wei-Wei Du, Wen-Chih Peng
Multi-time Window Ensemble and Maximization of Expected Return for Stock Movement Prediction
Abstract
This paper proposes a novel end-to-end model that predicts stock movements, MERTE: Maximization of Expected Returns in multi-Time window Ensemble for stock movement prediction. MERTE is based on three main ideas: 1) an ensemble framework to capture multiple time-based momentums; 2) consolidating the expected return of trading to a loss function; and 3) learning correlations between the stocks without pre-defined knowledge. MERTE consists of several base learners with the same neural network structure, but each receives an input of a different time-sequential length. The base learner specializes in learning the time momentum inherent in its given time window, and it also learns trading performance throughout our proposed loss function. The base learner consists of two attention mechanisms to learn correlations and dynamics of the stock movements without any domain knowledge. Experimental results report that MERTE outperforms baseline models, yielding superior trading gains on almost all six real-world datasets.
Kanghyeon Seo, Seungjae Lee, Woo Jin Cho, Yoojeong Song, Jihoon Yang
MOT: A Mixture of Actors Reinforcement Learning Method by Optimal Transport for Algorithmic Trading
Abstract
Algorithmic trading refers to executing buy and sell orders for specific assets based on automatically identified trading opportunities. Strategies based on reinforcement learning (RL) have demonstrated remarkable capabilities in addressing algorithmic trading problems. However, the trading patterns differ among market conditions due to shifted distribution data. Ignoring multiple patterns in the data will undermine the performance of RL. In this paper, we propose MOT, which designs multiple actors with disentangled representation learning to model the different patterns of the market. Furthermore, we incorporate the Optimal Transport (OT) algorithm to allocate samples to the appropriate actor by introducing a regularization loss term. Additionally, we propose Pretrain Module to facilitate imitation learning by aligning the outputs of actors with expert strategy and better balance the exploration and exploitation of RL. Experimental results on real futures market data demonstrate that MOT exhibits excellent profit capabilities while balancing risks. Ablation studies validate the effectiveness of the components of MOT.
Xi Cheng, Jinghao Zhang, Yunan Zeng, Wenfang Xue
Agent-Based Simulation of Decision-Making Under Uncertainty to Study Financial Precarity
Abstract
Financial insecurity in the U.S. is on the rise, accelerated by the growth of the gig economy and the associated income instability, increasing inequality, and the effects of algorithmic decision-making. Such insecurity has been studied within the framework of precarity – a concept that captures people’s latent uncertainty and precariousness. To alleviate precarity, we must study it. Precarity manifests over time as a sequence of events for an individual. Therefore, we must study individual trajectories, rather than the trajectory of aggregate properties of populations or snapshot analysis of an automated decision process. Doing so requires an agent behavior model that can simulate a number of related phenomena simultaneously: how individual consumption reacts to uncertainty in one’s financial status, how predictive tools impact income, and how utility-maximizing individuals behave in the long term. In this paper, we develop an agent-based simulation framework with realistic elements to examine the dynamics of precarity. Our model combines different threads of inquiry in economics and incorporates models of consumption, ruin, and investment. Our results illustrate how precarity, if ignored by policy-makers, can exacerbate the ill-effects of automated decision-making. Our framework also allows us to experiment with different strategies to mitigate precarity and evaluate their effectiveness.
Pegah Nokhiz, Aravinda Kanchana Ruwanpathirana, Neal Patwari, Suresh Venkatasubramanian

Information Retrieval and Search

Frontmatter
Semantic Completion: Enhancing Image-Text Retrieval with Information Extraction and Compression
Abstract
Image-text retrieval is an essential branch in the field of information retrieval, facing the serious challenge of the cross-modal semantic gap. Although significant progress has been made in recent years, most research has ignored an essential problem: text as image description is incomplete, so the problem of semantic loss between image and text still exists. In this paper, we propose a novel information extraction and compression based image-text retrieval method to alleviate the above problem. The method aims to bridge the semantic gap between the two modalities by generating rich and high-quality semantic descriptions from a set of related sentences via an information extraction and compression module. To validate the effectiveness of the method, we conducted extensive experiments on the Flickr30K and MSCOCO datasets. The experimental results show that our method achieves significant performance improvement in image text retrieval with an appropriate fusion ratio. When the amount of pre-trained images is 4M, the evaluation metrics of our method improve by at least 4.22% and 3.69% compared to the baseline method. This further confirms the advantages and potential of our method in solving the semantic loss problem in image-text retrieval.
Xue Chen, Yi Guo
Fast Edit Distance Prediction for All Pairs of Sequences in Very Large NGS Datasets
Abstract
All the known edit distance calculation algorithms run in near quadratic time with respect to sequence length. For very large number of sequences such as next generation sequencing (NGS) datasets, all pair edit distance calculation based on near quadratic run time may take days or weeks. To solve this performance bottleneck problem, several sub-quadratic run time algorithms have been proposed. Recently, Pramanik et al. [1] has proposed fast reference sequence based edit distance prediction method which addresses this performance bottleneck problem. They are very effective for correctly predicting smaller edit distances (useful, for example, clustering NGS datasets with low threshold) but less effective for larger edit distances. In this paper, we propose faster edit distance prediction method based on a very small number of special reference sequences. These sequences are very effective for predicting close to 100% accuracy. They require several novel techniques based on non-matching sub sequences. We have provided Propositions and Theorems to justify the basis for developing these novel techniques. Using these strategies, we are able to develop a linear time edit distance prediction method with respect to sequence length.
A. K. M. Tauhidul Islam, Sakti Pramanik
MixCL: Mixed Contrastive Learning for Relation Extraction
Abstract
Entity representation plays a fundamental role in modern relation extraction models. Previous efforts usually explicitly distinguish entities from contextual words, e.g., by introducing position embedding w.r.t. entities or surrounding entities with special tokens. Inspired by this observation, we propose improving relation extraction via a novel entity-level contrastive learning, which contrasts an entity with both other ones and its contextual words in a mini-batch. To generate high-quality negatives for contrast, we equip our entity-level contrastive learning with an innovative Mixup strategy, which interpolates feature representations of negative entities and contextual words to create new diversified negative examples. Extensive experiments on TACRED, TACRED-revisited, and SemEval2010 show that our method delivers robust performance improvements base on a strong relation extraction baseline. Furthermore, we propose a new metric to measure the overall hardness of the negative examples by considering their dissimilarities with the anchor instance as well as their diversities, explaining the superiority of our method in-depth.
Jinglei Zhang, Bo Li, Xixin Cao, Minghui Zhang, Wen Zhao
Decomposing Relational Triple Extraction with Large Language Models for Better Generalization on Unseen Data
Abstract
Despite the significant achievements of existing methods for relational triple extraction (RTE), most of them only have weak generalization for the data unseen or partially seen in the training set. Recently, large language models (LLMs) have attracted more interests for their powers in various natural language processing tasks, but they still suffer from some issues, including less sensitivity to the order of subjects and objects and unsatisfactory output formats. To harness LLMs to achieve more accurate RTE especially on the unseen data, we propose a novel framework LRTE in which the RTE task is decomposed into three sub-tasks of a pipeline: relation extraction, entity extraction, and triple filtering. To evaluate all models’ RTE performance more truthfully, we also refined two RTE benchmarks through removing noisy triples and complementing the missing triples. Our extensive experiments upon the refined datasets demonstrate our framework’s superior performance over the previous competitors.
Boyu Meng, Tianhe Lin, Deqing Yang
Multi-Query Person Search with Transformers
Abstract
We propose a transformer-based multi-query person search (MQPS) method that jointly performs person detection and person re-identification (re-id) in an end-to-end framework. Most existing person search methods employ hand-crafted components and involve multiple steps and stages to detect and identify the target person, which are computationally inefficient and brutal to generalise to different datasets. The recent advance in end-to-end object detection with transformers, mainly the DETR family, employ object queries to learn objects and directly predict a set of bounding boxes and object classes. However, this approach uses one object query per object so that the detected object is centred around the object spatial location, which is not ideal for small and occluded objects during feature representation learning. Therefore, we propose a multi-query method for person detection and person feature representation learning. Specifically, MQPS utilises multiple adjacent object queries to learn a target person object with multi-scale features. Moreover, to improve the feature representation learning of intra-identity objects, we employ a margin ranking loss to bring closer the intra-identity person instances in the feature space. Experiments on CUHK-SYSU and PRW datasets demonstrate the effectiveness of the proposed method.
Ying Chen, Zhihui Li, Andy Song
BioReX: Biomarker Information Extraction Inspired by Aspect-Based Sentiment Analysis
Abstract
Biomarkers are critical in cancer diagnosis, prognosis, and treatment planning. However, this information is often buried in unstructured text form. In this paper, we make an analogy between Biomarker Information Extraction and Aspect-Based Sentiment Analysis. We propose a system, Biomarker and Result Extraction Model (BioReX). BioReX employs BERT post-training methods to augment the BioBERT model with domain-specific and task-specific knowledge for biomarker extraction. It uses syntactic-based and semantic-based attention to associate results to corresponding biomarkers. Evaluation demonstrates the effectiveness of the proposed approach.
Weiting Gao, Xiangyu Gao, Wenjin Chen, David J. Foran, Yi Chen
IR Embedding Fairness Inspection via Contrastive Learning and Human-AI Collaborative Intelligence
Abstract
Xiaohongshu’s search daily serves tens of millions active users in social networks, that presents a challenge to existing log-based embedding based retrieval (EBR) system: how to endorse individual document exposure fairness to diversify the search results. Conventional EBR models optimize relevance between query and document by leveraging massive user behavior data, e.g. clicks, purchase, etc., however, search log derived retrieval outcomes can deviate from true relevance distribution, that may result in less opportunity to retrieve for low-popularity or long-tailed documents. To address this problem, in this study, we propose a novel semi-supervised model, Gaussian process based contrastive learning (GPCL), which minimizes the discrepancy between model prediction distribution and true relevance distribution via taking advantage of contrastive samples adaptively generated from small human-labeled data. We validated the effectiveness of the proposed methodology by comparing with a set of baselines and observed significant metrics gains via online A/B testing. We discuss the entire system including model deployment and parameter-tuning. Also the new dataset is publicly available, which associates manually labeled relevance samples and massive click-logs.
Heng Huang, Yunhan Bai, Hongwei Liang, Xiaozhong Liu
SemPool: Simple, Robust, and Interpretable KG Pooling for Enhancing Language Models
Abstract
Knowledge Graph (KG) powered question answering (QA) performs complex reasoning over language semantics as well as knowledge facts. Graph Neural Networks (GNNs) learn to aggregate information from the underlying KG, which is combined with Language Models (LMs) for effective reasoning with the given question. However, GNN-based methods for QA rely on the graph information of the candidate answer nodes, which limits their effectiveness in more challenging settings where critical answer information is not included in the KG. We propose a simple graph pooling approach that learns useful semantics of the KG that can aid the LM’s reasoning and that its effectiveness is robust under graph perturbations. Our method, termed SemPool, represents KG facts with pre-trained LMs, learns to aggregate their semantic information, and fuses it at different layers of the LM. Our experimental results show that SemPool outperforms state-of-the-art GNN-based methods by 2.27% accuracy points on average when answer information is missing from the KG. In addition, SemPool offers interpretability on what type of graph information is fused at different LM layers.
Costas Mavromatis, Petros Karypis, George Karypis

Medical and Biological Data

Frontmatter
Spatial Gene Expression Prediction Using Multi-Neighborhood Network with Reconstructing Attention
Abstract
Spatial transcriptomics (ST) has made it possible to link local spatial gene expression with the properties of tissue, which is very helpful to the research of histopathology and pathology. To obtain more ST data, we utilize deep learning methods to predict gene expression on tissue slide images. Considering the importance of the dependence of local tissue images on their neighborhoods, we propose the novel Multi-Neighborhood Network (MNN), composed of down-sampling module and vanilla Transformer blocks. Moreover, to satisfy the needs of architecture and address the computational and parameter challenges arising from it, we introduce dual-scale attention block and reconstructing attention block. To demonstrate the effectiveness of this network structure and the superiority of attention mechanisms, we conducted comparative experiments, where MNN achieved optimal PCC@M \((1\times 10^1)\) of 9.23 and 8.54 for the lung cancer and mouse brain datasets of 10x Genomics website, respectively, outperforming several state-of-the-art (SOTA) methods. This reveals the superiority of our method in terms of spatial gene prediction.
Panrui Tang, Zuping Zhang, Cui Chen, Yubin Sheng
APFL: Active-Passive Forgery Localization for Medical Images
Abstract
Medical image forgery has become an urgent issue in academia and medicine. Unlike natural images, images in the medical field are so sensitive that even minor manipulation can produce severe consequences. Existing forgery localization methods often rely on a single image attribute and suffer from poor generalizability and low accuracy. To this end, we propose a novel active-passive forgery localization (APFL) algorithm to locate the forgery region of medical images attacked by three common forgeries: splicing, copy-move and removal. It involves two modules: a) active forgery localization, we utilize reversible watermarking to locate the fuzzy forgery region, and b) passive forgery localization, we train a lightweight model named KDU-Net through knowledge distillation to precisely locate the forgery region in the fuzzy localization result extracted by active forgery localization. The lightweight KDU-Net as a student model can achieve similar performance to RRU-Net as a teacher model, while its model capacity is only \( 24.6\%\) of RRU-Net, which facilitates fast inference for medical diagnostic devices with limited computational power. Since there are no publicly available medical tampered datasets, we manually produce tampered medical images from the real-world Ophthalmic Image Analysis (OIA) fundus image dataset. The experimental results present that APFL achieves satisfied forgery localization accuracy under the three common forgeries and shows robustness to rotation and scaling post-processing attacks.
Nan Wang, Jiaqi Shi, Liping Yi, Gang Wang, Ming Su, Xiaoguang Liu
A Universal Non-parametric Approach for Improved Molecular Sequence Analysis
Abstract
In the field of biological research, it is essential to comprehend the characteristics and functions of molecular sequences. The classification of molecular sequences has seen widespread use of neural network-based techniques. Despite their astounding accuracy, these models often require a substantial number of parameters and more data collection. In this work, we present a novel approach based on the compression-based Model, motivated from [1], which combines the simplicity of basic compression algorithms like Gzip and Bz2, with Normalized Compression Distance (NCD) algorithm to achieve better performance on classification tasks without relying on handcrafted features or pre-trained models. Firstly, we compress the molecular sequence using well-known compression algorithms, such as Gzip and Bz2. By leveraging the latent structure encoded in compressed files, we compute the Normalized Compression Distance between each pair of molecular sequences, which is derived from the Kolmogorov complexity. This gives us a distance matrix, which is the input for generating a kernel matrix using a Gaussian kernel. Next, we employ kernel Principal Component Analysis (PCA) to get the vector representations for the corresponding molecular sequence, capturing important structural and functional information. The resulting vector representations provide an efficient yet effective solution for molecular sequence analysis and can be used in ML-based downstream tasks. The proposed approach eliminates the need for computationally intensive Deep Neural Networks (DNNs), with their large parameter counts and data requirements. Instead, it leverages a lightweight and universally accessible compression-based model. Also, it performs exceptionally well in low-resource scenarios, where limited labeled data hinder the effectiveness of DNNs. Using our method on the benchmark DNA dataset, we demonstrate superior predictive accuracy compared to SOTA methods.
Sarwan Ali, Tamkanat E Ali, Prakash Chourasia, Murray Patterson
Dynamic GNNs for Precise Seizure Detection and Classification from EEG Data
Abstract
Diagnosing epilepsy requires accurate seizure detection and classification, but traditional manual EEG signal analysis is resource-intensive. Meanwhile, automated algorithms often overlook EEG’s geometric and semantic properties critical for interpreting brain activity. This paper introduces NeuroGNN, a dynamic Graph Neural Network (GNN) framework that captures the dynamic interplay between the EEG electrode locations and the semantics of their corresponding brain regions. The specific brain region where an electrode is placed critically shapes the nature of captured EEG signals. Each brain region governs distinct cognitive functions, emotions, and sensory processing, influencing both the semantic and spatial relationships within the EEG data. Understanding and modeling these intricate brain relationships are essential for accurate and meaningful insights into brain activity. This is precisely where the proposed NeuroGNN framework excels by dynamically constructing a graph that encapsulates these evolving spatial, temporal, semantic, and taxonomic correlations to improve precision in seizure detection and classification. Our extensive experiments with real-world data demonstrate that NeuroGNN significantly outperforms existing state-of-the-art models.
Arash Hajisafi, Haowen Lin, Yao-Yi Chiang, Cyrus Shahabi
A Novel Population Graph Neural Network Based on Functional Connectivity for Mental Disorders Detection
Abstract
Accurate and rapid clinical confirmation of psychiatric disorders based on imaging, symptom and scale data has long been difficult. Graph neural networks have received increasing attention in recent years due to their advantages in processing unstructured relational data, especially functional magnetic resonance imaging data. However, all existing methods have certain drawbacks. Individual graph methods are able to provide important biomarkers based on functional connectivity modelling, but their accuracy is low. Population graph methods, which improve the prediction performance by considering the similarity between patients, lack clinical interpretability. In this study, we propose a functional connectivity-based population graph (FCP-GNN) approach that possesses excellent classification capabilities while also providing significant biomarkers for clinical reference. The proposed method is divided into two phases. In the first phase, brain region features are learned hemispherically and used to identify biomarkers through a local-global dual-channel pooling layer. In the second phase, a heterogeneous population map is constructed based on gender. The feature information of same-sex and opposite-sex neighbours is learned separately using a hierarchical feature aggregation module to obtain the final embedding representation. The experiment results show that FCP-GNN achieves state-of-the-art performance in classification prediction work on two public datasets.
Yuheng Gu, Shoubo Peng, Yaqin Li, Linlin Gao, Yihong Dong
Weighted Chaos Game Representation for Molecular Sequence Classification
Abstract
Molecular sequence analysis is a crucial task in bioinformatics and has several applications in drug discovery and disease diagnosis. However, traditional methods for molecular sequence classification are based on sequence alignment, which can be computationally expensive and lack accuracy. Although alignment-free methods exist, they usually do not take full advantage of deep learning (DL) models since DL models traditionally perform below power on tabular data compared to their effectiveness on image-based data. To address this, we propose a novel approach to classify molecular sequences using a Chaos Game Representation (CGR)-based approach. We utilize k-mers-based frequency chaos game representation (FCGR) to generate 2D images for molecular sequences. Additionally, we incorporate scaling features for the sliding windows, including Kyte and Doolittle (KD) hydropathy scale, Eisenberg hydrophobicity scale, Hydrophilicity scale, Flexibility of the characters, and Hydropathy scale, to assign weights to the k-mers. By selecting multiple features, we aim to improve the accuracy of molecular sequence classification models. The motivations to incorporate weights for the k-mers in the molecular sequence analysis are the fact that different k-mers may have different levels of importance or relevance to the classification task at hand and that incorporating additional information, such as hydropathy scales, could improve the accuracy of classification models. The proposed method shows promising results in molecular sequence classification by outperforming the baseline methods and provides a new direction for analyzing sequences using image classification techniques.
Taslim Murad, Sarwan Ali, Murray Patterson
Robust Influence-Based Training Methods for Noisy Brain MRI
Abstract
Correctly classifying brain tumors is imperative to the prompt and accurate treatment of a patient. While several classification algorithms based on classical image processing or deep learning methods have been proposed to rapidly classify tumors in MR images, most assume the unrealistic setting of noise-free training data. In this work, we study a difficult but realistic setting of training a deep learning model on noisy MR images to classify brain tumors. We propose two training methods that are robust to noisy MRI training data, Influence-based Sample Reweighing (ISR) and Influence-based Sample Perturbation (ISP), which are based on influence functions from robust statistics. Using the influence functions, in ISR, we adaptively reweigh training examples according to how helpful/harmful they are to the training process, while in ISP, we craft and inject helpful perturbation proportional to the influence score. Both ISR and ISP harden the classification model against noisy training data without significantly affecting the generalization ability of the model on test data. We conduct empirical evaluations over a common brain tumor dataset and compare ISR and ISP to three baselines. Our empirical results show that ISR and ISP can efficiently train deep learning models robust against noisy training data.
Minh-Hao Van, Alycia N. Carey, Xintao Wu
Co-ReaSON: EEG-based Onset Detection of Focal Epileptic Seizures with Multimodal Feature Representations
Abstract
Early detection of an epileptic seizure’s onset is crucial to reduce the impact of seizures on the patient’s health. The Electroencephalogram (EEG) has been widely used in clinical epileptology for continuous, long-term measurement of electrical activity in the brain. Despite numerous EEG-based approaches employing diverse models and feature extraction methods for seizure detection, these methods rarely tackle the more challenging task of early detection of the seizure onset, especially as only a few EEG channels are impacted at the onset, and the seizure evidence is minimal. Furthermore, EEG-based seizure onset detection remains challenging due to the sparse, imbalanced, and noisy data, as well as the complexity posed by the diverse nature of epileptic seizures in patients. In this paper, we propose Co-ReaSON – a novel approach towards early detection of focal seizure onsets by considering the onset-specific increase in spatio-temporal correlations across the EEG channels observed over a range of multimodal EEG feature representations, combined in a ResNet18-based model architecture. Evaluation on a real-world dataset demonstrates that Co-ReaSON outperforms the state-of-the-art baselines in focal seizure onset detection by at least 5 percent points regarding the macro-average F1-score.
Uttam Kumar, Ran Yu, Michael Wenzel, Elena Demidova
A Data-Driven Approach for Building a Cardiovascular Disease Risk Prediction System
Abstract
Cardiovascular disease is a leading cause of mortality worldwide. The disease can develop without showing apparent symptoms at an early stage, making it difficult for domain experts to provide intervention. Using machine learning techniques in chronic disease prediction is becoming popular because of their ability in processing a large amount of data and analysing the patterns buried in the datasets. To increase the accessibility for healthcare professionals to ready-to-use machine learning prediction pipelines, we introduce an automated machine learning system called Auto-Imblearn that can process and analyse the imbalanced clinical data; automatically compare different classification algorithms and apply the best algorithm for prediction. Using a real patient dataset, the prediction of our proposed system achieves the best performance against the state-of-the-art baselines while saving significant computations from the exhaustive approach.
Hongkuan Wang, Raymond K. Wong, Kwok Leung Ong
TFAugment: A Key Frequency-Driven Data Augmentation Method for Human Activity Recognition
Abstract
Data augmentation enhances Human Activity Recognition (HAR) models by diversifying training data through transformations, improving their robustness. However, traditional techniques with random masking pose challenges by introducing randomness that can obscure critical information. This randomness may lead the model to learn incorrect patterns, yielding variable results across datasets and models and diminishing reliability and generalizability in real-world scenarios. To address this issue, this paper introduces Time-Frequency Augmentation (TFAugment), an adaptive method improving generalizability by selectively enhancing key frequencies across diverse datasets in HAR. The proposed method incorporates a FreqMasking module into the network to extract an importance distribution from incoming frequency channels. This distribution serves as a parameter in a Bernoulli distribution for independent sampling of each frequency channel, thereby generating enriched training data. Experiments on DSADS, MHEALTH, PAMAP2, and RealWorld-HAR datasets demonstrate TFAugment’s superior adaptability and significant performance enhancement compared to state-of-the-art techniques.
Hao Zhang, Bixiao Zeng, Mei Kuang, Xiaodong Yang, Hongfang Gong
Dfp-Unet: A Biomedical Image Segmentation Method Based on Deformable Convolution and Feature Pyramid
Abstract
U-net is a classic deep network framework in the field of biomedical image segmentation, which uses a U-shaped encoder and decoder structure to realize the recognition and segmentation of semantic features, but only uses the last layer of the decoder structure for the final prediction, ignoring the feature maps of different levels of semantic strength. In addition, the convolution kernel size used by U-net is fixed, which is poorly adaptable to unknown changes. Therefore, we propose Dfp-Unet based on deformable convolution and feature pyramid for biomedical image segmentation. Dfp-Unet uses the idea of feature pyramid to respectively add an additional independent path including convolution and up-sampling operations to each level of the decoder. Then, the output feature maps of all levels are concatenated to obtain the final feature map containing multiple levels of semantic information for final prediction. Besides, Dfp-Unet replaces the convolution in the down-sampling modules with a deformable convolution on the basis of U-net. To verify the performance of Dfp-Unet, four image segmentation data sets including Sunnybrook, ISIC2017, Covid19-ct-scans, and ISBI2012 are used to compare Dfp-Unet with the existing convolutional neural networks (U-net and U-net++), and the experimental results show that Dfp-Unet has high segmentation accuracy and generalization performance.
Zengzhi Yang, Yubin Wei, Xiao Yu, Jinting Guan
ACHIM: Adaptive Clinical Latent Hierarchy Construction and Information Fusion Model for Healthcare Knowledge Representation
Abstract
Utilize electronic health records (EHR) to forecast the likelihood of a patient succumbing under the current clinical condition. This assists healthcare professionals in identifying clinical emergencies promptly, enabling timely intervention to alter the patient’s critical state. Existing healthcare prediction models are typically based on clinical features of EHR data to learn a patient’s clinical representation, but they frequently disregard structural information in features. To address this issue, we propose Adaptive Clinical latent Hierarchy construction and Information fusion Model (ACHIM), which adaptively constructs a clinical potential level without prior knowledge and aggregates the structural information from the learned into the original data to obtain a compact and informative representation of the human state. Our experimental results on real-world datasets demonstrate that our model can extract fine-grained representations of patient characteristics from sparse data and significantly improve the performance of death prediction tasks performed on EHR datasets.
Gaohong Liu, Jian Ye, Borong Wang
Using Multimodal Data to Improve Precision of Inpatient Event Timelines
Abstract
Textual data often describe events in time but frequently contain little information about their specific timing, whereas complementary structured data streams may have precise timestamps but may omit important contextual information. We investigate the problem in healthcare, where we produce clinician annotations of discharge summaries, with access to either unimodal (text) or multimodal (text and tabular) data, (i) to determine event interval timings and (ii) to train multimodal language models to locate those events in time. We find our annotation procedures, dashboard tools, and annotations result in high-quality timestamps. Specifically, the multimodal approach produces more precise timestamping, with uncertainties of the lower bound, upper bounds, and duration reduced by 42% (95% CI 34–51%), 36% (95% CI 28–44%), and 13% (95% CI 10–17%), respectively. In the classification version of our task, we find that, trained on our annotations, our multimodal BERT model outperforms unimodal BERT model and Llama-2 encoder-decoder models with improvements in F1 scores for upper (10% and 61%, respectively) and lower bounds (8% and 56%, respectively). The code for the annotation tool and the BERT model is available (link).
Gabriel Frattallone-Llado, Juyong Kim, Cheng Cheng, Diego Salazar, Smitha Edakalavan, Jeremy C. Weiss
Adversarial-Robust Transfer Learning for Medical Imaging via Domain Assimilation
Abstract
Extensive research in Medical Imaging aims to uncover critical diagnostic features in patients, with AI-driven medical diagnosis relying on sophisticated machine learning and deep learning models to analyze, detect, and identify diseases from medical images. Despite the remarkable accuracy of these models under normal conditions, they grapple with trustworthiness issues, where their output could be manipulated by adversaries who introduce strategic perturbations to the input images. Furthermore, the scarcity of publicly available medical images, constituting a bottleneck for reliable training, has led contemporary algorithms to depend on pretrained models grounded on a large set of natural images—a practice referred to as transfer learning. However, a significant domain discrepancy exists between natural and medical images, which causes AI models resulting from transfer learning to exhibit heightened vulnerability to adversarial attacks. This paper proposes a domain assimilation approach that introduces texture and color adaptation into transfer learning, followed by a texture preservation component to suppress undesired distortion. We systematically analyze the performance of transfer learning in the face of various adversarial attacks under different data modalities, with the overarching goal of fortifying the model’s robustness and security in medical imaging tasks. The results demonstrate high effectiveness in reducing attack efficacy, contributing toward more trustworthy transfer learning in biomedical applications.
Xiaohui Chen, Tie Luo
Backmatter
Metadata
Title
Advances in Knowledge Discovery and Data Mining
Editors
De-Nian Yang
Xing Xie
Vincent S. Tseng
Jian Pei
Jen-Wei Huang
Jerry Chun-Wei Lin
Copyright Year
2024
Publisher
Springer Nature Singapore
Electronic ISBN
978-981-9722-38-9
Print ISBN
978-981-9722-40-2
DOI
https://doi.org/10.1007/978-981-97-2238-9

Premium Partner