Intelligent Information Processing XII
13th IFIP TC 12 International Conference, IIP 2024, Shenzhen, China, May 3–6, 2024, Proceedings, Part I
- 2024
- Book
- Editors
- Zhongzhi Shi
- Jim Torresen
- Shengxiang Yang
- Publisher
- Springer Nature Switzerland
About this book
The two-volume set IFIP AICT 703 and 704 constitutes the refereed conference proceedings of the 13th IFIP TC 12 International Conference on Intelligent Information Processing XII, IIP 2024, held in Shenzhen, China, during May 3–6, 2024.
The 49 full papers and 5 short papers presented in these proceedings were carefully reviewed and selected from 58 submissions.
The papers are organized in the following topical sections:
Volume I: Machine Learning; Natural Language Processing; Neural and Evolutionary Computing; Recommendation and Social Computing; Business Intelligence and Risk Control; and Pattern Recognition.
Volume II: Image Understanding.
Table of Contents
-
Frontmatter
-
Machine Learning
-
Frontmatter
-
Dual Contrastive Learning for Anomaly Detection in Attributed Networks
Shijie Xue, He Kong, Qi WangThe chapter focuses on the critical issue of anomaly detection in attributed networks, which model complex real-world scenarios by including both node interactions and rich attributes. Traditional methods often fall short in effectively identifying anomalies at different levels. The proposed Cobra method addresses this by employing a dual contrastive learning framework that considers both contextual and behavioral anomalies. By sampling subgraphs and utilizing self-supervised learning strategies, Cobra captures the intricate relationships and behaviors within the network, providing a more accurate and comprehensive evaluation of node abnormality. Extensive experiments on various datasets demonstrate the superior performance of Cobra compared to state-of-the-art methods, highlighting its potential in enhancing anomaly detection across diverse applications.AI Generated
This summary of the content was generated with the help of AI.
AbstractAnomaly detection in attributed networks has been crucial in many critical domains and has gained significant attention in recent years. However, most existing methods fail to capture the complexity of anomalous patterns at different levels with suitable supervision signals. To address this issue, we propose a novel dual contrastive self-supervised learning method for attributed network anomaly detection. Specifically, our approach relies on two major components to determine the anomaly of nodes. The first component assesses self-consistency by determining whether a target node’s attributes are consistent with its contextual environment. The second component evaluates behavioral consistency by analyzing the relationships and interaction patterns between the target node and its one-hop neighbors, which determines if the behavior of these neighbors aligns with the expected pattern of the target node. Accordingly, our method designs two types of contrastive instance pairs to fully exploit the structural and attribute information for detecting anomalous nodes at different levels regarding two focused consistencies. This approach is more effective in detecting anomalies and mitigating the limitations of previous methods. We evaluated our method on six benchmark datasets, and the experimental results demonstrate the superiority of our methods against state-of-the-art methods. -
Online Learning in Varying Feature Spaces with Informative Variation
Peijia Qin, Liyan SongThe chapter addresses the limitations of classical online learning, which assumes a constant feature space. It introduces the concept of Varying Feature Space (VFS), where features can appear and disappear over time. The research focuses on informative variations that can indicate class labels, enhancing predictive performance. The proposed approach, OVFIV, combines a sparse learner for the variation space with an ensemble method to integrate predictions from both the feature and variation streams. Experimental results demonstrate the effectiveness of this method in various datasets, highlighting its potential to significantly improve predictive models in dynamic feature spaces.AI Generated
This summary of the content was generated with the help of AI.
AbstractMost conventional literature on online learning implicitly assumes a static feature space. However, in real-world applications, the feature space may vary over time due to the emergence of new features and the vanishing of outdated features. This phenomenon is referred to as online learning with Varying Feature Space (VFS). Recently, there has been increasing attention towards exploring this online learning paradigm. However, none of the existing approaches have taken into account the potentially informative information conveyed by the presence or absence (i.e., variation in this paper) of each feature. This indicates that the existence of certain features in the VFS can be correlated with the class labels. If properly utilized for the learning process, such information can potentially enhance predictive performance. To this end, we formally define and present a learning framework to address this specific learning scenario, which we refer to as Online learning in Varying Feature space with Informative Variation (abbreviated as OVFIV). The framework aims to answer two key questions: how to learn a model that captures the association between the existence of features and the class labels, and how to incorporate this information into the prediction process to improve performance. The validity of our proposed method is verified through theoretical analyses and empirical studies conducted on 17 datasets from diverse fields. -
Towards a Flexible Accuracy-Oriented Deep Learning Module Inference Latency Prediction Framework for Adaptive Optimization Algorithms
Jingran Shen, Nikos Tziritas, Georgios TheodoropoulosThe chapter discusses the challenges of deploying large deep neural networks in resource-constrained edge environments and the need for adaptive optimization algorithms. It introduces a new framework that allows flexible configurations of input parameters and automatic selection of regression models for predicting module inference latency. The proposed Multi-task Encoder-Decoder Network (MEDN) is highlighted as a more accurate and efficient alternative to existing regression models. The framework's ability to measure device dynamics and handle various input parameters, including Inferable Parameters, is emphasized. Experimental results demonstrate MEDN's superior performance and the effectiveness of the Time/Space-efficient Auto-selection algorithm. Future research directions are also outlined, focusing on further enhancing the framework's capabilities.AI Generated
This summary of the content was generated with the help of AI.
AbstractWith the rapid development of Deep Learning, more and more applications on the cloud and edge tend to utilize large DNN (Deep Neural Network) models for improved task execution efficiency as well as decision-making quality. Due to memory constraints, models are commonly optimized using compression, pruning, and partitioning algorithms to become deployable onto resource-constrained devices. As the conditions in the computational platform change dynamically, the deployed optimization algorithms should accordingly adapt their solutions. To perform frequent evaluations of these solutions in a timely fashion, RMs (Regression Models) are commonly trained to predict the relevant solution quality metrics, such as the resulted DNN module inference latency, which is the focus of this paper. Existing prediction frameworks specify different RM training workflows, but none of them allow flexible configurations of the input parameters (e.g., batch size, device utilization rate) and of the selected RMs for different modules. In this paper, a deep learning module inference latency prediction framework is proposed, which i) hosts a set of customizable input parameters to train multiple different RMs per DNN module (e.g., convolutional layer) with self-generated datasets, and ii) automatically selects a set of trained RMs leading to the highest possible overall prediction accuracy, while keeping the prediction time/space consumption as low as possible. Furthermore, a new RM, namely MEDN (Multi-task Encoder-Decoder Network), is proposed as an alternative solution. Comprehensive experiment results show that MEDN is fast and lightweight, and capable of achieving the highest overall prediction accuracy and R-squared value. The Time/Space-efficient Auto-selection algorithm also manages to improve the overall accuracy by 2.5% and R-squared by 0.39%, compared to the MEDN single-selection scheme. -
Table Orientation Classification Model Based on BERT and TCSMN
Dawei Jin, Rongxin Mi, Tianhang SongThe chapter focuses on the classification of table orientations using a deep learning model that combines BERT for contextual understanding and TCSMN for capturing sequential features. The model is designed to handle the diverse structures of tables found in scientific literature, offering a more accurate and efficient method for table analysis. The authors introduce row and column-based attention mechanisms to enhance the extraction of structural semantic features, contributing to the model's high performance. Experimental results demonstrate that the proposed TableTC model outperforms traditional and deep learning baselines, showcasing its effectiveness in table classification tasks. The chapter also discusses related work and future research directions, making it a valuable resource for professionals and researchers in the field of natural language processing and data analysis.AI Generated
This summary of the content was generated with the help of AI.
AbstractTables are commonly used for structuring and consolidating knowledge, significantly enhancing the efficiency for human readers to acquire relevant information. However, due to their diverse structures and open domains, employing computational methods for their automatic analysis remains a substantial challenge. Among these challenges, accurately classifying the forms of tables is fundamental for achieving deep comprehension and analysis, forming the basis for understanding, retrieving, and extracting knowledge within tables. Common table formats include row tables, column tables, and matrix tables, where data is arranged in rows, columns, and combinations of rows and columns, respectively. This paper introduces a novel approach for table classification based on the neural network model, TableTC. TableTC initially utilizes fine-tuning of the BERT pre-trained model to comprehend table content. Additionally, it proposes an improved Temporal Convolutional Network (TCN) named Temporal Convolutional Sparse Multilayer Perceptron Network (TCSMN). This network captures sequential structural features of cells and their surrounding neighbors, enhancing the ability to extract semantic features and positions. Finally, it employs an attention mechanism to further augment the capability of extracting row-column positions and semantic features. The evaluation of our proposed method is conducted using table data from scientific literature found in the PubMed Central website. Experimental results demonstrate that TableTC achieves a 2.7% improvement in table classification accuracy, as measured by the F1 score, compared to previous state-of-the-art methods on this dataset. -
Divide-and-Conquer Strategy for Large-Scale Dynamic Bayesian Network Structure Learning
Hui Ouyang, Cheng Chen, Ke TangThis chapter introduces a divide-and-conquer strategy for large-scale dynamic Bayesian network structure learning, focusing on 2 Time-sliced Bayesian Networks (2-TBNs). The method, adapted from the Partition-Estimation-Fusion (PEF) strategy used in static Bayesian networks, demonstrates significant improvements in accuracy and efficiency. By leveraging prior knowledge of 2-TBNs, the approach enhances the learning process, particularly for the transition model. Experimental validation using large-scale datasets shows that the proposed strategy outperforms baseline methods in terms of edge classification accuracy and runtime. The chapter also highlights the potential for further improvements in the partition and fusion phases, and discusses future research directions.AI Generated
This summary of the content was generated with the help of AI.
AbstractDynamic Bayesian Networks (DBNs), renowned for their interpretability, have become increasingly vital in representing complex stochastic processes in various domains such as gene expression analysis, healthcare, and traffic prediction. Structure learning of DBNs from data is a challenging endeavor, particularly for datasets with thousands of variables. Most current algorithms for DBN structure learning are adaptations from those used in static Bayesian Networks (BNs), and are typically focused on smaller-scale problems. In order to solve large-scale problems while taking full advantage of existing algorithms, this paper introduces a novel divide-and-conquer strategy, originally developed for static BNs, and adapts it for large-scale DBN structure learning. Additionally, we leverage the prior knowledge of 2 Time-sliced BNs (2-TBNs), a special class of DBNs, to enhance the performance of this strategy. Our approach significantly improves the scalability and accuracy of 2-TBN structure learning. Designed experiments demonstrate the effectiveness of our method, showing substantial improvements over existing algorithms in both computational efficiency and structure learning accuracy. In problem instances with more than 1,000 variables, our proposed approach on average improves two accuracy metrics by \(74.45\%\) and \(110.94\%\), respectively, while reducing runtime by an average of \(93.65\%\). Moreover, in problem instances with more than 10,000 variables, our proposed approach successfully completed the task in a matter of hours, whereas the baseline algorithm failed to produce a reasonable result within a one-day runtime limit. -
Entropy-Based Logic Explanations of Differentiable Decision Tree
Yuanyuan Liu, Jiajia Zhang, Yifan LiThis chapter delves into the challenge of interpreting complex decision-making processes in deep reinforcement learning. By leveraging entropy-based logic explanations, the authors introduce a method to actively intervene in the training of differentiable decision trees, reducing parameter explosion and enhancing interpretability. Experimental results demonstrate that this approach not only maintains high performance but also achieves superior interpretability compared to baseline methods. The novelty lies in the use of entropy penalty terms and state preprocessing techniques, which steer the training process towards more explainable models. The chapter concludes with compelling experimental evidence, showcasing the effectiveness of the proposed method in multiple reinforcement learning environments.AI Generated
This summary of the content was generated with the help of AI.
AbstractExplainable reinforcement learning has evolved rapidly over the years because transparency of the model’s decision-making process is crucial in some important domains. Differentiable decision trees have been applied to this field due to their performance and interpretability. However, the number of parameters per branch node of a differentiable decision tree is related to the state dimension. When the feature dimension of states increases, the number of states considered by the model in each branch node decision also increases linearly, which increases the difficulty of human understanding. This paper proposes a entroy-based differentiable decision tree, which can restrict each branch node to use as few features as possible to predict during the training process. After the training is completed, the parameters that have little impact on the output of the branch node will be blocked, thus significantly reducing the decision complexity of each branch node. Experiments in multiple environments demonstrate the significant interpretability advantage of our proposed approach. -
Deep Friendly Embedding Space for Clustering
Haiwei Hou, Shifei Ding, Xiao Xu, Lili GuoThe chapter 'Deep Friendly Embedding Space for Clustering' delves into the advancements of deep clustering methods, highlighting the limitations of traditional clustering algorithms in handling large-scale, high-dimensional datasets. It introduces a unified deep clustering algorithm that leverages autoencoders for feature extraction and dimensionality reduction, incorporating deep metric learning to enhance feature discrimination. The proposed algorithm preserves the data manifold structure, leading to improved clustering results. Experimental validation on benchmark datasets and a case study on rolling bearing fault diagnosis showcase the algorithm's superior performance and potential for industrial applications. The chapter concludes by discussing future research directions, emphasizing the importance of semi-supervised learning and pseudo-label technology in guiding neural networks to learn more suitable representations for clustering.AI Generated
This summary of the content was generated with the help of AI.
AbstractDeep clustering has powerful capabilities of dimensionality reduction and non-linear feature extraction, superior to conventional shallow clustering algorithms. Deep learning and clustering can be unified through one objective function, significantly improving clustering performance. However, the features of embedding space may have redundancy and ignore preserved manifold. Besides, the features lack discriminative, which hinders the clustering performance. To solve the above problems, the paper proposes a novel algorithm that improves the discrimination of features, filters redundant features and protects manifold structures for clustering. Firstly, it reduces the dimensionality in the embedding again to filter redundant and preserve the manifold for the features. Then it improves the discriminative of the representation by reducing the intra-class distance. Performance evaluation is carried out on four benchmark datasets and a case study of engineering applications. Comparing with state-of-the-art algorithms indicates that our algorithm performs favorably and demonstrates good potential for real-world applications. -
Bayesian Personalized Sorting Based on Time Factors and Hot Recommendations
Wenhua Zeng, Junjie Liu, Bo ZhangThe chapter introduces a Bayesian Personalized Ranking model, BPR-TH, designed to address information overload in digital libraries. By incorporating time factors and hot recommendations, BPR-TH effectively handles massive distributed data and cold start problems, outperforming traditional BPR and File-path algorithms. The model is realized through user behavior feature extraction, model construction, and optimization, resulting in improved personalized recommendations both online and offline. Experimental results demonstrate BPR-TH's superior performance in accuracy, coverage, and recall, making it a promising solution for personalized digital library recommendations.AI Generated
This summary of the content was generated with the help of AI.
AbstractAiming at the problems of strict preference judgment and cold start in Bayesian personalized ranking(BPR), an improved ranking model is proposed, which considers the influence of time and incorporates hot recommendations. By extracting user behavior features, constructing an optimized BPR model, and processing recommendation results, we establish BPR-TH for realizing personalized online (or offline) recommendation of digital library information. By Comparing with other two similar algorithms, the experimental results show that this model performs better. -
Design and Implementation of Risk Control Model Based on Deep Ensemble Learning Algorithm
Maoguang Wang, Ying CuiThe chapter delves into the critical issue of credit risk in internet credit loans and proposes a groundbreaking credit risk control model based on deep ensemble learning. By building a two-layer ensemble learner, the model effectively identifies potential defaulting users, achieving an impressive F1-Score of 0.98 on the Lending Club credit dataset. This innovative approach outperforms conventional methods like logistic regression and decision trees, showcasing excellent generalization capabilities. The model's architecture, including the selection of base learners and ensemble methods, is meticulously designed to capture both general and nuanced patterns in the data. The chapter also provides a comprehensive analysis of related work, experimental results, and future research directions, making it a valuable resource for professionals seeking to enhance credit risk management strategies.AI Generated
This summary of the content was generated with the help of AI.
AbstractThis paper aims to explore the concept of “depth” through the selection of various ensemble methods and proposes a practical deep ensemble learning method. In this study, we propose a nested ensemble learning method. First, we employ the stacking framework for selective ensemble learning. Next, we integrate the stacked ensemble with bagging and boosting techniques to create a comprehensive stacked ensemble. We utilized both domestic and foreign online loan data to build the model and test its ability to generalize. The experimental results demonstrate that the nested ensemble proposed in this paper outperforms models such as logistic regression and support vector machines, showing exceptional generalization ability. -
More Teachers Make Greater Students: Compression of CycleGAN
Xiaoxi Liu, Lin Lv, Ju Liu, Yanyang Han, Mengnan Liang, Xiao JiangThe chapter introduces the MGFD framework, designed to compress CycleGAN models effectively. By integrating an Inception-enhanced network and a multi-granularity distillation scheme, MGFD simplifies the compression process and reduces computational costs. The framework eliminates the need for a discriminator, optimizing the student generator directly through knowledge distillation. Experimental results demonstrate that MGFD outperforms existing methods in terms of computational efficiency and image quality, making it a promising solution for practical applications on mobile devices and IoT systems.AI Generated
This summary of the content was generated with the help of AI.
AbstractGenerative Adversarial Networks (GANs) have obtained outstanding performance in image-to-image translation. Nevertheless, their applications are greatly limited due to high computational costs. Although past work on compressed GANs has yielded rich results, most still come at the expense of image quality. Therefore, in order to generate high-quality images and simplify the process of distillation, we propose a framework with more generators and fewer discriminators (MGFD) strategy to enhance the online knowledge distillation with high-quality images. First, we introduce the Inception-enhanced residual block into our enhanced teacher generator, which significantly improves image quality at a low cost. Then, the multi-granularity online knowledge distillation method is adopted and simplified by selecting wider Inception-enhanced teacher generator. In addition, we also combine the intermediate layer distillation losses to help student generator to obtain diverse features and more supervised signals from the intermediate layer for better transformations. Experiments demonstrate that our framework can significantly reduce computational costs and generate more natural images. -
Hybrid Integrated Dimensionality Reduction Method Based on Conformal Homeomorphism Mapping
Bianping Su, Chaoyin Liang, Chunkai Wang, Yufan Guo, Shicong Wu, Yan Chen, Longqing Zhang, Jiao PengThis chapter introduces a hybrid integrated dimensionality reduction method that leverages conformal homeomorphism mapping to address the challenges of maintaining geometric structure and reversibility in data reduction. The method combines linear, nonlinear, and hybrid dimensionality reduction techniques to ensure that the intrinsic rigidity and geometric topology structure of the data are preserved. The chapter discusses various dimensionality reduction methods, including Principal Component Analysis (PCA), Locally Linear Embedding (LLE), and Laplacian Eigenmap (LE), highlighting their advantages and limitations. The proposed method integrates these techniques to create a robust dimensionality reduction framework that is both efficient and interpretable. The chapter also includes a detailed explanation of the conformal homeomorphism mapping process and its application to text data, demonstrating the method's effectiveness through experimental results. By reading this chapter, professionals in the field of data science and machine learning will gain valuable insights into advanced dimensionality reduction techniques and their practical applications.AI Generated
This summary of the content was generated with the help of AI.
AbstractBased on the theories of Riemannian surface, Topology and Analytic function, a novel method for dimensionality reduction is proposed in this paper. This approach utilizes FCA to merge highly correlated features to obtain approximate independent new features in the locally, and establishes a conformal homomorphic function to realize global dimensionality reduction for text data with the manifold embed in the Hausdorff space. During the process of dimensionality reduction, the geometric topological structure information of the original data is preserved through conformal homomorphism function. This method is characterized by its simplicity, effectiveness, low complexity, and it avoids the neighbor problem in nonlinear dimensionality reduction and it is conducive to the outlier data. Moreover, it has extensible for new text vectors and new feature from sub-vectors of new text vectors, and incremental operation without involving existing documents. The mapping function exhibits desirable properties resulting in stable, reliable, and interpretable dimensionality reduction outcomes. Experimental results on both construction laws and regulations dataset and toutiao text dataset demonstrate that this dimensionality reduction technique is effective when combined with the typical classification method of Random Forest, Support Vector Machine, and Feedforward Neural Network.
-
-
Natural Language Processing
-
Frontmatter
-
Are Mixture-of-Modality-Experts Transformers Robust to Missing Modality During Training and Inferring?
Yan Gao, Tong Xu, Enhong ChenThe chapter delves into the robustness of Mixture-of-Modality-Experts Transformers (MoME) when faced with missing modality during training and inference. It begins by discussing the limitations of current multi-modal Transformers and the need for models that can handle incomplete data. The authors propose a series of sub-questions to guide their research and conduct experiments to compare the robustness of MoME Transformers with vanilla Transformers. They also explore the use of multi-task learning and data imputation techniques, such as Mixup, to improve the model's performance. The chapter concludes with a novel method based on MoME Transformers and multi-task learning, which demonstrates high robustness to missing modalities with no extra computational requirements. The method is validated through experiments on three popular multi-modal datasets, showing significant improvements over existing approaches.AI Generated
This summary of the content was generated with the help of AI.
AbstractIt is commonly seen that the imperfect multi-modal data with missing modality appears in realistic application scenarios, which usually break the data completeness assumption of multi-modal analysis. Therefore, large efforts in multi-modal learning communities have been made on the robust solution for modality-missing data. Recently, pre-trained models based on Mixture-of-Modality-Experts (MoME) Transformers have been proposed, which achieved competitive performance in various downstream tasks, by utilizing different experts of feed-forward networks for single/multi modal inputs. One natural question arises: are Mixture-of-Modality-Experts Transformers robust to missing modality? To that end, in this paper, we conduct a deep investigation on MoME Transformer under the missing modality problem. Specifically, we propose a novel multi-task learning strategy, which leverages a uniform model to handle missing modalities during training and inference. In this way, the MoME Transformer will be empowered with robustness to missing modality. To validate the effectiveness of our proposed method, we conduct extensive experiments on three popular datasets, which indicate our method could outperform the state-of-the-art (SOTA) methods with a large margin. -
Question Answering Systems Based on Pre-trained Language Models: Recent Progress
Xudong Luo, Ying Luo, Binxia YangThe chapter delves into the pivotal role of Pre-trained Language Models (PLMs) in enhancing Question Answering Systems (QASs), emphasizing their superiority in understanding complex language patterns and providing accurate, relevant answers. It explores various PLM-based methods for information retrieval, QA performance improvement, and addressing other QA challenges. Additionally, the chapter highlights the applications of PLM-based QASs in domains such as legal, medical, and multimodal QA, showcasing their versatility and potential. The discussion includes performance evaluations and future research directions, making it a valuable resource for professionals seeking to understand the state-of-the-art in QASs.AI Generated
This summary of the content was generated with the help of AI.
AbstractAlthough Pre-trained Language Model (PLM) ChatGPT as a Question-Answering System (QAS) is so successful, it is still necessary to study further the QASs based on PLMs. In this paper, we survey state-of-the-art systems of this kind, identify the issues that current researchers are concerned about, explore various PLM-based methods for addressing them, and compare their pros and cons. We also discuss the datasets used for fine-tuning the corresponding PLMs and evaluating these PLM-based methods. Moreover, we summarise the criteria for evaluating these methods and compare their performance against these criteria. Finally, based on our analysis of the state-of-the-art PLM-based methods for QA, we identify some challenges for future research. -
A BERT-Based Model for Legal Document Proofreading
Jinlong Liu, Xudong LuoThe chapter introduces a BERT-based model for legal document proofreading, addressing the critical need for accuracy and precision in legal texts. The model integrates two structurally identical MLMs with varying training methods to enhance performance. It includes a grammar check module and a spelling check module, complementing each other to correct errors effectively. The spelling check module employs a Limiter algorithm to balance precision and recall by considering pinyin similarity. The model is trained on a million artificially generated legal document sentences, demonstrating superior performance compared to baseline models and large language models. The chapter also discusses the model's architecture, training process, and experimental evaluation, highlighting its potential to revolutionize legal document proofreading.AI Generated
This summary of the content was generated with the help of AI.
AbstractLegal documents require high precision and accuracy in language use, leaving no room for grammatical and spelling errors. To address the issue, this paper proposes a novel application of the BERT pre-trained language model for legal document proofreading. The BERT-based model is trained to detect and correct legal texts’ grammatical and spelling errors. On a dataset of annotated legal documents, we experimentally show that our BERT-based model significantly outperforms state-of-the-art proofreading models in precision, recall, and F1 score, showing its potential as a valuable tool in legal document preparation and revision processes. The application of such advanced deep learning techniques could revolutionise the field of legal document proofreading, enhancing accuracy and efficiency. -
Entity Relation Joint Extraction with Data Augmentation Based on Large Language Model
Manman Zhang, Shuocan Zhu, Jingmin Zhang, Yu Han, Xiaoxuan Zhu, Leilei ZhangThe chapter delves into the application of large language models for entity relation extraction, focusing on data augmentation techniques to address data scarcity. It introduces two methods—Entity pairs-Dominant Data Generation (EDDG) and Relation-Dominant Data Generation (RDDG)—using ChatGPT for generating annotated data. The study also emphasizes the importance of prompt engineering strategies, such as expression diversity, length diversity, and domain diversity, to enhance model performance. Experimental results on the DuIE dataset showcase the effectiveness of these strategies, highlighting a notable increase in F1 scores across various models. The chapter concludes by validating the proposed methods and strategies, demonstrating their potential for improving entity relation extraction tasks.AI Generated
This summary of the content was generated with the help of AI.
AbstractEntity relation extraction aims to identify entities and their semantic relationships from unstructured text. To address issues like cascading errors and redundant information found in current joint extraction methods, a One-Module One-Step model is adopted. Additionally, in overcoming challenges related to limited annotated data and the tendency of neural networks to overfit, this paper introduces a method leveraging data augmentation based on a large language model. The approach utilizes five data augmentation strategies to improve the accuracy of triple extraction. Conducting experiments on the augmented dataset reveals significant enhancements in evaluation metrics compared to unaugmented data. In entity relation extraction tasks, the proposed method demonstrates a notable boost, increasing accuracy and F1 scores by 7.3 and 8.5 percentage points, respectively. Moreover, it shows a positive impact on the non-prompting strategy, elevating accuracy and F1 scores by 9.4 and 9.1 percentage points, respectively. These experiments affirm the effectiveness of data augmentation based on a large language model in improving entity relation extraction tasks.
-
- Title
- Intelligent Information Processing XII
- Editors
-
Zhongzhi Shi
Jim Torresen
Shengxiang Yang
- Copyright Year
- 2024
- Publisher
- Springer Nature Switzerland
- Electronic ISBN
- 978-3-031-57808-3
- Print ISBN
- 978-3-031-57807-6
- DOI
- https://doi.org/10.1007/978-3-031-57808-3
Accessibility information for this book is coming soon. We're working to make it available as quickly as possible. Thank you for your patience.