main-content

This two-volume set of LNAI 12340 and LNAI 12341 constitutes the refereed proceedings of the 9th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2020, held in Zhengzhou, China, in October 2020.

The 70 full papers, 30 poster papers and 14 workshop papers presented were carefully reviewed and selected from 320 submissions. They are organized in the following areas: Conversational Bot/QA; Fundamentals of NLP; Knowledge Base, Graphs and Semantic Web; Machine Learning for NLP; Machine Translation and Multilinguality; NLP Applications; Social Media and Network; Text Mining; and Trending Topics.

### DCA: Diversified Co-attention Towards Informative Live Video Commenting

We focus on the task of Automatic Live Video Commenting (ALVC), which aims to generate real-time video comments with both video frames and other viewers’ comments as inputs. A major challenge in this task is how to properly leverage the rich and diverse information carried by video and text. In this paper, we aim to collect diversified information from video and text for informative comment generation. To achieve this, we propose a Diversified Co-Attention (DCA) model for this task. Our model builds bidirectional interactions between video frames and surrounding comments from multiple perspectives via metric learning, to collect a diversified and informative context for comment generation. We also propose an effective parameter orthogonalization technique to avoid excessive overlap of information learned from different perspectives. Results show that our approach outperforms existing methods in the ALVC task, achieving new state-of-the-art results.

Zhihan Zhang, Zhiyi Yin, Shuhuai Ren, Xinhang Li, Shicheng Li

### The Sentencing-Element-Aware Model for Explainable Term-of-Penalty Prediction

Automatic term-of-penalty prediction is a key subtask of intelligent legal judgment (ILJ). Recent ILJ systems are based on deep learning methods, in which explainability is a pressing concern. In this paper, our goal is to build a term-of-penalty prediction system with good judicial explainability and high accuracy following the legal principles. We propose a sentencing-element-aware neural model to realize this. We introduce sentencing elements to link the case facts with legal laws, which makes the prediction meet the legal objectivity principle and ensure the accuracy. Meanwhile, in order to explain why the term-of-penalties are given, we output sentencing element-level explanations, and utilize sentencing elements to select the most similar cases as case-level explanations, which reflects the equity principle. Experiments on the datasets (CAIL2018) show that our model not only achieves equal or better accuracy than the baselines, but also provide useful explanations to help users to understand how the system works.

Hongye Tan, Bowen Zhang, Hu Zhang, Ru Li

### Referring Expression Generation via Visual Dialogue

Referring Expression Generation (REG) is to generate unambiguous descriptions for the referred object in contexts such as images. While people often use installment dialoguing methods to extend the original basic noun phrases to form final references to objects. Most existing REG models generate Referring Expressions (REs) in a “one-shot” way, which cannot benefit from the interaction process. In this paper, we propose to model REG basing on dialogues. To achieve it, we first introduce a RE-oriented visual dialogue (VD) task ReferWhat?!, then build two large-scale datasets RefCOCOVD and RefCOCO+VD for this task by making use of the existing RE datasets RefCOCO and RefCOCO+ respectively. We finally propose a VD-based REG model. Experimental results show that our model outperforms all the existing “one-shot” REG models. Our ablation studies also show that modeling REG as a dialogue agent can utilize the information in responses from dialogues to achieve better performance which is not available in the “one-shot” models. The source code and datasets will be seen in https://github.com/llxuan/ReferWhat soon.

Lingxuan Li, Yihong Zhao, Zhaorui Zhang, Tianrui Niu, Fangxiang Feng, Xiaojie Wang

### Hierarchical Multimodal Transformer with Localness and Speaker Aware Attention for Emotion Recognition in Conversations

Emotion Recognition in Conversations (ERC) aims to predict the emotion of each utterance in a given conversation. Existing approaches for the ERC task mainly suffer from two drawbacks: (1) failing to pay enough attention to the emotional impact of the local context; (2) ignoring the effect of the emotional inertia of speakers. To tackle these limitations, we first propose a Hierarchical Multimodal Transformer as our base model, followed by carefully designing a localness-aware attention mechanism and a speaker-aware attention mechanism to respectively capture the impact of the local context and the emotional inertia. Extensive evaluations on a benchmark dataset demonstrate the superiority of our proposed model over existing multimodal methods for ERC.

Xiao Jin, Jianfei Yu, Zixiang Ding, Rui Xia, Xiangsheng Zhou, Yaofeng Tu

### Generating Emotional Social Chatbot Responses with a Consistent Speaking Style

Emotional conversation plays a vital role in creating more human-like conversations. Although previous works on emotional conversation generation have achieved promising results, the issue of the speaking style inconsistency still exists. In this paper, we propose a Style-Aware Emotional Dialogue System (SEDS) to enhance speaking style consistency through detecting user’s emotions and modeling speaking styles in emotional response generation. Specifically, SEDS uses an emotion encoder to perceive the user’s emotion from multimodal inputs, and tracks speaking styles through jointly optimizing a generator that is augmented with a personalized lexicon to capture explicit word-level speaking style features. Additionally, we propose an auxiliary task, a speaking style classification task, to guide SEDS to learn the implicit form of speaking style during the training process. We construct a multimodal dialogue dataset and make the alignment and annotation to verify the effectiveness of the model. Experimental results show that our SEDS achieves a significant improvement over other strong baseline models in terms of perplexity, emotion accuracy and style consistency.

Jun Zhang, Yan Yang, Chengcai Chen, Liang He, Zhou Yu

### An Interactive Two-Pass Decoding Network for Joint Intent Detection and Slot Filling

Intent detection and slot filling are two closely related tasks for building a spoken language understanding (SLU) system. The joint methods for the two tasks focus on modeling the semantic correlations between the intent and slots and applying the information of one task to guide the other task, which helps them to promote each other. However, most existing joint approaches only unidirectionally utilize the intent information to guide slot filling while ignoring the fact that the slot information is beneficial to intent detection. To address this issue, in this paper, we propose an Interactive Two-pass Decoding Network (ITD-Net) for joint intent detection and slot filling, which explicitly establishes the token-level interactions between the intent and slots through performing an interactive two-pass decoding process. In ITD-Net, the task-specific information obtained by the first-pass decoder for one task is directly fed into the second-pass decoder for the other task, which can take full advantage of the explicit intent and slot information to achieve bidirectional guidance between the two tasks. Experiments on the ATIS and SNIPS datasets demonstrate the effectiveness and superiority of our ITD-Net.

Huailiang Peng, Mengjun Shen, Lei Jiang, Qiong Dai, Jianlong Tan

### RuKBC-QA: A Framework for Question Answering over Incomplete KBs Enhanced with Rules Injection

The incompleteness of the knowledge base (KB) is one of the key issues when answering natural language questions over an incomplete knowledge base (KB-QA). To alleviate this problem, a framework, RuKBC-QA, is proposed to integrate methods of rule-based knowledge base completion (KBC) into general QA systems. Three main components are included in our framework, namely, a rule miner that mines logic rules from the KB, a rule selector that selects meaningful rules for QA, and a QA model that aggregates information from the original knowledge base and the selected rules. Experiments on WEBQUESTIONS dataset indicate that the proposed framework can effectively alleviate issues caused by incompleteness and obtains a significant improvement in terms of micro average Fl score by 2.4% to 4.5% under different incompleteness settings.

Qilin Sun, Weizhuo Li

### Syntax-Guided Sequence to Sequence Modeling for Discourse Segmentation

Previous studies on RST-style discourse segmentation have achieved impressive results. However, recent neural works either require a complex joint training process or heavily rely on powerful pre-trained word vectors. Under this condition, a simpler but more robust segmentation method is needed. In this work, we take a deeper look into intra-sentence dependencies to investigate if the syntax information is totally useless, or to what extent it can help improve the discourse segmentation performance. To achieve this, we propose a sequence-to-sequence model along with a GCN based encoder to well utilize intra-sentence dependencies and a multi-head biaffine attention based decoder to predict EDU boundaries. Experimental results on two benchmark corpora show that the syntax information we use is significantly useful and the resulting model is competitive when compared with the state-of-the-art.

Longyin Zhang, Fang Kong, Guodong Zhou

### Macro Discourse Relation Recognition via Discourse Argument Pair Graph

Most previous studies used various sequence learning models to represent discourse arguments, which not only limit the model to perceive global information, but also make it difficult to deal with long-distance dependencies when the discourse arguments are paragraph-level or document-level. To address the above issues, we propose a GCN-based neural network model on discourse argument pair graph to transform discourse relation recognition into a node classification task. Specifically, we first convert discourse arguments of all samples into a heterogeneous text graph that integrates word-related global information and argument-related keyword information. Then, we use a graph learning method to encode argument semantics and recognize the relationship between arguments. The experimental results on the Chinese MCDTB corpus show that our proposed model can effectively recognize the discourse relations and outperforms the SOTA model.

Zhenhua Sun, Feng Jiang, Peifeng Li, Qiaoming Zhu

### Dependency Parsing with Noisy Multi-annotation Data

In the past few years, performance of dependency parsing has been improved by large margin on closed-domain benchmark datasets. However, when processing real-life texts, parsing performance degrades dramatically. Besides the domain adaptation technique, which has made slow progress due to its intrinsic difficulty, one straightforward way is to annotate a certain scale of syntactic data given a new source of texts. However, it is well known that annotating data is time and effort consuming, especially for the complex syntactic annotation. Inspired by the progress in crowdsourcing, this paper proposes to annotate noisy multi-annotation syntactic data with non-experts annotators. Each sentence is independently annotated by multiple annotators and the inconsistencies are retained. In this way, we can annotate data very rapidly since we can recruit many ordinary annotators. Then we construct and release three multi-annotation datasets from different sources. Finally, we propose and compare several benchmark approaches to training dependency parsers on such multi-annotation data. We will release our code and data at http://hlt.suda.edu.cn/~zhli/ .

Yu Zhao, Mingyue Zhou, Zhenghua Li, Min Zhang

### Joint Bilinear End-to-End Dependency Parsing with Prior Knowledge

Dependency parsing aims to identify relationships between words in one sentence. In this paper, we propose a novel graph-based end-to-end dependency parsing model, including POS tagger and Joint Bilinear Model (JBM). Based on prior POS knowledge from dataset, we use POS tagging results to guide the training of JBM. To narrow the gap between edge and label prediction, we pass the knowledge hidden in label prediction procedure in JBM. Motivated by success of deep contextualized word embeddings, this work also finetunes BERT for dependency parsing. Our model achieves 96.85% UAS and 95.01% LAS in English PTB dataset. Moreover, experiments on Universal Dependencies dataset indicates our model also reaches state-of-the-art performance on dependency parsing and POS tagging.

Yunchu Gao, Ke Zhang, Zhoujun Li

### Multi-layer Joint Learning of Chinese Nested Named Entity Recognition Based on Self-attention Mechanism

Nested named entity recognition attracts increasingly attentions due to their pervasiveness in general domain as well as in other specific domains. This paper proposes a multi-layer joint learning model for Chinese named entities recognition based on self-attention aggregation mechanism where a series of multi-layered sequence labeling sub-models are joined to recognize named entities in a bottom-up fashion. In order to capture entity semantic information in a lower layer, hidden units in an entity are aggregated using self-attention mechanism and further fed into the higher layer. We conduct extensive experiments using various entity aggregation methods. The results on the Chinese nested entity corpus transformed from the People’s Daily show that our model performs best among other competitive methods, implying that self-attention mechanism can effectively aggregate important semantic information in an entity.

Haoru Li, Haoliang Xu, Longhua Qian, Guodong Zhou

### Adversarial BiLSTM-CRF Architectures for Extra-Propositional Scope Resolution

Due to the ability of expressively representing narrative structures, proposition-aware learning models in text have been drawing more and more attentions in information extraction. Following this trend, recent studies go deeper into learning fine-grained extra-propositional structures, such as negation and speculation. However, most of elaborately-designed experiments reveal that existing extra-proposition models either fail to learn from the context or neglect to address cross-domain adaptation. In this paper, we attempt to systematically address the above challenges via an adversarial BiLSTM-CRF model, to jointly model the potential extra-propositions and their contexts. This is motivated by the superiority of sequential architecture in effectively encoding order information and long-range context dependency. On the basis, we come up with an adversarial neural architecture to learn the invariant and discriminative latent features across domains. Experimental results on the standard BioScope corpus show the superiority of the proposed neural architecture, which significantly outperforms the state-of-the-art on scope resolution in both in-domain and cross-domain scenarios.

Rongtao Huang, Jing Ye, Bowei Zou, Yu Hong, Guodong Zhou

### Analyzing Relational Semantics of Clauses in Chinese Discourse Based on Feature Structure

The discourse clause relational semantics is the semantic relation between discourse clause relevance structures. This paper proposes a method to represent the discourse clause relational semantics as a multi-dimensional feature structure. Compared with the simple classification mechanism of discourse relations, it can reveal the discourse semantic relations more deeply. Furthermore, we built Chinese discourse clause relational semantic feature corpus, and study the clause relational semantic feature recognition. We Transfer the clause relational semantic feature recognition into multiple binary classification problems, and extract relevant classification features for experiment. Experiments show that under the best classifier (SVM), the overall semantic feature recognition effect of F1 value reaches 70.14%; each classification feature contributes differently to the recognition of different clause relational semantic features, and the connectives contributes more to the recognition of all semantic features. By adding related semantic features as classification features, the interaction between different semantic features is studied. Experiments show that the influence of different semantic features is different. The addition of multiple semantic features has a more significant effect than a single semantic feature.

Wenhe Feng, Xi Huang, Han Ren

### Efficient Lifelong Relation Extraction with Dynamic Regularization

Relation extraction has received increasing attention due to its important role in natural language processing applications. However, most existing methods are designed for a fixed set of relations. They are unable to handle the lifelong learning scenario, i.e. adapting a well-trained model to newly added relations without catastrophically forgetting the previously learned knowledge. In this work, we present a memory-efficient dynamic regularization method to address this issue. Specifically, two types of powerful consolidation regularizers are applied to preserve the learned knowledge and ensure the robustness of the model, and the regularization strength is adaptively adjusted with respect to the dynamics of the training losses. Experiment results on multiple benchmarks show that our proposed method significantly outperforms prior state-of-the-art approaches.

Hangjie Shen, Shenggen Ju, Jieping Sun, Run Chen, Yuezhong Liu

### Collective Entity Disambiguation Based on Deep Semantic Neighbors and Heterogeneous Entity Correlation

Entity Disambiguation (ED) aims to associate entity mentions recognized in text corpus with the corresponding unambiguous entry in knowledge base (KB). A large number of models were proposed based on the topical coherence assumption. Recently, several works have proposed a new assumption: topical coherence only needs to hold among neighboring mentions, which proved to be effective. However, due to the complexity of the text, there are still some challenges in how to accurately obtain the local coherence of the mention set. Therefore, we introduce the self-attention mechanism in our work to capture the long-distance dependencies between mentions and quantify the degree of topical coherence. Based on the internal semantic correlation, we find the semantic neighbors for every mention. Besides, we introduce the idea of “simple to complex” to the construction of entity correlation graph, which achieves a self-reinforcing effect of low-ambiguity mention towards high-ambiguity mention during collective disambiguation. Finally, we apply the graph attention network to integrate the local and global features extracted from key information and entity correlation graph. We validate our graph neural collective entity disambiguation (GNCED) method on six public datasets and the results demonstrate a better performance improvement compared with state-of-the-art baselines.

Zihan He, Jiang Zhong, Chen Wang, Cong Hu

### Boosting Cross-lingual Entity Alignment with Textual Embedding

Multilingual knowledge graph (KG) embeddings have attracted many researchers, and benefit lots of cross-lingual tasks. The cross-lingual entity alignment task is to match equivalent entities in different languages, which can largely enrich the multilingual KGs. Many previous methods consider solely the use of structures to encode entities. However, lots of multilingual KGs provide rich entity descriptions. In this paper, we mainly focus on how to utilize these descriptions to boost the cross-lingual entity alignment. Specifically, we propose two textual embedding models called Cross-TextGCN and Cross-TextMatch to embed description for each entity. Our experiments on DBP15K show that these two textual embedding model can indeed boost the structure based cross-lingual entity alignment model.

Wei Xu, Chen Chen, Chenghao Jia, Yongliang Shen, Xinyin Ma, Weiming Lu

### Label Embedding Enhanced Multi-label Sequence Generation Model

Existing sequence generation models ignore the exposure bias problem when they apply to the multi-label classification task. To solve this issue, in this paper, we proposed a novel model, which disguises the label prediction probability distribution as label embedding and incorporate each label embedding from previous step into the current step’s LSTM decoding process. It allows the current step can make a better prediction based on the overall output of the previous prediction, rather than simply based on a local optimum output. In addition, we proposed a scheduled sampling-based learning algorithm for this model. The learning algorithm effectively and appropriately incorporates the label embedding into the process of label generation procedure. Through comparing with three classical methods and four SOTA methods for the multi-label classification task, the results demonstrated that our proposed method obtained the highest F1-Score (reaching 0.794 on a chemical exposure assessment task and reaching 0.615 on a clinical syndrome differentiation task of traditional Chinese medicine).

Yaqiang Wang, Feifei Yan, Xiaofeng Wang, Wang Tang, Hongping Shu

### Ensemble Distilling Pretrained Language Models for Machine Translation Quality Estimation

Machine translation quality estimation (Quality Estimation, QE) aims to evaluate the quality of machine translation automatically without golden reference. QE can be implemented on different granularities, thus to give an estimation for different aspects of machines translation output. In this paper, we propose an effective method to utilize pretrained language models to improve the performance of QE. Our model combines two popular pretrained models, which are Bert and XLM, to create a very strong baseline for both sentence-level and word-level QE. We also propose a simple yet effective strategy, ensemble distillation, to further improve the accuracy of QE system. Ensemble distillation can integrate different knowledge from multiple models into one model, and strengthen each single model by a large margin. We evaluate our system on CCMT2019 Chinese-English and English-Chinese QE dataset, which contains word-level and sentence-level subtasks. Experiment results show our model surpasses previous models to a large extend, demonstrating the effectiveness of our proposed method.

Hui Huang, Hui Di, Jin’an Xu, Kazushige Ouchi, Yufeng Chen

### Weaken Grammatical Error Influence in Chinese Grammatical Error Correction

Chinese grammatical error correction (CGEC), a task of correcting grammatical errors in text, is treated as a translation task, where error sentences are “translated” to correct sentences. However, some grammatical errors in the training data can confuse the CGEC models and have negative influence in the “translating” process. In this paper, we propose a Grammatical Error Weakening Module (GEWM) to impair the negative influence of grammatical errors in CGEC task. The grammatical error weakening module first extracts contextual features for each word in an error sentence via context attention mechanism. Then the proposed module uses learnable error weakening factors to control the proportion of contextual features and word features in the final representation of each word. As such, features from grammatical error words can be suppressed. Experiments show that our approach has better performance compared with the baseline models in CGEC task.

Jinggui Liang, Si Li

### Encoding Sentences with a Syntax-Aware Self-attention Neural Network for Emotion Distribution Prediction

Emotion distribution prediction aims to simultaneously identify multiple emotions and their intensities in a sentence. Recently, neural network models have been successfully applied in this task. However, most of them have not fully considered the sentence syntactic information. In this paper, we propose a syntax-aware self-attention neural network (SynSAN) that exploits syntactic features for emotion distribution prediction. In particular, we first explore a syntax-level self-attention layer over syntactic tree to learn the syntax-aware vector of each word by incorporating the dependency syntactic information from its parent and child nodes. Then we construct a sentence-level self-attention layer to compress syntax-aware vectors of words to the sentence representation used for emotion prediction. Experimental results on two public datasets show that our model can achieve better performance than the state-of-the-art models by large margins and requires less training parameters.

Chang Wang, Bang Wang

### Hierarchical Multi-view Attention for Neural Review-Based Recommendation

Many E-commerce platforms allow users to write their opinions towards products, and these reviews contain rich semantic information for users and items. Hence review analysis has been widely used in recommendation systems. However, most existing review-based recommendation methods focus on a single view of reviews and ignore the diversity of users and items since users always have multiple preferences and items always have various characteristics. In this paper, we propose a neural recommendation method with hierarchical multi-view attention which can effectively learn diverse user preferences and multiple item features from reviews. We design a review encoder with multi-view attention to learn representations of reviews from words, which can extract multiple points of a review. In addition, to learn representations of users and items from their reviews, we design a user/item encoder based on another multi-view attention. In this way, the diversity of user preference and item features can be fully exploited. Compared with the existing single attention approaches, the hierarchical multi-view attention in our method has the potential for better user and product modeling from reviews. We conduct extensive experiments on four recommendation datasets, and the results validate the advantage of our method for review based recommendation.

Hongtao Liu, Wenjun Wang, Huitong Chen, Wang Zhang, Qiyao Peng, Lin Pan, Pengfei Jiao

### Negative Feedback Aware Hybrid Sequential Neural Recommendation Model

Content-based (CB) and collaborative filtering (CF) are two classical types of recommendation methods that widely applied in various online services. Recently, sequential based recommender systems achieved good performance. However, how to integrate the advantages of these recommendation systems has not been well studied yet. Besides, most previous algorithms conduct negative sampling for each user based on items the user has not interacted with for model training, while it is unreasonable when there is known users’ negative feedback over items. We believe that a user’s negative feedback is valuable and should be used to better model users’ preferences. In this study, we propose a novel negative feedback aware hybrid sequential recommendation model (NFHS) to take the advantages of these three types of recommendation systems and to directly utilize negative feedback. There are two modules in our algorithm: 1) a static module to model the interaction history and the content features of the user and the current item. 2) a sequence module to distill a user’s interaction sequence features, negative feedback has also been directly introduced into this module. The experimental results on two real-world datasets from distinct scenarios demonstrate our model significantly outperforms various state-of-the-art approaches.

Bin Hao, Min Zhang, Weizhi Ma, Shaoyun Shi, Xinxing Yu, Houzhi Shan, Yiqun Liu, Shaoping Ma

### MSReNet: Multi-step Reformulation for Open-Domain Question Answering

Recent works on open-domain question answering (QA) rely on retrieving related passages to answer questions. However, most of them can not escape from sub-optimal initial retrieval results because of lacking interaction with the retrieval system. This paper introduces a new framework MSReNet for open-domain question answering where the question reformulator interacts with the term-based retrieval system, which can improve retrieval precision and QA performance. Specifically, we enhance the open-domain QA model with an additional multi-step reformulator which generates a new human-readable question with the current passages and question. The interaction continues for several times before answer extraction to find the optimal retrieval results as much as possible. Experiments show MSReNet gains performance improvements on several datasets such as TriviaQA-unfiltered, Quasar-T, SearchQA, and SQuAD-open. We also find that the intermediate reformulation results provide interpretability for the reasoning process of the model.

Weiguang Han, Min Peng, Qianqian Xie, Xiuzhen Zhang, Hua Wang

In a sponsored search engine, generative retrieval models are recently proposed to mine relevant advertisement keywords for users’ input queries. Generative retrieval models generate outputs token by token on a path of the target library prefix tree (Trie), which guarantees all of the generated outputs are legal and covered by the target library. In actual use, we found several typical problems caused by Trie-constrained searching length. In this paper, we analyze these problems and propose a looking ahead strategy for generative retrieval models named ProphetNet-Ads. ProphetNet-Ads improves the retrieval ability by directly optimizing the Trie-constrained searching space. We build a dataset from a real-word sponsored search engine and carry out experiments to analyze different generative retrieval models. Compared with Trie-based LSTM generative retrieval model proposed recently, our single model result and integrated result improve the recall by 15.58% and 18.8% respectively with beam size 5. Case studies further demonstrate how these problems are alleviated by ProphetNet-Ads clearly.

Weizhen Qi, Yeyun Gong, Yu Yan, Jian Jiao, Bo Shao, Ruofei Zhang, Houqiang Li, Nan Duan, Ming Zhou

### LARQ: Learning to Ask and Rewrite Questions for Community Question Answering

Taking advantage of the rapid growth of community platforms, such as Yahoo Answers, Quora, etc., Community Question Answering (CQA) systems are developed to retrieve semantically equivalent questions when users raise a new query. A typical CQA system mainly consists of two key components, a retrieval model and a ranking model, to search for similar questions and select the most related, respectively. In this paper, we propose LARQ, Learning to Ask and Rewrite Questions, which is a novel sentence-level data augmentation method. Different from common lexical-level data augmentation progresses, we take advantage of the Question Generation (QG) model to obtain more accurate, diverse, and semantically-rich query examples. Since the queries differ greatly in a low-resource code-start scenario, incorporating the QG model as an augmentation to the indexed collection significantly improves the response rate of CQA systems. We incorporate LARQ in an online CQA system and the Bank Question (BQ) Corpus to evaluate the enhancements for both the retrieval process and the ranking model. Extensive experimental results show that the LARQ enhanced model significantly outperforms single BERT and XGBoost models, as well as a widely-used QG model (NQG).

Huiyang Zhou, Haoyan Liu, Zhao Yan, Yunbo Cao, Zhoujun Li

### Abstractive Summarization via Discourse Relation and Graph Convolutional Networks

Currently, the mainstream abstractive summarization method uses a machine learning model based on encoder-decoder architecture, and generally utilizes the encoder based on a recurrent neural network. The model mainly learns the serialized information of the text, but rarely learns the structured information. From the perspective of linguistics, the text structure information is effective in judging the importance of the text content. In order to enable the model to obtain text structure information, this paper proposes to use discourse relation in text summarization tasks, which can make the model focus on the important part of the text. Based on the traditional LSTM encoder, this paper adds graph convolutional networks to obtain the structural information of the text. In addition, this paper also proposes a fusion layer, which enables the model to pay attention to the serialized information of the text while acquiring the text structure information. The experimental results show that the system performance is significantly improved on ROUGE evaluation after joining discourse relation information.

Wenjie Wei, Hongling Wang, Zhongqing Wang

### Chinese Question Classification Based on ERNIE and Feature Fusion

Question classification (QC) is a basic task of question answering (QA) system. This task effectively narrows the range of candidate answers and improves the operating efficiency of the system by providing semantic restrictions for the subsequent steps of information retrieval and answer extraction. Due to the small number of words in the question, it is difficult to extract deep semantic information for the existing QC methods. In this work, we propose a QC method based on ERNIE and feature fusion. We approach this problem by first using ERNIE to generate word vectors, which we then use to input into the feature extraction model. Next, we propose to combine the hybrid neural network (CNN-BILSTM, which extracts features independently), highway network and DCU (Dilated Composition Units) module as the feature extraction model. Experimental results on Fudan university’s question classification data set and NLPCC(QA)-2018 data set show that our method can improve the accuracy, recall rate and F1 of the QC task.

Gaojun Liu, Qiuxia Yuan, Jianyong Duan, Jie Kou, Hao Wang

### An Abstractive Summarization Method Based on Global Gated Dual Encoder

The sequence-to-sequence model based on the RNN attention mechanism has been well applied in abstractive summarization, but the existing models generally cannot capture long-term information because of the defects of RNN. So an abstractive text summarization method is proposed in this paper, which is based on global gated double encoding (GDE). Combined with Transformer to extract global semantics, a global gating unit based on dual encoder is designed that can filter the key information to prevent the redundant information, and the problem of insufficient semantics is compensated dynamically. Many experiments on the LCSTS Chinese and CNN/Daily Mail English datasets show that our model is superior to the current advanced generative methods.

Lu Peng, Qun Liu, Lebin Lv, Weibin Deng, Chongyu Wang

### Rumor Detection on Hierarchical Attention Network with User and Sentiment Information

Social media has developed rapidly due to its openness and freedom, and people can post information on Internet anytime and anywhere. However, social media has also become the main way for rumors to spread largely and quickly. Hence, it has become a huge challenge to automatically detect rumors among such a huge amount of information. Currently, there are many neural network methods, which mainly considered text features but did not pay enough attention to user and sentiment information that are also useful clues for rumor detection. Therefore, this paper proposes a hierarchical attention network with user and sentiment information (HiAN-US) for rumor detection, which first uses the transformer encoder to learn the semantic information at both word-level and tweet-level, then integrates user and sentiment information via attention mechanism. Experiments on the Twitter15, Twitter16 and PHEME datasets show that our model is more effective than several state-of-the-art baselines.

Sujun Dong, Zhong Qian, Peifeng Li, Xiaoxu Zhu, Qiaoming Zhu

### Measuring the Semantic Stability of Word Embedding

The techniques of word embedding have a wide range of applications in natural language processing (NLP). However, recent studies have revealed that word embeddings have large amounts of instability, which affects the performance in downstream tasks and the applications in safety-critical fields such as medical diagnosis and financial analysis. Further researches have found that the popular metric of Nearest Neighbors Stability (NNS) is unreliable for qualitative conclusions on diachronic semantic matters, which means NNS cannot fully capture the semantic fluctuations of word vectors. To measure semantic stability more accurately, we propose a novel metric that combines the Nearest Senses Stability (NSS) and the Aligned Sense Stability (ASS). Moreover, previous studies on word embedding stability focus on static embedding models such as Word2vec and ignore the contextual embedding models such as Bert. In this work, we propose the SPIP metric based on Pairwise Inner Product (PIP) loss to extend the stability study to contextual embedding models. Finally, the experimental results demonstrate that CS and SPIP are effective in parameter configuration to minimize embedding instability without training downstream models, outperforming the state-of-the-art metric NNS.

Zhenhao Huang, Chenxu Wang

Haiou Zhang, Hanjun Zhao, Chunhua Liu, Dong Yu

### Key-Elements Graph Constructed with Evidence Sentence Extraction for Gaokao Chinese

Multiple choice questions from university admission exams (Gaokao in Chinese) is a challenging AI task since it requires effective representation to capture complicated semantic relations between sentences in the article and strong ability to handle long text. Face the above challenges, we propose a key-elements graph to enhance context semantic representation and a comprehensive evidence extraction method inspired by existing methods. Our model first extracts evidence sentences from a passage according to the corresponding question and options to reduce the impact of noise. Then combines syntactic analysis techniques with graph neural network to construct the key-elements graph bases on the extracted sentences. Finally, fusing the learned graph nodes representation into context representation to enhancing syntactic information. Experiments on Gaokao Chinese multiple-choice dataset demonstrate the proposed model obtains substantial performance gains over various neural model baselines in terms of accuracy.

Xiaoyue Wang, Yu Ji, Ru Li

### Knowledge Inference Model of OCR Conversion Error Rules Based on Chinese Character Construction Attributes Knowledge Graph

OCR is a character conversion method based on image recognition. The complexity of the character and the image quality plays a key role in the conversion accuracy. The OCR conversion process has the characteristics of irregular conversion errors and the combination between incorrect conversion words and context of original location in certain text scenarios is established in semantic. In this paper, we propose an OCR conversion error rules inference model based on Chinese character construction attribute knowledge graph to analyze and inference the structure and complexity of Chinese characters. The model integrates a variety of coding methods, extracts features of entities and relationships of different data types with different encoder in the knowledge graph, uses convolutional neural networks to learn and inference the unknown error rules in the OCR conversion. In addition, in order to enable the triple feature matrix to fully contain the construction attribute information of the Chinese characters, a feature crossover algorithm for feature diffusion of the triple feature matrix is introduced. In this algorithm, the relation matrix and the entities matrix are crossed to generate the new feature matrix which can better represent the triple of knowledge graph. The experimental results show that, compared with the current mainstream knowledge inference model, the OCR conversion error rules inference model incorporating the feature cross algorithm has achieved important improvements in MRR, Hits@1, Hits@2 and other evaluation indicators on public data sets and task-related data sets.

Xiaowen Zhang, Hairong Wang, Wenjie Gu

### Interpretable Machine Learning Based on Integration of NLP and Psychology in Peer-to-Peer Lending Risk Evaluation

With the rapid development of Peer-to-Peer (P2P) lending in the financial field, abundant data of lending agencies have appeared. P2P agencies also have problems such as absconded with ill-gotten gains and out of business. Therefore, it is urgent to use the interpretable AI in Fintech to evaluate the lending risk effectively. In this paper we use the machine learning and deep learning method to model and analyze the unstructured natural language text of P2P agencies, and we propose an interpretable machine learning method to evaluate the fraud risk of P2P agencies, which enhances the credibility of the AI model. First, this paper explains model behavior based on the psychological interpersonal fraud theory in the field of social science. At the same time, the NLP and influence function in the field of natural science are used to verify that the machine learning model really learns the information of part-of-speech details in the fraud theory, which provides the psychological interpretable support for the model of P2P risk evaluation. In addition, we propose “style vectors” to describe the overall differences between text styles of P2P agencies and understand model behavior. Experiments show that using style vectors and influence functions to describe text style differences is the same as human intuitive perception. This proves that the machine learning model indeed learn the text style difference and use it for risk evaluation, which further shows that the model has a certain machine learning interpretability.

Lei Li, Tianyuan Zhao, Yang Xie, Yanjie Feng

### Algorithm Bias Detection and Mitigation in Lenovo Face Recognition Engine

With the advancement of Artificial Intelligence (AI), algorithms brings more fairness challenges in ethical, legal, psychological and social levels. People should start to face these challenges seriously in dealing with AI products and AI solutions. More and more companies start to recognize the importance of Diversity and Inclusion (D&I) due to AI algorithms and take corresponding actions. This paper introduces Lenovo AI’s Vision on D&I, specially, the efforts of mitigating algorithm bias in human face processing technology. Latest evaluation shows that Lenovo face recognition engine achieves better performance of racial fairness over competitors in terms of multiple metrics. In addition, it also presents post-processing strategy of improving fairness according to different considerations and criteria.

Sheng Shi, Shanshan Wei, Zhongchao Shi, Yangzhou Du, Wei Fan, Jianping Fan, Yolanda Conyers, Feiyu Xu

### Path-Based Visual Explanation

The ability to explain the behavior of a Machine Learning (ML) model as a black box to people is becoming essential due to wide usage of ML applications in critical areas ranging from medicine to commerce. Case-Based Reasoning (CBR) received a special interest among other methods of providing explanations for model decisions due to the fact that it can easily be paired with a black box and then can propose a post-hoc explanation framework. In this paper, we propose a CBR-Based method to not only explain a model decision but also provide recommendations to the user in an easily understandable visual interface. Our evaluation of the method in a user study shows interesting results.

Mohsen Pourvali, Yucheng Jin, Chen Sheng, Yao Meng, Lei Wang, Masha Gorkovenko, Changjian Hu

### Feature Store for Enhanced Explainability in Support Ticket Classification

In order to maximize trust between human and ML agents in an ML application scenario, humans need to be able to easily understand the reasoning behind predictions made by the black box models commonly used today. The field of explainable AI aims to maximize this trust. To achieve this, model interpretations need to be informative yet understandable. But often, explanations provided by a model are not easy to understand due to complex feature transformations. Our work proposes the use of a feature store to address this issue. We extend the general idea of a feature store. In addition to using a feature store for reading pre-processed features, we also use it to interpret model explanations in a more user-friendly and business-relevant format. This enables both the end user as well as the data scientist personae to glean more information from the interpretations in a shorter time. We demonstrate our idea using a service ticket classification scenario. However, the general concept can be extended to other data types and applications as well to gain more insightful explanations.

Vishal Mour, Sreya Dey, Shipra Jain, Rahul Lodhe

### Incorporating Lexicon for Named Entity Recognition of Traditional Chinese Medicine Books

Little research has been done on the Named Entity Recognition (NER) of Traditional Chinese Medicine (TCM) books and most of them use statistical models such as Conditional Random Fields (CRFs). However, in these methods, lexicon information and large-scale of unlabeled corpus data are not fully exploited. In order to improve the performance of NER for TCM books, we propose a method which is based on biLSTM-CRF model and can incorporate lexicon information into representation layer to enrich its semantic information. We compared our approach with several previous character-based and word-based methods. Experiments on “Shanghan Lun” dataset show that our method outperforms previous models. In addition, we collected 376 TCM books to construct a large-scale of corpus to obtain the pre-trained vectors since there is no large available corpus in this field before. We have released the corpus and pre-trained vectors to the public.

Bingyan Song, Zhenshan Bao, YueZhang Wang, Wenbo Zhang, Chao Sun

### Anaphora Resolution in Chinese for Analysis of Medical Q&A Platforms

In medical Q&A platforms, patients share information about their diagnosis, give advice and consult with doctors, this creates a large amount of data that contains valuable knowledge on the side effects of drugs, patients’ actions and symptoms. This information is widely considered to be the most important in the field of computer-aided medical analysis. Nevertheless, messages on the Internet are difficult to analyze because of their unstructured form. Thus, the purpose of this study is to develop a program for anaphora resolution in Chinese and to implement it for analysis of user-generated content in the medical Q&A platform. The experiments are conducted on three models: BERT, NeuralCoref and BERT-Chinese+SpanBERT. BERT-Chinese+SpanBERT achieves the highest accuracy—68.5% on the OntoNotes 5.0 corpus. Testing the model that showed the highest result was carried out on messages from the medical Q&A platform haodf.com. The results of the study might contribute to improving the diagnosis of hereditary diseases.

Alena Tsvetkova

### Weighted Pre-trained Language Models for Multi-Aspect-Based Multi-Sentiment Analysis

In recent years, aspect-based sentiment analysis has attracted the attention of many researchers with its wide range of application scenarios. Existing methods for fine-grained sentiment analysis usually explicitly model the relations between aspects and contexts. In this paper, we tackle the task as sentence pair classification. We build our model based on pre-trained language models (LM) due to their strong ability in modeling semantic information. Besides, in order to further enhance the performance, we apply weighted voting strategy to combine the multiple results of different models in a heuristic way. We participated in NLPCC-2020 shared task on Multi-Aspect-based Multi-Sentiment Analysis (MAMS) and won the first place in terms of two sub-tasks, indicating the effectiveness of the approaches adopted.

Fengqing Zhou, Jinhui Zhang, Tao Peng, Liang Yang, Hongfei Lin

### Iterative Strategy for Named Entity Recognition with Imperfect Annotations

Named entity recognition (NER) systems have been widely researched and applied for decades. Most NER systems rely on high quality annotations, but in some specific domains, annotated data is usually imperfect, typically including incomplete annotations and non-annotations. Although related studies have achieved good results on specific types of annotations, to build a more robust NER system, it is necessary to consider complex scenarios that simultaneously contain complete annotations, incomplete annotations, non-annotations, etc. In this paper, we propose a novel NER system, which could use different strategies to process different types of annotations, rather than simply adopts the same strategy. Specifically, we perform multiple iterations. In each iteration, we first train the model based on incomplete annotations, and then use the model to re-annotate imperfect annotations and update their weights, which could generate and filter out high quality annotations. In addition, we fine-tune models through high quality annotations and its augmentations, and finally integrate multiple models to generate reliable prediction results. Comprehensive experiments are conducted to demonstrate the effectiveness of our system. Moreover, the system is ranked first and second respectively in two leaderboards of NLPCC 2020 Shared Task: Auto Information Extraction ( https://github.com/ZhuiyiTechnology/AutoIE ).

Huimin Xu, Yunian Chen, Jian Sun, Xuezhi Cao, Rui Xie

### The Solution of Huawei Cloud & Noah’s Ark Lab to the NLPCC-2020 Challenge: Light Pre-Training Chinese Language Model for NLP Task

Pre-trained language models have achieved great success in natural language processing. However, they are difficult to be deployed on resource-restricted devices because of the expensive computation. This paper introduces our solution to the Natural Language Processing and Chinese Computing (NLPCC) challenge of Light Pre-Training Chinese Language Model for the Natural Language Processing ( http://tcci.ccf.org.cn/conference/2020/ ) ( https://www.cluebenchmarks.com/NLPCC.html ). The proposed solution uses a state-of-the-art method of BERT knowledge distillation (TinyBERT) with an advanced Chinese pre-trained language model (NEZHA) as the teacher model, which is dubbed as TinyNEZHA. In addition, we introduce some effective techniques in the fine-tuning stage to boost the performances of TinyNEZHA. In the official evaluation of NLPCC-2020 challenge, TinyNEZHA achieves a score of 77.71, ranking 1st place among all the participating teams. Compared with the BERT-base, TinyNEZHA obtains almost the same results while being 9× smaller and 8× faster on inference.

Yuyang Zhang, Jintao Yu, Kai Wang, Yichun Yin, Cheng Chen, Qun Liu

### DuEE: A Large-Scale Dataset for Chinese Event Extraction in Real-World Scenarios

This paper introduces DuEE, a new dataset for Chinese event extraction (EE) in real-world scenarios. DuEE has several advantages over previous EE datasets. (1) Scale: DuEE consists of 19,640 events categorized into 65 event types, along with 41,520 event arguments mapped to 121 argument roles, which, to our knowledge, is the largest Chinese EE dataset so far. (2) Quality: All the data is human annotated with crowdsourced review, ensuring that the annotation accuracy is higher than 95%. (3) Reality: The schema covers trending topics from Baidu Search and the data is collected from news on Baijiahao. The task is also close to real-world scenarios, e.g., a single instance is allowed to contain multiple events, different event arguments are allowed to share the same argument role, and an argument is allowed to play different roles. To advance the research on Chinese EE, we release DuEE as well as a baseline system to the community. We also organize a shared competition on the basis of DuEE, which has attracted 1,206 participants. We analyze the results of top performing systems and hope to shed light on further improvements.

Xinyu Li, Fayuan Li, Lu Pan, Yuguang Chen, Weihua Peng, Quan Wang, Yajuan Lyu, Yong Zhu

### Transformer-Based Multi-aspect Modeling for Multi-aspect Multi-sentiment Analysis

Aspect-based sentiment analysis (ABSA) aims at analyzing the sentiment of a given aspect in a sentence. Recently, neural network-based methods have achieved promising results in existing ABSA datasets. However, these datasets tend to degenerate to sentence-level sentiment analysis because most sentences contain only one aspect or multiple aspects with the same sentiment polarity. To facilitate the research of ABSA, NLPCC 2020 Shared Task 2 releases a new large-scale Multi-Aspect Multi-Sentiment (MAMS) dataset. In the MAMS dataset, each sentence contains at least two different aspects with different sentiment polarities, which makes ABSA more complex and challenging. To address the challenging dataset, we re-formalize ABSA as a problem of multi-aspect sentiment analysis, and propose a novel Transformer-based Multi-aspect Modeling scheme (TMM), which can capture potential relations between multiple aspects and simultaneously detect the sentiment of all aspects in a sentence. Experiment results on the MAMS dataset show that our method achieves noticeable improvements compared with strong baselines such as BERT and RoBERTa, and finally ranks the 2nd in NLPCC 2020 Shared Task 2 Evaluation.

Zhen Wu, Chengcan Ying, Xinyu Dai, Shujian Huang, Jiajun Chen

### Overview of the NLPCC 2020 Shared Task: AutoIE

This is an overview paper of the NLPCC 2020 shared task on AutoIE, which aims to evaluate the information extraction solutions under low data resource. Given an unlabeled corpus, entity lists covering $$30\%$$ entities in the corpus and some labeled validation samples, participants are required to build a named entity recognition system. There are 44 registered teams and 16 of them submitted results, the top system achieve 0.041 and 0.133 F1 score improvement upon the baseline system with or without labeled validation data respectively. The evaluation result indicates that it is possible to use less human annotation for information extraction system. All information about this task may be found at https://github.com/ZhuiyiTechnology/AutoIE .

Xuefeng Yang, Benhong Wu, Zhanming Jie, Yunfeng Liu

### Light Pre-Trained Chinese Language Model for NLP Tasks

We present the results of shared-task 1 held in the 2020 Conference on Natural Language Processing and Chinese Computing (NLPCC): Light Pre-Trained Chinese Language Model for NLP tasks. This shared-task examines the performance of light language models on four common NLP tasks: Text Classification, Named Entity Recognition, Anaphora Resolution and Machine Reading Comprehension. To make sure that the models are light-weight, we put restrictions and requirements on the number of parameters and inference speed of the participating models. In total, 30 teams registered our tasks. Each submission was evaluated through our online benchmark system ( https://www.cluebenchmarks.com/nlpcc2020.html ), with the average score over the four tasks as the final score. Various ideas and frameworks were explored by the participants, including data enhancement, knowledge distillation and quantization. The best model achieved an average score of 75.949, which was very close to BERT-base (76.460). We believe this shared-task highlights the potential of light-weight models and calls for further research on the development and exploration of light-weight models.

Junyi Li, Hai Hu, Xuanwei Zhang, Minglei Li, Lu Li, Liang Xu

### Overview of the NLPCC 2020 Shared Task: Multi-Aspect-Based Multi-Sentiment Analysis (MAMS)

In this paper, we present an overview of the NLPCC 2020 shared task on Multi-Aspect-based Multi-Sentiment Analysis (MAMS). The evaluation consists of two sub-tasks: (1) aspect term sentiment analysis (ATSA) and (2) aspect category sentiment analysis (ACSA). We manually annotated a large-scale restaurant reviews corpus for MAMS, in which each sentence contains at least two different aspects with different sentiment polarities. Thus, the provided MAMS dataset is more challenging than the existing aspect-based sentiment analysis (ABSA) datasets. MAMS attracted a total of 50 teams to participate in the evaluation task. We believe that MAMS will push forward the research in the field of aspect-based sentiment analysis.

Lei Chen, Ruifeng Xu, Min Yang