main-content

This two-volume set of LNAI 12340 and LNAI 12341 constitutes the refereed proceedings of the 9th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2020, held in Zhengzhou, China, in October 2020.

The 70 full papers, 30 poster papers and 14 workshop papers presented were carefully reviewed and selected from 320 submissions. They are organized in the following areas: Conversational Bot/QA; Fundamentals of NLP; Knowledge Base, Graphs and Semantic Web; Machine Learning for NLP; Machine Translation and Multilinguality; NLP Applications; Social Media and Network; Text Mining; and Trending Topics.

FAQ-Based Question Answering via Knowledge Anchors

Question answering (QA) aims to understand questions and find appropriate answers. In real-world QA systems, Frequently Asked Question (FAQ) based QA is usually a practical and effective solution, especially for some complicated questions (e.g., How and Why). Recent years have witnessed the great successes of knowledge graphs (KGs) in KBQA systems, while there are still few works focusing on making full use of KGs in FAQ-based QA. In this paper, we propose a novel Knowledge Anchor based Question Answering (KAQA) framework for FAQ-based QA to better understand questions and retrieve more appropriate answers. More specifically, KAQA mainly consists of three modules: knowledge graph construction, query anchoring and query-document matching. We consider entities and triples of KGs in texts as knowledge anchors to precisely capture the core semantics, which brings in higher precision and better interpretability. The multi-channel matching strategy also enables most sentence matching models to be flexibly plugged in our KAQA framework to fit different real-world computation limitations. In experiments, we evaluate our models on both offline and online query-document matching tasks on a real-world FAQ-based QA system in WeChat Search, with detailed analysis, ablation tests and case studies. The significant improvements confirm the effectiveness and robustness of the KAQA framework in real-world FAQ-based QA.

Ruobing Xie, Yanan Lu, Fen Lin, Leyu Lin

Deep Hierarchical Attention Flow for Visual Commonsense Reasoning

Visual Commonsense Reasoning (VCR) requires a thoroughly understanding general information connecting language and vision, as well as the background world knowledge. In this paper, we introduce a novel yet powerful deep hierarchical attention flow framework, which takes full advantage of text information in the query and candidate responses to perform reasoning over the image. Moreover, inspired by the success of machine reading comprehension, we also model the correlation among candidate responses to obtain better response representations. Extensive quantitative and qualitative experiments are conducted to evaluate the proposed model. Empirical results on the benchmark VCR1.0 show that the proposed model outperforms existing strong baselines, which demonstrates the effectiveness of our method.

Yuansheng Song, Ping Jian

Dynamic Reasoning Network for Multi-hop Question Answering

Multi-hop reasoning question answering is a sub-task of machine reading comprehension (MRC) which aims to find the answer of a given question across multiple passages. Most existing models usually obtain the answer by visiting the question only once so that models may not obtain adequate text information. In this paper, we propose a Dynamic Reasoning Network (DRN), a novel approach to obtain correct answers by multi-hop reasoning among multiple passages. We establish a query reshaping mechanism which visits a query repeatedly to mimic people’s reading habit. The model dynamically reasons over an entity graph with graph attention (GAT) and the query reshaping mechanism to promote its ability of comprehension and reasoning. The experimental results on the HotpotQA and TriviaQA datasets show that our DRN model achieves significant improvements as compared to prior state-of-the-art models.

Xiaohui Li, Yuezhong Liu, Shenggen Ju, Zhengwen Xie

Memory Attention Neural Network for Multi-domain Dialogue State Tracking

In a task-oriented dialogue system, the dialogue state tracker aims to generate a structured summary (domain-slot-value triples) over the whole dialogue utterance. However, existing approaches generally fail to make good use of pre-defined ontologies. In this paper, we propose a novel Memory Attention State Tracker that considers ontologies as prior knowledge and utilizes Memory Network to store such information. Our model is composed of an utterance encoder, an attention-based query generator, a slot gate classifier, and ontology Memory Networks for every domain-slot pair. To make a fair comparison with previous approaches, we also conduct experiments with RNN instead of pre-trained BERT as the encoder. Empirical results show that our model achieves a compatible joint accuracy on MultiWoz 2.0 dataset and MultiWoz 2.1 dataset.

Zihan Xu, Zhi Chen, Lu Chen, Su Zhu, Kai Yu

Word sense understanding (WSU) is fundamental for human beings’ reading and the word-meaning-explanation question is an important kind of questions in Chinese reading comprehension (RC) in the college entrance exams of China (called as ‘Gaokao’ for short), which requires students to explain the meaning for a target word. This paper proposes a method to answer the word-meaning-explanation questions, which combines the attractive VAE framework with the BERT and Transformer to learn rich, nonlinear representations for producing the high-quality explanation for a target word within a certain context. In order to generate multi-style explanations, we construct not only the Chinese dictionary-style datasets, but also the essay-style dataset as a supplement to Chinese Gaokao application. We also build the Gaokao-style test set to evaluate our model. The experimental results show that our model can perform better than the baseline models. The code and the relevant dataset will be released on Github.

Hongye Tan, Pengpeng Qiang, Ru Li

Enhancing Multi-turn Dialogue Modeling with Intent Information for E-Commerce Customer Service

Nowadays, it is a heated topic for many industries to build intelligent conversational bots for customer service. A critical solution to these dialogue systems is to understand the diverse and changing intents of customers accurately. However, few studies have focused on the intent information due to the lack of large-scale dialogue corpus with intent labelled. In this paper, we propose to leverage intent information to enhance multi-turn dialogue modeling. First, we construct a large-scale Chinese multi-turn E-commerce conversation corpus with intent labelled, namely E-IntentConv, which covers 289 fine-grained intents in after-sales domain. Specifically, we utilize the attention mechanism to extract Intent Description Words (IDW) for representing each intent explicitly. Then, based on E-IntentConv, we propose to integrate intent information for both retrieval-based model and generation-based model to verify its effectiveness for multi-turn dialogue modeling. Experimental results show that extra intent information is useful for improving both response selection and generation tasks.

Ruixue Liu, Meng Chen, Hang Liu, Lei Shen, Yang Song, Xiaodong He

Robust Spoken Language Understanding with RL-Based Value Error Recovery

Spoken Language Understanding (SLU) aims to extract structured semantic representations (e.g., slot-value pairs) from speech recognized texts, which suffers from errors of Automatic Speech Recognition (ASR). To alleviate the problem caused by ASR-errors, previous works may apply input adaptations to the speech recognized texts, or correct ASR errors in predicted values by searching the most similar candidates in pronunciation. However, these two methods are applied separately and independently. In this work, we propose a new robust SLU framework to guide the SLU input adaptation with a rule-based value error recovery module. The framework consists of a slot tagging model and a rule-based value error recovery module. We pursue on an adapted slot tagging model which can extract potential slot-value pairs mentioned in ASR hypotheses and is suitable for the existing value error recovery module. After the value error recovery, we can achieve a supervision signal (reward) by comparing refined slot-value pairs with annotations. Since operations of the value error recovery are non-differentiable, we exploit policy gradient based Reinforcement Learning (RL) to optimize the SLU model. Extensive experiments on the public CATSLU dataset show the effectiveness of our proposed approach, which can improve the robustness of SLU and outperform the baselines by significant margins.

Chen Liu, Su Zhu, Lu Chen, Kai Yu

A Large-Scale Chinese Short-Text Conversation Dataset

The advancements of neural dialogue generation models show promising results on modeling short-text conversations. However, training such models usually needs a large-scale high-quality dialogue corpus, which is hard to access. In this paper, we present a large-scale cleaned Chinese conversation dataset LCCC, which contains a base version (6.8 million dialogues) and a large version (12.0 million dialogues). The quality of our dataset is ensured by a rigorous data cleaning pipeline, which is built based on a set of rules and a classifier that is trained on manually annotated 110K dialogue pairs. We also release pre-training dialogue models which are trained on LCCC-base and LCCC-large respectively. The cleaned dataset and the pre-training models will facilitate the research of short-text conversation modeling. All the models and datasets are available at https://github.com/thu-coai/CDial-GPT .

Yida Wang, Pei Ke, Yinhe Zheng, Kaili Huang, Yong Jiang, Xiaoyan Zhu, Minlie Huang

DVDGCN: Modeling Both Context-Static and Speaker-Dynamic Graph for Emotion Recognition in Multi-speaker Conversations

Emotion recognition in conversation has been one hot topic in natural language processing (NLP). Speaker information plays an important role in the dialogue system, especially speaker state closely related to emotion. Because of the increasing speakers, it is more challenging to model speakers’ state in multi-speaker conversation than in two-speaker conversation. In this paper, we focus on emotion detection in multi-speaker conversation–a more generalized conversation emotion task. We mainly try to solve two problems. First, the more speakers, the more difficulties we have to meet to model speakers’ interactions and get speaker state. Second, because of conversations’ temporal variations, it’s necessary to model speaker dynamic state in the conversation. For the first problem, we adopt graph structure which has expressive ability to model speaker interactions and speaker state. For the second problem, we use dynamic graph neural network to model speaker dynamic state. Therefore, we propose Dual View Dialogue Graph Neural Network (DVDGCN), a graph neural network to model both context-static and speaker-dynamic graph. The experimental results on a multi-speaker conversation emotion recognition corpus demonstrate the great effectiveness of the proposed approach.

Shuofeng Zhao, Pengyuan Liu

Nominal Compound Chain Extraction: A New Task for Semantic-Enriched Lexical Chain

Lexical chain consists of cohesion words in a document, which implies underlying structure of a text, and thus facilitates downstream NLP tasks. Nevertheless, existing work focuses on detecting the simple surface lexicons with shallow syntax associations, ignoring the semantic-aware lexical compounds as well as the latent semantic frames, (e.g., topic), which can be much more crucial for real-world NLP applications. In this paper, we introduce a novel task, Nominal Compound Chain Extraction (NCCE), extracting and clustering all the nominal compounds that share identical semantic topics. In addition, we model the task as a two-stage prediction (i.e., compound extraction and chain detection), which is handled via a proposed joint framework. The model employs the BERT encoder to yield contextualized document representation. Also HowNet is exploited as external resource for offering rich sememe information. The experiments are based on our manually annotated corpus, and the results prove the necessity of the NCCE task as well as the effectiveness of our joint approach.

Bobo Li, Hao Fei, Yafeng Ren, Donghong Ji

A Hybrid Model for Community-Oriented Lexical Simplification

Generally, lexical simplification replaces complex words in a sentence with simplified and synonymous words. Most current methods improve lexical simplification by optimizing ranking algorithm and their performance are limited. This paper utilizes a hybrid model through merging candidate words generated by a Context2vec neural model and a Context-aware model based on a weighted average method. The model consists of four steps: candidate word generation, candidate word selection, candidate word ranking, and candidate word merging. Through the evaluation on standard datasets, our hybrid model outperforms a list of baseline methods including Context2vec method, Context-aware method, and the state-of-the-art semantic-context ranking method, indicating its effectiveness in community-oriented lexical simplification task.

Jiayin Song, Yingshan Shen, John Lee, Tianyong Hao

Multimodal Aspect Extraction with Region-Aware Alignment Network

Fueled by the rise of social media, documents on these platforms (e.g., Twitter, Weibo) are increasingly multimodal in nature, with images in addition to text. To well automatically analyze the opinion information inside multimodal data, it’s crucial to perform aspect term extraction (ATE) on them. However, until now, the researches focus on multimodal ATE are rare. In this study, we take a step further than previous studies by proposing a Region-aware Alignment Network (RAN) that aligns text with object regions that show in an image for the multimodal ATE task. Experiments on the Twitter dataset showcase the effectiveness of our proposed model. Further researches prove that our model has better performance when extracting emotion polarized aspect terms.

Hanqian Wu, Siliang Cheng, Jingjing Wang, Shoushan Li, Lian Chi

NER in Threat Intelligence Domain with TSFL

In order to deal with more sophisticated Advanced Persistent Threat (APT) attacks, it is indispensable to convert cybersecurity threat intelligence via structured or semi-structured data specifications. In this paper, we convert the task of extracting indicators of compromises (IOC) information into a sequence labeling task of named entity recognition. We construct the dataset used for named entity identification in the threat intelligence domain and train word vectors in the threat intelligence domain. Meanwhile, we propose a new loss function TSFL, triplet loss function based on metric learning and sorted focal loss function, to solve the problem of unbalanced distribution of data labels. Experiments show that named entity recognition experiments show that F1 value have improved in both public domain datasets and threat intelligence.

Xuren Wang, Zihan Xiong, Xiangyu Du, Jun Jiang, Zhengwei Jiang, Mengbo Xiong

Enhancing the Numeracy of Word Embeddings: A Linear Algebraic Perspective

To reason over the embeddings of numbers, they should capture numeracy information. In this work, we consider the magnitude aspect of numeracy information. We could find a vector in a high dimensional space and a subspace of original space. After projecting the original embeddings of numbers onto that vector or subspace, the magnitude information could be significantly enhanced. Therefore, this paper proposes a new angle to study numeracy of word embeddings, which is interpretable and has nice mathematical formulations.

Yuanhang Ren, Ye Du

Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing?

In the pre deep learning era, part-of-speech tags have been considered as indispensable ingredients for feature engineering in dependency parsing. But quite a few works focus on joint tagging and parsing models to avoid error propagation. In contrast, recent studies suggest that POS tagging becomes much less important or even useless for neural parsing, especially when using character-based word representations. Yet there are not enough investigations focusing on this issue, both empirically and linguistically. To answer this, we design and compare three typical multi-task learning framework, i.e., Share-Loose, Share-Tight, and Stack, for joint tagging and parsing based on the state-of-the-art biaffine parser. Considering that it is much cheaper to annotate POS tags than parse trees, we also investigate the utilization of large-scale heterogeneous POS tag data. We conduct experiments on both English and Chinese datasets, and the results clearly show that POS tagging (both homogeneous and heterogeneous) can still significantly improve parsing performance when using the Stack joint framework. We conduct detailed analysis and gain more insights from the linguistic aspect.

Houquan Zhou, Yu Zhang, Zhenghua Li, Min Zhang

A Span-Based Distantly Supervised NER with Self-learning

The lack of labeled data is one of the major obstacles for named entity recognition (NER). Distant supervision is often used to alleviate this problem, which automatically generates annotated training datasets by dictionaries. However, as far as we know, existing distant supervision based methods do not consider the latent entities which are not in dictionaries. Intuitively, entities of the same type have the similar contextualized feature, we can use the feature to extract the latent entities within corpuses into corresponding dictionaries to improve the performance of distant supervision based methods. Thus, in this paper, we propose a novel span-based self-learning method, which employs span-level features to update corresponding dictionaries. Specifically, the proposed method directly takes all possible spans into account and scores them for each label, then picks latent entities from candidate spans into corresponding dictionaries based on both local and global features. Extensive experiments on two public datasets show that our proposed method performs better than the state-of-the-art baselines.

Hongli Mao, Hanlin Tang, Wen Zhang, Heyan Huang, Xian-Ling Mao

A Passage-Level Text Similarity Calculation

Along with the explosion of web information, information flow service has attracted the attentions of users. In this kind of service, how to measure the similarity between texts and further filter the redundant information collected from multiple sources becomes the key solution to meet user’s desire. One text often mentions several events. The core event mostly decides the main content carried by the text. It should take the pivotal position. For this reason, this paper aims to construct a passage-level event connection graph to model the relations among the events mentioned by one text. The core event can be revealed and is further chosen to measure the similarity between two texts. As shown by experimental results, after measuring text similarity from a passage-level event representation perspective, our unsupervised measuring method acquires superior results than unsupervised methods by a large margin and even comparable results with some popular supervised neuron based methods.

Ming Liu, Zihao Zheng, Bing Qin, Yitong Liu

Using Active Learning to Improve Distantly Supervised Entity Typing in Multi-source Knowledge Bases

Entity typing in the knowledge base is an essential task for constructing a knowledge base. Previous models mainly rely on manually annotated data or distant supervision. However, human annotation is expensive and distantly supervised data suffers from label noise problem. In addition, it suffers from semantic heterogeneity problem in the multi-source knowledge base. To address these issues, we propose to use an active learning method to improve distantly supervised entity typing in the multi-source knowledge base, which aims to combine the benefits of human annotation for difficult instances with the coverage of a large distantly supervised data. However, existing active learning criteria do not consider the label noise and semantic heterogeneity problems, resulting in much of annotation effort wasted on useless instances. In this paper, we develop a novel active learning pipeline framework to tackle the most difficult instances. Specifically, we first propose a noise reduction method to re-annotate the most difficult instances in distantly supervised data. Then we propose a data augmentation method to annotate the most difficult instances in unlabeled data. We propose two novel selection criteria to find the most difficult instances in different phases, respectively. Moreover, we propose a hybrid annotation strategy to reduce human labeling effort. Experimental results show the effectiveness of our method.

Bo Xu, Xiangsan Zhao, Qingxuan Kong

TransBidiFilter: Knowledge Embedding Based on a Bidirectional Filter

A large-scale knowledge base can support a large number of practical applications, such as intelligent search and intelligent question answering. As the completeness of the information in a knowledge base may have a direct impact on the quality of downstream applications, its automatic completion has become a crucial task for many researchers and practitioners. To address this challenge, the knowledge representation learning technology which represents entities and relations as low-dimensional dense real value vectors has been developed rapidly in recent years. Although researchers continue to improve knowledge representation learning models using an increasingly complex feature engineering, we find that the most advanced models can be outdone by simply considering interactions from entities to relations and that from relations to entities without requiring huge number of parameters. In this work, we present a knowledge embedding model based on a bidirectional filter called TransBidiFilter. By learning the global shared parameter set based on the traditional gate structure, TransBidiFilter captures the restriction rules from entities to relations and that from relations to entities respectively. It achieves better automatic completion ability by modifying the standard translation-based loss function. In doing so, though with much fewer discriminate parameters, TransBidiFilter performs better than state-of-the-art baselines of semantic discriminate models on most indicators on many datasets.

Xiaobo Guo, Neng Gao, Jun Yuan, Lin Zhao, Lei Wang, Sibo Cai

Applying Model Fusion to Augment Data for Entity Recognition in Legal Documents

Named entity recognition for legal documents is a basic and crucial task, which can provide important knowledge for the related tasks in the field of wisdom justice. However, it is still difficult to augment the labeled data of named entities for legal documents automatically. To address this issue, we propose a novel data augmentation method for named entity recognition by fusing multiple models. Firstly, we train a total of ten models by conducting 5-fold cross-training on the small-scale labeled datasets based on Bilstm-CRF and Bert-Bilstm-CRF models separately. Next, we try to apply single-model fusion and multi-model fusion modes, in which, single-model fusion is to vote on the prediction results of five models of the same baseline, while multi-model fusion is to vote on the prediction results of ten models with two different baselines. Further, we take the identified entities with high correctness in the multiple experimental results as effective entities, and add them to the training set for the next training. Finally, we conduct the different experiments on two public datasets and our built judicial dataset separately, which shows the experimental results using data augmentation are close to those based on 5 times of labeled dataset, and obviously better than those on the initial small-scale labeled datasets.

Hu Zhang, Haihui Gao, Jingjing Zhou, Ru Li

Combining Knowledge Graph Embedding and Network Embedding for Detecting Similar Mobile Applications

With the popularity of mobile devices, large amounts of mobile applications (a.k.a.“app”) have been developed and published. Detecting similar apps from a large pool of apps is a fundamental and important task because it has many benefits for various purposes. There exist several works that try to combine different metadata of apps for measuring the similarity between apps. However, few of them pay attention to the roles of this service. Besides, existing methods do not distinguish the characters of contents in the metadata. Therefore, it is hard to obtain accurate semantic representations of apps and capture their fine-grained correlations. In this paper, we propose a novel framework by knowledge graph (KG) techniques and a hybrid embedding strategy to fill above gaps. For the construction of KG, we design a lightweight ontology tailored for the service of cybersecurity analysts. Benefited from a defined schema, more linkages can be shared among apps. To detect similar apps, we divide the relations in KG into structured and unstructured ones according to their related content. Then, TextRank algorithm is employed to extract important tokens from unstructured texts and transform them into structured triples. In this way, the representations of apps in our framework can be iteratively learned by combining KG embedding methods and network embedding models for improving the performance of similar apps detection. Preliminary results indicate the effectiveness of our method comparing to existing models in terms of reciprocal ranking and minimum ranking.

Weizhuo Li, Buye Zhang, Liang Xu, Meng Wang, Anyuan Luo, Yan Niu

CMeIE: Construction and Evaluation of Chinese Medical Information Extraction Dataset

In this paper, we present the Chinese Medical Information Extraction (CMeIE) dataset, consisting of 28, 008 sentences, 85, 282 triplets, 11 entities, and 44 relations derived from medical textbooks and clinical practices, constructed by several rounds of manual annotation. Additionally, we evaluate performances of the most recent state-of-the-art frameworks and pre-trained language models for the joint extraction of entities and relations task on the CMeIE dataset. Experiment results show that even these most advanced models still have a large space to improve on our dataset; currently, the best F1 score on the dataset is 58.44%. Our analysis points out several challenges and multiple potential future research directions for the task specialized in the medical domain.

Tongfeng Guan, Hongying Zan, Xiabing Zhou, Hongfei Xu, Kunli Zhang

Document-Level Event Subject Pair Recognition

In recent years, financial events in the stock market have increased dramatically. Extracting valuable information automatically from massive financial documents can provide effective support for the analysis of financial events. This paper just proposes an end-to-end document-level subject pair recognition method. It aims to recognize the subject pair, i.e. the subject and the object of an event. Given one document and the predefined event type set, this method will output all the corresponding subject pairs related to each event type. Subject pair recognition is certainly a document-level extraction task since it needs to scan the entire document to output desired subject pairs. This paper constructs a global document-level vector based on sentence-level vectors which are encoded from BERT. The global document-level vector aims to cover the information carried by the entire document. It is utilized to guide the extraction process conducted sentence by sentence. After considering global information, our method obtains superior experimental results.

Zhenyu Hu, Ming Liu, Yin Wu, Jiexin Xu, Bing Qin, JinLong Li

Knowledge Enhanced Opinion Generation from an Attitude

Mining opinion is essential for consistency and persona of a chatbot. However, mining existing opinions suffers from data sparsity. Toward a given entity, we cannot always find a proper sentence that expresses desired sentiment. In this paper, we propose to generate opinion sentences for a given attitude, i.e., an entity and sentiment polarity pair. We extract attributes of a target entity from a knowledge base and specific keywords from its description. The attributes and keywords are integrated with knowledge graph embeddings, and fed into an encoder-decoder generation framework. We also propose an auxiliary task that predicts attributes using the generated sentences, aiming to avoid common opinions. Experimental results indicate that our approach significantly outperforms baselines in automatic and human evaluation.

Zhe Ye, Ruihua Song, Hao Fu, Pingping Lin, Jian-Yun Nie, Fang Li

MTNE: A Multitext Aware Network Embedding for Predicting Drug-Drug Interaction

Identifying drug-drug interactions (DDIs) is an important research topic in drug discovery. Accurate predictions of DDIs reduce the unexpected interactions during the drug development process and play a significant role in drug safety surveillance. Many existing methods used drug properties to predict the unobserved interactions between drugs. However, semantic relations between drug features have seldom been considered and have resulted in low prediction accuracy. In addition, incomplete annotated data and sparse drug characteristics have greatly hindered the performance of DDI predictions. In this paper, we proposed a network embedding method named MTNE (MultiText Aware Network Embedding) that considers multiple external information sources. MTNE learns the dynamic representation of the drug description and the pharmacodynamics through a mutual attention mechanism. It effectively maps a high-dimension drug-drug interaction network to low dimension vector spaces by taking advantage of both the textual information of drugs and the topological information of the drug-drug interaction network. We conduct experiments based on the DrugBank dataset. The results show that MTNE improves the performance of DDI predictions with an AUC value of 76.1% and outperforms other state-of-the-art methods. Moreover, MTNE can also achieve high-quality prediction results on sparse datasets.

Fuyu Hu, Chunping Ouyang, Yongbin Liu, Yi Bu

Learning to Generate Representations for Novel Words: Mimic the OOV Situation in Training

In this work, we address the out-of-vocabulary (OOV) problem in sequence labeling using only training data of the task. A typical solution in this field is to represent an OOV word using the mean-pooled representations of its surrounding words at test time. However, such a pipeline approach often suffers from the error propagation problem, since training of the supervised model is independent of the mean-pooling operation. In this work, we propose a novel training strategy to address the error propagation problem suffered by this solution. It designs to mimic the OOV situation in the process of model training and trains the supervised model to fit the OOV word representations generated by the mean-pooling operation. Extensive experiments on different sequence labeling tasks, including part-of-speech tagging (POS), named entity recognition (NER), and chunking verified the effectiveness of our proposed method.

Xiaoyu Xing, Minlong Peng, Qi Zhang, Qin Liu, Xuanjing Huang

Reinforcement Learning for Named Entity Recognition from Noisy Data

Named entity recognition (NER) is an important task in natural language processing, and is often formalized as a sequence labeling problem. Deep learning becomes the state-of-the-art approach for NER, but the lack of high-quality labeled data remains the bottleneck for model performance. To solve the problem, we employ the distant supervision technique to obtain noisy labeled data, and propose a novel model based on reinforcement learning to revise the wrong labels and distill high-quality data for learning. Specifically, our model consists of two modules, a Tag Modifier and a Tag Predictor. The Tag Modifier corrects the wrong tags with reinforcement learning and feeds the corrected tags into the Tag Predictor. The Tag Predictor makes the sentence-level prediction and returns rewards to the Tag Modifier. Two modules are trained jointly to optimize tag correction and prediction processes. Experiment results show that our model can effectively deal with noises with a small number of correctly labeled data and thus outperform state-of-the-art baselines.

Jing Wan, Haoming Li, Lei Hou, Juaizi Li

Flexible Parameter Sharing Networks

Deep learning models have flourished in recent years, but it still remains a complex optimization problem in that the parameters of each layer are independent. Although this problem can be alleviated by the coefficient vector based parameter sharing methods, it has brought up a new problem: different size of parameters cannot be generated from a fixed-size global parameter template, which may truncate latent connections among parameters. In order to generate different size of parameters from the same parameter template, a Flexible Parameter Sharing Scheme (FPSS) is proposed. We exploited the asymmetric characteristic of convolution operations to resize and transform the template to specific parameters. As a generalization of the coefficient vector based methods, FPSS incorporates 2-dimension convolution operations rather than linear combinations to make transformations on the global template. Since all parameters are generated from the same template, FPSS can be viewed as building latent connections among each parameter through the global template. Meanwhile, each layer needs much fewer parameters, which will reduce the search space and make it easier to train. Furthermore, we presented two deep models as applications of FPSS, Hybrid CNN and Adaptive DenseNet, which sharing the global template to different modules and blocks. One can easily find the similar parts of a deep network through our method. Experimental results on several text datasets show that the proposed models are comparable or better to state of the art model.

Chengkai Piao, Jinmao Wei, Yapeng Zhu, Hengpeng Xu

An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models

Recently, pre-trained language models like BERT have shown promising performance on multiple natural language processing tasks. However, the application of these models has been limited due to their huge size. To reduce its size, a popular and efficient way is quantization. Nevertheless, most of the works focusing on BERT quantization adapted primary linear clustering as the quantization scheme, and few works try to upgrade it. That limits the performance of quantization significantly. In this paper, we implement k-means quantization and compare its performance on the fix-precision quantization of BERT with linear quantization. Through the comparison, we verify that the effect of the underlying quantization scheme upgrading is underestimated and there is a huge development potential of k-means quantization. Besides, we also compare the two quantization schemes on ALBERT models to explore the robustness differences between different pre-trained models.

Zihan Zhao, Yuncong Liu, Lu Chen, Qi Liu, Rao Ma, Kai Yu

A Survey of Sentiment Analysis Based on Machine Learning

Every day, Facebook, Twitter, Weibo and other social network sites and major e-commerce sites generate a large number of online reviews with emotions. The analysing people’s opinions from these reviews can assist a variety of decision-making processes in organisations, products, and administrations. Therefore, it is practically and theoretically important to study how to analyse online reviews with emotions. To help researchers study sentiment analysis, in this paper, we survey the machine learning based method for sentiment analysis of online reviews. These methods are main based on Support Vector Machine, Neural Networks, Naïve Bayes, Bayesian network, Maximum entropy, and some hybrid methods. In particular, we point out the main problems in the machine learning based methods for sentiment analysis and the problems to be solved in the future.

Pingping Lin, Xudong Luo

Incorporating Named Entity Information into Neural Machine Translation

Most neural machine translation (NMT) models normally take the subword-level sequence as input to address the problem of representing out-of-vocabulary words (OOVs). However, using subword units as input may omit the information carried by larger text granularity, such as named entities, which leads to a loss of important semantic information. In this paper, we propose a simple but effective method to incorporate the named entity (NE) tags information into the Transformer translation system. The encoder of our proposed model takes both the subwords and the NE tags of source sentences as inputs. Furthermore, we introduce a novel entity-aligned attention mechanism to make full use of the chunk information of NE tags. The proposed approach can be easily integrated into the existing framework of Transformer. Experimental results on two public translation tasks demonstrate that our proposed method can achieve significant translation improvements over the basic Transformer model and also outperforms the existing competitive systems.

Leiying Zhou, Wenjie Lu, Jie Zhou, Kui Meng, Gongshen Liu

Non-autoregressive Neural Machine Translation with Distortion Model

Non-autoregressive translation (NAT) has attracted attention recently due to its high efficiency during inference. Unfortunately, it performs significantly worse than the autoregressive translation (AT) model. We observe that the gap between NAT and AT can be remarkably narrowed if we provide the inputs of the decoder in the same order as the target sentence. However, existing NAT models still initialize the decoding process by copying source inputs from left to right, and lack an explicit reordering mechanism for decoder inputs. To address this problem, we propose a novel distortion model to enhance the decoder inputs so as to further improve NAT models. The distortion model, incorporated into the NAT model, reorders the decoder inputs to close the word order of the decoder outputs, which can reduce the search space of the non-autoregressive decoder. We verify our approach empirically through a series of experiments on three similar language pairs (En $$\Rightarrow$$ De, En $$\Rightarrow$$ Ro, and De $$\Rightarrow$$ En) and two dissimilar language pairs (Zh $$\Rightarrow$$ En and En $$\Rightarrow$$ Ja). Quantitative and qualitative analyses demonstrate the effectiveness and universality of our proposed approach.

Long Zhou, Jiajun Zhang, Yang Zhao, Chengqing Zong

Incorporating Phrase-Level Agreement into Neural Machine Translation

Phrase information has been successfully integrated into current state-of-the-art neural machine translation (NMT) models. However, the natural property of the source and target phrase alignment has not been explored. In this paper, we propose a novel phrase-level agreement method to deal with this problem. First, we explore n-gram models over minimal translation units (MTUs) to explicitly capture aligned bilingual phrases from the parallel corpora. Then, we propose a phrase-level agreement loss that directly reduces the difference between the representations of the source-side and target-side phrase. Finally, we integrate the phrase-level agreement loss into the NMT models, to improve the translation performance. Empirical results on the NIST Chinese-to-English and the WMT English-to-German translation tasks demonstrate that the proposed phrase-level agreement method achieves significant improvements over state-of-the-art baselines, demonstrating the effectiveness and necessity of exploiting phrase-level agreement for NMT.

Mingming Yang, Xing Wang, Min Zhang, Tiejun Zhao

Improving Unsupervised Neural Machine Translation with Dependency Relationships

Nowadays, neural networks have been widely used in the domain of machine translation (MT) and achieved good results. Neural machine translation (NMT) models need large bilingual parallel corpora to perform training. However, in many languages or domains, such corpora are scarce. Therefore, the technology of unsupervised neural machine translation (UNMT) which does not need bilingual parallel corpora attracted wide interest. State-of-the-art UNMT models use Transformer for training and cannot learn the syntactic knowledge from the corpora. In this paper, we propose a method to improve UNMT by using dependency relationships extracted from dependency parsing. The extracted dependency relationships are concatenated with the original training data after Byte Pair Encoding (BPE) to obtain new sentence representations for UNMT training. Models that combine dependency relationships allow for a better understanding of the underlying syntactic structure in sentences and thus affect the quality of UNMT. We leverage linearized parsing trees of the training sentences in order to incorporate syntax into the Transformer architecture without modifying it. Compared with state-of-the-art UNMT method, our method increased the BLEU scores by 5.11 and 9.41 respectively on WMT 2019 English-French and German-English monolingual news corpora with 5 million sentence pairs.

Jia Xu, Na Ye, GuiPing Zhang

Incorporating Knowledge and Content Information to Boost News Recommendation

News recommendation, which aims to help users find the news they are interested in, is essential for online news platforms to alleviate the information overload problem. News is full of textual information with some knowledge entities, so recent studies try to leverage knowledge graphs (KGs) as side information to better model user preferences over news. However, most knowledge-enhanced methods assume that users are interested in the knowledge entities that occurred in the news. In real scenarios, users may like the news because of the news content rather than the knowledge entities. To take both knowledge and content factors into consideration, we propose a news recommendation method, namely knowledge and content aware network for news recommendation (KCNR). KCNR represents user and news in terms of knowledge and content, then it predicts the weight of user preferences on knowledge and content via a user preferences prediction mechanism. Besides, based on the weight of user preferences on knowledge, it extends user preferences along with entities in knowledge graphs. Experiments on two real-world datasets show that our approach achieves significant improvements over several state-of-the-art baselines in news recommendation.

Zhen Wang, Weizhi Ma, Min Zhang, Weipeng Chen, Jingfang Xu, Yiqun Liu, Shaoping Ma

Multi-domain Transfer Learning for Text Classification

Leveraging data from multiple related domains to enhance the model generalization performance is critical for transfer learning in text classification. However, most existing approaches try to separate the features into shared and private spaces regardless of correlations between domains, resulting in the inadequate features sharing among certain most related domains. In this paper, we propose a generic dual-channels multi-task learning framework for multi-domain text classification, which can capture global-shared, local-shared, and private features simultaneously. Our novel framework incorporates Adversarial network and Mixture of experts into a neural network for multi-domain text classification, which is very useful for sharing more features among domains. The extensive experiments on the real-world text classification data-sets across 16 domains demonstrate our proposed approach achieves better results than five state-of-the-art techniques.

Xuefeng Su, Ru Li, Xiaoli Li

A Cross-Layer Connection Based Approach for Cross-Lingual Open Question Answering

Cross-lingual open domain question answering (Open-QA) has become an increasingly important topic. When training a monolingual model, it is often necessary to use a large number of labeled data for supervised training, which makes it difficult to real applications, especially for low-resource languages. Recently, thanks to multilingual BERT model, a new task, so called zero-shot cross-lingual QA has emerged in this field, i.e., training a model for a language rich in resources and directly testing in other languages. The existing problems in the current research include two main points. The one is in document retrieval stage, directly working multilingual pretraining model for similarity calculation will result in insufficient retrieval accuracy. The other is in the stage of answer extraction, the answers will involve different levels of abstraction related to retrieved documents, which needs deep exploration. This paper puts forward a cross-layer connection based approach for cross-lingual Open-QA. It consists of Match-Retrieval module and Connection-Extraction module. The matching network in the retrieval module makes heuristic adjustment and expansion on the learned features to improve the retrieval quality. In the answer extraction module, the reuse of deep semantic features is realized at the network structure level through cross-layer connection. Experimental results on a public cross-lingual Open-QA dataset show the superiority of our proposed approach over the state-of-the-art methods.

Lin Li, Miao Kong, Dong Li, Dong Zhou

Learning to Consider Relevance and Redundancy Dynamically for Abstractive Multi-document Summarization

As one of the most essential tasks for information aggregation, multi-document summarization is faced with information redundancy of source document clusters. Recent works have attempted to avoid redundancy while generating summaries. Most state-of-the-art multi-document summarization systems are either extractive or abstractive with an external extractive model. In this paper, we propose an end-to-end abstractive model based on Transformer to generate summaries, considering relevance and redundancy dynamically and jointly. Specifically, we employ sentence masks and design a sentence-level transformer layer for learning sentence representations in a hierarchical manner. Then we use a dynamic Max Marginal Relevance (MMR) model to discern summary-worthy sentences and modify the encoder-decoder attention. We also utilize the pointer mechanism, taking the mean attention of all transformer heads as the probability to copy words from the source text. Experimental results demonstrate that our proposed model outperforms several strong baselines. We also conduct ablation studies to verify the effectiveness of our key mechanisms.

Yiding Liu, Xiaoning Fan, Jie Zhou, Chenglong He, Gongshen Liu

A Submodular Optimization-Based VAE-Transformer Framework for Paraphrase Generation

Paraphrase plays an important role in various Natural Language Processing (NLP) problems, such as question answering, information retrieval, conversation systems, etc. Previous approaches mainly concentrate on producing paraphrases with similar semantics, namely fidelity, while recent ones begin to focus on the diversity of generated paraphrases. However, most of the existing models fail to explicitly emphasize on both metrics above. To fill this gap, we propose a submodular optimization-based VAE-transformer model to generate more consistent and diverse phrases. Through extensive experiments on datasets like Quora and Twitter, we demonstrate that our proposed model outperforms state-of-the-art baselines on BLEU, METEOR, TERp and n-distinct grams. Furthermore, through ablation study, our results suggest that incorporating VAE and submodularity functions could effectively promote fidelity and diversity respectively.

Xiaoning Fan, Danyang Liu, Xuejian Wang, Yiding Liu, Gongshen Liu, Bo Su

MixLab: An Informative Semi-supervised Method for Multi-label Classification

Multi-label classification is an intensively studied topic in data analysis. In spite of the considerable improvements, recent deep learning-based methods overlook the existence of unlabeled data, which consumes too much time on instance annotation. To circumvent this difficulty, semi-supervised multi-label classification aims to exploit the readily-available unlabeled data to help build multi-label classification model. To make full use of labeled and unlabeled data, this paper propose a novel approach named MixLab, encourages the model classifications to be accurate with label-correlated information and consistency regularization. It utilizes label correlations to enhance predicted labels for augmented unlabeled instances as targets and regularizes predictions to be consistent with this targets. We empirically validate the effectiveness of our framework by extensive experiments on four real datasets of textual content.

Ye Qiu, Xiaolong Gong, Zhiyi Ma, Xi Chen

A Noise Adaptive Model for Distantly Supervised Relation Extraction

Relation extraction is an important task in natural language processing. To obtain a large amount of annotated data, distant supervision is introduced by using large-scale knowledge graphs as external resources. The disadvantage is that distant supervision brings a new issue: noise label, which means the labels obtained by distant supervision may be unreliable and the performance of the models decreases significantly on these datasets. To address the problem, we propose a new framework where noise labels are modeled directly by context-dependent rectification strategy. Intuitively, we adjust the labels that might otherwise be wrong in the right direction. In addition, considering the lack of effective guidance in training with noise, we propose a new curriculum learning-based adaptive mechanism. It learns simple relation extraction task first, then takes the reliability of labels into consideration, so that the model can learn more from the data. The experimental results on a widely used dataset show a significant improvement in our approach and outperform current state-of-the-art.

Xu Huang, Bowen Zhang, Yunming Ye, Xiaojun Chen, Xutao Li

CLTS: A New Chinese Long Text Summarization Dataset

We present CLTS, a Chinese long text summarization dataset, in order to solve the problem that large-scale and high-quality datasets are scarce in automatic summarization, which is a limitation for further research. To the best of our knowledge, it is the first long text summarization dataset in Chinese. Extracted from the Chinese news website ThePaper.cn ( https://www.thepaper.cn/ ), the corpus contains more than 180,000 Chinese long articles and corresponding summaries written by professional editors and authors, which is available online (CLTS dataset is available to download online at https://github.com/lxj5957/CLTS-Dataset ). We train and evaluate several existing methods on CLTS to verify the utility and challenges of the dataset, and the results show that the corpus proposed in this paper is useful to set some baselines to contribute to the further research on automatic text summarization.

Xiaojun Liu, Chuang Zhang, Xiaojun Chen, Yanan Cao, Jinpeng Li

Lightweight Multiple Perspective Fusion with Information Enriching for BERT-Based Answer Selection

Answer selection (AS), as one of the hottest topics in the field of natural language processing, has developed rapidly with outstanding performances reported, especially with the emergency of pretrained model (e.g., BERT). However, the current BERT based AS methods applied BERT only by fine-tuning or stacking other modules such as CNN and RNN, but ignored to exploit the discrimination embedded inside the BERT. In this paper, we proposed a novel method LMPF-IE, i.e., Lightweight Multiple Perspective Fusion with Information Enriching. The method can mine and fuse the multi-layer discrimination inside different layers of BERT and can use Question Category and Name Entity Recognition to enrich the information which can help BERT better understand the relationship between questions and answers. We test the proposed BERT layer-wised attention model in 5 benchmark datasets of answer selection task. The experimental results clearly verify better performances than the baseline models can be achieved by our method.

Yu Gu, Meng Yang, Peiqin Lin

Stance Detection with Stance-Wise Convolution Network

Stance detection aims at identifying the stance (favor, against or neutral) of a text towards a specific target of opinion. Recently, there is a growing interest in using neural models for stance detection, but there are still some challenges to be solved. Firstly, it is difficult to associate text with target because targets are not always discussed explicitly in texts. However, existing methods always roughly model the representations of text and target on task-specific and limited corpus without considering the indispensable external information. Secondly, different from categories in normal classification task, we find that stances in stance detection task are not independent to each other. We study this observation and find it would be more effective to learn each stance individually. But all previous approaches ignore the correlation. To address these two challenges effectively, we introduce a Stance-wise Convolution Network (SCN) including two novel modules. Specifically, we first use a Text-Target Encoder module to subtly incorporate the pre-trained BERT into our model to learn more reasonable text-target representations. Then we propose a Stance-wise Convolution module to better learn stances by absorbing the correlation between stances. We evaluate our method on real-world dataset and the experimental results show that our proposed method achieves the state-of-the-art performance.

Dechuan Yang, Qiyu Wu, Wei Chen, Tengjiao Wang, Zhen Qiu, Di Liu, Yingbao Cui

Emotion-Cause Joint Detection: A Unified Network with Dual Interaction for Emotion Cause Analysis

Emotion cause analysis has attracted much attention in the field of natural language processing. The existing works include emotion cause extraction (ECE) and emotion-cause pair extraction (ECPE), but the former requires emotion annotations, thereby restricting its application scenarios, and the latter consists of two steps in sequence, thereby making the second step depend on the results of first step. To tackle the limits, we implement emotion detection and cause detection as two sub-tasks in a unified framework. Based on this framework, we propose an emotion-cause joint detection (ECJD) method, which enhances the interaction of sub-tasks in a synchronous and joint way to improve performance. Specifically, we formalize ECE as a four-class classification problem, in which clause representation is evaluated from the dual perspective of both emotion and cause. We implement cause detection with consideration of relative position from emotion detection as prior knowledge so as to improve detection performance. The experimental evaluation based on an emotion cause corpus benchmark shows that our method achieves the best performance of cause detection without using emotion annotations and overcomes the limits of ECE and ECPE, and further demonstrates the effectiveness of our model.

Guimin Hu, Guangming Lu, Yi Zhao

Incorporating Temporal Cues and AC-GCN to Improve Temporal Relation Classification

Temporal relation classification, an important branch of relation extraction, aims to identify the time sequence among events. Currently, Shortest Dependency Path (SDP) is widely used in various kinds of neural network models to capture the crucial information from sentences. However, while eliminating irrelevant words in event sentences, SDP will miss some useful information, e.g., time expressions. To address the above issue, we propose a neural network method incorporating the temporal cues to AC-GCN (Augmented Contextualized Graph Convolutional Network) to classify temporal relations. Firstly, we introduce the semantic role labeling and heuristic rules to extract the time expressions corresponding to event triggers and other words in SDPs, respectively. Then, the SDP with time expression (i.e., T-SDP) is encoded by a Bi-LSTM with the parameter sharing mechanism and fed into GCN to classify temporal relations. The experimental results on TimeBank-Dense show that our proposed model outperforms all baselines significantly.

Xinyu Zhou, Peifeng Li, Qiaoming Zhu, Fang Kong

Event Detection with Document Structure and Graph Modelling

Event detection is the basic task of event extraction. Previous studies usually used independent sentences as basic event detection objects. They cannot effectively identify event triggers which depend on document information. Besides, there are correlations between the sentences and words in the document. Therefore, it is necessary to use document information for event detection. In this study, we propose a graph model for event detection based on document structure. It is used to connect sentences and words in a document. Specifically, we finetune BERT model and use Bi-LSTM to learn the sentences and their context features, and then use GCN to model the document relation graph. The document relation graph is based on the parts of speech of all words in different sentences, which contributes to establishing the triggers-triggers relation and triggers-arguments relation. The experimental results on LitBank show that our proposed model outperforms all baselines significantly and verifies the validity of document information.

Peipei Zhu, Zhongqing Wang, Hongling Wang, Shoushan Li, Guodong Zhou

AFPun-GAN: Ambiguity-Fluency Generative Adversarial Network for Pun Generation

Automatic pun generation is an interesting and challenging text generation task. In this study, we focus on the task of homographic pun generation by given a pair of word senses. Current efforts depend on templates or laboriously annotated pun source to guide the supervised learning, which is lack of quality and diversity of generated puns. To address this, we present a new text generation model, called Ambiguity-Fluency Pun Generative Adversarial Network (AFPun-GAN) for pun genration. This model is composed of a pun generator to produce pun sentences by a hierarchical on-lstm attention model, and a pun discriminator to distinguish the generated pun sentences and real sentences with word senses of target pun word. The proposed model assigns a hierarchical low reward to train the pun generator via reinforcement learning, encouraging the pun generator to produce the ambiguous and fluent pun sentences that can better support two word senses. The experimental results on pun generation task demonstrate that our proposed AFPun-GAN model is able to generate pun sentences that are more ambiguous and fluent in both automatic and human evaluation.

Yufeng Diao, Liang Yang, Xiaochao Fan, Yonghe Chu, Di Wu, Shaowu Zhang, Hongfei Lin

Author Name Disambiguation Based on Rule and Graph Model

Author name disambiguation has long been viewed as a challenging problem in scientific literature management, and with the substantial growth of the scientific literature, the solution to this problem has become increasingly difficult and urgency. In this paper, we conduct research on the author name disambiguation problem in large-scale academic papers. In our method, we combine the paper feature information and the relation information between the papers for disambiguation. Based on the Aminer’s disambiguation framework, we present a novel method to constructing the paper relation graph based on atomic cluster and propose an efficient post processing algorithm, aiming to improve the disambiguation performance by rule-based clustering, this algorithm utilizes similarity features based on metadata information and implement two types of disambiguation rules. We carefully evaluate the proposed disambiguation method on real-world large data and experimental result shows that our method achieves clearly better performance than the state-of-the-art methods.

Lizhi Zhang, Zhijie Ban

Opinion Transmission Network for Jointly Improving Aspect-Oriented Opinion Words Extraction and Sentiment Classification

Aspect-level sentiment classification (ALSC) and aspect oriented opinion words extraction (AOWE) are two highly relevant aspect-based sentiment analysis (ABSA) subtasks. They respectively aim to detect the sentiment polarity and extract the corresponding opinion words toward a given aspect in a sentence. Previous works separate them and focus on one of them by training neural models on small-scale labeled data, while neglecting the connections between them. In this paper, we propose a novel joint model, Opinion Transmission Network (OTN), to exploit the potential bridge between ALSC and AOWE to achieve the goal of facilitating them simultaneously. Specifically, we design two tailor-made opinion transmission mechanisms to control opinion clues flow bidirectionally, respectively from ALSC to AOWE and AOWE to ALSC. Experiment results on two benchmark datasets show that our joint model outperforms strong baselines on the two tasks. Further analysis also validates the effectiveness of opinion transmission mechanisms.

Chengcan Ying, Zhen Wu, Xinyu Dai, Shujian Huang, Jiajun Chen

Label-Wise Document Pre-training for Multi-label Text Classification

A major challenge of multi-label text classification (MLTC) is to stimulatingly exploit possible label differences and label correlations. In this paper, we tackle this challenge by developing Label-Wise Pre-Training (LW-PT) method to get a document representation with label-aware information. The basic idea is that, a multi-label document can be represented as a combination of multiple label-wise representations, and that, correlated labels always cooccur in the same or similar documents. LW-PT implements this idea by constructing label-wise document classification tasks and trains label-wise document encoders. Finally, the pre-trained label-wise encoder is fine-tuned with the downstream MLTC task. Extensive experimental results validate that the proposed method has significant advantages over the previous state-of-the-art models and is able to discover reasonable label relationship. The code is released to facilitate other researchers.( https://github.com/laddie132/LW-PT ).

Han Liu, Caixia Yuan, Xiaojie Wang

Hierarchical Sequence Labeling Model for Aspect Sentiment Triplet Extraction

Aspect sentiment triplet extraction is an emerging task in aspect-based sentiment analysis, which aims at simultaneously identifying the aspect, the opinion expression, and the sentiment from a given review sentence. Existing studies divide this task into many sub-tasks and process them in a pipeline manner, which ignores the relevance between different sub-tasks and leads to error accumulation. In this paper, we propose a hierarchical sequence labeling model (HSLM) to recognize the sentiment triplets in an end-to-end manner. Concretely, HSLM consists of an aspect-level sequence labeling module, an opinion-level sequence labeling module, and a sentiment-level sequence labeling module. To learn the interactions between the above three modules, we further design three information fusion mechanisms, including aspect feature fusion mechanism, opinion feature fusion mechanism, and global feature fusion mechanism to refine high-level semantic information. To verify the effectiveness of our model, we conduct comprehensive experiments on four benchmark datasets. The experimental results demonstrate that our model achieves state-of-the-art performances.

Peng Chen, Shaowei Chen, Jie Liu

Knowledge-Aware Method for Confusing Charge Prediction

Automatic charge prediction task aims to determine the final charges based on fact descriptions of criminal cases, which is a vital application of legal assistant systems. Conventional works usually depend on fact descriptions to predict charges while ignoring the legal schematic knowledge, which makes it difficult to distinguish confusing charges. In this paper, we propose a knowledge-attentive neural network model, which introduces legal schematic knowledge about charges and exploit the knowledge hierarchical representation as the discriminative features to differentiate confusing charges. Our model takes the textual fact description as the input and learns fact representation through a graph convolutional network. A legal schematic knowledge transformer is utilized to generate crucial knowledge representations oriented to the legal schematic knowledge at both the schema and charge levels. We apply a knowledge matching network for effectively incorporating charge information into the fact to learn knowledge-aware fact representation. Finally, we use the knowledge-aware fact representation for charge prediction. We create two real-world datasets and experimental results show that our proposed model can outperform other state-of-the-art baselines on accuracy and F1 score, especially on dealing with confusing charges.

Xiya Cheng, Sheng Bi, Guilin Qi, Yongzhen Wang

Social Media and Network

Frontmatter

Aggressive language detection (ALD), detecting the abusive and offensive language in texts, is one of the crucial applications in NLP community. Most existing works treat ALD as regular classification with neural models, while ignoring the inherent conflicts of social media text that they are quite unnormalized and irregular. In this work, we target improving the ALD by jointly performing text normalization (TN), via an adversarial multi-task learning framework. The private encoders for ALD and TN focus on the task-specific features retrieving, respectively, and the shared encoder learns the underlying common features over two tasks. During adversarial training, a task discriminator distinguishes the separate learning of ALD or TN. Experimental results on four ALD datasets show that our model outperforms all baselines under differing settings by large margins, demonstrating the necessity of joint learning the TN with ALD. Further analysis is conducted for a better understanding of our method.

Shengqiong Wu, Hao Fei, Donghong Ji

A Cross-Modal Classification Dataset on Social Network

Classifying tweets into general categories, such as food, music and games, is an essential work for social network platforms, which is the basis for information recommendation, user portraits and content construction. As far as we know, nearly all existing general tweet classification datasets only have textual content. However, textual content in tweets may be short, meaningless, and even none, which would harm the classification performance. In fact, images and videos are widespread in tweets, and they can intuitively provide extra useful information. To fill this gap, we construct a novel Cross-Modal Classification Dataset constructed from Weibo called CMCD. Specifically, we collect tweets with three modalities of text, image and video from 18 general categories, and then filter tweets that can easily be classified by only textual contents. Finally, the whole dataset consists of 85,860 tweets, and all of them have been manually labelled. Among them, 64.4% of tweets contain images, and 16.2% of tweets contain videos. We implement classical baselines for tweets classification and report human performance. Empirical results show that the classification over CMCD is challenging enough and requires further efforts.

Yong Hu, Heyan Huang, Anfan Chen, Xian-Ling Mao

Sentiment Analysis on Chinese Weibo Regarding COVID-19

The outbreak of COVID-19 has had a great impact on people’s general lifestyle over the world. People express their views about COVID-19 on social media more frequently when cities are under lockdown. In this work, we are motivated to analyze the sentiments and their evolution of people in the face of this public health crisis based on Chinese Weibo, a largest social media platform in China. First, we obtained the top 50 hot searched hashtags from January 10, 2020 to May 31, 2020, and collected 1,681,265 Weibo posts associated to the hashtags regarding COVID-19. We then constructed a COVID-19 sentiment analysis dataset by annotating the related Weibo posts with 7 categories, e.g., fear, anger, disgust, sadness, gratitude, surprise, and optimism, in combination of the other two datasets. The well annotated data consists of 21,173 pieces of texts. Second, we employed three methods, i.e., LSTM, BERT, and ERNIE, to predict the sentiments of users on Weibo. Comprehensive experimental results show that ERNIE classifier has the highest accuracy and reaches 0.8837. We then analyzed the sentiment and its evolution of Weibo users to see how people respond to COVID-19 throughout the outbreak. Based on the in-depth analysis, we found that people generally felt negative (mainly fear) at early stage of the outbreak. As the pandemic situation gradually improved, people’s positive sentiment began to increase. The number of cases of COVID-19, news and public events have a great influence on people’s sentiments. Finally, we developed a real-time visualization system to display the trend of the user’s sentiment and hot searched hashtags based on Weibo during the pandemic.

Xiaoting Lyu, Zhe Chen, Di Wu, Wei Wang

Pairwise Causality Structure: Towards Nested Causality Mining on Financial Statements

Causality mining, which aims to find cause-effect relations in text, is an important yet challenging problem in natural language understanding. The extraction of causal relations is beneficial to practitioners in document-intensive industries. For instance, it enables investors and regulators in financial industries to quickly understand the correlation between events in financial statements. However, this problem is difficult since the expression of causality is diverse, and more importantly, nested. Specifically, causality often has a nested structure, where a pair of cause-effect can be the cause of another higher-level causality. Recent works deal with this problem by a bottom-up relation extraction solution, but it performs worse for relations on higher levels. In this study, we find that the nested causality structure can be transformed into a graph of pairwise causality between sentence segments. Then we propose a two-step solution: first, a segmenter disassembles a sentence into segments by detecting causality connectives; second, a relation classifier predicts whether a pair of segments has cause-effect relation or not. Two modules above are trained jointly in our proposed Causality Detection Network (CDNet). On a large dataset we collect, the precision of our model reaches 92.11% and the recall reaches 93.07% for this task. Compared with the existing state-of-the-art solution, the precision of our model is improved by 3.28% and 3.03% for recall. We also observe that the percentage of exactly correct sentences from prediction is 74.26% without post-processing, indicating the hardness of our problem and space for improvement.

Dian Chen, Yixuan Cao, Ping Luo

Word Graph Network: Understanding Obscure Sentences on Social Media for Violation Comment Detection

Violation comment detection aims to recognize the texts that may violate the governing laws/regulations and cause adverse effect on social media. To avoid being intercepted, violation comments always informal and incomplete in an obscure expression poses challenge to violation detection algorithms. To tackle the problem, we introduce a new language representation model namely Word Graph Network (WGN). By introducing word graph, WGN integrates more syntactic structure information thus is qualified with stronger association and completion capability on detecting informal and incomplete violation comments in social networking scenarios. Our experimental results show that WGN outperforms than the existing state-of-the-art models and even performs best in simulation of real online environment.

Dan Ma, Haidong Liu, Dawei Song

Data Augmentation with Reinforcement Learning for Document-Level Event Coreference Resolution

Most previous models on event coreference resolution largely depend on hand-crafted features and annotated corpora. To address above issues, this paper introduces a neural model to resolve document-level event coreference in raw texts by both employing various neural components to better represent event semantics and integrating data augmentation with reinforcement learning to largely expand the dataset and effectively improve its quality. Experimentation on three KBP datasets shows that our proposed neural model significantly outperforms several strong state-of-the-art baselines.

Jie Fang, Peifeng Li

An End-to-End Multi-task Learning Network with Scope Controller for Emotion-Cause Pair Extraction

Emotion-cause pair extraction (ECPE) aims to extract all potential pairs of emotions and corresponding causes in a document. It has an advantage over traditional emotion cause extraction (ECE) that it does not require annotating emotions manually. Existing methods for ECPE task are based on two-step framework. However, they ignore the fact that the emotion-cause pair is regarded as a whole unit and there are cascading errors in two-step framework. In this paper, we propose an end-to-end hierarchical neural network model, which directly extracts emotion-cause pairs and enhances mutual interaction between emotions and causes via multi-task learning. In addition, we introduce a scope controller to constrain the emotion-cause pair predictions in a high probability area, according to the position correlation between emotions and causes. The experimental results demonstrate that our method achieves the state-of-the-art performance and improves F-measure by 2.24%.

Rui Fan, Yufan Wang, Tingting He

Clue Extraction for Fine-Grained Emotion Analysis

The emotion analysis in texts is a popular task in natural language processing. Existing research mainly recognizes the types of emotions by encoding sentences. However, in many cases, pure explicit encoding is easy to lose the fine-grained emotional clues hidden in the sentences due to the complexity and subtlety of emotions. We argue that narratives are inextricably emotionally structured and narrative analysis is used as a method to study and examine emotional clues. In this paper, we propose a new unified task: clue extraction for fine-grained emotion analysis (CLUE), which attempts to extract fine-grained emotional clue triples (Why, How, What): Why emotions occur, How people express emotions and What emotions they trigger. We propose a span-based method to address this CLUE task, which directly takes all possible spans as input. The advantage of span is to ensure that each clue is not segmented and semantically complete. The experimental results on a benchmark emotional cause corpus prove the feasibility of the CLUE task as well as the effectiveness of our method.

Hongliang Bi, Pengyuan Liu

Multi-domain Sentiment Classification on Self-constructed Indonesian Dataset

Domain-dependence limits the application of a well-trained sentiment classifier based on one domain data in other different domains. To solve this problem, multi-domain sentiment classification has received great attention recently. It aims to construct a domain-specific sentiment classifier at once from datasets of multi-domains. However, research on multi-domain sentiment classification mainly focuses on high-resource languages, and there is no research on Indonesian multi-domain sentiment classification. To fill the gap, we constructed an Indonesian multi-domain dataset, including 489,000 reviews from four domains with three sentiment polarities (positive, neutral, and negative), and proposed an integrated model for Indonesian multi-domain sentiment classification. This model is consisted of lemmatization layer, domain-general module, domain-specific module, and domain classifier module. Based on the Indonesian multi-domain dataset, the model was evaluated and compared with baseline methods commonly used in the sentiment analysis of high-resource languages. The effectiveness of some essential components in the model was also verified. The model achieved an average weighted F1 over four domains with 87.24%, outperforming the baseline methods and demonstrating its effectiveness.

Nankai Lin, Boyu Chen, Sihui Fu, Xiaotian Lin, Shengyi Jiang

Extracting the Collaboration of Entity and Attribute: Gated Interactive Networks for Aspect Sentiment Analysis

Aspect-based sentiment analysis (ABSA) is composed of aspect term sentiment analysis (ATSA) and aspect category sentiment analysis (ACSA). In the task of ACSA, some existing methods simply bound the aspect category (entity and attribute) as an integrated whole or adopt a randomly initialized embedding to represent the aspect category, which introduces a defective representation of aspect and leads to the ignorance of independent contextual sentiment of entity and attribute. Some other methods only consider the entity and disregard the attribute in predicting the sentiment polarity of aspect category, which leads to the ignorance of the collaboration between the entity and attribute. To this end, we propose a Gated Interactive Network (GIN) for aspect category sentiment analysis in this paper. To be specific, for each context and the corresponding aspect, we adopt two attention-based networks to learn the contextual sentiment for the entity and attribute independently and interactively. Further, based on the interactive attentions learned from entities and attributes, the coordinative gate units are exploited to reconcile and purify the sentiment features for the aspect sentiment prediction. Experimental results on two benchmark datasets demonstrate that our proposed model achieves state-of-the-art performance in the task of ACSA.

Rongdi Yin, Hang Su, Bin Liang, Jiachen Du, Ruifeng Xu

Sentence Constituent-Aware Aspect-Category Sentiment Analysis with Graph Attention Networks

Aspect category sentiment analysis (ACSA) aims to predict the sentiment polarities of the aspect categories discussed in sentences. Since a sentence usually discusses one or more aspect categories and expresses different sentiments toward them, various attention-based methods have been developed to allocate the appropriate sentiment words for the given aspect category and obtain promising results. However, most of these methods directly use the given aspect category to find the aspect category-related sentiment words, which may cause mismatching between the sentiment words and the aspect categories when an unrelated sentiment word is semantically meaningful for the given aspect category. To mitigate this problem, we propose a Sentence Constituent-Aware Network (SCAN) for aspect-category sentiment analysis. SCAN contains two graph attention modules and an interactive loss function. The graph attention modules generate representations of the nodes in sentence constituency parse trees for the aspect category detection (ACD) task and the ACSA task, respectively. ACD aims to detect aspect categories discussed in sentences and is a auxiliary task. For a given aspect category, the interactive loss function helps the ACD task to find the nodes which can predict the aspect category but can’t predict other aspect categories. The sentiment words in the nodes then are used to predict the sentiment polarity of the aspect category by the ACSA task. The experimental results on five public datasets demonstrate the effectiveness of SCAN (Data and code can be found at https://github.com/l294265421/SCAN ).

Yuncong Li, Cunxiang Yin, Sheng-hua Zhong

SciNER: A Novel Scientific Named Entity Recognizing Framework

There is an increasing number of scientific publications produced by the booming science community. It is very important for automatic scientific analysis to extract entities such as tasks and methods from unstructured scientific publications. At present, the span-based methods are the best way for scientific NER tasks, which usually generate a few entities by searching hundreds of candidate spans in a sentence. However, these existing methods have a few drawbacks. Firstly, the span extractor obtains more negative samples than positive samples, and thus it makes the input extremely imbalance. Secondly, the pruner has no predictive ability at the beginning of the joint training process in an end-to-end model. To tackle the above problem, in this paper, we propose a novel scientific named entity recognizing pipeline framework, called SciNER. Specifically, in the first stage, there is a pruner to filter out most illegal entities. The span extractor in the pruner performs under-sampling to balance the positive and negative samples. In the second stage, the entity recognizer is trained by the pruned spans. Extensive experiments demonstrate that SciNER outperforms state-of-the-art baselines on several datasets in both computer science and biomedical domains (Code is available at: https://github.com/ethan-yt/sciner ).

Tan Yan, Heyan Huang, Xian-Ling Mao

Learning Multilingual Topics with Neural Variational Inference

Multilingual topic models are one of the most popular methods for revealing common latent semantics of cross-lingual documents. However, traditional approximation methods adopted by existing probabilistic models sometimes do not effectively lead to high-quality multilingual topics. Besides, as the generative processes of these models become more expressive, the difficulty of performing fast and accurate inference methods over parameters grows. In this paper, to address these issues, we propose a new multilingual topic model that permits training by backpropagation in the framework of neural variational inference. We propose to infer topic distributions via a shared inference network to capture common word semantics and an incorporating module to incorporate the topic-word distribution from another language through a novel transformation method. Thus, the networks of cross-lingual corpora are coupled together. With jointly training the coupled networks, our model can infer more interpretable multilingual topics and discriminative topic distributions. Experimental results on real-world datasets show the superiority of our model both in terms of topic quality and text classification performance.

Xiaobao Wu, Chunping Li, Yan Zhu, Yishu Miao