Skip to main content
Top

2020 | Book

Neural Information Processing

27th International Conference, ICONIP 2020, Bangkok, Thailand, November 18–22, 2020, Proceedings, Part IV

Editors: Haiqin Yang, Dr. Kitsuchart Pasupa, Andrew Chi-Sing Leung, Prof. James T. Kwok, Dr. Jonathan H. Chan, Prof. Irwin King

Publisher: Springer International Publishing

Book Series : Communications in Computer and Information Science

insite
SEARCH

About this book

The two-volume set CCIS 1332 and 1333 constitutes thoroughly refereed contributions presented at the 27th International Conference on Neural Information Processing, ICONIP 2020, held in Bangkok, Thailand, in November 2020.*

For ICONIP 2020 a total of 378 papers was carefully reviewed and selected for publication out of 618 submissions. The 191 papers included in this volume set were organized in topical sections as follows: data mining; healthcare analytics-improving healthcare outcomes using big data analytics; human activity recognition; image processing and computer vision; natural language processing; recommender systems; the 13th international workshop on artificial intelligence and cybersecurity; computational intelligence; machine learning; neural network models; robotics and control; and time series analysis.

* The conference was held virtually due to the COVID-19 pandemic.

Table of Contents

Frontmatter

Data Mining

Frontmatter
A Hybrid Representation of Word Images for Keyword Spotting

In the task of keyword spotting based on query-by-example, how to represent word images is a very important issue. Meanwhile, the problem of out-of-vocabulary (OOV) is frequently occurred in keyword spotting. Therefore, the problem of OOV keyword spotting is a challenging task. In this paper, a hybrid representation approach of word images has been presented to accomplish the aim of OOV keyword spotting. To be specific, a sequence to sequence model has been utilized to generate representation vectors of word images. Meanwhile, a CNN model with VGG16 architecture has been used to obtain another type of representation vectors. After that, a score fusion scheme is adopted to combine the above two kinds of representation vectors. Experimental results demonstrate that the proposed hybrid representation approach of word images is especially suited for solving the problem of OOV keyword spotting.

Hongxi Wei, Jing Zhang, Kexin Liu
A Simple and Novel Method to Predict the Hospital Energy Use Based on Machine Learning: A Case Study in Norway

Hospitals are one of the most energy-consuming commercial buildings in many countries as a highly complex organization because of a continuous energy utilization and great variability of usage characteristic. With the development of machine learning techniques, it can offer opportunities for predicting the energy consumptions in hospital. With a case hospital building in Norway, through analyzing the characteristic of this building, this paper focused on the prediction of energy consumption through machine learning methods (ML), based on the historical weather data and monitored energy use data within the last four consecutive years. A deep framework of machine learning was proposed in six steps: including data collecting, preprocessing, splitting, fitting, optimizing and estimating. It results that, in Norwegian hospital, Electricity was the most highly demand in main building by consuming 55% of total energy use, higher than district heating and cooling. By means of optimizing the hyper-parameters, this paper selected the specific parameters of model to predict the electricity with high accuracy. It concludes that Random forest and AdaBoost method were much better than decision tree and bagging, especially in predicting the lower energy consumption.

Kai Xue, Yiyu Ding, Zhirong Yang, Natasa Nord, Mael Roger Albert Barillec, Hans Martin Mathisen, Meng Liu, Tor Emil Giske, Liv Inger Stenstad, Guangyu Cao
An Empirical Study to Investigate Different SMOTE Data Sampling Techniques for Improving Software Refactoring Prediction

The exponential rise in software systems and allied applications has alarmed industries and professionals to ensure high quality with optimal reliability, maintainability etc. On contrary software companies focus on developing software solutions at the reduced cost corresponding to the customer demands. Thus, maintaining optimal software quality at reduced cost has always been the challenge for developers. On the other hand, inappropriate code design often leads aging, smells or bugs which can harm eventual intend of the software systems. However, identifying a smell signifier or structural attribute characterizing refactoring probability in software has been the challenge. To alleviate such problems, in this research code-metrics structural feature identification and Neural Network based refactoring prediction model is developed. Our proposed refactoring prediction system at first extracts a set of software code metrics from object-oriented software systems, which are then processed for feature selection method to choose an appropriate sample set of features using Wilcoxon rank test. Once obtaining the optimal set of code-metrics, a novel ANN classifier using 5 different hidden layers is implemented on 5 open source java projects with 3 data sampling techniques SMOTE, BLSMOTE, SVSMOTE to handle class imbalance problem. The performance of our proposed model achieves optimal classification accuracy, F-measure and then it has been shown through AUC graph as well as box-plot diagram.

Rasmita Panigrahi, Lov Kumar, Sanjay Kumar Kuanar
Prediction Model of Breast Cancer Based on mRMR Feature Selection

In real life, there are a lot of unbalanced data, and there are great differences in the data volume in category distribution, especially in the medical data where this problem is more prominent because of the prevalence rate. In this paper, the P-mRMR algorithm is proposed based on the mRMR algorithm to improve the feature selection process of unbalance data, and to process the attributes with more missing values and integrate the missing values into feature selection while selecting features specific to the characteristics of more missing values in the data set, so as to reduce the complexity of the data pre-processing. In the experiments, the AUC, confusion matrix and probability of missing value are used to compare the algorithms. The experiment shows that the features selected by the improved algorithm have better results in the classifiers.

Junwen Di, Zhiguo Shi
Clustering Ensemble Selection with Analytic Hierarchy Process

Existing clustering ensemble selection methods adopt internal and external evaluation indexes to measure the quality and diversity of base clusterings. The significance of base clustering is quantified by the average or weighted average of multiple evaluation indexes. However, there exist two limitations in these methods. First, the evaluation of base clusterings in the form of linear combination of multiple indexes lacks the structural analysis and relative comparison between clusterings and measures. Second, the consistency between the final evaluation and the multiple evaluations from different measures cannot be guaranteed. To tackle these problems, we propose a clustering ensemble selection method with Analytic Hierarchy Process (AHPCES). Experimental results validate the effectiveness of the proposed method.

Wei Liu, Xiaodong Yue, Caiming Zhong, Jie Zhou
Deep Learning for In-Vehicle Intrusion Detection System

Modern and future vehicles are complex cyber-physical systems. The connection to their outside environment raises many security problems that impact our safety directly. In this work, we propose a Deep CAN intrusion detection system framework. We introduce a multivariate time series representation for asynchronous CAN data which enhances the temporal modelling of deep learning architectures for anomaly detection. We study different deep learning tasks (supervised/unsupervised) and compare several architectures, in order to design an in-vehicle intrusion detection system that fits in-vehicle computational constraints. We conduct experiments with many types of attacks on an in-vehicle CAN using SynCAn Dataset.

Elies Gherbi, Blaise Hanczar, Jean-Christophe Janodet, Witold Klaudel
Efficient Binary Multi-view Subspace Learning for Instance-Level Image Retrieval

The existing hashing methods mainly handle either the feature based nearest-neighbour search or the category-level image retrieval, whereas a few efforts are devoted to instance retrieval problem. Besides, although multi-view hashing methods are capable of exploring the complementarity among multiple heterogeneous visual features, they heavily rely on massive labeled training data, and somewhat affects the real-world applications. In this paper, we propose a binary multi-view fusion framework for directly recovering a latent Hamming subspace from the multi-view features. More specifically, the multi-view subspace reconstruction and the binary quantization are integrated in a unified framework so as to minimize the discrepancy between the original multi-view high-dimensional Euclidean space and the resulting compact Hamming subspace. In addition, our method is amenable to efficient iterative optimization for learning a compact similarity-preserving binary code. The resulting binary codes demonstrate significant advantage in retrieval precision and computational efficiency at the cost of limited memory footprint. More importantly, our method is essentially an unsupervised learning scheme without any labeled data involved, and thus can be used in the cases when the supervised information is unavailable or insufficient. Experiments on public benchmark and large-scale datasets reveal that our method achieves competitive retrieval performance comparable to the state-of-the-art and has excellent scalability in large-scale scenario.

Zhijian Wu, Jun Li, Jianhua Xu
Hyper-Sphere Support Vector Classifier with Hybrid Decision Strategy

If all bounding hyper-spheres for training data of every class are independent, classification for any test sample is easy to compute with high classification accuracy. But real application data are very complicated and relationships between classification bounding spheres are very complicated too. Based on detailed analysis of relationships between bounding hyper-spheres, a hybrid decision strategy is put forward to solve classification problem of the intersections for multi-class classification based on hyper-sphere support vector machines. First, characteristics of data distribution in the intersections are analyzed and then decision class is decided by different strategies. If training samples of two classes in the intersection can be classified by intersection hyper-plane for two hyper-spheres, then new test samples can be decided by this plane. If training samples of two classes in the intersection can be approximately linearly classified, new test samples can be classified by standard optimal binary-SVM hyper-plane. If training samples of two classes in the intersection cannot be linearly classified, new test samples can be decided by introducing kernel function to get optimal classification hyper-plane. If training examples belong to only one class, then new test samples can be classified by exclusion method. Experimental results show performance of our algorithm is more optimal than hyper-sphere support vector machines with only one decision strategy with relatively low computation cost.

Shuang Liu, Peng Chen
Knowledge Graph Embedding Based on Relevance and Inner Sequence of Relations

Knowledge graph Embedding can obtain the low-dimensional dense vectors, which helps to reduce the high dimension and heterogeneity of Knowledge graph (KG), and enhance the application of KG. Many existing methods focus on building complex models, elaborate feature engineering or increasing learning parameters, to improve the performance of embedding. However, these methods rarely capture the influence of intrinsic relevance and inner sequence of the relations in KG simultaneously, while balancing the number of parameters and the complexity of the algorithm. In this paper, we propose a concatenate knowledge graph embedding method based on relevance and inner sequence of relations (KGERSR). In this model, for each $$<head, relation, tail>$$ < h e a d , r e l a t i o n , t a i l > triple, we use two partially shared gates for head and tail entities. Then we concatenate these two gates to capture the inner sequence information of the triples. We demonstrate the effectiveness of the proposed KGERSR on standard FB15k-237 and WN18RR datasets, and it gives about 2% relative improvement over the state-of-the-art method in terms of Hits@1, and Hits@10. Furthermore, KGERSR has fewer parameters than ConmplEX and TransGate. These results indicate that our method could be able to find a better trade-off between complexity and performance.

Jia Peng, Neng Gao, Min Li, Jun Yuan
MrPC: Causal Structure Learning in Distributed Systems

PC algorithm (PC) – named after its authors, Peter and Clark – is an advanced constraint based method for learning causal structures. However, it is a time-consuming algorithm since the number of independence tests is exponential to the number of considered variables. Attempts to parallelise PC have been studied intensively, for example, by distributing the tests to all computing cores in a single computer. However, no effort has been made to speed up PC through parallelising the conditional independence tests into a cluster of computers. In this work, we propose MrPC, a robust and efficient PC algorithm, to accelerate PC to serve causal discovery in distributed systems. Alongside with MrPC, we also propose a novel manner to model non-linear causal relationships in gene regulatory data using kernel functions. We evaluate our method and its variants in the task of building gene regulatory networks. Experimental results on benchmark datasets show that the proposed MrPCgains up to seven times faster than sequential PC implementation. In addition, kernel functions outperform conventional linear causal modelling approach across different datasets.

Thin Nguyen, Duc Thanh Nguyen, Thuc Duy Le, Svetha Venkatesh
Online Multi-objective Subspace Clustering for Streaming Data

This paper develops an online subspace clustering technique which is capable of handling continuous arrival of data in a streaming manner. Subspace clustering is a technique where the subset of features that are used to represent a cluster are different for different clusters. Most of the streaming data clustering methods primarily optimize only a single objective function which limits the model in capturing only a particular shape or property. However, the simultaneous optimization of multiple objectives helps in overcoming the above mentioned limitations and enables to generate good quality clusters. Inspired by this, the developed streaming subspace clustering method optimizes multiple objectives capturing cluster compactness and feature relevancy. In this paper, we consider an evolutionary-based technique and optimize multiple objective functions simultaneously to determine the optimal subspace clusters. The generated clusters in the proposed method are allowed to contain overlapping of objects. To establish the superiority of using multiple objectives, the proposed method is evaluated on three real-life and three synthetic data sets. The results obtained by the proposed method are compared with several state-of-the-art methods and the comparative study shows the superiority of using multiple objectives in the proposed method.

Dipanjyoti Paul, Sriparna Saha, Jimson Mathew
Predicting Information Diffusion Cascades Using Graph Attention Networks

Effective information cascade prediction plays a very important role in suppressing the spread of rumors in social networks and providing accurate social recommendations on social platforms. This paper improves existing models and proposes an end-to-end deep learning method called CasGAT. The method of graph attention network is designed to optimize the processing of large networks. After that, we only need to pay attention to the characteristics of neighbor nodes. Our approach greatly reduces the processing complexity of the model. We use realistic datasets to demonstrate the effectiveness of the model and compare the improved model with three baselines. Extensive results demonstrate that our model outperformed the three baselines in the prediction accuracy.

Meng Wang, Kan Li
PrivRec: User-Centric Differentially Private Collaborative Filtering Using LSH and KD

The collaborative filtering (CF)-based recommender systems provide recommendations by collecting users’ historical ratings and predicting their preferences on new items. However, this inevitably brings privacy concerns since the collected data might reveal sensitive information of users, when training a recommendation model and applying the trained model (i.e., testing the model). Existing differential privacy (DP)-based approaches generally have non-negligible trade-offs in recommendation utility, and often serve as centralized server-side approaches that overlook the privacy during testing when applying the trained models in practice. In this paper, we propose PrivRec, a user-centric differential private collaborative filtering approach, that provides privacy guarantees both intuitively and theoretically while preserving recommendation utility. PrivRec is based on the locality sensitive hashing (LSH) and the teacher-student knowledge distillation (KD) techniques. A teacher model is trained on the original user data without privacy constraints, and a student model learns from the hidden layers of the teacher model. The published student model is trained without access to the original user data and takes the locally processed data as input for privacy. The experimental results on real-world datasets show that our approach provides promising utility with privacy guarantees compared to the commonly used approaches.

Yifei Zhang, Neng Gao, Junsha Chen, Chenyang Tu, Jiong Wang
Simultaneous Customer Segmentation and Behavior Discovery

Customer purchase behavior segmentation plays an important role in the modern economy. We proposed a Bayesian non-parametric (BNP)-based framework, named Simultaneous Customer Segmentation and Utility Discovery (UtSeg), to discover customer segmentation without knowing specific forms of utility functions and parameters. For the segmentation based on BNP models, the unknown type of functions is usually modeled as a non-homogeneous point process (NHPP) for each mixture component. However, the inference of these models is complex and time-consuming. To reduce such complexity, traditionally, economists will use one specific utility function in a heuristic way to simplify the inference. We proposed to automatically select among multiple utility functions instead of searching in a continuous space. We further unified the parameters for different types of utility functions with the same prior distribution to improve efficiency. We tested our model with synthetic data and applied the framework to real-supermarket data with different products, and showed that our results can be interpreted with common knowledge.

Siqi Zhang, Ling Luo, Zhidong Li, Yang Wang, Fang Chen, Richard Xu
Structural Text Steganography Using Unseen Tag Attribute Values

Apart from an effective steganography scheme, it is vital to have an abundance of cover medium while considering the practicability of a stego-system. Aside from images, document files are one of the most exchanged attached content via electronic mailings. In this paper, we present a structural steganographic scheme based on unseen tag attribute values using office documents as the medium. Specifically, we use the XML file that builds the core of the file documents to vehiculate the message. The secret is not visible within the text content, and the stego file size is not far from the cover size. We are among the first to investigate the unseen tag identifier within the cover document to hide the secret message. We assess the performance of the proposed scheme in terms of the invisibility, embedding capacity, robustness, and security. The performance results show the advantage of a higher capacity embedding and a better flexibility while keeping high practicability in terms of accessibility and implementation.

Feno H. Rabevohitra, Yantao Li
Trajectory Anomaly Detection Based on the Mean Distance Deviation

With the development of science and technology and the explosive growth of data, there will be a lot of trajectories every day. However, how to detect the abnormal trajectory from many trajectories has become a hot issue. In order to study trajectory anomaly detection better, we analyze the Sequential conformal anomaly detection in trajectories based on hausdorff distance (SNN-CAD) method, and propose a new measurement method of trajectory distance Improved Moved Euclidean Distance (IMED) instead of Hausdorff distance, which reduces the computational complexity. In addition, we propose a removing-updating strategy to enhance the conformal prediction (CP). Then, we also put forward our Non-conformity measure (NCM), Mean Distance Deviation. It can enlarge the difference between trajectories more effectively, and detect the abnormal trajectory more accurately. Finally, based on the technical measures mentioned above and under the framework of enhanced conformal prediction theory detection, we also build our own detector called Mean Distance Deviation Detector (MDD-ECAD). Using a large number of synthetic trajectory data and real world trajectory data on two detectors, the experimental results show that MDD-ECAD is much better than SNN-CAD in both accuracy and running time.

Xiaoyuan Hu, Qing Xu, Yuejun Guo
Tweet Relevance Based on the Theory of Possibility

The popularity and the great success of social networks are due to their ability to offer Internet users a free space for expression where they can produce a large amount of information. Thus the new challenges of information research and data mining are to extract and analyze this mass of information which can then be used in different applications. This information is characterized mainly by incompleteness, imprecision, and heterogeneity. Indeed the task of analysis using models based on statistics and word frequencies is crucial. To solve the problem of uncertainty, the possibility theory turns out to be the most adequate. In this article, we propose a new approach to find relevant short texts such as tweets using the dual possibility and necessity. Our goal is to translate the fact that a tweet can only be relevant if there is not only a semantic relationship between the tweet and the query but also a synergy between the terms of the tweet. We have modeled the problem through a possibility network to measure the possibility of the relevance of terms in relation to a concept of a given query and a necessity network to measure the representativeness of terms in a tweet. The evaluation shows that using the theory of possibilities with a set of concepts relevant to an initial query gives the best precision rate compared to other approaches.

Amina Ben Meriem, Lobna Hlaoua, Lotfi Ben Romdhane

Healthcare Analytics-Improving Healthcare Outcomes Using Big Data Analytics

Frontmatter
A Semantically Flexible Feature Fusion Network for Retinal Vessel Segmentation

The automatic detection of retinal blood vessels by computer aided techniques plays an important role in the diagnosis of diabetic retinopathy, glaucoma, and macular degeneration. In this paper we present a semantically flexible feature fusion network that employs residual skip connections between adjacent neurons to improve retinal vessel detection. This yields a method that can be trained employing residual learning. To illustrate the utility of our method for retinal blood vessel detection, we show results on two publicly available data sets, i.e. DRIVE and STARE. In our experimental evaluation we include widely used evaluation metrics and compare our results with those yielded by alternatives elsewhere in the literature. In our experiments, our method is quite competitive, delivering a margin of sensitivity and accuracy improvement as compared to the alternatives under consideration.

Tariq M. Khan, Antonio Robles-Kelly, Syed S. Naqvi
Automatic Segmentation and Diagnosis of Intervertebral Discs Based on Deep Neural Networks

Lumbar disc diagnosis belongs to Magnetic Resonance Imaging (MRI) segmentation and detection. It is a challenge for even the most professional radiologists to manually check and interpret MRI. In addition, high-class imbalance is a typical problem in diverse medical image classification problems, which results in poor classification performance. Data imbalance is a typical problem in medical image classifications. Recently computer vision and deep learning are widely used in the automatic positioning and diagnosis of intervertebral discs to improve diagnostic efficiency. In this work, a two-stage disc automatic diagnosis network is proposed, which can improve the accuracy of training classifiers with imbalanced dataset. Experimental results show that the proposed method can achieve 93.08%, 95.41%, 96.22%, 89.34% for accuracy, precision, sensitivity and specificity, respectively. It can solve the problem of imbalanced dataset, and reduce misdiagnosis rate.

Xiuhao Liang, Junxiu Liu, Yuling Luo, Guopei Wu, Shunsheng Zhang, Senhui Qiu
Detecting Alzheimer’s Disease by Exploiting Linguistic Information from Nepali Transcript

Alzheimer’s disease (AD) is the most common form of neurodegenerating disorder accounting for 60–80% of all dementia cases. The lack of effective clinical treatment options to completely cure or even slow the progression of disease makes it even more serious. Treatment options are available to treat the milder stage of the disease to provide symptomatic short-term relief and improve quality of life. Early diagnosis is key in the treatment and management of AD as advanced stages of disease cause severe cognitive decline and permanent brain damage. This has prompted researchers to explore innovative ways to detect AD early on. Changes in speech are one of the main signs of AD patients. As the brain deteriorates the language processing ability of the patients deteriorates too. Previous research has been done in the English language using Natural Language Processing (NLP) techniques for early detection of AD. However, research using local languages and low resourced language like Nepali still lag behind. NLP is an important tool in Artificial Intelligence to decipher the human language and perform various tasks. In this paper, various classifiers have been discussed for the early detection of Alzheimer’s in the Nepali language. The proposed study makes a convincing conclusion that the difficulty in processing information in AD patients reflects in their speech while describing a picture. The study incorporates the speech decline of AD patients to classify them as control subjects or AD patients using various classifiers and NLP techniques. Furthermore, in this experiment a new dataset consisting of transcripts of AD patients and Control normal (CN) subjects in the Nepali language. In addition, this paper sets a baseline for the early detection of AD using NLP in the Nepali language.

Surendrabikram Thapa, Surabhi Adhikari, Usman Naseem, Priyanka Singh, Gnana Bharathy, Mukesh Prasad
Explaining AI-Based Decision Support Systems Using Concept Localization Maps

Human-centric explainability of AI-based Decision Support Systems (DSS) using visual input modalities is directly related to reliability and practicality of such algorithms. An otherwise accurate and robust DSS might not enjoy trust of domain experts in critical application areas if it is not able to provide reasonable justifications for its predictions. This paper introduces Concept Localization Maps (CLMs), which is a novel approach towards explainable image classifiers employed as DSS. CLMs extend Concept Activation Vectors (CAVs) by locating significant regions corresponding to a learned concept in the latent space of a trained image classifier. They provide qualitative and quantitative assurance of a classifier’s ability to learn and focus on similar concepts important for human experts during image recognition. To better understand the effectiveness of the proposed method, we generated a new synthetic dataset called Simple Concept DataBase (SCDB) that includes annotations for 10 distinguishable concepts, and made it publicly available. We evaluated our proposed method on SCDB as well as a real-world dataset called CelebA. We achieved localization recall of above 80% for most relevant concepts and average recall above 60% for all concepts using SE-ResNeXt-50 on SCDB. Our results on both datasets show great promise of CLMs for easing acceptance of DSS in clinical practice.

Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, Sheraz Ahmed
Learning from the Guidance: Knowledge Embedded Meta-learning for Medical Visual Question Answering

Traditional medical visual question answering approaches require a large amount of labeled data for training, but still cannot jointly consider both image and text information. To address this issue, we propose a novel framework called Knowledge Embedded Meta-Learning. In particular, we present a deep relation network to capture and memorize the relation among different samples. First, we introduce the embedding approach to perform feature fusion representation learning. Then, we present the construction of our knowledge graph that relates image with text, as the guidance of our meta-learner. We design a knowledge embedding mechanism to incorporate the knowledge representation into our network. Final result is derived from our relation network by learning to compare the features of samples. Experimental results demonstrate that the proposed approach achieves significantly higher performance compared with other state-of-the-arts.

Wenbo Zheng, Lan Yan, Fei-Yue Wang, Chao Gou
Response Time Determinism in Healthcare Data Analytics Using Machine Learning

IT is revolutionizing the healthcare industry. The benefits being realized could not be imagined a few decades ago. Healthcare Data Analytics (HDA) has enabled medical practitioners to perform prescriptive, descriptive and predictive analytics. This capability has rendered the practitioners far more effective and efficient as compared to their previous generations. At the same time, humankind is being served by the more meaningful diagnosis of diseases, better healthcare, more effective treatments and earlier detection of health issues. However, healthcare practitioners still rely on their expert judgement during emergency situations because there is no assurance of response time determinism (RTD) in current HDA systems. This paper addresses this problem by proposing the inclusion of RTD in HDAs using a recent technique developed in the field of real-time systems. An experiment was conducted simulating a life-saving scenario of this technique to demonstrate this concept. Time gains of up to 17 times were achieved, exhibiting promising results.

Syed Abdul Baqi Shah, Syed Mahfuzul Aziz

Human Activity Recognition

Frontmatter
A Landmark Estimation and Correction Network for Automated Measurement of Sagittal Spinal Parameters

Recently, deep learning for spinal measurement in scoliosis achieved huge success. However, we notice that existing methods suffer low performance on lateral X-rays because of severe occlusion. In this paper, we propose the automated Landmark Estimation and Correction Network (LEC-Net) based on a convolutional neural network (CNN) to estimate landmarks on lateral X-rays. The framework consists of two parts (1) a landmark estimation network (LEN) and (2) a landmark correction network (LCN). The LEN first estimates 68 landmarks of 17 vertebrae (12 thoracic vertebrae and 5 lumbar vertebrae) per image. These landmarks may include some failed points on the area with occlusion. Then the LCN estimates the clinical parameters by considering the spinal curvature described by 68 landmarks as a constraint. Extensive experiment results which test on 240 lateral X-rays demonstrate that our method improves the landmark estimation accuracy and achieves high performance of clinical parameters on X-rays with severe occlusion. Implementation code is available at https://github.com/xiaoyanermiemie/LEN-LCN .

Guosheng Yang, Xiangling Fu, Nanfang Xu, Kailai Zhang, Ji Wu
Facial Expression Recognition with an Attention Network Using a Single Depth Image

In the facial expression recognition field, RGB image-involved models have always achieved the best performance. Since RGB images are easily influenced by illumination, skin color, and cross-databases, the effect of these methods decreases accordingly. To avoid these issues, we propose a novel facial expression recognition framework in which the input only relies on a single depth image since depth image performs very stable in cross-situations. In our framework, we pretrain an RGB face image synthesis model by a generative adversarial network (GAN) using a public database. This pretrained model can synthesize an RGB face image under a unified imaging situation from a depth face image input. Then, introducing the attention mechanism based on facial landmarks into a convolutional neural network (CNN) for recognition, this attention mechanism can strengthen the weights of the key parts. Thus, our framework has a stable input (depth face image) while retaining the natural merits of RGB face images for recognition. Experiments conducted on public databases demonstrate that the recognition rate of our framework is better than that of the state-of-the-art methods, which are also based on depth images.

Jianmin Cai, Hongliang Xie, Jianfeng Li, Shigang Li
Fast and Accurate Hand-Raising Gesture Detection in Classroom

This paper proposes a fast and accurate method for hand-raising gesture detection in classrooms. Our method is based on a one-stage detector, CenterNet, which significantly reduces the inference time. Meanwhile, we design three mechanisms to improve the performance. Firstly, we propose a novel suppression loss to prevent easy and hard examples from overwhelming the training process. Secondly, we adopt a deep layer aggregation network to fuse semantic and spatial representation, which is effective for detecting tiny gestures. Thirdly, due to less variation in aspect ratios, we only regress single width property to predict whole bounding box. Thus achieving a more accurate result. Experiments show that our method achieves 91.4% mAP on our hand-raising dataset and runs at 26 FPS, 6.7 $$\times $$ × faster than the two-stage ones.

Tao Liu, Fei Jiang, Ruimin Shen
Identifying Anger Veracity Using Neural Network and Long-Short Term Memory with Bimodal Distribution Removal

Anger is an important emotion in social interactions. People can be angry from the feeling, or by acting, with an aim to turn situations to their advantage. With advances in affective computing, machine learning based approaches make it possible to identify veracity of anger through physiological signals of observers. In this paper, we examine time-series pupillary responses of observers viewing genuine and acted anger stimuli. A Fully-Connected Neural Network (FCNN) and an Long-Short Term Memory (LSTM) are trained using pre-processed pupillary responses to classify genuine anger and acted anger expressed from the stimuli. We also adopt the Bimodal Distribution Removal (BDR) technique to remove noise from the dataset. We find that both FCNN and LSTM can recognise veracity of anger with an accuracy of $$79.7\%$$ 79.7 % and $$89.7\%$$ 89.7 % respectively. The use of BDR is beneficial in providing an early stopping for LSTM to avoid overfitting and improve efficiency.

Rouyi Jin, Xuanying Zhu, Yeu-Shin Fu
Learning and Distillating the Internal Relationship of Motion Features in Action Recognition

In the field of video-based action recognition, a majority of advanced approaches train a two-stream architecture in which an appearance stream for images and a motion stream for optical flow frames. Due to the considerable computation cost of optical flow and high inference latency of the two-stream method, knowledge distillation is introduced to efficiently capture two-stream representation while only inputting RGB images. Following this technique, this paper proposes a novel distillation learning strategy to sufficiently learn and mimic the representation of the motion stream. Besides, we propose a lightweight attention-based fusion module to uniformly exploit both appearance and motion information. Experiments illustrate that the proposed distillation strategy and fusion module achieve better performance over the baseline technique, and our proposal outperforms the known state-of-art approaches in terms of single-stream and traditional two-stream methods.

Lu Lu, Siyuan Li, Niannian Chen, Lin Gao, Yong Fan, Yong Jiang, Ling Wu
SME User Classification from Click Feedback on a Mobile Banking Apps

Customer segmentation is an essential process that leads a bank to gain more insight and better understand their customers. In the past, this process requires analyses of data, both customer demographic and offline financial transactions. However, from the advancement of mobile technology, mobile banking has become more accessible than before. With over 10 million digital users, SCB easy app by Siam Commercial Bank receives an enormous volume of transactions each day. In this work, we propose a method to classify mobile user’s click behaviour into two groups, i.e. ‘SME-like’ and ‘Non-SME-like’ users. Thus, the bank can easily identify the customers and offer them the right products. We convert a user’s click log into an image that aims to capture temporal information. The image representation reduces the need for feature engineering. Employing ResNet-18 with our image data can achieve 71.69% average accuracy. Clearly, the proposed method outperforms the conventional machine learning technique with hand-crafted features that can achieve 61.70% average accuracy. Also, we discover a hidden insight behind ‘SME-like’ and ‘Non-SME-like’ user’s click behaviour from these images. Our proposed method can lead to a better understanding of mobile banking user behaviour and a novel way of developing a customer segmentation classifier.

Suchat Tungjitnob, Kitsuchart Pasupa, Ek Thamwiwatthana, Boontawee Suntisrivaraporn

Image Processing and Computer Vision

Frontmatter
3D Human Pose Estimation with 2D Human Pose and Depthmap

Three-dimensional human pose estimation models are conventionally based on RGB images or by assuming that accurately-estimated (near to ground truth) 2D human pose landmarks are available. Naturally, such data only contains information about two dimensions, while the 3D poses require the three dimensions of height, width, and depth. In this paper, we propose a new 3D human pose estimation model that takes an estimated 2D pose and the depthmap of the 2D pose as input to estimate 3D human pose. In our system, the estimated 2D pose is obtained from processing an RGB image using a 2D landmark detection network that produces noisy heatmap data. We compare our results with a Simple Linear Model (SLM) of other authors that takes accurately-estimated 2D pose landmarks as input and that has reached the state-of-the-art results for 3D human pose estimate using the Human3.6m dataset. Our results show that our model can achieve better performance than the SLM, and that our model can align the 2D landmark data with the depthmap automatically. We have also tested our network using estimated 2D poses and depthmaps separately. In our model, all three conditions (depthmap+2D pose, depthmap-only and 2D pose-only) are more accurate than the SLM with, surprisingly, the depthmap-only condition being comparable in accuracy with the depthmap+2D pose condition.

Zhiheng Zhou, Yue Cao, Xuanying Zhu, Henry Gardner, Hongdong Li
A Malware Classification Method Based on Basic Block and CNN

Aiming at solving the three problems ranging from considerable consumption of manpower in manual acquisition, to excessively high feature dimension and unsatisfying accuracy caused by manual feature acquisition, which will occur when using the current malware classification methods for feature acquisition. This paper proposes a malware classification method that is based on basic block and Convolutional Neural Network (CNN). The paper will firstly get the assembly code file of the executable malware sample, then extract the opcodes(such as “mov” and “add”) of disassembled file of malware based on the label of basic block, and in the next, it will generate SimHash value vectors of basic blocks through these opcodes and a hash algorithm. Finally, the classification model is trained on the training sample set through using CNN. As we have carried out a series of experiments, and through these experiments, it is proved that our method can get a satisfying result in malware classification. The experiment showed that the classification accuracy of our method can achieve as highest as 99.24%, with the false positive rate being as low as 1.265%.

Jinrong Chen
A Shape-Aware Feature Extraction Module for Semantic Segmentation of 3D Point Clouds

3D shape pattern description of raw point clouds plays an essential and important role in 3D understanding. Previous works often learn feature representations via the solid cubic or spherical neighborhood, ignoring the distinction between the point distributions of objects in various shapes. Additionally, most works encode the spatial information in each neighborhood implicitly by learning edge weights between points, which is not enough to restore spatial information. In this paper, a Shape-Aware Feature Extraction (SAFE) module is proposed. It explicitly describes the spatial distribution of points in the neighborhood by well-designed distribution descriptors and replaces the conventional solid neighborhood with a hollow spherical neighborhood. Then, we encode the inner pattern and the outer pattern separately in the hollow spherical neighborhood to achieve shape awareness. Building an encoder-decoder network based on the SAFE module, we conduct extensive experiments and the results show that our SAFE-based network achieves state-of-the-art performance on the benchmark datasets ScanNet and ShapeNet.

Jiachen Xu, Jie Zhou, Xin Tan, Lizhuang Ma
A Strong Baseline for Fashion Retrieval with Person Re-identification Models

Fashion retrieval is a challenging task of finding an exact match for fashion items contained within an image. Difficulties arise from the fine-grained nature of clothing items, very large intra-class and inter-class variance. Additionally, query and source images for the task usually come from different domains - street and catalogue photos, respectively. Due to these differences, a significant gap in quality, lighting, contrast, background clutter and item presentation exists. As a result, fashion retrieval is an active field of research both in academia and the industry. Inspired by recent advancements in person re-identification research, we adapt leading ReID models to fashion retrieval tasks. We introduce a simple baseline model for fashion retrieval, significantly outperforming previous state-of-the-art results, despite a much simpler architecture. We conduct in-depth experiments on Street2Shop and DeepFashion datasets. Finally, we propose a cross-domain (cross-dataset) evaluation method to test the robustness of fashion retrieval models.

Mikolaj Wieczorek, Andrzej Michalowski, Anna Wroblewska, Jacek Dabrowski
Adaptive Feature Enhancement Network for Semantic Segmentation

Semantic segmentation is a fundamental and challenging problem in computer vision. Recent studies attempt to integrate feature information of different depths to improve the performance of segmentation tasks, and a few of them enhance the features before fusion. However, which areas of the feature should be strengthened and how to strengthen are still inconclusive. Therefore, in this work we propose an Adaptive Feature Enhancement Module (AFEM) that utilizes high-level features to adaptively enhance the key areas of low-level features. Meanwhile, an Adaptive Feature Enhancement Network (AFENet) is designed with AFEM to combine all the enhanced features. The proposed method is validated on representative semantic segmentation datasets, Cityscapes and PASCAL VOC 2012. In particular, 79.5% mIoU on the Cityscapes testing set is achieved without using fine-val data, which is 1.1% higher than the baseline network and the model size is smaller. The code of AFENet is available at https://github.com/KTMomo/AFENet .

Kuntao Cao, Xi Huang, Jie Shao
BEDNet: Bi-directional Edge Detection Network for Ocean Front Detection

Ocean front is an ocean phenomenon, which has important impact on marine ecosystems and marine fisheries. Hence, it is of great significance to study ocean front detection. So far, some ocean front detection methods have been proposed. However, there are mainly two problems for these existing methods: one is the lack of labeled ocean front detection data sets, and the other is that there is no deep learning methods used to locate accurate position of ocean fronts. In this paper, we design a bi-directional edge detection network (BEDNet) based on our collected ocean front data set to tackle these two problems. The labeled ocean front data set is named OFDS365, which consists of 365 images based on the gradient of sea surface temperature (SST) images acquired at every day of the year 2014. BEDNet mainly contains four stages, a pathway from shallow stages to deep stages, and a pathway from deep stages to shallow stages, which can achieve bi-directional multi-scale information fusion. Moreover, we combine the dice and cross-entropy loss function to train our network, obtaining the fine-grained ocean front detection results. In the experiments, we show that BEDNet achieves better performance on ocean front detection compared with other existing methods.

Qingyang Li, Zhenlin Fan, Guoqiang Zhong
Convolutional Neural Networks and Periocular Region Image Recognition

There are some benefits in using periocular biometric traits for individual identification. This work describes the use of convolutional neural network Neocognitron, in this novel application, in individual recognition using periocular region images. Besides, it is used the competitive learning using the extreme points of lines detected in the preprocessing of the input images as winner positions. It was used Carnegie Mellon University - Pose, Illumination, and Expression Database (CMU-PIE), with 41,368 images of 68 persons. From these images, 57 $$\times $$ × 57 periocular images were obtained as training and test samples. The experiments indicate results in the Kappa index of 0.89, for periocular images, and 0.91 for complete face images.

Eliana Pereira da Silva, Francisco Fambrini, José Hiroki Saito
CPCS: Critical Points Guided Clustering and Sampling for Point Cloud Analysis

3D vision based on irregular point sequences has gained increasing attention, with current methods depending on random or farthest point sampling. However, the existing sampling methods either measure the distance in the Euclidean space and ignore the high-level properties, or just sample from point clouds only with the largest distance. To tackle these limitations, we introduce the Expectation-Maxi mization Attention module, to find the critical subset points and cluster the other points around them. Moreover, we explore a point cloud sampling strategy to sample points based on the critical subset. Extensive experiments demonstrate the effectiveness of our method for several popular point cloud analysis tasks. Our module achieves the accuracy of 93.3% on ModelNet40 with only 1024 points for classification task.

Wei Wang, Zhiwen Shao, Wencai Zhong, Lizhuang Ma
Customizable GAN: Customizable Image Synthesis Based on Adversarial Learning

In this paper, we propose a highly flexible and controllable image synthesis method based on the simple contour and text description. The contour determines the object’s basic shape, and the text describes the specific content of the object. The method is verified in the Caltech-UCSD Birds (CUB) and Oxford-102 flower datasets. The experimental results demonstrate its effectiveness and superiority. Simultaneously, our method can synthesize the high-quality image synthesis results based on artificial hand-drawing contour and text description, which demonstrates the high flexibility and customizability of our method further.

Zhiqiang Zhang, Wenxin Yu, Jinjia Zhou, Xuewen Zhang, Jialiang Tang, Siyuan Li, Ning Jiang, Gang He, Gang He, Zhuo Yang
Declarative Residual Network for Robust Facial Expression Recognition

Automatic facial expression recognition is of great importance for the use of human-computer interaction (HCI) in various applications. Due to the large variance in terms of head position, age range, illumination, etc, detecting and recognizing human facial expressions in realistic environments remains a challenging task. In recent years, deep neural networks have started being used in this task and demonstrated state-of-the-art performance. Here we propose a reliable framework for robust facial expression recognition. The basic architecture for our framework is ResNet-18, in combination with a declarative $$L_p$$ L p sphere/ball projection layer. The proposed framework also contains data augmentation, voting mechanism, and a YOLO based face detection module. The performance of our proposed framework is evaluated on a semi-natural static facial expression dataset Static Facial Expressions in the Wild (SFEW), which contains over 800 images extracted from movies. Results show excellent performance with an averaged test accuracy of $$51.89\%$$ 51.89 % for five runs, which indicates the considerable potential of our framework.

Ruikai Cui, Josephine Plested, Jiaxu Liu
Deep Feature Compatibility for Generated Images Quality Assessment

The image quality assessment (IQA) for generated images main focuses on the quality of perceptual aspects. Existing methods for evaluating generated images consider the overall image distribution characteristics. At present, there is no practical method for a single generated image. To address this issue, this paper proposes a solution base on the deep feature compatibility (DFC), which first collects suitable comparison images by a collection model. Then it provides an individual score by computing the compatibility of target and pictures with good perceptual quality. This method makes up for the deficiency of Inception Score (IS) in a small number of results or/and a single image. The experiment on Caltech UCSD birds 200 (CUB) shows that our method performs well on the assessment mission for generated images. Finally, we analyze the various problems of the representative IQA methods in evaluating.

Xuewen Zhang, Yunye Zhang, Zhiqiang Zhang, Wenxin Yu, Ning Jiang, Gang He
DRGCN: Deep Relation GCN for Group Activity Recognition

Person to person relation is an essential clue for group activity recognition (GAR). And the relation graph and the graph convolution neural network (GCN) have become powerful presentation and processing tools of relationship. The previous methods are difficult to capture the complex relationship between people. We propose an end-to-end framework called Deep Relation GCN (DRGCN) for recognizing group activities by exploring the high-level relations between individuals. In DRGCN, we use a horizontal slicing strategy to layer each individual into smaller individual parts, then apply a deep GCN to learn the relation graph of these individual parts. We perform experiments on two widely used datasets and obtain competitive results that demonstrated the effectiveness of our method.

Yiqiang Feng, Shimin Shan, Yu Liu, Zhehuan Zhao, Kaiping Xu
Dual Convolutional Neural Networks for Hyperspectral Satellite Images Classification (DCNN-HSI)

Hyperspectral Satellite Images (HSI) presents a very interesting technology for mapping, environmental protection, and security. HSI is very rich in spectral and spatial characteristics, which are non-linear and highly correlated which makes classification difficult. In this paper, we propose a new approach to the reduction and classification of HSI. This deep approach consisting of a dual Convolutional Neural Networks (DCNN), which aims to improve precision and computing time. This approach involves two main steps; the first is to extract the spectral data and reduce it by CNN until a single value representing the active pixel is displayed. The second consists in classifying the only remaining spatial band on CNN until the class of each pixel is obtained. The tests were applied to three different hyperspectral data sets and showed the effectiveness of the proposed method.

Maissa Hamouda, Med Salim Bouhlel
Edge Curve Estimation by the Nonparametric Parzen Kernel Method

The article concerns the problem of finding the spatial curve which is the line of the abrupt or jump change in the 3d-shape, namely: the edge curve. There are many real applications where such a problems play a significant role. For instance, in computer vision in detection of edges in monochromatic pictures used in e.g. medicine diagnostics, biology and physics; in geology in analysis of satellite photographs of the earth surface for maps and/or determination of borders of forest areas, water resources, rivers, rock cliffs etc. In architecture the curves arising as a result of intersecting surfaces often are also objects of interest. The main focus of this paper is detection of abrupt changes in patterns defined by multidimensional functions. Our approach is based on the nonparametric Parzen kernel estimation of functions and their derivatives. An appropriate use of nonparametric methodology allows to establish the shape of an interesting edge curve.

Tomasz Gałkowski, Adam Krzyżak
Efficient Segmentation Pyramid Network

Extensive growth in the field of robotics and autonomous industries, the demand for efficient image segmentation is increasing rapidly. Whilst existing methods have been shown to achieve outstanding results on challenging data sets, they cannot scale the model properly for real-world computational constraints applications due to a fixed large backbone network. We propose a novel architecture for semantic scene segmentation suitable for resource-constrained applications. Specifically, we make use of the global contextual prior by using a pyramid pooling technique on top of the backbone network. We also employ the recently proposed EfficientNet network to make our model efficiently scalable for computational constraints. We show that our newly proposed model - Efficient Segmentation Pyramid Network (ESPNet) - outperforms many existing scene segmentation models and produces 88.5% pixel accuracy on validation and 80.9% on training set of the Cityscapes benchmark.

Tanmay Singha, Duc-Son Pham, Aneesh Krishna, Joel Dunstan
EMOTIONCAPS - Facial Emotion Recognition Using Capsules

Facial emotion recognition plays an important role in day-to-day activities. To address this, we propose a novel encoder/decoder network namely EmotionCaps, which models the facial images using matrix capsules, where hierarchical pose relationships between facial parts are built into internal representations. An optimal number of capsules and their dimension is chosen, as these hyper-parameters in the network play an important role to capture the complex facial pose relationship. Further, the batch normalization layer is introduced to expedite the convergence. To show the effectiveness of our network, EmotionCaps is evaluated for seven basic emotions in a wide range of head orientations. Additionally, our method is able to analyze facial images even in the presence of noise and blur quite accurately.

Bhavya Shah, Krutarth Bhatt, Srimanta Mandal, Suman K. Mitra
End-to-end Saliency-Guided Deep Image Retrieval

A challenging issue of content-based image retrieval (CBIR) is to distinguish the target object from cluttered backgrounds, resulting in more discriminative image embeddings, compared to situations where feature extraction is distracted by irrelevant objects. To handle the issue, we propose a saliency-guided model with deep image features. The model is fully based on convolution neural networks (CNNs) and it incorporates a visual saliency detection module, making saliency detection a preceding step of feature extraction. The resulted saliency maps are utilized to refine original inputs and then compatible image features suitable for ranking are extracted from refined inputs. The model suggests a working scheme of involving saliency information into existing CNN-based CBIR systems with minimum impacts on the them. Some work assist image retrieval with other methods like object detection or semantic segmentation, but they are not so fine-grained as saliency detection, meanwhile some of them require additional annotations to train. In contrast, we train the saliency module in weak-supervised end-to-end style and do not need saliency ground truth. Extensive experiments are conducted on standard image retrieval benchmarks and our model shows competitive retrieval results.

Jinyu Ma, Xiaodong Gu
Exploring Spatiotemporal Features for Activity Classifications in Films

Humans are able to appreciate implicit and explicit contexts in a visual scene within a few seconds. How we obtain the interpretations of the visual scene using computers has not been well understood, and so the question remains whether this ability could be emulated. We investigated activity classifications of movie clips using 3D convolutional neural network (CNN) as well as combinations of 2D CNN and long short-term memory (LSTM). This work was motivated by the concepts that CNN can effectively learn the representation of visual features, and LSTM can effectively learn temporal information. Hence, an architecture that combined information from many time slices should provide an effective means to capture the spatiotemporal features from a sequence of images. Eight experiments run on the following three main architectures were carried out: 3DCNN, ConvLSTM2D, and a pipeline of pre-trained CNN-LSTM. We analyzed the empirical output, followed by a critical discussion of the analyses and suggestions for future research directions in this domain.

Somnuk Phon-Amnuaisuk, Shiqah Hadi, Saiful Omar
Feature Redirection Network for Few-Shot Classification

Few-shot classification aims to learn novel categories by giving few labeled samples. How to make best use of the limited data to obtain a learner with fast learning ability has become a challenging problem. In this paper, we propose a feature redirection network (FRNet) for few-shot classification to make the features more discriminative. The proposed FRNet not only highlights relevant category features of support samples, but also learns how to generate task-relevant features of query samples. Experiments conducted on three datasets have demonstrate its superiority over the state-of-the-art methods.

Yanan Wang, Guoqiang Zhong, Yuxu Mao, Kaizhu Huang
Generative Adversarial Networks for Improving Object Detection in Camouflaged Images

The effectiveness of object detection largely depends on the availability of large annotated datasets to train the deep network successfully; however, obtaining a large-scale dataset is expensive and remains a challenge. In this work, we explore two different GAN-based approaches for data augmentation of agricultural images in a camouflaged environment. Camouflage is the property of an object which makes it hard to detect because of its similarity to its environment. We leverage paired and unpaired image-to-image translation to create synthetic images based on custom segmentation masks. We evaluate the quality of synthetic images by applying these to the object detection task as additional training samples. The experiments demonstrate that adversarial-based data augmentation significantly improves the accuracy of region-based convolutional neural network for object detection. Our findings show that when evaluated on the testing dataset, data augmentation achieves detection performance improvement of 3.97%. Given the difficulty of object detection task in camouflaged images, the result suggests that combining adversarial-based data augmentation with the original data can theoretically be synergistic in enhancing deep neural network efficiency to address the open problem of detecting objects in camouflaged environments.

Jinky G. Marcelo, Arnulfo P. Azcarraga
Light Textspotter: An Extreme Light Scene Text Spotter

Scene text spotting is a challenging open problem in computer vision community. Many insightful methods have been proposed, but most of them did not consider the enormous computational burden for better performance. In this work, an extreme light scene text spotter is proposed with a teacher-student (TS) structure. Specifically, light convolutional neural network (CNN) architecture, Shuffle Unit, is adopted with feature pyramid network (FPN) for feature extraction. Knowledge distillation and attention transfer are designed in the TS framework to boost text detection accuracy. Cascaded with a full convolution network (FCN) recognizer, our proposed method can be trained end-to-end. Because the resource consumption is halved, our method runs faster. The experimental results demonstrate that our method is more efficient and can achieve state-of-the-art detection performance comparing with other methods on benchmark datasets.

Jiazhi Guan, Anna Zhu
LPI-Net: Lightweight Inpainting Network with Pyramidal Hierarchy

With the development of deep learning, there are a lot of inspiring and outstanding attempts in image inpainting. However, the designed models of most existing approaches take up considerable computing resources, which result in sluggish inference speed and low compatibility to small-scale devices. To deal with this issue, we design and propose a lightweight pyramid inpainting Network called LPI-Net, which applies lightweight modules into the inpainting network with pyramidal hierarchy. Besides, the operations in the top-down pathway of the proposed pyramid network are also lightened and redesign for the implementation of lightweight design. According to the qualitative and quantitative comparison of this paper, the proposed LPI-Net outperforms known advanced inpainting approaches with much fewer parameters. In the evaluation inpainting performance on 10–20% damage regions, LPI-Net achieves an improvement of at least 3.52 dB of PSNR than other advanced approaches on CelebA dataset.

Siyuan Li, Lu Lu, Kepeng Xu, Wenxin Yu, Ning Jiang, Zhuo Yang
MobileHand: Real-Time 3D Hand Shape and Pose Estimation from Color Image

We present an approach for real-time estimation of 3D hand shape and pose from a single RGB image. To achieve real-time performance, we utilize an efficient Convolutional Neural Network (CNN): MobileNetV3-Small to extract key features from an input image. The extracted features are then sent to an iterative 3D regression module to infer camera parameters, hand shapes and joint angles for projecting and articulating a 3D hand model. By combining the deep neural network with the differentiable hand model, we can train the network with supervision from 2D and 3D annotations in an end-to-end manner. Experiments on two publicly available datasets demonstrate that our approach matches the accuracy of most existing methods while running at over 110 Hz on a GPU or 75 Hz on a CPU.

Guan Ming Lim, Prayook Jatesiktat, Wei Tech Ang
Monitoring Night Skies with Deep Learning

The surveillance of meteors is important due to the possibility of studying the Universe and identifying hazardous events. The EXOSS initiative monitors the Brazilian sky with cameras in order to identify meteors, leading to a great quantity of non-meteor captures that must be filtered. We approach the task of automatically distinguishing between meteor and non-meteor images with the use of pre-trained convolutional neural networks. Our main contributions are the revision of the methodology for evaluating models on this task, showing that the previous methodology leads to an overestimation of the expected performance for future data on our dataset; and the application of probability calibration in order to improve the selection of most confident predictions, showing that apart from obtaining probabilities that better reflect the confidence of the model, calibration can lead to concrete improvements on both accuracy and coverage. Our method achieves 98% accuracy predicting on 60% of the images, improving upon the performance of the uncalibrated model of 94% accuracy predicting on 70% of the images.

Yuri Galindo, Marcelo De Cicco, Marcos G. Quiles, Ana C. Lorena
MRNet: A Keypoint Guided Multi-scale Reasoning Network for Vehicle Re-identification

With the increasing usage of massive surveillance data, vehicle re-identification (re-ID) has become a hot topic in the computer vision community. Vehicle re-ID is a challenging problem due to the viewpoint variation, i.e. the different views greatly affect the visual appearance of a vehicle. To handle this problem, we propose an end-to-end framework called Keypoint Guided Multi-Scale Reasoning Network (MRNet) to infer multi-view vehicle features from a one-view image. In our proposed framework, besides the global branch, we learn multi-view vehicle information by introducing a local branch, which leverages different vehicle segments to do relational reasoning. MRNet can infer the latent whole vehicle feature by increasing the semantic similarity between incomplete vehicle segments. MRNet is evaluated on two benchmarks (VeRi-776 and VehicleID) and the experimental results show that our framework has achieved competitive performance with the state-of-the-art methods. On the more challenging dataset VeRi-776, we achieve 72.0% in mAP and 92.4% in Rank-1. Our code is available at https://github.com/panmt/MRNet_for_vehicle_reID .

Minting Pan, Xiaoguang Zhu, Yongfu Li, Jiuchao Qian, Peilin Liu
Multi-modal Feature Attention for Cervical Lymph Node Segmentation in Ultrasound and Doppler Images

Cervical lymph node disease is a kind of cervical disease with a high incidence. Accurate detection of lymph nodes can greatly improve the performance of the computer-aided diagnosis systems. Presently, most studies have focused on classifying lymph nodes in a given ultrasound image. However, ultrasound has a poor discrimination of different tissues such as blood vessel and lymph node. When solving confused tasks like detecting cervical lymph nodes, ultrasound imaging becomes inappropriate. In this study, we combined two common modalities to detect cervical lymph nodes: ultrasound and Doppler. Then a multimodal fusion method is proposed, which made full use of the complementary information between the two modalities to distinguish the lymph and other tissues. 1054 pairs of ultrasound and Doppler images are used in the experiment. As a result, the proposed multimodal fusion method is 3% higher (DICE value) than the baseline methods in segmentation results.

Xiangling Fu, Tong Gao, Yuan Liu, Mengke Zhang, Chenyi Guo, Ji Wu, Zhili Wang
MultiTune: Adaptive Integration of Multiple Fine-Tuning Models for Image Classification

Transfer learning has been widely used as a deep learning technique to solve computer vision related problems, especially when the problem is image classification employing Convolutional Neural Networks (CNN). In this paper, a novel transfer learning approach that can adaptively integrate multiple models with different fine-tuning settings is proposed, which is denoted as MultiTune. To evaluate the performance of MultiTune, we compare it to SpotTune, a state-of-the-art transfer learning technique. Two image datasets from the Visual Decathlon Challenge are used to evaluate the performance of MultiTune. The FGVC-Aircraft dataset is a fine-grained task and the CIFAR100 dataset is a more general task. Results obtained in this paper show that MultiTune outperforms SpotTune on both tasks. We also evaluate MultiTune on a range of target datasets with smaller numbers of images per class. MultiTune outperforms SpotTune on most of these smaller-sized datasets as well. MultiTune is also less computational than SpotTune and requires less time for training for each dataset used in this paper.

Yu Wang, Jo Plested, Tom Gedeon
No-Reference Quality Assessment Based on Spatial Statistic for Generated Images

In recent years, generative adversarial networks has made remarkable progress in the field of text-to-image synthesis whose task is to obtain high-quality generated images. Current evaluation metrics in this field mainly evaluate the quality distribution of the generated image dataset rather than the quality of single image itself. With the deepening research of text-to-image synthesis, the quality and quantity of generated images will be greatly improved. There will be a higher demand for generated image evaluation. Therefore, this paper proposes a blind generated image evaluator(BGIE) based on BRISQUE model and sparse neighborhood co-occurrence matrix, which is specially used to evaluate the quality of single generated image. Through experiments, BGIE surpasses all no-reference methods proposed in the past. Compared to VSS method, the surpassing ratio: SRCC is 8.8%, PLCC is 8.8%. By the “One-to-Multi” high-score image screening experiment, it is proved that the BGIE model can screen out best image from multiple images.

Yunye Zhang, Xuewen Zhang, Zhiqiang Zhang, Wenxin Yu, Ning Jiang, Gang He
Pairwise-GAN: Pose-Based View Synthesis Through Pair-Wise Training

Three-dimensional face reconstruction is one of the popular applications in computer vision. However, even state-of-the-art models still require frontal face as inputs, restricting its usage scenarios in the wild. A similar dilemma also happens in face recognition. New research designed to recover the frontal face from a single side-pose facial image has emerged. The state-of-the-art in this area is the Face-Transformation generative adversarial network, which is based on the CycleGAN. This inspired our researchwhich explores two models’ performance from pixel transformation in frontal facial synthesis, Pix2Pix and CycleGAN. We conducted the experiments on five different loss functions on Pix2Pix to improve its performance, then followed by proposing a new network Pairwise-GAN in frontal facial synthesis. Pairwise-GAN uses two parallel U-Nets as the generator and PatchGAN as the discriminator. The detailed hyper-parameters are also discussed. Based on the quantitative measurement by face similarity comparison, our results showed that Pix2Pix with L1 loss, gradient difference loss, and identity loss results in 2.72 $$\%$$ % of improvement at average similarity compared to the default Pix2Pix model. Additionally, the performance of Pairwise-GAN is 5.4 $$\%$$ % better than the CycleGAN, 9.1 $$\%$$ % than the Pix2Pix, and 14.22 $$\%$$ % than the CR-GAN at the average similarity. More experiment results and codes were released at https://github.com/XuyangSHEN/Pairwise-GAN .

Xuyang Shen, Jo Plested, Yue Yao, Tom Gedeon
Pixel-Semantic Revising of Position: One-Stage Object Detector with Shared Encoder-Decoder

Recently, many methods have been proposed for object detection. However, they cannot detect objects by semantic features, adaptively. According to channel and spatial attention mechanisms, we mainly analyze that different methods detect objects adaptively. Some state-of-the-art detectors combine different feature pyramids with many mechanisms. However, they require more cost. This work addresses that by an anchor-free detector with shared encoder-decoder with attention mechanism, extracting shared features. We consider features of different levels from backbone (e.g., ResNet-50) as the basis features. Then, we feed the features into a simple module, followed by a detector header to detect objects. Meantime, we use the semantic features to revise geometric locations, and the detector is a pixel-semantic revising of position. More importantly, this work analyzes the impact of different pooling strategies (e.g., mean, maximum or minimum) on multi-scale objects, and finds the minimum pooling can improve detection performance on small objects better. Compared with state-of-the-art MNC based on ResNet-101 for the standard MSCOCO 2014 baseline, our method improves detection AP of 3.8%.

Qian Li, Nan Guo, Xiaochun Ye, Dongrui Fan, Zhimin Tang
Reduction of Polarization-State Spread in Phase-Distortion Mitigation by Phasor-Quaternion Neural Networks in PolInSAR

This paper presents that phasor-quaternion neural networks (PQNN) reduce not only the phase singular points (SP) in interferometric synthetic aperture radar (InSAR) but also the spread of polarization states in polarimetric SAR (PolSAR). This result reveals that the PQNN deals with the dynamics of transversal wave, having phase and polarization, in an appropriate manner. That is, the phasor quaternion is not just a formally combined number but, instead, an effective number realizing generalization ability in phase and polarization space in the neural networks.

Kohei Oyama, Akira Hirose
RoadNetGAN: Generating Road Networks in Planar Graph Representation

We propose RoadNetGAN, a road network generation method as an extension to NetGAN, a generative model that can generate graphs similar to real-world networks with the acquisition of similarity measure through learning. Our main contribution is twofold. Firstly, we added displacement attributes to the random walks to generate not only the sequence but also the spatial position of nodes as intersections within a road network to be generated, which increases the diversity of generated road network patterns including the shape of the city blocks. Secondly, we make the generator and discriminator neural networks conditional. This allows for learning of the specification of the initial node of random walks over a graph, which is especially important for interactive road network generation that is mostly used in the applications for urban planning of road networks. We demonstrate that the proposed method can generate road networks that mimic the real road networks with the desired similarity.

Takashi Owaki, Takashi Machida
Routing Attention Shift Network for Image Classification and Segmentation

Deep neural networks as fundamental tools of deep learning have evolved remarkably in various tasks; however, the computational complexity and resources costs rapidly increased when using deeper networks, which challenges the deployment of the resource-limited devices. Recently, shift operation is considered as an alternative.to depthwise separable convolutions, using 60% fewer parameters compared spatial convolutions. Its basic block is composed by shift operations and 1 $$\times $$ × 1 convolution in the intermediate feature maps. Previous works focus on optimizing the redundancy of the correlation between shift groups, making shift to be a learnable parameter, which yields more time to train and higher computation. In this paper, we propose a “dynamic routing” strategy to seek the best movement for shift operation based on attention mechanism, termed Routing Attention Shift Layer (RASL), which measures the contribution of channels to the outputs without back propagation. Moreover, the proposed RASL shows strong generalization to many tasks. Experiments on both classification and semantic segmentation tasks demonstrate the superior performance of the proposed methods.

Yuwei Yang, Yi Sun, Guiping Su, Shiwei Ye
SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading

This paper presents a novel deep learning architecture for word-level lipreading. Previous works suggest a potential for incorporating a pretrained deep 3D Convolutional Neural Networks as a front-end feature extractor. We introduce SpotFast networks, a variant of the state-of-the-art SlowFast networks for action recognition, which utilizes a temporal window as a spot pathway and all frames as a fast pathway. The spot pathway uses word boundaries information while the fast pathway implicitly models other contexts. Both pathways are fused with dual temporal convolutions, which speed up training. We further incorporate memory augmented lateral transformers to learn sequential features for classification. We evaluate the proposed model on the LRW dataset. The experiments show that our proposed model outperforms various state-of-the-art models, and incorporating the memory augmented lateral transformers makes a $$3.7\%$$ 3.7 % improvement to the SpotFast networks and $$16.1\%$$ 16.1 % compared to finetuning the original SlowFast networks. The temporal window utilizing word boundaries helps improve the performance up to $$12.1\%$$ 12.1 % by eliminating visual silences from coarticulations.

Peratham Wiriyathammabhum
Towards Online Handwriting Recognition System Based on Reinforcement Learning Theory

In this work, we formalize the problem of online handwriting recognition according to the reinforcement learning theory. The handwriting trajectory is divided into strokes and we extracted their structural and parametric features based on freeman codes, visual codes and beta-elliptic features respectively. The environments were trained using tabular q-learning algorithm in order to calculate the optimal sate-to-action values for each class of handwriting. The proposed model was evaluated on LMCA database and achieved very promising results for both structural and parametric representations.

Ramzi Zouari, Houcine Boubaker, Monji Kherallah
Training Lightweight yet Competent Network via Transferring Complementary Features

Though deep neural networks have achieved quite impressive performance in various image detection and classification tasks, they are often constrained by requiring intensive computation and large storage space for deployment in different scenarios and devices. This paper presents an innovative network that aims to train a lightweight yet competent student network via transferring multifarious knowledge and features from a large yet powerful teacher network. Based on the observations that different vision tasks are often correlated and complementary, we first train a resourceful teacher network that captures both discriminative and generative features for the objective of image classification (the main task) and image reconstruction (an auxiliary task). A lightweight yet competent student network is then trained by mimicking both pixel-level and spatial-level feature distribution of the resourceful teacher network under the guidance of feature loss and adversarial loss, respectively. The proposed technique has been evaluated over a number of public datasets extensively and experiments show that our student network obtains superior image classification performance as compared with the state-of-the-art.

Xiaobing Zhang, Shijian Lu, Haigang Gong, Minghui Liu, Ming Liu
TSGYE: Two-Stage Grape Yield Estimation

Vision-based grape yield estimation provides a cost-effective solution for intelligent orchards. However, unstructured background, occlusion and dense berries make it challenging for grape yield estimation. We propose an efficient two-stage pipeline TSGYE: precise detection of grape clusters and efficient counting of grape berries. Firstly, high-precision grape clusters are detected using object detectors, such as Mask R-CNN, YOLOv2/v3/v4. Secondly, based on the detected clusters, berry counted through image processing technology. Experimental results show that TSGYE with YOLOv4 achieves 96.96% mAP@0.5 score on WGISD, better than the state-of-the-art detectors. Besides we manually annotate all test images of WGISD and make it public with a grape berry counting benchmark. Our work is a milestone in grape yield estimation for two reasons: we propose an efficient two-stage grape yield estimation pipeline TSGYE; we offer a public test set in grape berry counting for the first time.

Geng Deng, Tianyu Geng, Chengxin He, Xinao Wang, Bangjun He, Lei Duan
Unsupervised Reused Convolutional Network for Metal Artifact Reduction

Nowadays computed tomography (CT) is widely used for medical diagnosis and treatment. However, CT images are often corrupted by undesirable artifacts when metallic implants are carried by patients, which could affect the quality of CT images and increase the possibility of false diagnosis and analysis. Recently, Convolutional Neural Network (CNN) was applied for metal artifact reduction (MAR) with synthesized paired images, which is not accurate enough to simulate the mechanism of imaging. With unpaired images, the first unsupervised model ADN appeared. But it is complicated in architecture and has distance to reach the level of existing supervised methods. To narrow the gap between unsupervised methods with supervised methods, this paper introduced a simpler multi-phase deep learning method extracting features recurrently to generate both metal artifacts and non-artifact images. Artifact Generative Network and Image Generative Network are presented jointly to remove metal artifacts. Extensive experiments show a better performance than ADN on synthesized data and clinical data.

Binyu Zhao, Jinbao Li, Qianqian Ren, Yingli Zhong
Visual-Based Positioning and Pose Estimation

Recent advances in deep learning and computer vision offer an excellent opportunity to investigate high-level visual analysis tasks such as human localization and human pose estimation. Although the performances of human localization and human pose estimation have significantly improved in recent reports, they are not perfect, and erroneous estimation of position and pose can be expected among video frames. Studies on the integration of these techniques into a generic pipeline robust to those errors are still lacking. This paper fills the missing study. We explored and developed two working pipelines that suited visual-based positioning and pose estimation tasks. Analyses of the proposed pipelines were conducted on a badminton game. We showed that the concept of tracking by detection could work well, and errors in position and pose could be effectively handled by linear interpolation of information from nearby frames. The results showed that the Visual-based Positioning and Pose Estimation could deliver position and pose estimations with good spatial and temporal resolutions.

Somnuk Phon-Amnuaisuk, Ken T. Murata, La-Or Kovavisaruch, Tiong-Hoo Lim, Praphan Pavarangkoon, Takamichi Mizuhara
Voxel Classification Based Automatic Hip Cartilage Segmentation from Routine Clinical MR Images

Hip Osteoarthritis (OA) is a common pathological condition among the elderly population, which is mainly characterized by cartilage degeneration. Accurate segmentation of the cartilage tissue over MRIs facilitates quantitative investigations into the disease progression. We propose an automated approach to segment the hip joint cartilage as a single unit from routine clinical MRIs utilizing a voxel-based classification approach. We extracted a rich feature set from the MRIs, which consisting of normalized image intensity-based, local image structure-based, and geometry-based features. We have evaluated the proposed method using routine clinical hip MR images taken from asymptomatic elderly and diagnosed OA patients. MR images from both cohorts show full or partial loss of thickness due to aging or hip OA progression. The proposed algorithm shows good accuracy compared to the manual segmentations with a mean DSC value of 0.74, even with a high prevalence of cartilage defects in the MRI dataset.

Najini Harischandra, Anuja Dharmaratne, Flavia M. Cicuttini, YuanYuan Wang

Natural Language Processing

Frontmatter
Active Learning Based Relation Classification for Knowledge Graph Construction from Conversation Data

Creation of a Knowledge Graph (KG) from text, and its usages in solving several Natural Language Processing (NLP) problems are emerging research areas. Creating KG from text is a challenging problem which requires several NLP modules working together in unison. This task becomes even more challenging when constructing knowledge graph from a conversational data, as user and agent stated facts in conversations are often not grounded and can change with dialogue turns. In this paper, we explore KG construction from conversation data in travel and taxi booking domains. We use a fixed ontology for each of the conversation domain, and extract the relation triples from the conversation. Using active learning technique we build a state-of-the-art BERT based relation classifier which uses minimal data, but still performs accurate classification of the extracted relation triples. We further design heuristics for constructing KG that uses the BERT based relation classifier and Semantic Role Labelling (SRL) for handling negations in extracted relationship triples. Through our experiments we show that using our active learning trained classifier and heuristic based method, KG can be built with good correctness and completeness scores for domain specific conversational datasets. To the best of our knowledge this is the very first attempt at creating a KG from the conversational data that could be efficiently augmented in a dialogue agent to tackle the issue of data sparseness and improve the quality of generated response.

Zishan Ahmad, Asif Ekbal, Shubhashis Sengupta, Anutosh Mitra, Roshni Rammani, Pushpak Bhattacharyya
Adversarial Shared-Private Attention Network for Joint Slot Filling and Intent Detection

Spoken language understanding plays an important role in the dialogue systems, and in such systems, intent detection and slot filling tasks are used to extract semantic components. In previous works on spoken language understanding, many ways that from traditional pipeline methods to joint models have been investigated. The features from these methods are usually extracted from one dataset that cannot jointly optimize databases with different distributions. In this paper, we propose a new adversarial shared-private attention network that learns features from two different datasets with shared and private spaces. The proposed adversarial network trains the shared attention network so that the shared distributions of two datasets are close, thereby reducing the redundancy of the shared features, which helps to alleviate the interference from the private and shared space. A joint training strategy between intent detection and slot filling is also applied to enhance the task relationship. Experimental results on public benchmark corpora, called ATIS, Snips and MIT, show that our proposed models significantly outperform other methods on intent accuracy, slot F1 measure and sentence accuracy.

Mengfei Wu, Longbiao Wang, Yuke Si, Jianwu Dang
Automatic Classification and Comparison of Words by Difficulty

Vocabulary knowledge is essential for both native and foreign language learning. Classifying words by difficulty helps students develop better in different stages of study and gives teachers the standard to adhere to when preparing tutorials. However, classifying word difficulty is time-consuming and labor-intensive. In this paper, we propose to classify and compare the word difficulty by analyzing multi-faceted features, including intra-word, syntactic and semantic features. The results show that our method is robust against different language environments.

Shengyao Zhang, Qi Jia, Libin Shen, Yinggong Zhao
Dual-Learning-Based Neural Machine Translation Using Undirected Sequence Model for Document Translation

Document-level machine translation remains challenging owing to the high time complexity of existing models. In this paper, we propose a dual-learning-based neural machine translation (NMT) using undirected neural sequence model for document-level translation. Dual-learning mechanism can enable an NMT system to automatically learn from corpora through a reinforcement learning process. Undirected neural sequence models such as Bidirectional Encoder Representations from Transformers (BERT) have achieved success on several natural language processing (NLP) tasks. Inspired by a BERT-like machine translation model, we employ a constant-time decoding strategy in our model. In addition, we utilize a two-step training strategy. The experimental results show that our approach has much faster decoding speed than a previous document-level NMT model on several document-level translation tasks while the loss of our approach’s translation quality is acceptable.

Lei Zhang, Jianhua Xu
From Shortsighted to Bird View: Jointly Capturing All Aspects for Question-Answering Style Aspect-Based Sentiment Analysis

Aspect-based sentiment analysis (ABSA) aims to identify the opinion polarity towards a specific aspect. Traditional approaches formulate ABSA as a sentence classification task. However, it is observed that the single sentence classification paradigm cannot take full advantage of pre-trained language models. Previous work suggests it is better to cast ABSA as a question answering (QA) task for each aspect, which can be solved in the sentence-pair classification paradigm. Though QA-style ABSA achieves state-of-the-art (SOTA) results, it naturally separates the prediction process of multiple aspects belonging to the same sentence. It thus is unable to take full advantage of the correlation between different aspects. In this paper, we propose to use the global-perspective (GP) question to replace the original question in QA-style ABSA, which explicitly tells the model the existence of other relevant aspects using additional instructions. In this way, the model can distinguish relevant phrases for each aspect better and utilize the underlying relationship between different aspects. The experimental results on three benchmark ABSA datasets demonstrate the effectiveness of our method.

Liang Zhao, Bingfeng Luo, Zuo Bai, Xi Yin, Kunfeng Lai, Jianping Shen
Hierarchical Sentiment Estimation Model for Potential Topics of Individual Tweets

Twitter has gradually become a valuable source of people’s opinions and sentiments. Although tremendous progress has been made in sentiment analysis, mainstream methods hardly leverage user information. Besides, most methods strongly rely on sentiment lexicons in tweets, thus ignoring other non-sentiment words that imply rich topic information. This paper aims to predict individuals’ sentiment towards potential topics on a two-point scale: positive or negative. The analysis is conducted based on their past tweets for the precise topic recommendation. We propose a hierarchical model of individuals’ tweets (HMIT) to explore the relationship between individual sentiments and different topics. HMIT extracts token representations from fine-tuned Bidirectional Encoder Representations from Transformer (BERT). Then it incorporates topic information in context-aware token representations through a topic-level attention mechanism. The Convolutional Neural Network (CNN) serves as a final binary classifier. Unlike conventional sentiment classification in the Twitter task, HMIT extracts topic phrases through Single-Pass and feeds tweets without sentiment words into the whole model. We build six user models from one benchmark and our collected datasets. Experimental results demonstrate the superior performance of the proposed method against multiple baselines on both classification and quantification tasks.

Qian Ji, Yilin Dai, Yinghua Ma, Gongshen Liu, Quanhai Zhang, Xiang Lin
Key Factors of Email Subject Generation

Automatic email subject generation is of great significance to both the recipient and the email system. The method of using deep neural network to solve the automatically generated task of email subject line has been proposed recently. We experimentally explored the performance impact of multiple elements in this task. These experimental results will provide some guiding significance for the future research of this task. As far as we know, this is the first work to study and analyze the effects of related elements.

Mingfeng Xue, Hang Zhang, Jiancheng Lv
Learning Interactions at Multiple Levels for Abstractive Multi-document Summarization

The biggest obstacles facing multi-document summarization include much more complicated input and excessive redundancy in source contents. Most state-of-the-art systems have attempted to tackle the redundancy problem, treating the entire input as a flat sequence. However, correlations among documents are often neglected. In this paper, we propose an end-to-end summarization model called MLT, which can effectively learn interactions at multiple levels and avoid redundant information. Specifically, we utilize a word-level transformer layer to encode contextual information within each sentence. Also, we design a sentence-level transformer layer for learning relations between sentences within a single document, as well as a document-level layer for learning interactions among input documents. Moreover, we use a neural method to enhance Max Marginal Relevance (MMR), a powerful algorithm for redundancy reduction. We incorporate MMR into our model and measure the redundancy quantitively based on the sentence representations. On benchmark datasets, our system compares favorably to strong summarization baselines judged by automatic metrics and human evaluators.

Yiding Liu, Xiaoning Fan, Jie Zhou, Gongshen Liu
Multitask Learning Based on Constrained Hierarchical Attention Network for Multi-aspect Sentiment Classification

Aspect-level sentiment classification (ALSC) aims to distinguish the sentiment polarity of each given aspect in text. A user-generated review usually contains several aspects with different sentiment for each aspect, but most existing approaches only identify one aspect-specific sentiment polarity. Moreover, the prior works using attention mechanisms will introduce inherent noise and reduce the performance of the work. Therefore, we propose a model called Multitask Learning based on Constrained HiErarchical ATtention network (ML-CHEAT), a simple but effective method, which uses the regularization unit to limit the attention weight of each aspect. In addition, the ML-CHEAT uses the hierarchical attention network to learn the potential relationship between aspect features and sentiment features. Furthermore, we extend our approach to multitask learning to optimize the parameters update in the backpropagation and improve the performance of the model. Experimental results on SemEval competition datasets demonstrate the effectiveness and reliability of our approach.

Yang Gao, Jianxun Liu, Pei Li, Dong Zhou, Peng Yuan
Neural Machine Translation with Soft Reordering Knowledge

The Transformer architecture has been widely used in sequence to sequence tasks since it was proposed. However, it only adds the representations of absolute positions to its inputs to make use of the order information of the sequence. It lacks explicit structures to exploit the reordering knowledge of words. In this paper, we propose a simple but effective method to incorporate the reordering knowledge into the Transformer translation system. The reordering knowledge of each word is obtained by an additional reordering-aware attention sublayer based on its semantic and contextual information. The proposed approach can be easily integrated into the existing framework of the Transformer. Experimental results on two public translation tasks demonstrate that our proposed method can achieve significant translation improvements over the basic Transformer model and also outperforms the existing competitive systems.

Leiying Zhou, Jie Zhou, Wenjie Lu, Kui Meng, Gongshen Liu
Open Event Trigger Recognition Using Distant Supervision with Hierarchical Self-attentive Neural Network

Event trigger recognition plays a crucial role in open-domain event extraction. To address issues of prior work on restricted domains and constraint types of events, so as to enable robust open event trigger recognition for various domains. In this paper, we propose a novel distantly supervised framework of event trigger extraction regardless of domains. This framework consists of three components: a trigger synonym generator, a synonym set scorer and an open trigger classifier. Given the specific knowledge bases, the trigger synonym generator generates high-quality synonym sets to train the remaining components. We employ distant supervision to produce instances of event trigger, then organizes them into fine-grained synonym sets. Inspired by recent deep metric learning, we also propose a novel neural method named hierarchical self-attentive neural network (HiSNN) to score the quality of generated synonym sets. Experimental results on three datasets (including two cross-domain datasets) demonstrate the superior of our proposal compared to the state-of-the-art approaches.

Xinmiao Pei, Hao Wang, Xiangfeng Luo, Jianqi Gao
Reinforcement Learning Based Personalized Neural Dialogue Generation

In this paper, we present a persona aware neural reinforcement learning response generation framework capable of optimizing long-term rewards carefully devised by system developers. The proposed model utilizes an extension of the recently introduced Hierarchical Encoder Decoder (HRED) architecture. We leverage insights from Reinforcement Learning (RL) and employ policy gradient methods to optimize rewards which are defined as simple heuristic approximations that indicate good conversation to a human mind. The proposed model is demonstrated on two benchmark datasets. Empirical results indicate that the proposed approach outperforms their counterparts that do not optimize long-term rewards, have no access to personas, standard models trained using solely maximum-likelihood estimation objective.

Tulika Saha, Saraansh Chopra, Sriparna Saha, Pushpak Bhattacharyya
Sparse Lifting of Dense Vectors: A Unified Approach to Word and Sentence Representations

As the first step in automated natural language processing, representing words and sentences is of central importance and has attracted significant research attention. Despite the successful results that have been achieved in the recent distributional dense and sparse vector representations, such vectors face nontrivial challenge in both memory and computational requirement in practical applications. In this paper, we designed a novel representation model that projects dense vectors into a higher dimensional space and favors a highly sparse and binary representation of vectors, while trying to maintain pairwise inner products between original vectors as much as possible. Our model can be relaxed as a symmetric non-negative matrix factorization problem which admits a fast yet effective solution. In a series of empirical evaluations, the proposed model reported consistent improvement in both accuracy and running speed in downstream applications and exhibited high potential in practical applications.

Senyue Hao, Wenye Li
Word-Level Error Correction in Non-autoregressive Neural Machine Translation

Non-Autoregressive neural machine translation (NAT) not only achieves rapid training but also actualizes fast decoding. However, the implementation of parallel decoding is at the expense of quality. Due to the increase of speed, the dependence on the context of the target side is discarded which resulting in the loss of the translation contextual position perception ability. In this paper, we improve the model by adding capsule network layers to extract positional information more effectively and comprehensively, that is, relying on vector neurons to compensate for the defects of traditional scalar neurons to store the position information of a single segment. Besides, word-level error correction on the output of NAT model is used to optimize generated translation. Experiments show that our model is superior to the previous model, with a BLEU score of 26.12 on the WMT2014 En-De task and a BLEU score of 31.93 on the WMT16 Ro-En, and the speed is even more than six times faster than the autoregressive model.

Ziyue Guo, Hongxu Hou, Nier Wu, Shuo Sun

Recommender Systems

Frontmatter
ANN-Assisted Multi-cloud Scheduling Recommender

Cloud computing has been widely adopted, in the forms of public clouds and private clouds, for many benefits, such as availability and cost-efficiency. In this paper, we address the problem of scheduling jobs across multiple clouds, including a private cloud, to optimize cost efficiency explicitly taking into account data privacy. In particular, the problem in this study concerns several factors, such as data privacy of job, varying electricity prices of private cloud, and different billing policies/cycles of public clouds, that most, if not all, existing scheduling algorithms do not ‘collectively’ consider. Hence, we design an ANN-assisted Multi-Cloud Scheduling Recommender (MCSR) framework that consists of a novel scheduling algorithm and an ANN-based recommender. While the former scheduling algorithm can be used to schedule jobs on its own, their output schedules are also used as training data for the latter recommender. The experiments using both real-world Facebook workload data and larger scale synthetic data demonstrate that our ANN-based recommender cost-efficiently schedules jobs respecting privacy.

Amirmohammad Pasdar, Tahereh Hassanzadeh, Young Choon Lee, Bernard Mans
Correlation-Aware Next Basket Recommendation Using Graph Attention Networks

With the increasing number of commodities in our daily life, the recommender system plays a more and more important role in selecting items of users’ interests. For the next basket recommendation task, in this work, we propose the first end-to-end correlation-aware model to predict the next basket considering intra-basket correlations using graph attention networks. Specifically, items and correlations between items are viewed as nodes and edges in a graph, respectively. By estimating and aggregating the intra-basket correlations using the attention layer of the self-attention model, the recommendation can be conducted at the basket level, instead of at the item level. We conduct comprehensive experiments on a real-world retailing dataset to show the improvement from state-of-the-art baselines using our proposed method.

Yuanzhe Zhang, Ling Luo, Jianjia Zhang, Qiang Lu, Yang Wang, Zhiyong Wang
Deep Matrix Factorization on Graphs: Application to Collaborative Filtering

This work addresses the problem of completing a partially filled matrix incorporating metadata associated with the rows and columns. The basic operation of matrix completion is modeled via deep matrix factorization, and the metadata associations are modeled as graphs. The problem is formally modeled as deep matrix factorization regularized by multiple graph Laplacians. The practical problem of collaborative filtering is an ideal candidate for the proposed solution. It needs to predict missing ratings between users and items, given demographic data of users and metadata associated with items. We show that the proposed solution improves over the state-of-the-art in collaborative filtering.

Aanchal Mongia, Vidit Jain, Angshul Majumdar
Improving Social Recommendations with Item Relationships

Social recommendations have witnessed rapid developments for improving the performance of recommender systems, due to the growing influence of social networks. However, existing social recommendations often ignore to facilitate the substitutable and complementary items to understand items and enhance the recommender systems. We propose a novel graph neural network framework to model the multi-graph data (user-item graph, user-user graph, item-item graph) in social recommendations. In particular, we introduce a viewpoint mechanism to model the relationship between users and items. We conduct an extensive experiment on two public benchmarks, demonstrating significant improvement over several state-of-the-art models.

Haifeng Liu, Hongfei Lin, Bo Xu, Liang Yang, Yuan Lin, Yonghe Chu, Wenqi Fan, Nan Zhao
MCRN: A New Content-Based Music Classification and Recommendation Network

Music classification and recommendation have received wide-spread attention in recent years. However, content-based deep music classification approaches are still very rare. Meanwhile, existing music recommendation systems generally rely on collaborative filtering. Unfortunately, this method has serious cold start problem. In this paper, we propose a simple yet effective convolutional neural network named MCRN (short for music classification and recommendation network), for learning the audio content features of music, and facilitating music classification and recommendation. Concretely, to extract the content features of music, the audio is converted into “spectrograms” by Fourier transform. MCRN can effectively extract music content features from the spectrograms. Experimental results show that MCRN outperforms other compared models on music classification and recommendation tasks, demonstrating its superiority over previous approaches.

Yuxu Mao, Guoqiang Zhong, Haizhen Wang, Kaizhu Huang
Order-Aware Embedding Non-sampling Factorization Machines for Context-Aware Recommendation

FM can use the second-order feature interactions. Some researchers combine FM with deep learning to get the high-order interactions. However, these models rely on negative sampling. ENSFM adopts non-sampling and gets fine results, but it does not consider the high-order interactions. In this paper, we add the high-order interactions to ENSFM. We also introduce a technique called Order-aware Embedding. The excellent results show the effectiveness of our model.

Qingzhi Hou, Yifeng Chen, Mei Yu, Ruiguo Yu, Jian Yu, Mankun Zhao, Tianyi Xu, Xuewei Li

The 13th International Workshop on Artificial Intelligence and Cybersecurity

Frontmatter
A Deep Learning Model for Early Prediction of Sepsis from Intensive Care Unit Records

Early and accurate prediction of sepsis could help physicians with proper treatments and improve patient outcomes. We present a deep learning framework built on a bidirectional long short-term memory (BiLSTM) network model to identify septic patients in the intensive care unit (ICU) settings. The fixed value data padding method serves as an indicator to maintain the missing patterns from the ICU records. The devised masking mechanism allows the BiLSTM model to learn the informative missingness from the time series data with missing values. The developed method can better solve two challenging problems of data length variation and information missingness. The quantitative results demonstrated that our method outperformed the other state-of-the-art algorithms in predicting the onset of sepsis before clinical recognition. This suggested that the deep learning based method could be used to assist physicians for early diagnosis of sepsis in real clinical applications.

Rui Zhao, Tao Wan, Deyu Li, Zhengbo Zhang, Zengchang Qin
AdversarialQR Revisited: Improving the Adversarial Efficacy

At present, deep learning and convolutional neural networks are currently two of the fastest rising trends as the tool to perform a multitude of tasks such as image classification and computer vision. However, vulnerabilities in such networks can be exploited through input modification, leading to negative consequences to its users. This research aims to demonstrate an adversarial attack method that can hide its attack from human intuition in the form of a QR code, an entity that is most likely to conceal the attack from human acknowledgment due to its widespread use at the current time. A methodology was developed to demonstrate the QR-embedded adversarial patch creation process and attack existing CNN image classification models. Experiments were also performed to investigate trade-offs in different patch shapes and find the patch’s optimal color adjustment to improve scannability while retaining acceptable adversarial efficacy.

Aran Chindaudom, Pongpeera Sukasem, Poomdharm Benjasirimonkol, Karin Sumonkayothin, Prarinya Siritanawan, Kazunori Kotani
Hybrid Loss for Improving Classification Performance with Unbalanced Data

Unbalanced data is widespread in practice and presents challenges which have been widely studied in classical machine learning. A classification algorithm trained with unbalanced data is likely to be biased towards the majority class and thus show inferior performance on the minority class. To improve the performance of deep neural network (DNN) models on poorly balanced data, we hybridized two well-performing loss functions, specially designed for learning imbalanced data, mean false error and focal loss. Since mean false error can effectively balance between majority and minority classes and focal loss can reduce the contribution of unnecessary samples, which are usually samples from the majority class, which may cause a DNN model to be biased towards the majority class when learning. We show that hybridizing the two losses can improve the classification performance of the model. Our hybrid loss function was tested with unbalanced data sets, extracted from CIFAR-100 and IMDB review datasets, and showed that, overall, it performed better than mean false error or focal loss.

Thanawat Lodkaew, Kitsuchart Pasupa
Multi-scale Attention Consistency for Multi-label Image Classification

Human has well demonstrated its cognitive consistency over image transformations such as flipping and scaling. In order to learn from human’s visual perception consistency, researchers find out that convolutional neural network’s capacity of discernment can be further elevated via forcing the network to concentrate on certain area in the picture in accordance with the human natural visual perception. Attention heatmap, as a supplementary tool to reveal the essential region that the network chooses to focus on, has been developed and widely adopted by CNNs. Based on this regime of visual consistency, we propose a novel end-to-end trainable CNN architecture with multi-scale attention consistency. Specifically, our model takes an original picture and its flipped counterpart as inputs, and then send them into a single standard Resnet with additional attention-enhanced modules to generate a semantically strong attention heatmap. We also compute the distance between multi-scale attention heatmaps of these two pictures and take it as an additional loss to help the network achieve better performance. Our network shows superiority on the multi-label classification task and attains compelling results on the WIDER Attribute Dataset.

Haotian Xu, Xiaobo Jin, Qiufeng Wang, Kaizhu Huang
Quantile Regression Hindsight Experience Replay

Efficient learning in the environment with sparse rewards is one of the most important challenges in Deep Reinforcement Learning (DRL). In continuous DRL environments such as robotic manipulation tasks, Multi-goal RL with the accompanying algorithm Hindsight Experience Replay (HER) has been shown an effective solution. However, HER and its variants typically suffer from a major challenge that the agents may perform well in some goals while poorly in the other goals. The main reason for the phenomenon is the popular concept in the recent DRL works called intrinsic stochasticity. In Multi-goal RL, intrinsic stochasticity lies in that the different initial goals of the environment will cause the different value distributions and interfere with each other, where computing the expected return is not suitable in principle and cannot perform well as usual. To tackle this challenge, in this paper, we propose Quantile Regression Hindsight Experience Replay (QR-HER), a novel approach based on Quantile Regression. The key idea is to select the returns that are most closely related to the current goal from the replay buffer without additional data. In this way, the interference between different initial goals will be significantly reduced. We evaluate QR-HER on OpenAI Robotics manipulation tasks with sparse rewards. Experimental results show that, in contrast to HER and its variants, our proposed QR-HER achieves better performance by improving the performances of each goal as we expected.

Qiwei He, Liansheng Zhuang, Wei Zhang, Houqiang Li
Sustainable Patterns of Pigeon Flights Over Different Types of Terrain

Visual characteristics of terrain affect the properties of pigeon trajectories in medium-distance flights. Pigeon flight often provides a solution to the task of searching for food (foraging), returning home (homing), or exploring territory (surveying). In this work, we considered the flights of single pigeons and pigeon flocks, calculated flight characteristics such as direction, altitude and its deviations, and analyzed reactions to the boundaries between different areas. Based on remote sensing datasets, we identified visual characteristics of terrain, such as the density of surface fill and its distribution over the study terrain, boundaries of single objects, and boundaries between homogeneous areas. Applying spatial analysis, we compared the characteristics of pigeon GPS tracks and features of object distributions on terrain over which birds fly. Our analysis revealed which flight parameters are stable and which, on the contrary, are very sensitive to visually perceived terrain characteristics. We found that the properties of flight over an urbanized area often differ from the properties of flight over a natural landscape. Spatial data—pigeon GPS track records and open-access remote sensing datasets—were processed using the geographical information system QGIS. Our results show that adaptive visual perception can help solve navigation tasks when pigeons fly over mixed terrain. Knowledge of the characteristic features of bird flights can be used both for a better understanding of the spatial behavior of living creatures (humans and animals) and for optimization of artificial intelligence algorithms.

Margarita Zaleshina, Alexander Zaleshin
Backmatter
Metadata
Title
Neural Information Processing
Editors
Haiqin Yang
Dr. Kitsuchart Pasupa
Andrew Chi-Sing Leung
Prof. James T. Kwok
Dr. Jonathan H. Chan
Prof. Irwin King
Copyright Year
2020
Electronic ISBN
978-3-030-63820-7
Print ISBN
978-3-030-63819-1
DOI
https://doi.org/10.1007/978-3-030-63820-7

Premium Partner