Skip to main content

2021 | Buch

Image and Graphics Technologies and Applications

16th Chinese Conference on Image and Graphics Technologies, IGTA 2021, Beijing, China, June 6–7, 2021, Revised Selected Papers

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 16th Conference on Image and Graphics Technologies and Applications, IGTA 2021, held in Beijing, China in June, 2021. The 21 papers presented were carefully reviewed and selected from 86 submissions. They provide a forum for sharing progresses in the areas of image processing technology; image analysis and understanding; computer vision and pattern recognition; big data mining, computer graphics and VR, as well as image technology applications. The volume contains the following thematic blocks: image processing and enhancement techniques (image information acquisition, image/video coding, image/video transmission, image/video storage, compression, completion, dehazing, reconstruction and display, etc.); biometric identification techniques (biometric identification and authentication techniques including face, fingerprint, iris and palm-print, etc.); machine vision and 3D reconstruction (visual information acquisition, camera calibration, stereo vision, 3D reconstruction, and applications of machine vision in industrial inspection, etc.); image/video big data analysis and understanding (object detection and recognition, image/video retrieval, image segmentation, matching, analysis and understanding); computer graphics (modeling, rendering, algorithm simplification and acceleration techniques, realistic scene generation, 3D reconstruction algorithm, system and application, etc.); virtual reality and human-computer interaction (virtual scene generation techniques, tracing and positioning techniques for large-scale space, augmented reality techniques, human-computer interaction techniques based on computer vision, etc.); applications of image and graphics (image/video processing and transmission, biomedical engineering applications, information security, digital watermarking, text processing and transmission, remote sensing, telemetering, etc.); other research works and surveys related to the applications of image and graphics technology.

Inhaltsverzeichnis

Frontmatter

Image Processing and Enhancement Techniques

Frontmatter
Residual Multi-resolution Network for Hyperspectral Image Denoising
Abstract
Hyperspectral image (HSI) denoising is an important tool to improve the quality of HSIs for subsequent tasks. In this paper, we propose a novel method based on Residual Multiresolution Network (RMRNet) for HSI denoising, which exploits multiscale information better from multiresolution versions of HSIs produced by pixelshuffle operation. The convolutional neural network (CNN) is used for extracting the spatial information among different resolution HSI, respectively. Enhanced representation will be obtained by fusing these multiresolution features. Wide receptive fields are provided by dilated convolution. Spectral information is also considered in the proposed network. To ease the flow of low-frequency information, we use a residual structure in our method. In addition, the experiment results on the simulated dataset demonstrate the superiority of our RMRNet.
Shiyong Xiu, Feng Gao, Yong Chen
Skin Reflectance Reconstruction Based on the Polynomial Regression Model
Abstract
Skin spectral reflectance has applications in numerous medical fields including the diagnosis and treatment of cutaneous disorders and the provision of maxillofacial soft tissue prostheses. This paper describes the polynomial model based on the least square (LS) method for skin spectral reflectance from RGB. Furthermore, this paper uses the real human skin data, which makes our results more practical. The performance is evaluated by the mean, maximum and standard deviation of color difference values under other sets of light sources. The values of standard deviation of root mean square (RMS) errors and goodness of fit coefficient (GFC) between the reproduced and the actual spectra were also calculated. Results are compared with the Xiao’s method. All metrics show that the proposed method leads to considerable improvements in comparison with the Xiao’s method.
Long Ma, Yingying Zhu
From Deep Image Decomposition to Single Depth Image Super-Resolution
Abstract
Although many computer vision tasks such as autonomous driving and robot autonomous navigation as well as object recognition and grasping need to use accurate depth information to improve performance, the captured depth images from practical scene are always troubled by low-resolution and contamination. To resolve this problem, we propose a deep single depth image super-resolution method, which includes three parts: depth dual decomposition block, depth image initialization block and depth image rebuilding block. First, we propose a deep dual decomposition network to separate single low-resolution depth image into two high-resolution parts: fine-detail and coarse-structure images with high quality. Second, weighted fusion mechanism is proposed in depth image rebuilding block for feature integration. Finally, these fused features are fed into residual learning-based reconstruction block in depth image rebuilding block to produce high-quality depth image. Experimental results demonstrate that the proposed method can outperform several state-of-the-art depth map super-resolution methods in term of root mean squared error.
Lijun Zhao, Ke Wang, Jinjing Zhang, Huihui Bai, Yao Zhao
Classification of Solar Radio Spectrum Based on VGG16 Transfer Learning
Abstract
Solar radio bursts are an important part of the study of solar activity, and automatic classification of solar radio spectrum can greatly improve the efficiency of solar activity research. Based on the preprocessing of the original solar radio spectrum images, this paper proposes a solar radio spectrum images classification method based on the VGG16 convolutional neural network and transfer learning. In this method, the pre-trained VGG model is applied to solar radio spectrum recognition. Trained on the generated target data set and adjusted the parameters. The experimental results show that compared with the traditional manual classification method and the existing deep learning classification method, the VGG16 transfer learning classification shows that the TPR of the solar radio burst is better than before. The situation has increased by 12.2%. For the overall classification result analysis, the experimental effect is greatly improved on the basis of the original classification.
Min Chen, Guowu Yuan, Hao Zhou, Ruru Cheng, Long Xu, Chengming Tan
A Channel Attention-Based Convolutional Neural Network for Intra Chroma Prediction of H.266
Abstract
Chroma intra prediction is an important module in Versatile Video Coding (VVC) and Cross-Component Linear Model (CCLM) is an effective coding tool for it, which establish a linear model between the predicted chroma component and the reconstructed luma component. When the video content has complex textures, the chroma prediction performance of CCLM will be suppressed. To further improve chroma prediction ability, we present a simple yet efficient channel attention-based network to predict chroma component, namely, CACNN. The proposed channel attention module is significant, which can control the contribution of each neighboring reference sample when predict chroma component in the current block. We also use multi-line reference samples to further improve the chroma prediction performance. The proposed CACNN is incorporated into the VVC test model version 8.2 (VTM 8.2). Experimental results demonstrate that comparing with VTM 8.2 anchor, the proposed method can achieve 2.89%, 2.36% chroma components bit rate savings in high QPs.
Yao Liu, Xin Ma, Hui Yuan, Ye Yang, Qi Liu

Biometric Identification Techniques

Frontmatter
A Novel Deep Residual Attention Network for Face Mosaic Removal
Abstract
Deep learning used to achieve face mosaic removal is in full swing. In this paper, a novel deep residual attention network (DRAN) is proposed for face mosaic removal. Inspired by the application of attention mechanism, we apply channel attention (CA) and pixel attention (PA) to DRAN to make the network focus on more informative information. In addition, we improve the conventional pixel attention which we superimpose three convolutional kernels of different sizes. DRAN consists of an encoder and a decoder, which the clean and real face image is reconstructed by convolutional neural network. In the encoder, the feature maps of each convolutional layer are used as the input of CA, the output of CA is sent to PA, and the output of PA is directly concatenated with the corresponding feature maps of the decoder. As the same time, inspired by the residual learning, we propose the parallel residual block for more detailed feature extraction. Extensive experiments show that DRAN performs better than state-of-the-art methods, the best PSNR (peak signal-to-noise ratio) and SSIM (structural similarity index) based on the test set are 20.67 dB and 0.8509, respectively.
Chen Liu, Shutao Wang

Machine Vision and 3D Reconstruction

Frontmatter
Pretrained Self-supervised Material Reflectance Estimation Based on a Differentiable Image-Based Renderer
Abstract
Measuring the material reflectance of surfaces is a key technology in inverse rendering, which can be used in object appearance reconstruction. In this paper we propose a novel deep learning-based method to extract material information represented by a physically-based bidirectional reflectance distribution function from an RGB image of an object. Firstly, we design new deep convolutional neural network architectures to regress material parameters by self-supervised training based on a differentiable image-based renderer. Then we generate a synthetic dataset to train the model as the initialization of the self-supervised system. To transfer the domain from the synthetic data to the real image, we introduce a test-time training strategy to finetune the pretrained model to improve the performance. The proposed architecture only requires one image as input and the experiments are conducted to evaluate the proposed method on both the synthetic data and real data. The results show that our trained model presents dramatic improvement and verifies the effectiveness of the proposed methods.
Tianteng Bi, Yue Liu, Dongdong Weng, Yongtian Wang

Image/Video Big Data Analysis and Understanding

Frontmatter
Object-Aware Attention in Few-Shot Learning
Abstract
Embedding networks trained with a limited number of samples have a poor capability in localizing objects. Therefore, models of few-shot learning (FSL) are easily affected by object-irrelevant information in the background, which will lead to low accuracy. An Object-Aware Attention (OAA) mechanism is proposed to improve the generalization ability of models. In OAA module, object-relevant area is obtained by a fully convolutional network to guide the network in extracting object-relevant features. Besides, a general few-shot learning framework with OAA as a plug-and-play module is proposed, in which original images and object-aware images are fused to get the rectified prototypes. Under the general framework, the performance of most existing few-shot learning methods can be improved effectively. Comprehensive experiments show that the OAA can improve the accuracy of four mainstream baselines significantly. On benchmark mini-ImageNet, the method achieves a state-of-the-art performance on the 5-way-1-shot task and 5-way-5-shot task.
Yeqing Shen, Lisha Mo, Huimin Ma, Tianyu Hu, Yuhan Dong
Simultaneously Predicting Video Object Segmentation and Optical Flow Without Motion Annotations
Abstract
Optical flow information is one of the most commonly used temporal cues in video object segmentation algorithms. However, as it is difficult to label real-world video data with motion annotations, video segmentation methods are often forced to use external optical flow datasets and additional flow prediction models. In this paper, we propose an optical flow synthesizing approach which can generate artificial object flow from video segmentation masks, reliving the constraint of manual motion annotations for joint learning of video segmentation and optical flow prediction tasks. Extensive experiments and analysis are carried out on the DAVIS video segmentation datasets and the self-constructed synthetic flow database, demonstrating that the proposed synthetic flow has a better training effect compared with external flow datasets, and that this target-specific flow synthesizing training scheme can help video segmentation networks to better distinguish the motion patterns of certain targets in multiple-instance video segmentation scenes.
Jingchun Cheng, Shengjin Wang, Chunxi Zhang
Recognition of Bending Deformed Pipe Sections in Geological Disaster Area Based on an Ensemble Learning Model
Abstract
At present, the artificial recognition method is mainly used to identify IMU strain detection data of the whole pipe section by segment, which has some problems such as low efficiency, high cost and long cycle. Therefore, this paper realizes the intelligent recognition of the Bending Deformed Section in Geological Disaster Area (BDPIGDA) by establishing an ensemble learning model. Firstly, it is statistically obtained that the pipe sections with bending strain value exceeding 0.125% in an oil pipe include bend, dent section, BDPIGDA. Then, combined with geometric detection data, sample data of different pipe sections are intercepted, and 11 typical data feature values are extracted. Through principal component analysis, kernel principal component analysis, and independent component analysis, the data dimension of the 11 feature data is reduced. Finally, an ensemble learning model combining support vector machine and K-means clustering is established. The research results show that the accuracy rate of the test set of the model is 93.26%, and the recognition rate of the bent deformed pipe section in the geological disaster area is 88.70%, which meets the engineering requirements and provides a certain reference for pipe integrity management.
Zhao Ziqi, Chen Chao, Dai Jinyang, Liu Shen, Li Bo, Liu Xiaoben
Memory Bank Clustering for Self-supervised Contrastive Learning
Abstract
Contrastive Learning aims at embedding positive samples close to each other and push away features from negative samples. This paper analyzed different contrastive learning architectures based on the memory bank network. The existing memory-bank-based model can only store global features across few data batches due to the limited memory bank size, and updating these features can cause the feature drift problem. After analyzing these issues above, a network for contrastive learning with visual representations is proposed in this paper. First, the model is combined with a memory bank and memory feature clustering mechanism; Second, a new feature clustering method is proposed for memory bank network to find and store cross-epoch global feature centers for training epochs based on the memory bank architecture. Third, the centers in memory bank are treated as class features to construct positive and negative samples with current batch data and apply contrastive learning methods to optimize a feature encoder to learn a better feature representation. Finally, this paper designed a training pipeline to update the memory bank and encoder individually to circumvent the feature drift problem. To test the performance of proposed memory bank clustering method with on unsupervised image classification, our experiment used a self-supervised online evaluator with an extra non-linear layer. The experiment results show that our proposed model can achieve good performance on image classification tasks.
Yiqing Hao, Gaoyun An, Qiuqi Ruan
Robust Visual Question Answering Based on Counterfactual Samples and Relationship Perception
Abstract
Traditional visual question answering algorithms based on relationship perception help answer questions by modeling the relationship in the input image. Although better visual question answering performance can be obtained, the model learns the language deviation of the image appearance during training and performs slightly worse on test sets with different data distributions. A model based on Counterfactual Samples and Relationship Perception (CSRP) is proposed by us to solve this problem. The counterfactual sample generation mechanism can generate a large number of counterfactual samples by shielding key objects, forcing the model to focus on key objects to answer questions. Counterfactual samples as feature enhancements can reduce the learning appearance language bias during training. And the relationship between image objects perceives semantics. Extensive experiments on the VQA-CP v2 and VQA V2 datasets demonstrate that our proposed model outperforms most state-of-the-art methods.
Hong Qin, Gaoyun An, Qiuqi Ruan
Prototype Generation Based Shift Graph Convolutional Network for Semi-supervised Anomaly Detection
Abstract
Semi-supervised network is an important branch in video anomaly detection. Previous methods committed to modeling the common feature or distribution of normal data. With the introduction of the pose graph, the model can focus on the behavior of the human body. However, graph embedded networks suffer from the heavy computational cost and could not accurately predict the distribution of normal data. To better tackle these issues, a prototype generation-based graph convolutional network is proposed for anomaly detection, which introduce shift operation and prototype generation module to obtain the distribution of normal data while simplifying the model. Extensive experiments is implemented on ShanghaiTech dataset, the result (76.7 AUC) shows that the proposed approach outperforms most of mainstream models.
Tao Cui, Wenyu Song, Gaoyun An, Qiuqi Ruan

Computer Graphics

Frontmatter
A New Image Super-Resolution Reconstruction Algorithm Based on Hybrid Diffusion Model
Abstract
In the process of image denoising based on anisotropic diffusion model, the problem of edge information loss and “staircase effect” often appear. On the basis of anisotropic diffusion model, this paper combines the fractional diffusion model with the gradient based integer diffusion model, and introduces washout filter as the control term of the model, a new image super-resolution reconstruction algorithm based on hybrid diffusion model is proposed. In our proposed model, the fractional derivative will adjust its size adaptively according to the local variance of the image, and because the threshold k in the traditional diffusion function requires a lot of data experiments to get the best results, we also propose an adaptive threshold k function, whose value changes adaptively with the gradient of the image. Simulation results show that, compared with other algorithms, the new model still has a strong ability to retain image details and edge information after image reconstruction, and the introduced washout filter will also speed up the rapid convergence of the system to a stable state, and improve the convergence speed and stability of the system.
Jimin Yu, Jiajun Yin, Saiao Huang, Maowei Qin, Xiankun Yang, Shangbo Zhou
Research on Global Contrast Calculation Considering Color Differences
Abstract
Image quality evaluation is an important research topic in the field of image processing. Contrast is a common image quality assessment index which can reflect the level of difference between the colors. However, the traditional method of global contrast calculation always has misdescribe when uniform color or nearly uniform color blocks appear in the image. We propose a global contrast method based on RGB-difference among regions. First, divide the image into several regions, and get the difference information of RGB-components between different regions. Second, calculate the parameter value of grayscale transformation which preserve the contrast information of the original image. Finally, combine the difference information and the parameter value of grayscale transformation to get the global contrast information. Compared with traditional method, our method is less affected by the uniform color blocks of the experimental image, it can describe the contrast of experimental image more objectively and fairly. In addition, the contrast value obtained by our method is consistent with the trend of human visual judgement, and conform to the objective law.
Jing Qian, Bin Kong

Virtual Reality and Human-Computer Interaction

Frontmatter
Research on Key Technologies and Function Analysis of Live Interactive Classroom in AI+ Era
Abstract
Intelligent live interactive classroom in the era of AI+ is a quality assessment method of precision teaching based on deep learning and big data behavior analysis. It has gradually developed into an important teaching approach and method. On the issues of teaching standard missing in adult open education online live class, education quality supervision is not in place and lack of interactivity, this paper explores the key technologies of live interactive teaching platform for open education, and puts forward the design method of AI+ live classroom framework and functions from the aspects of teaching management and evaluation, so as to realize learner-centered comprehensive intelligence. This method plays an important role in promoting the individualized teaching and accurate service of open education.
Zhou Nan, Zhou Jianshe

Applications of Image and Graphics

Frontmatter
BrainSeg R-CNN for Brain Tumor Segmentation
Abstract
Brain tumor segmentation methods using deep neural networks have recently achieved significant performance breakthroughs. However, the existing brain tumor segmentation networks are directly implemented on whole brain images, resulting in possibly reduced segmentation performance due to the disturbance of background regions. To solve this problem, inspired by the Mask R-CNN, a novel brain tumor segmentation model called BrainSeg R-CNN is proposed in this work, which classifies the brain tumor areas and boundaries based on the detected region of interest in an end-to-end manner to achieve segmentation result. Also, an effective feature extraction strategy is presented in BrainSeg R-CNN, which in detail extracts various kinds of information from separate channels for each modality and immediately adopts a cross-connection operator to realize the information transmission among different channels. Moreover, concatenation and add calculation are integrated to improve the fusion efficiency of multi-scale features from brain tumor images. Additionally, a multi-weighted and multi-task loss function which fully considers tumor size and overlap label is introduced, significantly improving the segmentation performance. Experimental results on BraTS 2017 dataset demonstrate that our BrainSeg R-CNN obtains competitive performance with state-of-the-arts.
Jianxin Zhang, Xinchao Cheng, Tao He, Dongwei Liu
A Real-Time Tracking Method for Satellite Video Based on Long-Term Tracking Framework
Abstract
Tracking accuracy and frame rate are the two most important indexes of satellite video tracking. Now, the research of close-range tracking algorithm is gradually developing from short-term to long-term, and has been widely used. This paper refers to the idea of long-term tracking strategy and introduces it into satellite video target tracking. With a variety of features as precision guarantee, the redetection function is added to the tracker to ensure the robust tracking. At the same time, the feature dimension is properly reduced. The validity of the algorithm is verified by the video sequence obtained from Skybox-1 satellite. The results show that the tracking effect is good, and the performance of the dimension-reduced version has no significant decrease, and the frame rate is improved by 37.8%. It shows the high feasibility and wide application prospect of long-term tracking strategy in space-based video tracking.
Yufei Ding, Hongyan He, Shixiang Cao, Yu Wang
Fourier Series Fitting of Space Object Orbit Data
Abstract
Accurate orbit information plays an important role in space and national defense security, such as space object prediction, maneuver detection, collision prevention and so on. Therefore, it is highly necessary to master the characteristics of orbital elements of space objects. In this paper, the Fourier series fitting method is proposed, in which the TLE orbit data is used to analyze the orbit elements of GEO, LEO and HEO. According to the orbit elements of different types of targets, the orbit elements variation rule is approximated by using the fitting method, and the resulting variation function can be used for predictions. The experimental results show that the predictions of this method is promising.
Ziwei Zhou, Gaojin Wen, Yun Xu
Mapping Methods in Teleoperation of the Mars Rover
Abstract
Mapping methods based on multi-source images are the core technology in the teleoperation of Mars rovers. This study discusses the related unique control characteristics and introduces a teleoperation control mode of the Mars rover based on the “perception, detection, movement” patrol cycle. The multi-scale landing site mapping method based on multi-source data (orbit/descent/ground images) supporting the teleoperation control is described in detail, and its applications in lunar exploration missions are demonstrated. The wide baseline mapping method aimed at mapping large (e.g., mountain peaks) and long-distance targets after landing is proposed, with relevant experiments conducted by the Yutu-2 rover. The ranging error of the panoramic camera in 560 m range is about 4.1 m, and the accuracy is about 0.73%. The wide baseline model was experimentally confirmed to effectively guide task implementation with a high-precision acquisition of the long baseline stereo, laying the foundation for high-precision terrain applications.
Jia Wang, Tianyi Yu, Junjie Yuan, Lichun Li, Man Peng, Fan Wu, Shiying Liu, Wenhui Wan, Ximing He

Other Research Works and Surveys Related to the Applications of Image and Graphics Technology

Frontmatter
A Regularized Limited-Angle CT Reconstruction Model Based on Sparse Multi-level Information Groups of the Images
Abstract
Restricted by the scanning environment and the shape of the target to be detected, the obtained projection data from computed tomography (CT) are usually incomplete, which leads to a seriously ill-posed problem, such as limited-angle CT reconstruction. In this situation, the classical filtered back-projection (FBP) algorithm loses efficacy especially when the scanning angle is seriously limited. By comparison, the simultaneous algebraic reconstruction technique (SART) can deal with the noise better than FBP, but it is also interfered by the limited-angle artifacts. At the same time, the total variation (TV) algorithm has the ability to address the limited-angle artifacts, since it takes into account a priori information about the target to be reconstructed, which alleviates the ill-posedness of the problem. Nonetheless, the current algorithms exist limitations when dealing with the limited-angle CT reconstruction problem. This paper analyses the distribution of the limited-angle artifacts, and it emerges globally. Then, motivated by TV algorithm, tight frame wavelet decomposition and group sparsity, this paper presents a regularization model based on sparse multi-level information groups of the images to address the limited-angle CT reconstruction, and the corresponding algorithm called modified proximal alternating linearized minimization (MPALM) is presented to deal with the proposed model. Numerical implementations demonstrate the effectiveness of the presented algorithms compared with the above classical algorithms.
Lingli Zhang, Huichuan Liang, Xiao Hu, Yi Xu
Backmatter
Metadaten
Titel
Image and Graphics Technologies and Applications
herausgegeben von
Yongtian Wang
Weitao Song
Copyright-Jahr
2021
Verlag
Springer Singapore
Electronic ISBN
978-981-16-7189-0
Print ISBN
978-981-16-7188-3
DOI
https://doi.org/10.1007/978-981-16-7189-0