Skip to main content

Über dieses Buch

This three-volume set LNCS 10666, 10667, and 10668 constitutes the refereed conference proceedings of the 9th International Conference on Image and Graphics, ICIG 2017, held in Shanghai, China, in September 2017. The 172 full papers were selected from 370 submissions and focus on advances of theory, techniques and algorithms as well as innovative technologies of image, video and graphics processing and fostering innovation, entrepreneurship, and networking.



Computational Imaging


A Double Recursion Algorithm to Image Restoration from Random Limited Frequency Data

One of the main tasks in image restoration is to catch the picture characteristics such as interfaces and textures from incomplete noisy frequency data. For the cost functional with data matching term in frequency domain and the total variation together with Frobenius norm penalty terms in spatial domain, the properties of the minimizer of cost functional and the error estimates on the regularizing solution are established. Then we propose an algorithm with double recursion to restore piecewise smooth image. The Bregman iteration with lagged diffusivity fixed point method is used to solve the corresponding nonlinear Euler-Lagrange equation. By implementing recursion algorithms a few times, the satisfactory reconstructions can be obtained using random band sampling data. Numerical implementations demonstrate the validity of our proposed algorithm with good edge-preservations.

Xiaoman Liu, Jijun Liu

RGB-D Saliency Detection with Multi-feature-fused Optimization

This paper proposes a three-stage method using color, joint entropy and depth to detect salient regions in an image. In the first stage, coarse saliency maps are computed through multi-feature-fused manifold ranking. However, graph-based saliency detection methods like manifold ranking often have problems of inconsistency and misdetection if some parts of the background or objects have relatively high contrast with its surrounding areas. To solve this problem and provide more robust results in varying conditions, depth information is repeatedly used to segment and refine saliency maps. In details, a self-adaptive segment method based on depth distribution is used secondly to filter the less significant areas and enhance the salient objects. At last, the saliency-depth consistency check is implemented to suppress the highlighted areas in the background and enhance the suppressed parts of objects. Qualitative and quantitative evaluation on a challenging RGB-D dataset demonstrates significant appeal and advantages of our algorithm compared with eight state-of-the-art methods.

Tianyi Zhang, Zhong Yang, Jiarong Song

Research on Color Image Segmentation

The traditional Ncut algorithm has the disadvantages of large amount of computation and insufficient ability of resisting noise interference because it needs to solve the eigenvector and eigenmatrix. In order to solve this problem, a color image segmentation algorithm based on the improved Ncut algorithm is proposed in this paper. Firstly, we apply the Mean Shift for pre-segmentation, and then the pixel blocks produced from pre-segmentation replace the raw image pixel blocks to form a new block diagram. Next, Ncut algorithm is employed to final segmentation. We implemented several experiments and the experimental results show that the improved Ncut algorithm can not only improve the efficiency of the segmentation, but also has excellent ability to resist noise interference.

Jingwen Yan, Xiaopeng Chen, Tingting Xie, Huimin Zhao

Depth-Based Focus Stacking with Labeled-Laplacian Propagation

Focus stacking is a promising technique to extend the depth of field in general photography, through fusing different images focused at various depth plane. However, existing depth propagation process in depth-based focus stacking is affected by colored texture and structure differences in guided images. In this paper, we propose a novel focus stacking method based on max-gradient flow and labeled Laplacian depth propagation. We firstly extract sparse source points with max-gradient flow to remove false edges caused in large blur kernel cases. Secondly, we present a depth-edge operator to give these sparse points 2 different labels: off-plane edges and in-plane edges. Only off-plane edges are then utilized in our proposed labeled-Laplacian propagation method to refine final dense depthmap and the all-in-focus image. Experiments show that our all-in-focus image is superior to other state-of-the-art methods.

Wentao Li, Guijin Wang, Xuanwu Yin, Xiaowei Hu, Huazhong Yang

A Novel Method for Measuring Shield Tunnel Cross Sections

With more metro tunnels being constructed and operated, the task of measuring tunnels’ deformation becomes more imperative. This article proposes a novel method for measuring shield tunnel cross sections based on close range photogrammetry. Direct Linear Translation (DLT) method is suitable for non-metric photography, requiring several control points on the linings, which is time-consuming. A new method of setting control points was put forward to overcome the shortcoming. A laser source forms a bright outline on the tunnel’s inner surface. The polar coordinates of control points on the outline are gained by a laser range-finder installed on a 360° protractor. These coordinates are used to solve the unknown parameters of DLT equations. Then the precise outline of the tunnel cross sections can be obtained. A series of tests in the subway tunnel of Shanghai Metro Line 1 were carried out to validate the method being precise and effective.

Ya-dong Xue, Sen Zhang, Zhe-ting Qi

Image Noise Estimation Based on Principal Component Analysis and Variance-Stabilizing Transformation

Image denoising requires taking into account the dependence of the noise distribution on the original image, and the performance of most video denoising algorithms depend on the noise parameters of noisy video, which is particularly important for the estimation of noise parameters. We propose a novel noise estimation method which combines principal component analysis (PCA) and variance-stabilizing transformation (VST), and extend the noise estimation to mixed noise estimation. We also introduce the excess kurtosis to ensure the accuracy of noise estimation and estimate the parameters of VST by minimizing the excess kurtosis of noise distribution. Subjective and objective results show that proposed noise estimation combining with classic video denoising algorithms obtains better effects and make video denoising more widely in application.

Ling Ding, Huying Zhang, Bijun Li, Jinsheng Xiao, Jian Zhou

Efficient High Dynamic Range Video Using Multi-exposure CNN Flow

High dynamic range (HDR) imaging has seen a lot of progress in recent years, while an efficient way to capture and generate HDR video is still in need. In this paper, we present a method to generate HDR video from frame sequence of alternating exposures in a fast and concise fashion. It takes advantage of the recent advancement in deep learning to achieve superior efficiency compared to other state-of-art method. By training an end-to-end CNN model to estimate optical flow between frames of different exposures, we are able to achieve dense image registration of them. Using this as a base, we develop an efficient method to reconstruct the aligned LDR frames with different exposure and then merge them to produce the corresponding HDR frame. Our approach shows good performance and time efficiency while still maintain a relatively concise framework.

Yuchen Guo, Zhifeng Xie, Wenjun Zhang, Lizhuang Ma

Depth Image Acquisition Method in Virtual Interaction of VR Yacht Simulator

The paper investigates depth image acquisition to realize virtual natural interaction of the VR yacht simulator. Because the existing image de-noising algorithms have the disadvantages of poor robustness and unstable edge of image, an improved bilateral filtering algorithm is proposed. Firstly, the algorithm obtains the depth information in the scene to generate depth image by using Kinect sensor. Secondly, the background removal is applied to keep the hand data from the depth image. Finally, median filtering and bilateral filtering are used to smooth the depth image. Experimental results show that the proposed method can obtain better depth images after removing background, thus the algorithm has good robustness.

Qin Zhang, Yong Yin

An Image Segmentation Method Based on Asynchronous Multiscale Similarity Measure

Image segmentation as a basic operation in computer vision is widely used in object detection, feature extraction and so on. In order to improve the effects and speed of image segmentation, an asynchronous processing mechanism of image segmentation was proposed, which use the image gray histogram and spatial contiguity and can avoid multiple iteration of the traditional FCM. A multiscale similarity measure method is proposed combined with the nonlinear sensitivity of gray difference of human based on the tree structure data representation of irregular rough classification of image block, using to merge the image blocks to obtain the segmentation result. Experimental results show that the proposed algorithm outperform the FCM in terms of segmentation effect and computation speed.

Min Li, Zhongwai Xu, Hongwen Xie, Yuhang Xing

Lagrange Detector in Image Processing

Edge detection is a basic operation in the field of image processing and computer vision. However, to the best of our knowledge, there is less mathematical work has been proposed beyond the first- and the second-order derivative operator for edge detection in the past decays. We propose a mathematical model called Lagrange detector for edge detection. Based on Lagrange polynomial interpolation theory, this detector calculates Lagrange remainder as the strength of edge and points at the features in various orders of discrete data or signal. Lagrange remainder combines the first-order derivation and the first- and second-order derivative of neighborhood by multiplication. We use the truncation error of polynomial interpolation to estimate Lagrange remainder. Lagrange detector performs well in detecting both outlines and tiny details. Furthermore, Lagrange detector can be used to detect high-frequency information like as corner, point, Moiré pattern and etc. The research of Lagrange detector opens an new window for low level image processing, and will be used as the basis for further studies on image processing.

Feilong Ma, Linmi Tao, Wu Xia

New Tikhonov Regularization for Blind Image Restoration

Blind image restoration is a challenging problem with unknown blurring kernel. In this paper, we propose a new algorithm based on a new Tikhonov regularization term, which combines three techniques including the split Bregman technique, fast Fourier transform and spectral decomposition technology to accelerate the computation process. Numerical results demonstrate that the proposed algorithm is simple, fast and effective for blind image restoration.

Yuying Shi, Qiao Liu, Yonggui Zhu

Real-Time Multi-camera Video Stitching Based on Improved Optimal Stitch Line and Multi-resolution Fusion

In this paper, we propose a multi-camera video stitching method based on an improved optimal stitch line and multi-resolution for real-time application. First, phase correlation is used to estimate overlapping fields between multiple videos, where SURF feature points are extracted for image registration to save computation time and improve the matching accuracy. Then, a fusion algorithm combining the improved optimal stitch line and multi-resolution algorithms is devised to improve visual effects by eliminating ghosting and any visible seams. In the fusion stage, GPU acceleration is employed to speed up the video stitching. Experiments show that the proposed algorithm has better and real-time performance compared with traditional video stitching methods.

Dong-Bin Xu, He-Meng Tao, Jing Yu, Chuang-Bai Xiao

Image Quality Assessment of Enriched Tonal Levels Images

The quality assessment of a high dynamic image is a challenging task. The few available no reference image quality methods for high dynamic range images are generally in evaluation stage. The most available image quality assessment methods are designed to assess low dynamic range images. In the paper, we show the assessment of high dynamic range images which are generated by utilizing a virtually flexible fill factor on the sensor images. We present a new method in the assessment process and evaluate the amount of improvement of the generated high dynamic images in comparison to original ones. The results show that the generated images not only have more number of tonal levels in comparison to original ones but also the dynamic range of images have significantly increased due to the measurable improvement values.

Jie Zhao, Wei Wen, Siamak Khatibi

Computer Graphics and Visualization


A Variational Model to Extract Texture from Noisy Image Data with Local Variance Constraints

Variational image denoising is one of the most successful methods to recover an image that has been blurred and corrupted with additive noise. However, the Lagrange multiplier of many variational model is global, which leads to the phenomenon that some image regions get satisfactory restoration while the others fail. To avoid this, we propose an image denoising model including a set of local constraints (Lagrange multipliers), each one corresponding to a dyadic region of the image. Thus, the proposed model can denoise the image according to different types of the image region. The model is solved by the gradient descend based algorithm and performs fast. Then we propose a hybrid image denoising scheme combining the state of the art model and the proposed model. The experimental results demonstrate that the proposed method ensures better restoration quality.

Tao Zhang, Qiuli Gao

Joint Visualization of UKF Tractography Data

Tractography methods provide ways to explore white mater in brain. UKF Tractography is a promising one in processing cross fibers and edema regions. In order to get more insight into UKF tractography, we present a joint visualization scheme for UKF tractography data, which integrates and visualizes fiber tracts, diffusion tensors, and multiple tensor measures at the same time. In the scheme, a new kind of tensor glyphs called cylingons, which is similar to cylinder but has more lateral faces, is designed to hold multiple scalar measures through the color mapping of faces. Cylingons are then combined with fiber tracts to provide global and local views under the control of different visualizing options. Initial tests and applications show that our approach can do most work of visual analysis for UKF tractography data, and help to get something that is worthy of further study.

Wen Zhao, Wenyao Zhang, Na Wang, Pin Liao

Semantic Segmentation Based Automatic Two-Tone Portrait Synthesis

This paper presents a semantic segmentation based method for automatically synthesizing two-tone cartoon portraits in black-and-white style. Synthesizing two-tone portraits from photographs can be considered as a heterogeneous image transformation problem, of which the result should be vivid portraits with distinct freehand-like features, such as clean backgrounds and continuous lines. To achieve this goal, our system connects two separate subsystems together, namely semantic segmentation and portrait synthesis. In the semantic segmentation phase, photographs are segmented into background, hair and skin regions using multiple segmentations method. In the portrait synthesis phase, we treat different regions with different strategies. Experimental results demonstrate that our system can precisely segment the input photo and produce visually desired two-tone portraits.

Zhuoqi Ma, Nannan Wang, Xinbo Gao, Jie Li

Parameters Sharing Multi-items Non-parametric Factor Microfacet Model for Isotropic and Anisotropic BRDFs

The reflection model of object surfaces is an essential part in photorealistic rendering. While analytical models are unable to express all significant effects of all materials, we turn to data-driven models, which, however, cost a large amount of memory and more computational resources. The non-parametric factor microfacet model designed by Bagher et al. [1] is intended to solve these problems. In this paper, we present a new non-parametric factor microfacet model which has triple specular lobes but retains the original number of parameters by sharing factors among three color channels. The fitting method called AWLS is also extended to solve for the G factor, which makes the fitting process more robust. Moreover, we use the D factor of our model for importance sampling as in the case of analytical models and find it effective for the specular materials. Finally we generalize our model and our fitting method to fit the 150 anisotropic materials. With only 2010 parameters (8KB), it can reconstruct the original data (2 MB) well, which further proves the expressiveness of our microfacet model.

Junkai Peng, Changwen Zheng, Pin Lv

SRG and RMSE-Based Automated Segmentation for Volume Data

Feature visualization plays an important role in volume visualization as it can alleviate the existing problems in direct volume rending. While extraction of the predefined features is also an issue which is difficult to solve. Volume segmentation is of great significance for feature visualization as it is pre-processing of feature extraction. Essentially volume segmentation can be viewed as generalized cluster, basing on this consideration, this paper proposes an automated method basing on symmetric region growing (SRG) and root mean square error (RMSE) for volume segmentation. The hybrid approach contains two steps: Firstly the volume dataset is over-segmented to series of subsets by SRG, and three strategies are applied to optimize the criteria in RG; Secondly subsets are clustered by K-Means basing on a new distance metric, which is derived from RMSE, further the segmentation is completed fast and efficiently. This method is unsupervised, and experiments reveal the proposed distance metric is more rational, and superiority of the proposed method with better segmentation performance.

Wang Li, Xiaoan Tang, Junda Zhang

Shape Recovery of Endoscopic Videos by Shape from Shading Using Mesh Regularization

Endoscopic videos have been widely used for stomach diagnoses. It is of particular importance to obtain the 3D shapes, which enables observations from different perspectives so as to facilitate comprehensive and accurate diagnoses. However, obtaining 3D shapes is challenging for traditional multi-view 3D reconstruction methods, due to strong motion blurs, reflections, low spatial resolutions, non-rigid surfaces, and limited view angle shifts. In this work, we propose a mesh regularization for shape recovery based on cues derived from Shape-from-Shading (SfS). We recover shapes for all frames to generate a 3D video. In particular, a 3D mesh is optimized for every frame according to the 3D raw data obtained from SfS. Although the raw data contains errors and temporal jitters, our spatially and temporally optimized meshes can well approximate the underlying non-rigid surfaces, rendering temporally-stabilized meshes for 3D video display. Our experiments demonstrate the effectiveness of our method on many challenging endoscopic videos.

Zhihang Ren, Tong He, Lingbing Peng, Shuaicheng Liu, Shuyuan Zhu, Bing Zeng

Lazy Recoloring

This paper presents a simple, intuitive and interactive tool that allows both non-experts and experts to recolor an image or series by using a slider based GUI. Different from the state-of-the-art methods, the proposed method has two unique advantages: one is the slider based GUI that is very easy to use and guides users to dragging the sliders to the favorite recolored images, and the other one is the automatic consecutive recoloring transition series of images. This recoloring method contains several components: a slider based GUI that is very easy to understand and use even for simpleton user, an efficient optimization algorithm for creating a recolored image from the source color image. It is shown that the proposed method is much faster and easier to use than the state-of-the-art methods and is a really lazy recoloring method for users. Particularly, the proposed method is also suitable for colorblind users by tentative dragging the sliders to obtain the visual images with more contrast and details.

Guanlei Xu, Xiaotong Wang, Xiaogang Xu, Lijia Zhou

Similar Trademark Image Retrieval Integrating LBP and Convolutional Neural Network

Trademarks play a very important role in the field of economics and companies and are usually used to distinguish goods among different producers and operators, represent reputation, quality and reliability of firms. In this paper, we utilize convolutional neural network to extract visual features. Then we present a method to extract Uniform LBP features from feature maps of each convolutional layer features based on the pre-trained CNN model. The experiments indicated that the methods we proposed can enhance the robustness of features and solve the drawback of the comparison approach. It is also shown that the methods we proposed get better results in recall, precision and F-Measure in trademark databases including 7139 trademark images and METU trademark database.

Tian Lan, Xiaoyi Feng, Zhaoqiang Xia, Shijie Pan, Jinye Peng

Adaptive Learning Compressive Tracking Based on Kalman Filter

Object tracking has theoretical and practical application value in video surveillance, virtual reality and automatic navigation. Compressive tracking(CT) is widely used because of its advantages in accuracy and efficiency. However, the compressive tracking has the problem of tracking drift when there are object occlusion, abrupt motion and blur, similar objects. In this paper, we propose adaptive learning compressive tracking based on Kalman filter (ALCT-KF). The CT is used to locate the object and the classifier parameter can be updated adaptively by the confidence map. When the heavy occlusion occurs, Kalman filter is used to predict the location of object. Experimental results show that ALCT-KF has better tracking accuracy and robustness than current advanced algorithms and the average tracking speed of the algorithm is 39 frames/s, which can meet the requirements of real-time.

Xingyu Zhou, Dongmei Fu, Yanan Shi, Chunhong Wu

Online High-Accurate Calibration of RGB+3D-LiDAR for Autonomous Driving

Vision+X has become the promising tendency for scene understanding in autonomous driving, where X may be the other non-vision sensors. However, it is difficult to utilize all the superiority of different sensors, mainly because of the heterogenous, asynchronous properties. To this end, this paper calibrates the commonly used RGB+3D-LiDAR data by synchronization and an online spatial structure alignment, and obtains a high-accurate calibration performance. The main highlights are that (1) we rectify the 3D points with the aid of differential inertial measurement unit (IMU), and increase the frequency of 3D laser data as the same as the ones of RGB data, and (2) this work can online high-accurately updates the external parameters of calibration by a more reliable spatial-structure matching of RGB and 3D-LiDAR data. By experimentally in-depth analysis, the superiority of the proposed method is validated.

Tao Li, Jianwu Fang, Yang Zhong, Di Wang, Jianru Xue

Run-Based Connected Components Labeling Using Double-Row Scan

This paper presents a novel run-based connected components labeling algorithm which uses double-row scan. In this algorithm, the run is defined in double rows and the binary image is scanned twice. In the first scan, provisional labels are assigned to runs according to the connectivity between the current run and runs in the last two rows. Simultaneously, equivalent provisional labels are recorded. Then the adjacent matrix of the provisional labels is generated and decomposed with the Dulmage-Mendelsohn decomposition, to search for the equivalent-label sets in linear time. In the second scan, each equivalent-label set is labeled with a number from 1, which can be efficiently accomplished in parallel. The proposed algorithm is compared with the state-of-the-art algorithms both on synthetic images and real image datasets. Results show that the proposed algorithm outperforms the other algorithms on images with low density of foreground pixels and small amount of connected components.

Dongdong Ma, Shaojun Liu, Qingmin Liao

A 3D Tube-Object Centerline Extraction Algorithm Based on Steady Fluid Dynamics

Three-dimensional tubular objects are widely used in the fields of industrial design, medical simulation, virtual reality and so on. Because of the complex tubular structure with bifurcation, irregular surface and uneven distribution of inner diameter, creating the centerlines of tubular objects is accurately a challenge work. In this paper, we propose a novel two-stage algorithm for efficient and accurate centerline extraction based on steady fluid dynamics. Firstly, the liquid pressure cloud data is obtained by Finite Volume Method (FVM) to simulate Newtonian fluid in the inner space of 3D tube. And the Delaunay Tetrahedralization and the Marching Tetrahedra Method are used to extract isobaric surfaces. Secondly, the selected center points of these isosurfaces are orderly organized for constructing the centerline directed tree, from which the final continuous, smooth centerline is automatically generated by Catmull-Rom spline. The experimental results show that our approach is feasible for extracting the centerlines of tubular objects with high accuracy and less manual interventions, especially has good robustness on complex tubular structures.

Dongjin Huang, Ruobin Gong, Hejuan Li, Wen Tang, Youdong Ding

Moving Objects Detection in Video Sequences Captured by a PTZ Camera

To solve the problem of detecting moving objects in video sequences which are captured by a Pan-Tilt-Zoom (PTZ) camera, a modified ViBe (Visual Background Extractor) algorithm, which is a pixel-based background modelling algorithm, is proposed in this paper. We divide a changing background scene into three parts. The first part is the new background region if a PTZ camera’s field of view has been changed and we re-initialize background model of this part. The second is the disappeared area in the current frame and we decide to discard their models to save memory. Then the third part is the overlapping background region of consecutive frames. Via matching SURF feature points which are extracted only in background region we obtain an accurate homography matrix between consecutive frames. To ensure that the corresponding model from the former frame can be used in the current pixel, the homographic matrix should show a forward mapping relationship between the adjacent frames. Efficiency figures show that compared with origin ViBe algorithm and some other state-of-the-art background subtraction methods, our method is more affective for video sequences captured by a PTZ camera. More importantly, our method can be used in most of pixel-based background modelling algorithms to enhance their performance when dealing with videos captured by a moving camera.

Li Lin, Bin Wang, Fen Wu, Fengyin Cao

Fast Grid-Based Fluid Dynamics Simulation with Conservation of Momentum and Kinetic Energy on GPU

Since the computation of fluid animation is often too heavy to run in real-time simulation, we propose a fast grid-based method with parallel acceleration. In order to reduce the cost of computation keeping a balance between fluid stability and diversity, we consider the Navier-Stokes equation on the grid structure with momentum conservation, and introduce the kinetic energy for collision handling and boundary condition. Our algorithm avoids the mass loss during the energy transfer, and can be applied to the two-way coupling with a solid body. Importantly, we propose to use the forward-tracing-based motion and design for parallel computing on Graphics Processing Unit (GPU). In particular, these experiments illustrate the benefits of our method, both in conserving fluid density and momentum. They show that our method is suitable to solve the energy transfer when object interaction is considered during fluid simulation.

Ka-Hou Chan, Sio-Kei Im

Adaptive Density Optimization of Lattice Structures Sustaining the External Multi-load

In recent years, additive manufacturing have attracted increasing attention and promoted the development of lightweight modeling methods. Some studies have been carried out using internal filling structure to optimize the 3D model, while reducing the weight of the model, it can also satisfy some physical properties, such as to withstand the external loads. This paper presents an adaptive infilling structure optimization method based on the triply periodic minimal surface (TPMS), which can be represented by a pure mathematical implicit function. The morphology of these lattice structures within different stress regions can be adaptively adjusted, so as to reduce the weight of 3D printed objects while sustaining the external multi-load constraints. Firstly, finite element method is used to analyze the stress distribution of the original model infilled with uniform lattice structure. According to its stress value, the internal lattice structure is divided into three regions consists of high region (HR), transition region (TR) and low region (LR). Then, the inner structure within different stress regions is adaptively adjusted to relieve the stress concentration. Finally, we demonstrated that the proposed algorithm can effectively reduce the weight of 3D model while sustaining its mechanical strength.

Li Shi, Changdong Zhang, Tingting Liu, Wenhe Liao, Xiuyi Jia

Hyperspectral Image Processing


Hyperspectral Image Classification Based on Deep Forest and Spectral-Spatial Cooperative Feature

Recently, deep-learning-based methods have displayed promising performance for hyperspectral image (HSI) classification. However, these methods usually require a large number of training samples, and the complex structure and time-consuming problem have restricted their applications. Deep forest, a decision tree ensemble approach with performance highly competitive to deep neural networks. Deep forest can work well and efficiently even when there are only small-scale training data. In this paper, a novel simplified deep framework is proposed, which achieves higher accuracy when the number of training samples is small. We propose the framework which employs local binary patterns (LBPS) and gabor filter to extract local-global image features. The extracted feature along with original spectral features will be stacked, which can achieve concatenation of multiple features. Finally, deep forest will extract deeper features and use strategy of layer-by-layer voting for HSI classification.

Mingyang Li, Ning Zhang, Bin Pan, Shaobiao Xie, Xi Wu, Zhenwei Shi

Hyperspectral Image Classification Using Multi Vote Strategy on Convolutional Neural Network and Sparse Representation Joint Feature

Classification is one of the most popular topics in hyperspectral image (HSI). This paper proposes a method that uses multi vote strategy on convolutional neural network and sparse representation joint feature in hyperspectral image classification. First, the labeled spectral information was extracted by Principal Component Analysis (PCA) as well as the spatial information, at the same time, we feed the convolutional neural network and sparse representation joint feature to SVM. Then, we use multi-vote strategy to get the final result. Experimental results based on public database demonstrate that the proposed method provides better classification accuracy than previous hyperspectral classification methods.

Daoming Ye, Rong Zhang, Dixiu Xue

Efficient Deep Belief Network Based Hyperspectral Image Classification

Hyperspectral Image (HSI) classification plays a key role remote sensing field. Recently, deep learning has demonstrated its effectiveness in HSI Classification field. This paper presents a spectral-spatial HSI classification technique established on the deep learning based deep belief network (DBN) for deep and abstract feature extraction and adaptive boundary adjustment based segmentation. Proposed approach focuses on integrating the deep learning based spectral features and segmentation based spatial features into a framework for improved performance. Specifically, first the deep DBN model is exploited as a spectral feature extraction based classifier to extract the deep spectral features. Second, spatial contextual features are obtained by utilizing effective adaptive boundary adjustment based segmentation technique. Finally, maximum voting based criteria is operated to integrate the results of extracted spectral and spatial information for improved HSI classification. In general, exploiting spectral features from DBN process and spatial features from segmentation and integration of spectral and spatial information by maximum voting based criteria, has a substantial effect on the performance of HSI classification. Experimental performance on real and widely used hyperspectral data sets with different contexts and resolutions demonstrates the accuracy of the proposed technique and performance is comparable to several recently proposed HSI classification techniques.

Atif Mughees, Linmi Tao

Classification of Hyperspectral Imagery Based on Dictionary Learning and Extended Multi-attribute Profiles

In recent years, sparse representation has shown its competitiveness in the field of image processing, and attribute profiles have also demonstrated their reliable performance in utilizing spatial information in hyperspectral image classification. In order to fully integrate spatial information, we propose a novel framework which integrates the above-mentioned methods for hyperspectral image classification. Specifically, sparse representation is used to learn a posteriori probability with extended attribute profiles as input features. A classification error term is added to the sparse representation-based classifier model and is solved by the k-singular value decomposition algorithm. The spatial correlation of neighboring pixels is incorporated by a maximum a posteriori scheme to obtain the final classification results. Experimental results on two benchmark hyperspectral images suggest that the proposed approach outperforms the related sparsity-based methods and support vector machine-based classifiers.

Qishuo Gao, Samsung Lim, Xiuping Jia

Deep Residual Convolutional Neural Network for Hyperspectral Image Super-Resolution

Hyperspectral image is very useful for many computer vision tasks, however it is often difficult to obtain high-resolution hyperspectral images using existing hyperspectral imaging techniques. In this paper, we propose a deep residual convolutional neural network to increase the spatial resolution of hyperspectral image. Our network consists of 18 convolution layers and requires only one low-resolution hyperspectral image as input. The super-resolution is achieved by minimizing the difference between the estimated image and the ground truth high resolution image. Besides the mean square error between these two images, we introduce a loss function which calculates the angle between the estimated spectrum vector and the ground truth one to maintain the correctness of spectral reconstruction. In experiments on two public datasets we show that the proposed network delivers improved hyperspectral super-resolution result than several state-of-the-art methods.

Chen Wang, Yun Liu, Xiao Bai, Wenzhong Tang, Peng Lei, Jun Zhou

Multi-view and Stereoscopic Processing


Stereoscopic Digital Camouflage Pattern Generation Algorithm Based on Color Image Segmentation

Pattern painting camouflage is an important method to improve the survivability of the military target. However, the existing flat pattern painting is not enough to confront the stereophotograph reconnaissance. So the stereoscopic digital pattern painting which can distort the 3D appearance of the target has been researched in this paper. The 3D models of the stereoscopic digital camouflage pattern were introduced. The design principles of the parallax in the pattern was analyzed and used in designing the sequence images producing algorithm. The result shows that the stereoscopic camouflage pattern can distort the appearance of the target in three dimensions.

Qin Lei, Wei-dong Xu, Jiang-hua Hu, Chun-yu Xu

Uncertain Region Identification for Stereoscopic Foreground Cutout

This paper presents a method that automatically segments the foreground objects for stereoscopic images. Given a stereo pair, a disparity map can be estimated, which encodes the depth information. Objects that stay close to the camera are considered as foreground while regions with larger depths are deemed as background. Although the raw disparity map is usually noisy, incomplete, and inaccurate, it facilitates an automatic generation of trimaps for both views, where the images are partitioned into three regions: definite foreground, definite background, and uncertain region. Our job is now reduced to labelling of pixels in the uncertain region, in which the number of unknown pixels has been decreased largely. We propose to use an MRF based energy minimization for labelling the unknown pixels, which involves both local and global color probabilities within and across views. Results are evaluated by objective metrics on a ground truth stereo segmentation dataset, which validates the effectiveness of our proposed method.

Taotao Yang, Shuaicheng Liu, Chao Sun, Zhengning Wang, Bing Zeng

Map-Build Algorithm Based on the Relative Location of Feature Points

Dynamic building road map is a challenging task in computer vision. In this paper we propose an effective algorithm to build road map dynamically with vehicular camera. We obtain top view of video by Inverse Perspective Mapping (IPM). Then extract feature points from top view and generate feature map. Feature map is the description of feature points’ distribution and provides the information of camera status. By detecting camera status, we apply different degree of restraint geometric transformation to avoid unnecessary time cost and enhance robustness. Experiment demonstrated our algorithm’s effectiveness and robustness.

Cheng Zhao, Fuqiang Zhao, Bin Kong

Sparse Acquisition Integral Imaging System

Using 3DS MAX to obtain elemental image (EI) array in the virtual integral imaging system need to put large scale camera array, which is difficult to be applied to practice. To solve this problem we establish a sparse acquisition integral imaging system. In order to improve the accuracy of disparity calculation, a method using color segmentation and integral projection to calculate the average disparity value of each color object between two adjacent images is proposed. Firstly, we need to finish the establishment of virtual scene and microlens array model in 3DS MAX. According to the mapping relationship between EI and sub image (SI), we can obtain the SI by first, then calculate to the EI. The average value of the disparity from different color objects between adjacent images is acquired based on color image segmentation method and integral projection method, and then translate a rectangular window of fixed size in accordance with the average disparities to intercept the rendered output images to get the sub images (SIs). Finally, after stitching and mapping of the SIs we obtain the elemental images (EIs), put the EIs into the display device to display 3-dimensional (3D) scene. The experimental results show that we can only use 12 * 12 cameras instead of 59 * 41 cameras to obtain EIs, and the 3D display effect is obvious. The error rate of disparity calculation is 0.433% in both horizontal and vertical directions, which is obviously better than other methods with disparity error rate of 2.597% and 4.762%. The sparse acquisition integral imaging system is more accurate and more convenient which can be used for EI content acquisition for large screen 3D displaying.

Bowen Jia, Shigang Wang, Wei Wu, Tianshu Li, Lizhong Zhang

Marker-Less 3D Human Motion Capture in Real-Time Using Particle Swarm Optimization with GPU-Accelerated Fitness Function

In model-based 3D motion tracking the most computationally demanding operation is evaluation of the objective function, which expresses similarity between the projected 3D model and image observations. In this work, marker-less tracking of full body has been realized in a multi-camera system using Particle Swarm Optimization. In order to accelerate the calculation of the fitness function the rendering of the 3D model in the requested poses has been realized using OpenGL. The experimental results show that the calculation of the fitness score with CUDA-OpenGL is up to 40 times faster in comparison to calculation it on a multi-core CPU using OpenGL-based model rendering. Thanks to CUDA-OpenGL acceleration of calculation of the fitness function the reconstruction of the full body motion can be achieved in real-time.

Bogdan Kwolek, Boguslaw Rymut

Warping and Blending Enhancement for 3D View Synthesis Based on Grid Deformation

This paper proposes an efficient view synthesis scheme based on image warping, which uses grid mesh deformation to guide the mapping process. Firstly as the first contribution we use moving least squares algorithm to get the initial warping position of the reference image. And then as the second contribution a novel grid line constraint is added to the energy equation predefined in a typical image domain warping algorithm which is proposed by Disney Research. Finally, as the third contribution we propose an novel image blending method based on correlation matching to directly solve the stretch problem emerged in image border of the final synthesis result. Experimental results show that our proposed method can get a better visual quality just in image space only, which is a significant advantage compared to the state-of-art view synthesis method who needs not only the corresponding depth maps but also the additional depth information and camera intrinsic and extrinsic parameters.

Ningning Hu, Yao Zhao, Huihui Bai

A Vehicle-Mounted Multi-camera 3D Panoramic Imaging Algorithm Based on Ship-Shaped Model

3D panoramic driving system provides real-time monitoring of a vehicle’s surrounding. It allows drivers to drive more safely without vision blind area. This system captures images of the surroundings by the cameras mounted on the vehicle, maps and stitches the collected images as textures to a 3D model. In this process, the key tasks are to construct a 3D surface model and to design an efficient and accurate texture mapping algorithm. This paper presents a ship-shaped 3D surface model with less distortion and better visual effects. Based on the ship-shaped model, a texture mapping algorithm is proposed, which can obtain the mapping relation between the corrected image and the 3D surface model indirectly by setting up a “virtual imaging plane”. The texture mapping algorithm is accurate and runs fast. Finally, this paper uses an improved weighted average image fusion algorithm to eliminate the splicing traces. Experiments show that the proposed algorithm based on ship-shaped model has better 3D panoramic effect.

Xin Wang, Chunyu Lin, Yi Gao, Yaru Li, Shikui Wei, Yao Zhao

A Quality Evaluation Scheme to 3D Printing Objects Using Stereovision Measurement

The paper presents a comprehensive evaluation method on shape consistency by using three-dimensional scanning, reverse engineering and post-processing. The complete evaluation scheme includes data collection, model alignment and the quality consistency evaluation. Firstly, the point cloud data is obtained by using 3D scanning. Secondly, the printed object and the model are aligned, and we also get the visual point-to-point deviation. Thirdly, some parameters of surface roughness are defined to evaluate comprehensive quality for personalized 3D printed object. Two printed objects from FDM and DLP printer are utilized to test the proposed scheme. The experimental results show that DLP printing is more precise than FDM printing, and it is consistent with the common sense. And it also demonstrates that the efficiency of the proposed scheme to some extent.

Li-fang Wu, Xiao-hua Guo, Li-dong Zhao, Meng Jian

Representation, Analysis and Applications of Large-Scale 3D Multimedia Data


Secure Image Denoising over Two Clouds

Multimedia processing with cloud is prevalent now, which the cloud server can provide abundant resources to processing various multimedia processing tasks. However, some privacy issues must be considered in cloud computing. For a secret image, the image content should be kept secret while conducting the multimedia processing in the cloud. Multimedia processing in the encrypted domain is essential to protect the privacy in cloud computing. Hu et al. proposed a novel framework to perform complex image processing algorithms in encrypted images with two cryptosystems: additive homomorphic encryption and privacy preserving transform. The additive homomorphic cryptosystem used in their scheme causes huge ciphertext expansion and greatly increases the cloud’s computation. In this paper, we modified their framework to a two-cloud scheme, and also implemented the random nonlocal means denoising algorithm. The complexity analysis and simulation results demonstrate that our new scheme is more efficient than Hu’s under the same denoising performance.

Xianjun Hu, Weiming Zhang, Honggang Hu, Nenghai Yu

Aesthetic Quality Assessment of Photos with Faces

Aesthetic quality assessment of photos is a challenging task regarding the complexity of various photos and subjectivity of human’s aesthetic perception. Recent research has suggested that photos with different contents have different aesthetic characters. However, these different aesthetic characters are not considered in previous work of aesthetic assessment. Meanwhile, photos with human faces have become increasingly popular and constitute an important part of social photo collections. In this work, we analyze the characters of this particular category of photos, human faces, and study the impact of them on aesthetic quality estimation. This study could have many potential applications, including selection of high aesthetic face photos, face photo editing and so on. To solve this problem, we design new handcrafted features and fine-tuned a new deep Convolutional Neural Network (CNN) for features. Next, we build decision fusion model to employ all the proposed features for aesthetic estimation. In addition, we analyze the effectiveness of different groups of features in a face photo classification task to better understand their differences. Experimental results show that our proposed features are effective and the classifier outperforms several up-to-date approaches in aesthetic quality assessment.

Weining Wang, Jiexiong Huang, Xiangmin Xu, Quanzeng You, Jiebo Luo

Sensitive Information Detection on Cyber-Space

The fast development of big data brings not only abundant information to extensive Internet users, but also new problems and challenges to cyber-security. Under the cover of Internet big data, many lawbreakers disseminate violence and religious extremism through the Internet, resulting in network space pollution and having a harmful effect on social stability. In this paper, we propose two algorithms, i.e., iterative based semi-supervised deep learning model and humming melody based search model, to detect abnormal visual and audio objects respectively. Experiments on different datasets also show the effectiveness of our algorithms.

Mingbao Lin, Xianming Lin, Yunhang Shen, Rongrong Ji

Optimization Algorithm Toward Deep Features Based Camera Pose Estimation

Deep convolutional neural networks are proved to be end-to-end localization method which tolerates large baselines. However, it relies on the training data and furthermore, camera pose estimation results are not robustness and smooth enough. It all boils down to without back-end optimization (e.g. local bundle adjustment). This paper proposes a deep features based SLAM optimization method as well as improves the pose estimation precision by the constraint function. The contribution of our paper is two-fold: (1) We present constraint function based on similarity for fast 2D-3D points mapping and a new optimization approach that estimates camera exterior parameter using multiple feature fusion. (2) For the problem of instability in Cnn based SLAM, a multiple features ensemble bundle adjustment optimization algorithm is presented. Most existing localization approaches simply approximate pose confidence based on reference point distance. Unlike previous work, we employ reconstruction data as a reference, then, the visible 3D points and its related key-points from off-line data sets by random forests are mapped, and a multiple feature fusion is used to measure the assessment score by an constraint function. The above method is used to optimize deep features based SLAM. Experimental results demonstrate the robustness analysis of our algorithm in handling various challenges for estimation of camera pose.

Han Chen, Feng Guo, Ying Lin, Rongrong Ji



A Robust 3D Video Watermarking Scheme Based on Multi-modal Visual Redundancy

Image watermarking is a popular research topic in signal processing. The paper presents a blind watermarking scheme for 3D videos. Given a 3D video, each frame of both views is divided into blocks. Watermark information is embedded by modulating selected DCT coefficients of each block. The modulation strength is controlled by multi-modal visual redundancies existing in the 3D video. Specifically, we compute an intra-frame Just-noticeable Distortion (JND) value and an inter-frame reference value for the block to determine the strength. The former reflects the visual redundancies in the image plane. The latter represents the visual redundancies of the block from aspects of motion between sequential frames and disparity between the left and right views. We validate the robustness of the proposed watermarking scheme under various attacks through experiments. More importantly, the visual quality of the 3D videos watermarked by our scheme is proved to be as good as that of the original videos, by a proposed LDQ (Loss of Disparity Quality) criterion specially designed for 3D videos, as well as PSNR of single views.

Congxin Cheng, Wei Ma, Yuchen Yang, Shiyang Zhang, Mana Zheng

Partial Secret Image Sharing for (n, n) Threshold Based on Image Inpainting

Shamir’s polynomial-based secret image sharing (SIS) scheme and visual secret sharing (VSS) also called visual cryptography scheme (VCS), are the primary branches in SIS. In traditional (k, n) threshold secret sharing, a secret image is fully (entirely) generated into n shadow images (shares) distributed to n associated participants. The secret image can be recovered by collecting any k or more shadow images. The previous SIS schemes dealt with the full secret image neglecting the possible situation that only part of the secret image needs protection. However, in some applications, only target part of the secret image may need to be protected while other parts may be not in a full image. In this paper, we consider the partial secret image sharing (PSIS) issue as well as propose a PSIS scheme for (n, n) threshold based on image inpainting and linear congruence (LC). First the target part is manually selected or marked in the color secret image. Second, the target part is automatically removed from the original secret image to obtain the same input cover images (unpainted shadow images). Third, the target secret part is generated into the pixels corresponding to shadow images by LC in the processing of shadow images texture synthesis (inpainting), so as to obtain the shadow images in a visually plausible way. As a result, the full secret image including the target secret part and other parts will be recovered losslessly by adding all the inpainted meaningful shadow images. Experiments are conducted to evaluate the efficiency of the proposed scheme.

Xuehu Yan, Yuliang Lu, Lintao Liu, Shen Wang, Song Wan, Wanmeng Ding, Hanlin Liu

A New SMVQ-Based Reversible Data Hiding Scheme Using State-Codebook Sorting

A reversible data hiding scheme makes it possible to extract secret data and recover the cover image without any distortion. This paper presents a novel reversible data hiding scheme based on side-match vector quantization (SMVQ). If the original SMVQ index value of an image block is larger than a threshold, a technique called state-codebook sorting (SCS) is used to create two state-codebooks to re-encode the block with a smaller index. In this way, more indices are distributed around zero, so the embedding scheme only needs a few bits to encode the indices, which produces more extra space for hiding secret data. The experimental results indicate that the proposed scheme is superior to some previous schemes in embedding performance while maintaining the same visual quality as that of VQ compression.

Juan-ni Liu, Quan Zhou, Yan-lang Hu, Jia-yuan Wei

An Efficient Privacy-Preserving Classification Method with Condensed Information

Privacy-preserving is a challenging problem in real-world data classification. Among the existing classification methods, the support vector machine (SVM) is a popular approach which has a high generalization ability. However, when datasets are privacy and complexity, the processing capacity of SVM is not satisfactory. In this paper, we propose a new method CI-SVM to achieve efficient privacy-preserving of the SVM. On the premise of ensuring the accuracy of classification, we condense the original dataset by a new method, which transforms the privacy information to condensed information with little additional calculation. The condensed information carries the class characteristics of the original information and doesn’t expose the detailed original data. The time-consuming of classification is greatly reduced because of the fewer samples as condensed information. Our experiment results on datasets show that the proposed CI-SVM algorithm has obvious advantages in classification efficiency.

Xinning Li, Zhiping Zhou

Cross-Class and Inter-class Alignment Based Camera Source Identification for Re-compression Images

With the sophisticated machine learning technology developing the state of art of model based camera source identification has achieved a high level of accuracy in the case of matching identification, which means the feature vectors of training and test sets follow the same statistical distribution. For a more practical scenario, identifying the camera source of an image transmitted via social media applications and internet is a much more interesting and challenging work. Undergoing serials of manipulations, re-compression for instance, the feature vectors of training and test sets mismatch, thus decreasing the identification accuracy. In this paper, cross-class and inter-class alignment based algorithms, inspired by transfer learning, are proposed to minimize the distribution difference between the training and the test sets. Experiments on four cameras with five image quality factors indicate that the proposed cross-class, inter-class alignment based algorithms and their combination outperform the existing LBP method, and presents high identification accuracies in re-compression images.

Guowen Zhang, Bo Wang, Yabin Li

JPEG Photo Privacy-Preserving Algorithm Based on Sparse Representation and Data Hiding

Wide spread of electronic imaging equipment and Internet makes photo sharing a popular and influential activity. However, privacy leaks caused by photo sharing have raised serious concerns, and effective methods on privacy protection are required. In this paper, a new privacy-preserving algorithm for photo sharing, especially for human faces in group photos, is proposed based on sparse representation and data hiding. Our algorithm uses a cartoon image to mask the region that contains privacy information, and performs sparse representation to find a more concise expression for this region. Furthermore, the sparse coefficients and part of the residual errors obtained from sparse representation are encoded and embedded into the photo by means of data hiding with a secret key, which avoids introducing extra storage overhead. In this way, the privacy-protected photo is obtained, and only people with correct key can reverse it to the original version. Experimental results demonstrate that the proposed privacy-preserving algorithm does not increase storage or bandwidth requirements, and meanwhile ensures a good quality of the reconstructed image.

Wenjie Li, Rongrong Ni, Yao Zhao

Surveillance and Remote Sensing


An Application Independent Logic Framework for Human Activity Recognition

Cameras may be employed to facilitate data collection, to serve as a data source for controlling actuators, or to monitor the status of a process which includes tracking. In order to recognize interesting events across different domains in this study we propose a cross domain framework supported by relevant theory, which will lead to an Open Surveillance concept - a systemic organization of components that will streamline future system development. The main contribution is the logic reasoning framework together with a new set of context free LLEs which could be utilized across different domains. Currently human action datasets from MSR and a synthetic human interaction dataset are used for experiments and results demonstrate the effectiveness of our approach.

Wengang Feng, Yanhui Xiao, Huawei Tian, Yunqi Tang, Jianwei Ding

An Altitude Based Landslide and Debris Flow Detection Method for a Single Mountain Remote Sensing Image

The altitude information of single remote sensing image may aid in detecting the natural disaster, such as landslide or debris flow. Accordingly, in this paper, an approach based on altitude is proposed to detect landslide and debris flow for a single mountain remote sensing image. Firstly, we extract the features of landslide and debris flow areas and introduce slow feature analysis (SFA) to improve the feature distinguishability. Then, machine learning and a training model are used to detect suspected landslide and debris flow areas. By using the altitude information calculated by dark channel prior, we analyze the altitude distribution of suspected areas to judge whether landslide and debris flow occur in these regions. The experimental results of multiple mountain remote sensing images with landslide or debris flow demonstrate that the proposed algorithm can accurately detect landslide debris flow areas in a single mountain remote sensing image.

Tingting Sheng, Qiang Chen

Improved Fully Convolutional Network for the Detection of Built-up Areas in High Resolution SAR Images

High resolution synthetic aperture radar (SAR) images have been widely used in urban mapping and planning, and built-up areas in high resolution SAR images are the key point to the urban planning. Because of the high dynamics and multiplicative noise in high resolution SAR images, it is always difficult to detect built-up areas. To address this matter, we put forward an Improved Fully Convolutional Network (FCN) to detect built-up areas in high resolution SAR images. Our improved FCN model adopt a context network in order to expand the receptive fields of feature maps, and it is because that contextual fields of feature maps which are demonstrated plays a critical role in semantic segmentation performance. Besides, transfer learning is applied to improve the performance of our model because of the limited high resolution SAR images. Experiment results on the TerraSAR-X high resolution images of Beijing areas outperform the traditional methods, Convolutional Neural Networks (CNN) method and original FCN method.

Ding-Li Gao, Rong Zhang, Di-Xiu Xue

SAR Image Registration with Optimized Feature Descriptor and Reliable Feature Matching

The scale-invariant feature transform (SIFT) algorithm has been widely used in feature-based remote sensing image registration. However, it may be difficult to find sufficient correct matches for SAR image pairs in some cases that exhibit significant intensity difference and geometric distortion. In this letter, a new robust feature descriptor extracted with Sobel operator and improved gradient location orientation hologram (GLOH) feature is introduced to overcome nonlinear difference of image intensity between SAR images. Then, an effective false correspondences removal method by improving the analysis of bivariate histogram is used to refine the initial matches. Finally, a reliable method for affine transformation error analysis of adjacent features is put forward to increase the number of correct matches. The experimental results demonstrate that the proposed method provides better registration performance compared with the standard SIFT algorithm and SAR-SIFT algorithm in terms of number of correct matches, correct match rate and aligning accuracy.

Yanzhao Wang, Juan Su, Bichao Zhan, Bing Li, Wei Wu

An Improved Feature Selection Method for Target Discrimination in SAR Images

Due to synthetic aperture radar (SAR) imaging principals, at particular azimuth or depression angle, targets and clutters may become very hard to distinguish, to solve this problem, many complicated features have been developed, this is not only a tough work, but little improvement of discrimination accuracy is obtained. In this paper, an improved target discrimination method is proposed, one-class quadratic discriminator (OCQD) has been used, compared with traditional method using Bayes discriminator, when number of features is limited, our new method has higher target classification correction than old methods, considering that target classification correction is more important than clutter classification correction, our proposed method has a good performance on target discrimination. First, discrimination scheme based on genetic algorithm (GA) is introduced. Second, feature extraction algorithms of SAR images have been introduced. Third, an improved feature selection method based on GA has been proposed, in which the OCQD has been used and a new fitness function has been designed. Finally, the theory of OCQD algorithm is explained. According to the experiment result based on moving and stationary target acquirement and recognition (MSTAR) database, our new method reduces target undetected rate by 1.5% compared to the state-of-the-art methods in target discrimination, besides, the efficiency of feature selection based on GA has been improved by 77%.

Yanyan Li, Aihua Cai

The Detection of Built-up Areas in High-Resolution SAR Images Based on Deep Neural Networks

The detection of built-up areas is an important task for high-resolution Synthetic Aperture Radar (SAR) applications, such as urban planning and environment evaluation. In this paper, we proposed a deep neural network based on convolutional neural networks for the detection of built-up areas in SAR images. Since lables of neighboring pixels have strong correlation in SAR images, informations on labels of neighboring pixels could help making better prediction. In addition, built-up areas in SAR images possess various scales, multiscale representations is critical for the detection of built-up areas. Based on above observations, we introduce the structured prediction into our network, where a network classifies multiple pixels simultaneously. Meanwhile, we attempt to adopt multi-level features in our network. Experiments on TerraSAR-X high resolution SAR images over Beijing show that our method outperforms traditional methods and CNNs methods.

Yunfei Wu, Rong Zhang, Yue Li

SAR Automatic Target Recognition Based on Deep Convolutional Neural Network

In the past years, researchers have shown more and more interests in synthetic aperture radar (SAR) automatic target recognition (ATR), and many methods have been proposed and studied for radar target recognition. Recently, deep learning methods, especially deep convolutional neural networks (CNN) has proven extremely competitive in image and speech recognition tasks. In this paper, a deep CNN model has been proposed for SAR automatic target recognition. The proposed deep model named SARnet, has two stage convolutional-pooling layers and two full-connected layers. Due to the demand of requirement of large scale of the data in deep learning, we proposed an augmentation method to get a large scale database for the training of CNN model, by which the CNN model can learn more useful features through the large scale database. Experimental results on the MSTAR database show the effectiveness of the proposed model and has achieved encouraging results with a correct recognition rate of 95.68%.

Ying Xu, Kaipin Liu, Zilu Ying, Lijuan Shang, Jian Liu, Yikui Zhai, Vincenzo Piuri, Fabio Scotti


Weitere Informationen

Premium Partner