main-content

## Über dieses Buch

This book constitutes the refereed conference proceedings of the 8th International Conference on Image and Graphics, ICIG 2015 held in Tianjin, China, in August 2015. The 164 revised full papers and 6 special issue papers were carefully reviewed and selected from 339 submissions. The papers focus on various advances of theory, techniques and algorithms in the fields of images and graphics.

## Inhaltsverzeichnis

### Blind Motion Deblurring Based on Fused $$\ell _0$$ - $$\ell _1$$ Regularization

In blind motion deblurring, various regularization models using different priors of either the image or the blur kernel are proposed to estimate the blur kernel, with tendency towards naive

$$\ell _0$$

norm or its approximations. In this paper, we propose a novel fused

$$\ell _0$$

-

$$\ell _1$$

regularization approach to estimate the motion blur kernel by imposing sparsity and continuation properties on natural images. A fast numerical scheme is then deduced by coupling operator splitting and the Augmented Lagrangian method to solve the proposed model efficiently. Experimental results on both synthetic and real data demonstrate the effectiveness of the proposed method and the superiority over the state-of-the-art methods.

Kai Wang, Yu Shen, Liang Xiao, Zhihui Wei, Linxue Sheng

### Blood Vessel Segmentation of Retinal Images Based on Neural Network

Blood vessel segmentation of retinal images plays an important role in the diagnosis of eye diseases. In this paper, we propose an automatic unsupervised blood vessel segmentation method for retinal images. Firstly, a multi-dimensional feature vector is constructed with the green channel intensity and the vessel enhanced intensity feature by the morphological operation. Secondly, self-organizing map (SOM) is exploited for pixel clustering, which is an unsupervised neural network. Finally, we classify each neuron in the output layer of SOM as retinal neuron or non-vessel neuron with Otsu’s method, and get the final segmentation result. Our proposed method is validated on the publicly available DRIVE database, and compared with the state-of-the-art algorithms.

Jingdan Zhang, Yingjie Cui, Wuhan Jiang, Le Wang

### Compressed Binary Discriminative Feature for Fast UAV Image Registration

Efficiently UAV images mosaicking is of critical importance for the application of disaster management, in which fast image registration plays an important role. Towards fast and accurate image registration, the key design lies in the keypoint description, to which end SIFT and SURF are widely leveraged in the related literature. However, the expensive computation and memory costs restrict their potential in disaster management. In this paper, we proposed a novel keypoint descriptor termed CBDF (Compressed Binary Discriminative Feature). A cascade of binary strings is computed by efficiently comparing image gradients static information over a log-polar location grid pattern. Extensive evaluations on benchmark datasets and real-world UAV images show that CBDF yields a similar performance with SIFT and SURF, and it is much more efficient in terms of both computation time and memory.

Li-Chuan Geng, Ze-xun Geng, Guo-xi Wu

### Context-Aware Based Mobile Augmented Reality Browser and its Optimization Design

The latest research shows that despite user’s increasing interests in the Augmented Reality service, its usage frequency is low and the usage time is short. The intrinsic reason lies in its ignorance of human factors, which causes the design defect in the design stage of Augmented Reality browsers. In order to improve the quality of user experience, this paper introduces the design and optimization of a mobile Augmented Reality browser system by adopting the concept of context-aware service pushing based on scene classification, and proposes a novel interactive mode which combines Virtual Reality and Augmented Reality (AR/VR) on the basis of mental model theory. The whole design flow is verified by the design of the AR/VR hybrid interface. Experimental result shows that the proposed system of adopting hybrid interactive mode based on context-aware frameworks can not only increase the user acceptance of Augmented Reality browser, but also significantly improve quality of experiences of human-computer interaction.

Yi Lin, Yue Liu, Yong-tian Wang

### Contour-Based Plant Leaf Image Segmentation Using Visual Saliency

Segmentation based on active contour has been received widespread concerns recently for its good flexible performance. However, most available active contour models lack adaptive initial contour and priori information of target region. In this paper, we presented a new method that is based on active contours combined with saliency map for plant leaf segmentation. Firstly, priori shape information of target objects in input leaf image which is used to describe the initial curve adaptively is extracted with the visual saliency detection method in order to reduce the influence of initial contour position. Furthermore, the proposed active model can segment images adaptively and automatically. Experiments on two applications demonstrate that the proposed model can achieve a better segmentation result.

Zhou Qiangqiang, Wang Zhicheng, Zhao Weidong, Chen Yufei

### Cooperative Target Tracking in Dual-Camera System with Bidirectional Information Fusion

The Dual-Camera system which consists of a static camera and a pan-tilt-zoom (PTZ) camera, plays an importance role in public area monitoring. The superiority of this system lies in that it can offer wide area coverage and highly detailed images of the interesting target simultaneously. Most existing works in Dual-Camera systems only consider simplistic scenarios, which are not robust in real situations, and no quantitative comparison between different tracking algorithms is provided. In this paper, we propose a cooperative target tracking algorithm with bidirectional information fusion which is robust even in moderately crowded scenes. Moreover, we propose a method to compare the algorithms quantitatively by generating a virtue PTZ camera. The experimental results on realistic simulations and the implementation on a real surveillance system validate the effectiveness of the proposed algorithm.

Jingjing Wang, Nenghai Yu

### Cute Balloons with Thickness

Based on the finite element method, we present a simple volume-preserved thin shell deformation algorithm to simulate the process of inflating a balloon. Different from other thin shells, the material of balloons has special features: large stretch, small bend and shear, and incompressibility. Previous deformation methods often focus on typical three-dimensional models or thin plate models such as cloth. The rest thin shell methods are complex or ignore the special features of thin shells especially balloons. We modify the triangle element to simple three-prism element, ignore bending and shearing deformation, and use volume preservation algorithm to match the incompressibility of balloons. Simple gas model is used, which interacts with shells to make the balloons inflated. Different balloon examples have been tested in our experiments and the results are compared with those of other methods. The experiments show that our algorithm is simple and effective.

Qingyun Wang, Xuehui Liu, Xin Lu, Jianwen Cao, Wen Tang

### Decision-Tree Based Hybrid Filter-Wrapping Method for the Fusion of Multiple Feature Sets

This paper proposes a decision-tree based hybrid filter-wrapping method to solve multiple-feature recognition problems. The decision tree is constructed by various feature sets. Each tree node comprises all possibilities of individual feature combinations: the original features, the serial fusion of original features, and the parallel fusion of original features. In order to generate the best discriminate feature set, a two-stage feature searching algorithm is developed. The first stage is a kind of feature filtering method to find out the local optimal individual features in each level of the tree using a LDA-motivated discrimination criterion. The second stage is a global optimal feature vector generation based on a kind of forward wrapping method. In contrast to literature feature fusion methods which considered filter and wrapping separately, our method combines them together. Furthermore, since our method takes all possibilities of feature combinations into consideration, it is more likely to generate the best discriminate feature set than other feature fusion methods. In addition, our method also compensates discrimination if some portions of original features are missing. The effectiveness of our method is evaluated on a 3D dataset. Comparative experimental results show that our method can impressively improve the recognition accuracy and has better performance than existing methods.

Cuicui Zhang, Xuefeng Liang, Naixue Xiong

### Depth Map Coding Method for Scalable Video Plus Depth

Depth map coding along with its associated texture video plays an important role in the display of 3D scene by Depth Image Based Rendering technique. In this paper, the inter-layer motion prediction in Scalable Video Coding is applied in order to utilize the similarity of the motion data between texture video and its associated depth map. Additionally, in order to preserve the edge in depth map, the edge detection algorithm is proposed in depth map coding, which combines the conventional Sobel edge detection and the block-based coding scheme of SVC. And the dynamic quantization algorithm is proposed to preserve the information of boundary regions of the depth map. Simulation results show that the proposed method achieves the BDPSNR gains from 0.463 dB to 0.941 dB for the depth map, and the Bjøntegaard bitrates savings range from 9.72 % to 19.36 % for the video plus depth.

Panpan Li, Ran Ma, Yu Hou, Ping An

### Depth Map Upsampling Using Segmentation and Edge Information

A depth upsampling method based on Markov Random Field is proposed, considering the depth and color information. First, the initial interpolated depth map is inaccurate and oversmooth, we use a rectangle window centered on every pixel to search the maximum and minimum depth value of the depth map to find out the edge pixels. Then, we use the depth information to guide the segmentation of the color image and build different data terms and smoothness terms for the edge and non-edge pixels. The result depth map is piecewise smooth and the edge is sharp. In the meanwhile, the result is good where the color information is consistent while the depth is not or where the depth information is consistent while the color is not. Experiments show that the proposed method performs better than other upsampling methods in terms of mean absolute difference (MAD).

Shuai Zheng, Ping An, Yifan Zuo, Xuemei Zou, Jianxin Wang

### Design of a Simulated Michelson Interferometer for Education Based on Virtual Reality

The physical experiment based on science inquiry learning shows its advantages compared with traditional class demonstration. However, the participation in physical experiments is often stylized and there are still many limitations when promoting the laboratory learning. This paper presents the construction of a Virtual Optics Laboratory which is applied to the experiment of the Michelson interferometer. On the current platform, the phenomena of interference and background theories are visualized for the purpose of efficiently promoting conceptual understanding. Factors of the experimental environment such as wavelengths, properties of optical elements and perspectives can be altered with interactive tools, enabling students to set up the system, observe the phenomena of interference as well as measure wavelengths and small displacements. The simulated laboratories can be of great assistance in the development of inquiry skills to enhance school education.

Hongling Sun, Xiaodong Wei, Yue Liu

### Detecting Deterministic Targets by Combination of Polarimetric Fork and Cloude-Pottier Decomposition for Polarimetric Synthetic Aperture Radar Imagery

For detecting the deterministic targets in polarimetric synthetic aperture imagery, an improved method is suggested in this paper. A concise introduction is put forward about the principle of detecting single targets using polarimetric fork at first. The first drawback of the classic method is that the threshold of coherence can not be automatically obtained. The rest of drawback of the classic method is that diverse scattering mechanisms share a same threshold. A revised schema using Cloude-Pottier decomposition is suggested. The distribution of entropy value of Cloude-Pottier decomposition is used to calculate threshold. While a coarse classification is done on the image data and this procedure can be employed to obtain the threshold of various scattering mechanisms. An experiment is put into practice for measuring the validity of detection. The new proposed method outperforms the classic method via the experiment.

Sheng Sun, Zhijia Xu, Taizhe Tan

### Detection of Secondary Structures from 3D Protein Images of Medium Resolutions and its Challenges

Protein secondary structures such as α-helices and β-strands are major structural components in most proteins. The position of secondary structures provides important constraints in computing the tertiary structure of a protein. Electron cryomicroscopy is a biophysical technique that produces 3-dimensional images of large molecular complexes. For images at medium resolutions, such as 5–10 Å, major secondary structures may be computationally detected. This paper summarizes our recent work in detection of secondary structures using

SSETracer

,

SSELearner

,

StrandTwister

and

StrandRoller

. The detection of helices and β-strands is illustrated using

SSETracer

and

StrandTwister

with a small dataset.

Jing He, Dong Si, Maryam Arab

### Determination of Focal Length for Targets Positioning with Binocular Stereo Vision

Targets positioning with binocular stereo vision has the potential of usage for surveillance on airdrome surface. This paper analyses the impact of focal length of a camera on the accuracy of positioning targets on the ground and proposes a way of determining the value of the focal length, with which measuring errors on the edges of the surveillance area could be mitigated. With profound analysis of the impact of many system parameters on the coordinates of targets, it is found that the focal length predominates. In order to establish the basis for calibration, a set of non-linear equations are solved for relevant system parameters, thus yielding their explicit expressions. An on-site experiment is implemented for calibration for parameters, as well as measuring the positions of targets. A test environment is established and data are obtained for calculating curves of coordinates versus the focal length, which gives us a clearer indication of determining the focal length of the camera. In a similar way, the curve gives also an implication of selecting the appropriate points where to place calibration objects, thus

reducing errors of

targets positioning.

Wang Jian, Wang Yu-sheng, Liu Feng, Li Qing-jia, Wang Guang-chao

### Dimensionality Reduction for Hyperspectral Image Based on Manifold Learning

A novel dimensionality reduction method named spectral angle and geodesic distance-based locality preserving projection (SAGD-LPP) was proposed in this paper. Considering the physical characters of hyperspectral imagery, the proposed method primarily select neighbor pixels in the image based on spectral angle distance. Then, using the geodesic distance matrix construct a weighted matrix between pixels. Finally, based on this weighted matrix, the idea of locality preserving projection algorithm is applied to reduce the dimensions of hyperspectral image data. The use of spectral angle to measure the distance between pixels can effectively overcome the spectral amplitude error caused by the uncertainty. At the same time, the use of geodesic distance to construct weight matrix can better reflect the internal structure of the data manifold than the use of Euclidean distance. Therefore, the proposed methods can reserve effectively the original characters of dataset with less loss in the useful information and less distortion on the data structure. Experimental results on real hyperspectral data demonstrate that the proposed methods have higher detection accuracy than the other methods when applied to the target detection of hyperspectral imagery after dimensionality reduction.

Yiting Wang, Shiqi Huang, Hongxia Wang, Daizhi Liu, Zhigang Liu

### Discriminative Feature Learning with Constraints of Category and Temporal for Action Recognition

Recently, with the availability of the depth cameras, a lot of studies of human action recognition have been conducted on the depth sequences. Motivated by the observations that each pose has its relative location during a complete action sequence, and similar actions have the fine spatio-temporal differences. We propose a novel method to recognize human actions based on the depth information in this paper. Representations of depth maps are learned and reconstructed using a stacked denoising autoencoder. By adding the category and temporal constraints, the learned features are more discriminative, able to capture the subtle but significant differences between actions, and mitigate the nuisance variability of temporal misalignment. Greedy layer-wise training strategy is used to train the deep neural network. Then we employ temporal pyramid matching on the feature representation to generate temporal representation. Finally a linear SVM is trained to classify each sequence into actions. We compare our proposal on MSR Action3D dataset with the previous methods, and the results shown that the proposed method significantly outperforms traditional model, and comparable to, state-of-art action recognition performance. Experimental results also indicate the great power of our model to restore highly noisy input data.

Zhize Wu, Shouhong Wan, Peiquan Jin, Lihua Yue

### Discriminative Neighborhood Preserving Dictionary Learning for Image Classification

In this paper, a discriminative neighborhood preserving dictionary learning method is proposed. The geometrical structure of the feature space is used to preserve the similarity information of the features, and the features’ class information is employed to enhance the discriminative power of the learned dictionary. The Laplacian matrix which expresses the similarity information and the class information of the features is constructed and used in the objective function. Experimental results on four public datasets demonstrate the effectiveness of the proposed method.

Shiye Zhang, Zhen Dong, Yuwei Wu, Mingtao Pei

### Distribution of FRFT Coefficients of Natural Images

For the convenience of providing temporal and spectral information by a single variable, fractional Fourier transformation (FRFT) is more and more applied to image processing recently. This paper focuses on the statistical regularity of FRFT coefficients of natural images and proposes that the real and imaginary parts of FRFT coefficients of natural images take on the generalized Gaussian distribution, the coefficient modulus follow the gamma distribution and the coefficient phase angles tend to the uniform distribution, moreover, the real and imaginary parts of coefficient phases similar to the extended beta distribution. These underlying statistics can provide theoretical basis for image processing in FRFT, such as dimensionality reduction, feature extraction, smooth denoising, digital forensics, watermarking, etc.

Li Jiang, Guichi Liu, Lin Qi

### Dual-Projection Based High-Precision Integral Imaging Pickup System

In this paper, a high-precision integral imaging (II) pickup system for the real scene is proposed. The dual-projection optical pickup method is utilized to obtain the elemental image array for the II display. The proposed method is robust to the position deviations of the projectors and camera. The calibration of the camera is simplified. Furthermore, the pickup of the II is not limited by the complex optical and mechanical structures. Experimental results show that the proposed system can generate the continuous and tunable parallaxes. With the proposed II pickup and display system, the high-quality 3D images for the real scene can be reconstructed efficiently.

Zhao-Long Xiong, Qiong-Hua Wang, Huan Deng, Yan Xing

### Eave Tile Reconstruction and Duplication by Image-Based Modeling

We present a pipeline for eaves tiles geometry reconstruction from single binary image, which collected in their monotype album. This pipeline consists of shape from shading based spatial points reconstruction, feature-preserved point cloud smoothing for high quality models, and 3D printing techniques to obtain the duplication of eaves tiles. Compared with other reverse engineering methods, we reduce the high demand of original point cloud data in modeling, efficiently reconstruct the large number of eaves tiles, and by duplicating we reappear the lost culture relic face after thousands of years. We illustrate the performance of our pipeline on Qin Dynasty eaves tiles.

Li Ji Jun Nan, Geng Guo Hua, Jia Chao

### Edge Directed Single Image Super Resolution Through the Learning Based Gradient Regression Estimation

Single image super resolution (SR) aims to estimate high resolution (HR) image from the low resolution (LR) one, and estimating accuracy of HR image gradient is very important for edge directed image SR methods. In this paper, we propose a novel edge directed image SR method by learning based gradient estimation. In proposed method, the gradient of HR image is estimated by using the example based ridge regression model. Recognizing that the training samples of the given sub-set for regression should have similar local geometric structure based on clustering, we employ high frequency of LR image patches with removing the mean value to perform such clustering. Moreover, the precomputed projective matrix of the ridge regression can reduce the computational complexity further. Experimental results suggest that the proposed method can achieve better gradient estimation of HR image and competitive SR quality compared with other SR methods.

Dandan Si, Yuanyuan Hu, Zongliang Gan, Ziguan Cui, Feng Liu

### Evacuation Simulation Incorporating Safety Signs and Information Sharing

In this paper, we combine local and global technology to simulate the behaviors of crowd. In the local, an improved weighting method to calculate preferred velocity is proposed, which reflects the current motion trend of a pedestrian. In the global, a decision tree is implemented to model the dynamic decision-making process, which plays an important role in choosing between several paths to destination. At the same time, we consider the influence of safety signs and information sharing on the behavior of pedestrians and give good security analysis to evacuation planning.

Yu Niu, Hongyu Yang, Jianbo Fu, Xiaodong Che, Bin Shui, Yanci Zhang

### Facial Stereo Processing by Pyramidal Block Matching

This paper describes the Pyramidal Block Matching (PBM) stereo method. It uses a pyramidal approach and a global energy function to obtain the disparity image. First, the input images are rectified to obtain row-aligned epipolar geometry. Then the face is segmented out of each image and a face pyramid is generated. The main difference to our approach is that the first layer of pyramid is the whole face. Matching result of the first layer provides input to the next layer, where it is used to constrain the search area for matching. This process continues on each layer. After that, a global energy function is designed to remove the wrong pixels and get a smoother result. A comparison on face images shows that the generated projection results of PBM are the closest to the ground truth images. A face recognition experiment is also performed, and PBM achieves the best recognition rates.

Jing Wang, Qiwen Zha, Yubo Yang, Yang Liu, Bo Yang, Dengbiao Tu, Guangda Su

### Fast Algorithm for Finding Maximum Distance with Space Subdivision in E2

Finding an exact maximum distance of two points in the given set is a fundamental computational problem which is solved in many applications. This paper presents a fast, simple to implement and robust algorithm for finding this maximum distance of two points in E

2

. This algorithm is based on a polar subdivision followed by division of remaining points into uniform grid. The main idea of the algorithm is to eliminate as many input points as possible before finding the maximum distance. The proposed algorithm gives the significant speed up compared to the standard algorithm.

Vaclav Skala, Zuzana Majdisova

### Fast Unconstrained Vehicle Type Recognition with Dual-Layer Classification

This paper tackles the problem of vision-based vehicle type recognition, which aims at outputting a semantic label for the given vehicle. Most existing methods operate on a similar situation where vehicle viewpoints are not obviously changed and the foreground regions can be well segmented to extract texture, edge or length-width ratio. However, this underlying assumption faces severe challenges when the vehicle viewpoint varies apparently or the background is clutter. Thus we propose a dual-layer framework that can jointly handle the two challenges in a more natural way. In the training stage, each viewpoint of each type of vehicles is denoted as a sub-class, and we treat a pre-divided region of images as a sub-sub-class. In the first layer, we train a fast Exemplar-LDA classifier for each sub-sub-class. In the second layer of the training stage, all the Exemplar-LDA scores are concatenated for the consequent training of each sub-class. Due to introducing Exemplar-LDA, our approach is fast for both training and testing. Evaluations of the proposed dual-layer approach are conducted on challenging non-homologous multi-view images, and yield impressive performance.

Xiao-Jun Hu, Bin Hu, Chun-Chao Guo, Jian-Huang Lai

### Feature Matching Method for Aircraft Positioning on Airdrome

The binocular stereoscopic vision technology could be used to locate targets on an aerodrome. To ensure positioning targets, the pixels on images from two vision sensors should be matched accordingly. Combining advantages of both SIFT (Scale-invariant feature transform) and an epipolar line constraint equation, a method of pixel matching is proposed. Roughly-matched points are first obtained using SIFT, in which some mismatched ones exist. Matched points are different from mismatched ones when an epipolar line constraint equation is introduced. Although matched points do not always meet the epipolar line constraint equation because of system measuring errors, the difference between the values of a matrix and critical threshold could be used to distinguish the mismatched pixels from roughly matched ones. This method can not only ensure the accuracy for the target matching, but also automatically find out feature points of the aircraft used for positioning.

Jian Wang, Yubo Ni

### Flow Feature Extraction Based on Entropy and Clifford Algebra

Feature extraction is important to the visualization of large scale flow fields. To extract flow field features, we propose a new method that is based on Clifford algebra and information entropy theory. Given an input 3D flow field defined on uniform grids, it is firstly converted to a multi-vector field. We then compute its flow entropy field according to information theory, and choose high entropy regions to do the Clifford convolution with predefined multi-vector filter masks. Features are determined on the convolution results. With this method, we can locate, identify, and visualize a set of flow features. And test results show that our method can reduce computation time and find more features than the topology-based method.

Xiaofan Liu, Wenyao Zhang, Ning Zheng

### Frame Rate Up-Conversion Using Motion Vector Angular for Occlusion Detection

In this paper, we study on handling the issue of occlusions in frame rate up-conversion (FRUC), which has been widely used to reconstruct high-quality videos presented on liquid crystal display. Depending on different types of occlusion problems, adaptive motion estimation is carried out for reducing computational complexity. Luminance information based RGB angular distance color difference formula is proposed for improving Unsymmetrical-cross multi-hexagon grid search (UMHexagonS) motion estimation, which reduces the occlusion regions resulted by the wrong motion vectors. Non-occlusion regions are determined by motion vector angular, furthermore, exposed and occluded areas are located by comparisons of interpolated frames within directional motion estimation. Consequently, adaptive motion compensation is introduced to calculate interpolated frames. Experimental results demonstrate that our scheme has a superior performance compared with preciously proposed FRUC schemes.

abstract

environment.

Yue Zhao, Ju Liu, Guoxia Sun, Jing Ge, Wenbo Wan

### Fusion of Skeletal and STIP-Based Features for Action Recognition with RGB-D Devices

Along with the popularization of the Kinect sensor, the usage of marker-less body pose estimation has been enormously eased and complex human actions can be recognized based on the 3D skeletal information. However, due to errors in tracking and occlusion, the obtained skeletal information can be noisy. In this paper, we compute posture, motion and offset information from skeleton positions to represent the global information of action, and build a novel depth cuboid feature (called HOGHOG) to describe the 3D cuboid around the STIPs (spatiotemporal interest points) to handle cluttered backgrounds and partial occlusions. Then, a fusion scheme is proposed to combine the two complementary features. We test our approach on the public MSRAction3D and MSRDailyActivity3D datasets. Experimental evaluations demonstrate the effectiveness of the proposed method.

Ting Liu, Mingtao Pei

### GA Based Optimal Design for Megawatt-Class Wind Turbine Gear Train

In this paper, we propose a novel megawatt-class wind turbine gear train optimal design method based on genetic algorithm. Firstly, we construct an objective function to obtain the optimal quality of gear train with the constrain conditions of ratio, adjacency, assemble, gear bending fatigue and gear surface contact fatigue limitations. Then, genetic algorithm is applied to achieve the optimal parameter of the objective function. Finally, the finite element analysis method is taken to verify the feasibility of the optimization results. One type of wind turbine gearbox is given as the detail example to verify the effective of proposed method, and the verification results show that the presented method can achieve good effect for wind turbine gearbox.

Jianxin Zhang, Zhange Zhang

### Generalized Contributing Vertices-Based Method for Minkowski Sum Outer-Face of Two Polygons

A new method is presented for computing the Minkowski sum outer-face of two polygons of any shape. Stemming from the contributing vertex concept, the concept of generalized contributing vertex is proposed. Based on the new concept, an efficient algorithm is developed, which starts from the construction of the superset of the Minkowski sum edges. The superset is composed of three types of edges: translated-corner edges, translated edges and corner edges. Then the Minkowski sum outer-face is extracted from the arrangement of the superset edges. The algorithm is implemented using C++ and the Computational Geometry Algorithms Library (CGAL). The experiments including very complicated polygons are conducted, suggesting that the proposed algorithm is more efficient than other existing algorithms in most cases.

Peng Zhang, Hong Zheng

### Handwritten Character Recognition Based on Weighted Integral Image and Probability Model

A system of the off-line handwritten character recognition based on weighted integral image and probability model is built in this paper, which is divided into image preprocessing and character recognition. The objects of recognition are digitals and letters. In the image preprocessing section, an adaptive binarization method based on weighted integral image is proposed, which overcomes the drawbacks in the classic binarization algorithms: noise sensitivity, edge coarseness, artifacts etc.; In the character recognition section, combined with statistical features and structural features, an probability model based on the Bayes classifier and the principle of similar shapes is developed. This method achieves a high recognition rate with rapid processing, strong anti-interference ability and fault tolerance.

Jia Wu, Feipeng Da, Chenxing Wang, Shaoyan Gai

### Hard Exudates Detection Method Based on Background-Estimation

Hard exudates (HEs) are one kind of the most important symptoms of Diabetic Retinopathy (DR). A new method based on background-estimation for hard exudates detection is presented. Firstly, through background-estimation, foreground map containing all bright objects is acquired. We use the edge information based on Kirsch operator to obtain HE candidates, and then we remove the optic disc. Finally, the shape features, histogram statistic features and phase features of the HE candidates are extracted. We use the SVM classifier to acquire the accurate extraction of HEs. The proposed method has been demonstrated on the public databases of DIARETDB1 and HEI-MED. The experiment results show that the method’s sensitivity is 97.3 % and the specificity is 90 % at the image level, and the mean sensitivity is 84.6 % and the mean predictive value is 94.4 % at the lesion level.

Zhitao Xiao, Feng Li, Lei Geng, Fang Zhang, Jun Wu, Xinpeng Zhang, Long Su, Chunyan Shan, Zhenjie Yang, Yuling Sun, Yu Xiao, Weiqiang Du

### Hierarchical Convolutional Neural Network for Face Detection

In this paper, we propose a new approach of hierarchical convolutional neural network (CNN) for face detection. The first layer of our architecture is a binary classifier built on a deep convolutional neural network with spatial pyramid pooling (SPP). Spatial pyramid pooling reduces the computational complexity and remove the fixed-size constraint of the network. We only need to compute the feature maps from the entire image once and generate a fixed-length representation regardless of the image size and scale. To improve the localization effectiveness, in the second layer, we design a bounding box regression network to refine the relative high scored non-face output from the first layer. The proposed approach is evaluated on the AFW dataset, FDDB dataset and Pascal Faces, and it reaches the state-of-the-art performance. Also, we apply our bounding box regression network to refine the other detectors and find that it has effective generalization.

Dong Wang, Jing Yang, Jiankang Deng, Qingshan Liu

### Human-Object Interaction Recognition by Modeling Context

In this paper, we present a new method to recognize human-object interactions by modeling the context between human actions and manipulated objects. It is a challenging task due to severe occlusion between human and objects during the interacting process. While human actions and objects can provide strong context information, such as some action happening is usually related to a certain object, by which we can improve the accuracy of recognition for both of them. In this paper, we use global and local temporal features from skeleton sequences to model actions, and kernel features are applied to describe objects. We optimize all possible solutions from actions and objects by modeling the context between them. The results of experiments show the effectiveness of our method.

Qun Zhang, Wei Liang, Xiabing Liu, Yumeng Wang

### Image Annotation Based on Multi-view Learning

With the explosive growth of image data collections on the web, labeling each image with appropriate semantic description based on the image content for image index and image retrieval has become an increasingly difficult and laborious task. To deal with this issue, we propose a novel multi-view semi-supervised learning scheme to improve the performance of image annotation by using multiple views of an image and leveraging the information contained in pseudo-labeled images. In the training process, labeled images are first adopted to train view-specific classifiers independently using uncorrelated and sufficient views, and each view-specific classifier is then iteratively re-trained using initial labeled samples and additional pseudo-labeled samples based on a measure of confidence. In the annotation process, each unlabeled image is assigned appropriate semantic annotations based on the maximum vote entropy principle and the correlationship between annotations with respect to the results of each optimally trained view-specific classifier. Experimental results on a general-purpose image database demonstrate the effectiveness and efficiency of the proposed multi-view semi-supervised scheme.

Zhe Shi, Songhao Zhu, Chengjian Sun

### Image Denoising with Higher Order Total Variation and Fast Algorithms

In this paper, we propose an efficient higher order total variation regularization scheme for image denoising problem. By relaxing the constraints appearing in the traditional infimal convolution regularization, the proposed higher order total variation can remove the staircasing effects caused by total variation as well as preserve sharp edges and finer details well in the restored image. We characterize the solution of the proposed model using fixed point equations (via the proximity operator) and develop convergent proximity algorithms for solving the model. Our numerical experiments demonstrate the efficiency of the proposed method.

Wenchao Zeng, Xueying Zeng, Zhen Yue

### Improve Neural Network Using Saliency

In traditional neural networks for image classification, every input image pixel is treated the same way. However real human visual system tends pay more attention to what they really focus on. This paper proposed a novel saliency-based network architecture for image classification named Sal-Mask Connection. After learning raw feature maps from input images using a convolutional connection, we use the saliency data as a mask for the raw feature maps. By doing an element-by-element multiplication with the saliency data on the raw feature maps, corresponding enhanced feature maps are generated, which helps the network to filter information and to ignore noise. By this means we may simulate the real human vision system more appropriately and gain a better performance. In this paper, we prove this new architecture upon two common image classification benchmark networks, and we verify them on the STL-10 datasets. Experimental results show that this method outperforms the traditional CNNs.

Yunong Wang, Nenghai Yu, Taifeng Wang, Qing Wang

### Improved Spread Transform Dither Modulation Using Luminance-Based JND Model

In the quantization-based watermarking framework, perceptual just noticeable distortion (JND) model has been widely used to determine the quantization step size to provide a better tradeoff between fidelity and robustness. However, the perceptual parameters computed in the embedding procedure and the detecting procedure are different, as the image has been altered by watermark embedding. In this paper, we incorporate a new DCT-based perceptual JND model, which not only shows better consistency with the HVS characteristics compared to the conventional models, but also can be invariant to the changes in the watermark framework. Furthermore, an improved spread transform dither modulation (STDM) watermarking scheme based on the new JND model is proposed. Experimental results show that the proposed scheme provides powerful resistance against common attacks, especially in robustness against Gauss noise, amplitude scaling and JPEG compression.

Wenhua Tang, Wenbo Wan, Ju Liu, Jiande Sun

### Incorporation of 3D Model and Panoramic View for Gastroscopic Lesion Surveillance

Natural Orifice Transluminal Endoscopic Surgery (NOTES) is widely used for clinical diagnoses. However, NOTES has two main problems: difficulties brought by endoscope’s flexibility and narrow view of endoscope. Image-guided system is helpful to deal with these problems. In our previous work, a computer aided endoscopic navigation system (CAEN) was developed for gastroscopic lesion surveillance. In this paper, 3D model and panoramic view are incorporated into CAEN with three improvements: selection of reference and tracking features; perspective projection for constructing local and global panoramic view; 3D surface modeling using structure from motion. The system is evaluated from three clinic applications: broadening the view, non-invasive retargeting, and overall lesion locations. The evaluation results show that the mean accuracy of broadening the view is 0.43 mm, the mean accuracy of non-invasive retargeting is 7.5 mm, and the mean accuracy for overall lesion diagnosis is 3.71 ± 0.35 mm.

Yun Zong, Weiling Hu, Jiquan Liu, Xu Zhang, Bin Wang, Huilong Duan, Jianmin Si

### Interactive Browsing System of 3D Lunar Model with Texture and Labels on Mobile Device

This paper devotes to developing an interactive visualization system of 3D lunar model with texture and labels on mobile device. Using OpenGL ES 2.0 and Osg for android, we implement 3D lunar mesh model construction, lunar texture and mapping, GPU shader illumination programming, multi-level terrain labels and interactive browsing. In particular, a technique of terrain labels for mobile device is presented by developing a vertex shader and fragment shader which is dedicated to render texture, where the vertex shader is mainly used to determine the vertex’s position attribute of the texture while the fragment shader is responsible for the color, font and transparency of the text, etc. The developed browsing system can be rendered on mobile device in real-time.

Yankui Sun, Kan Zhang, Ye Feng

### Interactive Head 3D Reconstruction Based Combine of Key Points and Voxel

In the 3D reconstruction of the head, we can extract a large number of key points from the face, but not enough key points from the hair. The 3D reconstruction method based key points do well in the facial reconstruction, but not be able to reconstruction the hair part. Because the traditional 3D reconstruction method based voxel exploit more silhouette information, the method can reconstruct the hair better than the method based key points. A method based the combine of key points and voxel is presented to reconstruct head models. The main process: first of all, extract key points and silhouette; secondly, reconstruct the head by key points method and get the space needed to be reconstruct by voxel method interactive; finally, reconstruct the space which we get in last step; Through this combination 3D reconstruction method, we can reconstruct a good head.

Yanwei Pang, Kun Li, Jing Pan, Yuqing He, Changshu Liu

### Lighting Alignment for Image Sequences

Lighting is one of the challenges for image processing. Even though some algorithms are proposed to deal with the lighting variation for images, most of them are designed for a single image but not for image sequences. In fact, the correlation between frames can provide useful information to remove the illumination diversity, which is not available for a single image. In this paper, we proposed a 2-step lighting alignment algorithm for image sequences. Based on entropy, a perception-based lighting model is initialized according to the lighting condition of first frame. Then the difference between frames is applied to optimize the parameters of the lighting model and consequently the lighting conditions can be aligned for the sequence. At the same time, the local features of each frame can be enhanced. Experimental results show the effectiveness of the proposed algorithm.

Xiaoyue Jiang, Xiaoyi Feng, Jun Wu, Jinye Peng

### Modelling and Tracking of Deformable Structures in Medical Images

This paper presents a new method based both on Active Shape Model and a priori knowledge about the spatio-temporal shape variation for tracking deformable structures in medical imaging. The main idea is to exploit a priori knowledge of shape that exists in ASM and introduce new knowledge about the shape variation over time. The aim is to define a new more stable method, allowing the reliable detection of structures whose shape changes considerably in time. This method can also be used for the three-dimensional segmentation by replacing the temporal component by the third spatial axis (z). The proposed method is applied for the functional and morphological study of the heart pump. The functional aspect was studied through temporal sequences of scintigraphic images and morphology was studied through MRI volumes. The obtained results are encouraging and show the performance of the proposed method.

Saïd Ettaïeb, Kamel Hamrouni, Su Ruan

### Moving Object Extraction in Infrared Video Sequences

In order to extract the moving object in infrared video sequence, this paper presents a scheme based on sparse and low-rank decomposition. By transforming each frame of the infrared video sequence to a column and combine all columns into a new matrix, the problem of extracting moving objects in infrared video sequences is converted to a sparse and low-rank matrix decomposition problem. The resulted nuclear norm and

L

1

norm related minimization problem can also be efficiently solved by some recently developed numerical methods. The effectiveness of our proposed scheme is illustrated on different infrared video sequences. The experiments show that, compared to ALM algorithm, our algorithm has distinct advantages in extracting moving object from infrared videos.

Jinli Zhang, Min Li, Yujie He

### Moving Object Segmentation by Length-Unconstrained Trajectory Analysis

Background subtraction for moving cameras is an unsolved key problem in intelligent video analysis. Trajectory analysis has demonstrated a significant difference between background and foreground motion model. But under limitation of trajectory-tracking technique, long-term trajectories are hardly dense and well distributed enough, which may cause inaccuracy in boundary discrimination. Addressed to these problems, in this paper we proposed a robust algorithm of “length unconstrained trajectory analysis” (LUCTA), to recapture “invalided” information of short trajectories. Extensive experiments demonstrate competitive performance of our frame work on both accuracy and time cost.

Qiyu Liao, BingBing Zhuang, Jingjing Wang, Nenghai Yu

### Multidimensional Adaptive Sampling and Reconstruction for Realistic Image Based on BP Neural Network

A novel adaptive sampling and reconstruction algorithm is presented to address the noise artifacts of Monte Carlo rendering. BP neural network is adopted to estimate per pixel error that guides sampling rate in the multidimensional spaces. In sampling stage, coarse samples are firstly generated to train BP network. Then per pixel error is estimated by the BP predicted value. Additional samples are distributed to the slices of pixels which have large errors. These slices are extracted through a heuristic distance. A warping procedure is then carried out to remove individual light paths that result in significant spikes of noise. In reconstruction stage, the final image is reconstructed by filtering each pixel with appropriate anisotropic filter. Filters with small scales are used to keep clarity while the large ones smooth out noise. Compared to the state-of-the-art methods, experimental results demonstrated that our algorithm achieves better results in numerical error and visual image quality.

Yu Liu, Changwen Zheng, Fukun Wu

### Modeling and Simulation of Multi-frictional Interaction Between Guidewire and Vasculature

In the cardiovascular interventional operation, the surgeon steers the tip of a long-thin guidewire to reach the clinical targets while traveling through the inner of blood vessels, and performs a wide range of minimally invasive procedures. However, real-time simulating the physical deformation behaviours of guidewire caused by a large areas of frictional contact between guidewire and vasculature during insertion is a challenge task. From the microscopic view, this paper built a novel multi-frictional contact dynamics model based on flexible multi-body system to address the multi-frictional interaction between them. In the model, guidewire and vascular formed a flexible multi-body system and the process of contact and collision could be divided into three stages, including contact detection, contact handling and separation. In the first stage, a continuous collision detection algorithm based on an adaptive layer was proposed to obtain a set of “point-surface” contact pairs quickly. After confirming the contact areas, a multi-frictional contact dynamics algorithm based on nonlinear equivalent spring damping was put forward. In the normal direction, nonlinear spring damping model was used to compute the spring restoring force and nonlinear damping force. In the tangential direction, sliding friction, static friction and rolling friction were calculated during the collision between two bodies by coulomb friction model. Finally, all frictional forces in the contact areas were added to the physical models of guidewire for further simulating various non-linear deformation behaviors. The experimental results show that this algorithm is feasible and could simulate the multi-frictional interaction between guidewire and blood vessles very well with real-time performance.

Dongjin Huang, Yin Wang, Pengbin Tang, Zhifeng Xie, Wen Tang, Youdong Ding

### Multi-modal Brain Image Registration Based on Subset Definition and Manifold-to-Manifold Distance

Image registration is an important procedure in multi-modal brain image processing. The main challenge is the variations of intensity distributions in different image modalities. The efficient SSD based method cannot handle this kind of variations. And other approaches based on modality independent descriptors and metrics are usually time-consuming. In this article, we propose a novel similarity metric based on manifold-to-manifold distance imposed on the subset of original images. We define a subset for a compact representation of the original image. Manifold learning technique is employed to reveal the intrinsic structure of the sampled data. Instead of comparing the images in the original feature space, we use the manifold-to-manifold distance to measure the difference. By minimizing the distance between the manifolds, we iteratively obtain the optimal registration of the original image pair. Experiment results show that our approach is effective to deal with the multi-modal image registration on the BrainWeb dataset.

Weiwei Liu, Yuru Pei, Hongbin Zha

### Multimodal Speaker Diarization Utilizing Face Clustering Information

Multimodal clustering/diarization tries to answer the question “who spoke when” by using audio and visual information. Diarization consists of two steps, at first segmentation of the audio information and detection of the speech segments and then clustering of the speech segments to group the speakers. This task has been mainly studied on audiovisual data from meetings, news broadcasts or talk shows. In this paper, we use visual information to aid speaker clustering. We tested the proposed method in three full length movies, i.e. a scenario much more difficult than the ones used so far, where there is no certainty that speech segments and video appearances of actors will always overlap. The results proved that the visual information can improve the speaker clustering accuracy and hence the diarization process.

Ioannis Kapsouras, Anastasios Tefas, Nikos Nikolaidis, Ioannis Pitas

### Multi-object Template Matching Using Radial Ring Code Histograms

In this paper, a novel template matching algorithm named radial ring code histograms (RRCH) for multi-objects positioning is proposed. It is invariant to translation, rotation and illumination changes. To improve the identification ability of multi objects with different rotation angles, radial gradient codes using relative angle between gradient direction and position vector is proposed. Adjustable weights in different regions make it possible to adapt various type objects. Experiments using a LED sorting equipment demonstrate that our algorithm results in correct positioning for multi objects in complicated environments with noise and illumination invariance.

Shijiao Zheng, Buyang Zhang, Hua Yang

### Novel DCT Features for Detecting Spatial Embedding Algorithms

Traditionally, discrete cosine transform (DCT) features and Cartesian calibration are mainly utilized in joint picture expert group (JPEG) steganalysis. As well known, the steganalyzer without any modification can also work for spatial steganography. However, since JPEG and spatial embedding have different influences on DCT coefficients, this direct generalization is not quite reasonable. This essay studies the statistical moments and histogram variances of DCT coefficients under pixel domain modification. A novel steganalyzer for detecting spatial steganography is established by all alternating current (AC) coefficient features and Cartesian calibration. The experimental results state that the proposed steganalyzer outperforms the old DCT feature based method, as well as several spatial and wavelet feature based methods.

Hao Zhang, Tao Zhang, Xiaodan Hou, Xijian Ping

### Novel Software-Based Method to Widen Dynamic Range of CCD Sensor Images

In the past twenty years, CCD sensor has made huge progress in improving resolution and low-light performance by hardware. However due to physical limits of the sensor design and fabrication, fill factor has become the bottle neck for improving quantum efficiency of CCD sensor to widen dynamic range of images. In this paper we propose a novel software-based method to widen dynamic range, by virtual increase of fill factor achieved by a resampling process. The CCD images are rearranged to a new grid of virtual pixels com-posed by subpixels. A statistical framework consisting of local learning model and Bayesian inference is used to estimate new subpixel intensity. By knowing the different fill factors, CCD images were obtained. Then new resampled images were computed, and compared to the respective CCD and optical image. The results show that the proposed method is possible to widen significantly the recordable dynamic range of CCD images and increase fill factor to 100 % virtually.

Wei Wen, Siamak Khatibi

### Object Contour Extraction Based on Merging Photometric Information with Graph Cuts

Graph cuts algorithm is one of high effective optimal methods in image segmentations. To improve the effect of segmentation caused by uneven illumination, a contour extraction method which merges photometric information with graph cuts is proposed. Firstly, the method gets the color values and brightness values of pixel depending on the color image, represents photometric values with the average of these values. Then, the photometric information is integrated into the energy function of active contour model, and a new energy function is built. Finally, we can get the optimal solution for solving new energy function with max-flow/min-cut algorithm, obtain global and local contours of the target object. Experimental results show that the proposed method can make initial contour convergence to the target object more accurately and faster.

Rongguo Zhang, Meimei Ren, Jing Hu, Xiaojun Liu, Kun Liu

### Object-Based Multi-mode SAR Image Matching

Owing to the effect of imaging mechanism and imaging conditions in synthetic aperture radar (SAR) image, inconsistent features and relationship correspondence constitute key problems using traditional image matching algorithms because of significant differences between the images. This study proposes an object-based SAR image matching method. Two images are matched through same ground objects, by means of property and shape information of objects, which are obtained via object extraction and morphological operations. We utilize a shape context descriptor to compare contours of objects and detected invariant control points. The experimental results show that the proposed method achieves reliable and stable matching performance, and can alleviate deformation and nonlinear distortion effects of different systems.

Jie Rui, Chao Wang, Hong Zhang, Bo Zhang, Fan Wang, Fei Jin

It has been proven by previous works that OCR is beneficial from reducing dictionary size. In this paper, a framework is proposed for improving OCR performance with the adaptive dictionary, in which text categorization is utilized to construct dictionaries using web data and identify the category of the imaged documents. To facilitate comparison with other existing methods that focus on language identification, an implementation is presented to improve the OCR performance with language adaptive dictionaries. Experimental results demonstrate that the performance of OCR system is significantly improved by the reduced dictionary. Compared with other existing methods for language identification, the proposed method shows a better performance. Also, any other categorization methodology is expected to further reduce the dictionary size. For example, an imaged document with specific language can be further categorized into sport, law, entertainment, etc. by its content.

Chenyang Wang, Yanhong Xie, Kai Wang, Tao Li

### One Simple Virtual Avatar System Based on Single Image

To establish virtual avatar systems at the computing environments with limited resources, we design such a system based on single image which can generate speech animation with different facial expressions. Firstly, facial feature points are extracted automatically or manually based on MPEG-4 facial animation to build the face model. Then, according to Facial Animation Parameters of visual phoneme and expression stream, the feature points after image warping are calculated. The next step is to generate the in-between frames by using the image warping algorithm based on scan lines and combining with the correlative time parameters. Finally, with adding the actions such as nodding, shaking heads and winking, the face animation is implemented. Experimental results show that the proposed system can generate smooth and realtime facial expressions under platforms with low computing capacity.

Lanfang Dong, Jianfu Wang, Kui Ni, Yatao Wang, Xian Wu, Mingxue Xu

### Optimized Laplacian Sparse Coding for Image Classification

Laplacian sparse coding exhibits good performance for image classification, because of its outstanding ability to preserve the locality and similarity information of the codes. However, there still exists two drawbacks during the Laplacian graph construction: (1) It has expensive computational cost, which significantly limits the applicability of the Laplacian sparse coding to large-scale data sets. (2) Euclidean distance does not necessarily reflect the inherent distribution of the data. To construct a more robust Laplacian graph, we introduce a local landmarks approximation method instead of the traditional k-nearest neighbor algorithm, and design a new form of adjacency matrix. Based on the Nesterov’s gradient projection algorithm, we develop an effective numerical solver to optimize the local landmarks approximation problem with guaranteed quadratic convergence. The obtained codes have more discriminating power compared with traditional sparse coding approaches for image classification. Comprehensive experimental results on publicly available datasets demonstrate the effectiveness of our method.

Lei Chen, Sheng Gao, Baofeng Yuan, Zhe Qi, Yafang Liu, Fei Wang

### Backmatter

Weitere Informationen