Skip to main content

Über dieses Buch

This book constitutes the refereed proceedings of the 4th Chinese Conference, IVS 2016, held in Beijing, China, in October 2016.

The 19 revised full papers presented were carefully reviewed and selected from 45 submissions. The papers are organized in topical sections on low-level preprocessing, surveillance systems; tracking, robotics; identification, detection, recognition; behavior, activities, crowd analysis.



Low-Level Preprocessing, Surveillance Systems


Occluded Object Imaging Based on Collaborative Synthetic Aperture Photography

Occlusion poses as a critical challenge in computer vision for a long time. Camera array based synthetic aperture photography has been regarded as a promising way to address the problem of occluded object imaging. However, the application of this technique is limited by the building cost and the immobility of the camera array system. In order to build a more practical synthetic aperture photography system, in this paper, a novel multiple moving camera based collaborative synthetic aperture photography is proposed. The main characteristics of our work include: (1) to the best of our knowledge, this is the first multiple moving camera based collaborative synthetic aperture photography system; (2) by building a sparse 3D map of the occluded scene using one camera, the information from the subsequent cameras can be incrementally utilized to estimate the warping induced by the focal plane; (3) the compatibility of different types of cameras, such as the hand-held action cameras or the quadrotor on-board cameras, shows the generality of the proposed framework. Extensive experiments have demonstrated the see-through-occlusion performance of the proposed approach in different scenarios.
Xiaoqiang Zhang, Yanning Zhang, Tao Yang, Zhi Li, Dapeng Tao

Video Synchronization with Trajectory Pulse

This paper presents a method to temporally synchronize two independently moving cameras with overlapping views. Temporal variations between image frames (such as moving objects) are powerful cues for alignment. We first generate pulse images by tracking moving objects and examining the trajectories for changes in speed. We then integrate a rank-based constraint and the pulse-based matching, to derive a robust approximation of spatio-temporal alignment quality for all pairs of frames. By folding both spatial and temporal cues into a single alignment framework, finally, the nonlinear temporal mapping is found using a graph-based approach that supports partial temporal overlap between sequences. We verify the robustness and performance of the proposed approach on several challenging real video sequences. Compared to state-of-the-art techniques, our approach is robust to tracking error and can handle non-rigid scene alignment in complex dynamic scenes.
Xue Wang, Qing Wang

Gestalt Principle Based Change Detection and Background Reconstruction

Gaussian mixture model based detection algorithms can easily lead to fragmentary due to the fixed number of Gaussian components. In this paper, we propose a gestalt principle based change target extraction method, and further present a background reconstruction algorithm. In particular, firstly we applied the Gaussian mixture model to extract the moving target as others did but this may lead to incomplete extraction. Secondly, we have also tried to apply the frame difference method to extract the moving target more precisely. Finally, we determine to build a static background according to relationships between each frame of a moving target. Experiment results reveal that the proposed detection method outperforms the other three representative detection methods. Moreover, our background reconstruction algorithm is also proved to be very effective and robust in reconstructing the backgrounds of a video.
Shi Qiu, Yongsheng Dong, Xiaoqiang Lu, Ming Du

-Regularization Based on Sparse Prior for Image Deblurring

In this paper we propose a novel \(L_0\) penalty function of both gradient and image itself as the regular term in the total energy function. This regular term is based on sparse prior and solved as part of mathematical optimization problem. Our method not only reserves structure information of the image but also avoids over smooth in the final restoration. We illustrate the applicability and validity of our method through experiments on both synthetic and natural blurry images. Despite we don’t have numerous iterations, the convergence rate and result quality outperform the most state-of-the-art methods.
Hongzhang Song, Sheng Liu

A Large-Scale Distributed Video Parsing and Evaluation Platform

Visual surveillance systems have become one of the largest data sources of Big Visual Data in real world. However, existing systems for video analysis still lack the ability to handle the problems of scalability, expansibility and error-prone, though great advances have been achieved in a number of visual recognition tasks and surveillance applications, e.g., pedestrian/vehicle detection, people/vehicle counting. Moreover, few algorithms explore the specific values/characteristics in large-scale surveillance videos. To address these problems in large-scale video analysis, we develop a scalable video parsing and evaluation platform through combining some advanced techniques for Big Data processing, including Spark Streaming, Kafka and Hadoop Distributed Filesystem (HDFS). Also, a Web User Interface is designed in the system, to collect users’ degrees of satisfaction on the recognition tasks so as to evaluate the performance of the whole system. Furthermore, the highly extensible platform running on the long-term surveillance videos makes it possible to develop more intelligent incremental algorithms to enhance the performance of various visual recognition tasks.
Kai Yu, Yang Zhou, Da Li, Zhang Zhang, Kaiqi Huang

Tracking, Robotics


Autonomous Wheeled Robot Navigation with Uncalibrated Spherical Images

This paper focuses on the use of spherical cameras for autonomous robot navigation tasks. Previous works of literature mainly lie in two categories: scene oriented simultaneous localization and mapping and robot oriented heading fields lane detection and trajectory tracking. Those methods face the challenges of either high computation cost or heavy labelling and calibration requirements. In this paper, we propose to formulate the spherical image navigation as an image classification problem, which significantly simplifies the orientation estimation and path prediction procedure and accelerates the navigation process. More specifically, we train an end-to-end convolutional network on our spherical image dataset with novel orientation categories labels. This trained network can give precise predictions on potential path directions with single spherical images. Experimental results on our Spherical-Navi dataset demonstrate that the proposed approach outperforms the comparing methods in realistic applications.
Lingyan Ran, Yanning Zhang, Tao Yang, Peng Zhang

Cascaded Tracking with Incrementally Learned Projections

A convention in visual object tracking is to only favor the candidate with maximum similarity score and take it as the tracking result, while ignore the rest. However, surrounded samples also provide valuable information for target locating, and the combination of their votes can produce more stable results. In this paper, we have proposed a novel method based on the supervised descent method (SDM). We search for the target from multiple start positions and locate it with their votes. For evaluating each predicted descent direction, we have presented a confidence estimating scheme for SDM. To adapt the tracking model to appearance variations, we have further presented an incremental cascaded support vector regression (ICSVR) algorithm for model updating. Experimental results on a recent benchmark demonstrate the superior performance of our tracker against state-of-the-arts.
Lianghua Huang

Tracking Multiple Players in Beach Volleyball Videos

Multi-object tracking has been a difficult problem in recent years, especially in complex scenes such as player tracking in sports videos. Player movements are often complex and abrupt. In this paper, we focus on the problem of tracking multiple players in beach volleyball videos. To handle the difficulties of player tracking, we follow the popular tracking-by-detection framework in multi-object tracking and adopt the multiple hypotheses tracking (MHT) algorithm to solve the data association problem. To improve the efficiency of the MHT, we use motion information from Kalman filter and train an online appearance model of each track hypothesis. An auxiliary particle filter method is adopted to handle the missing detection problem. Furthermore, we obtain the significant performance on our beach volleyball datasets, which demonstrate the effectiveness and efficiency of the proposed method.
Xiaokang Jiang, Zheng Liu, Yunhong Wang

Multi-object Tracking Within Air-Traffic-Control Surveillance Videos

In this paper, we strive to settle Multi-object tracking (MOT) problem within Air-Traffic-Control (ATC) surveillance videos. The uniqueness and challenges of the specific problem at hand is two-fold. Firstly, the targets within ATC surveillance videos are small and demonstrate homogeneous appearance. Secondly, the number of targets within the tracking scene undergoes severe variations results from multiple reasons. To solve such a problem, we propose a method that combines the advantages of fast association algorithm and local adjustment technique under a general energy minimization framework. Specifically, a comprehensive and discriminative energy function is established to measure the probability of hypothetical movement of targets, and the optimal output of the function yields to the most responsible target state configuration. Extensive experiments prove the effectiveness of our method on this new dataset.
Yan Li, Siyuan Chen, Xiaolong Jiang

Identification, Detection, Recognition


Person Re-identification by Multiple Feature Representations and Metric Learning

Person re-identification is the problem of matching pedestrian images captured from multiple cameras. Feature representation and metric designing are two critical aspects in person re-identification. In this paper, we first propose an effective Convolutional Neural Network and learn it with mixed datasets as a general deep feature extractor. Secondly, we extract the hand-crafted feature of images as a supplement, then we learn the independent distance metrics for deep feature representation and hand-crafted feature representation, respectively. Finally, we validate our method on three challenging person re-identification datasets, experimental results show the effectiveness of our approach, and we achieve the best rank-1 matching rates on all the three datasets compare with the state-of-the-art methods.
Meibin Qi, Jingxian Han, Jianguo Jiang

Deep Multi-level Hashing Codes for Image Retrieval

In this paper, we propose a deep siamese convolutional neutral network (DSCNN) to learn semantic-preserved global-level and local-level hashing codes simultaneously for effective image retrieval. Particularly, we analyze the visual attention characteristic inside hash bits by activation map of deep convolutional feature and propose a novel approach of bit selecting to reinforce the pertinence of local-level code. Finally, unlike most existing retrieval methods which use global or unsupervised local descriptors separately, leading to unexpected precision, we present a multi-level hash search method, taking advantage of both local and global properties of deep features. The experimental results show that our method outperforms several state-of-the-art on the Oxford 5k/105k and Paris 6k datasets.
Zhenjiang Dong, Ge Song, Xia Jia, Xiaoyang Tan

Salient Object Detection from Single Haze Images via Dark Channel Prior and Region Covariance Descriptor

Due to degraded visibility and low contrast, object detection from single haze images faces great challenges. This paper proposed to use a computational model of visual saliency to cope with this issue. Superpixel-level saliency map is firstly abstracted via the dark channel prior. Then, region covariance descriptors are utilized to estimate local and global saliency of each superpixel. Besides, the graph model is incorporated as constraint to optimize the correlation between superpixels. Experimental results verify the validity and efficiency of the proposed saliency computational model.
Nan Mu, Xin Xu, Xiaolong Zhang

Hybrid Patch Based Diagonal Pattern Geometric Appearance Model for Facial Expression Recognition

Automatic Facial Expression Recognition (FER) is an imperative process in next generation Human-Machine Interaction (HMI) for clinical applications. The detailed information analysis and maximization of labeled database are the major concerns in FER approaches. This paper proposes a novel Patch-Based Diagonal Pattern (PBDP) method on Geometric Appearance Models (GAM) that extracts the features in a multi-direction for detailed information analysis. Besides, this paper adopts the co-training to learn the complementary information from RGB-D images. Finally, the Relevance Vector Machine (RVM) classifier is used to recognize the facial expression. In experiments, we validate the proposed methods on two RGB-D facial expression datasets, i.e., EURECOMM dataset and biographer dataset. Compared to other methods, the comparative analysis regarding the recognition and error rate prove the effectiveness of the proposed PBDP-GAM in FER applications.
Deepak Kumar Jain, Zhang Zhang, Kaiqi Huang

Multi-object Detection Based on Binocular Stereo Vision

This paper proposes a new multi-object detection system based on binocular stereo vision. Firstly, we calibrate the two cameras to get intrinsic and extrinsic parameters and transformation matrix of the two cameras. Secondly, stereo rectify and stereo match is done to get a disparity map with image pairs acquired by binocular camera synchronously. Thus 3d coordinate of the objects is obtained. We then projects these 3D points to the ground to generate a top view projection image. Lastly, we propose distance and color based Mean shift cluster approach to classify the projected points, after which the number and position of objects can be determined. Binocular stereo vision based methods can overcome the problems of object occlusion, illumination variation, and shadow interference. Experiments in both indoor and corridor scenes show the advantages of the proposed system.
Zhannan He, Qiang Ren, Tao Yang, Jing Li, Yanning Zhang

Fast Vehicle Detection in Satellite Images Using Fully Convolutional Network

Detecting small targets like vehicles in high resolution satellite images is a significant but challenging task. In the past decade, some detection frameworks have been proposed to solve this problem. However, like the traditional ways of object detection in natural images those methods all consist of multiple separated stages. Region proposals are first produced, then, fed into the feature extractor and classified finally. Multi-stage detection schemes are designed complicated and time-consuming. In this paper, we propose a unified single-stage vehicle detection framework using fully convolutional network (FCN) to simultaneously predict vehicle bounding boxes and class probabilities from an arbitrary-sized satellite image. We elaborate our FCN architecture which replaces the fully connected layers in traditional CNNs with convolutional layers and design vehicle object-oriented training methodology with reference boxes (anchors). The whole model can be trained end-to-end by minimizing a multi-task loss function. Comparison experiment results on a common dataset demonstrate that our FCN model which has much fewer parameters can achieve a faster detection with lower false alarm rates compared to the traditional methods.
Jingao Hu, Tingbing Xu, Jixiang Zhang, Yiping Yang

Behavior, Activities, Crowd Analysis


Semi-supervised Hessian Eigenmap for Human Action Recognition

Dimensionality reduction has been attracting emerging attention with the explosive growing of high-dimensional data in many areas including web image annotation, video object detection, and human action recognition. Comparing with the traditional nonlinear dimensional reduction such as Locally Linear Embedding, Isometric feature Mapping, Laplacian Eigenmap, semi-supervised nonlinear dimensional reduction method can improve stability of the solution by taking into account prior information. In this paper, we integrate exact mapping information of certain data points into Hessian Eigenmap and propose semi-supervised Hessian Eigenmap. Considering the prior information with physical meaning, semi-supervised Hessian Eigenmap can approximate global low dimensional coordinates. On the other hand, Hessian can exploit high-order information of the local geometry of data distribution in comparison with graph Laplacian and thus further boost the performance. We conduct experiments on both synthetic and real world datasets. The experimental results demonstrate that the proposed semi-supervised Hessian Eigenmap algorithm outperforms the representative semi-supervised Laplacian Eigenmap algorithm.
Xueqi Ma, Jiaxing Pan, Yue Wang, Weifeng Liu

Surveillance Based Crowd Counting via Convolutional Neural Networks

Video surveillance based crowd counting is important for crowd management and public security. It is a challenge task due to the cluttered background, ambiguous foreground and diverse crowd distributions. In this paper, we propose an end-to-end crowd counting method with convolutional neural networks, which integrates original frames and motion cues for learning a deep crowd counting regressor. The original frames and motion cues are complementary to each other for counting the stationary and moving pedestrians. Experimental results on two widely-used crowd counting datasets demonstrate the effectiveness of our method, and achieve the state-of-the-art performance.
Damin Zhang, Zhanming Li, Pengcheng Liu

Jet Trajectory Recognition Based on Dark Channel Prior

The automatic fire-fighting water cannon is an important device for fire extinguish. By identifying the jet trajectory, the closed-loop control of fire extinguishing process can be realized, which improves the quality and efficiency of the water cannon. In this paper, a novel jet trajectory recognition method based on the dark channel prior and the optical properties of low scene transmission in the jet trajectory’s coverage area is proposed. Firstly, the dark channel prior was used to extract the low scene transmission region. Then, in order to identify the jet trajectory more accurately, this extracted region was matched with the moving target area which is restored by Gaussian mixture background modeling. Finally, the modified cubic curve is used to fit out jet trajectory and predict its ending. The experimental results indicate that the proposed approach can effectively detect the jet trajectory with strong anti-interference ability and higher accuracy.
Wenyan Chong, Ying Hu, Defei Yuan, Yongjun Ma

Real-Time Abnormal Behavior Detection in Elevator

Violent behaviors occurred in elevators have been frequently reported by media in recent years. It is necessary to provide a safe elevator environment for passengers. A new visual surveillance system with the function of abnormal behavior detection is proposed in this paper. Firstly, human objects in surveillance video are extracted by background subtraction, and meanwhile the number of people in each image is counted. Then, some algorithms are presented to deal with different abnormal behaviors. For one person case, we pay attention to whether the person fell down or not. And for two or more people case, we use the image entropy of Motion History Image (MHI) to detect if there is violent behavior. Experimental results show that the proposed algorithms can offer satisfactory results.
Yujie Zhu, Zengfu Wang


Weitere Informationen

Premium Partner