Skip to main content

About this book

This book constitutes the refereed proceedings of the 11th International Conference on Computer Vision Systems, ICVS 2017, held in Shenzhen, China, in July 2017.

The 61 papers presented were carefully reviewed and selected from 92 submissions. The papers are organized in topical sections on visual control, visual navigation, visual inspection, image processing, human robot interaction, stereo system, image retrieval, visual detection, visual recognition, system design, and 3D vision / fusion.

Table of Contents


Visual Control


Towards a Cloud Robotics Platform for Distributed Visual SLAM

Cloud computing allows robots to offload computation and share information as well as skills. Visual SLAM is one of the intensively computational tasks for mobile robots. It can benefit from the cloud. In this paper, we propose a novel cloud robotics platform named RSE-PF for distributed visual SLAM with close attention to the infrastructure of the cloud. We implement it with Amazon Web Services and OpenResty. We demonstrate the feasibility, robustness, and elasticity of the proposed platform with a use case of perspective-n-point solution. In this use case, the average round-trip delay is 153 ms, which meets the near real-time requirement of mobile robots.

Peng Yun, Jianhao Jiao, Ming Liu

A Practical Visual Positioning Method for Industrial Overhead Crane Systems

To solve the problem of information acquisition of an industrial overhead crane, this paper uses an industrial camera to get the information. The information includes the height and the swing angle of the hook and the distance between the hook and the cargo. To obtain the real-time data of the hook’s height and swing angle, firstly the whole image captured by the industrial camera is processed and the hook’s initial position is obtained by shape matching. As the trolley tracks the hook according to the local information of the image, the height is calculated by the interpolation method according to the number of local pixels. The swing angle is measured by the height of the hook and the distance between the initial and current positions of the upper edge. In addition to the measurement of the height and swing angle, this platform calculates the distance between the hook and the cargo based on a visual method, the cargo is observed by such features as length, width and height input by operators. This method gets the static information of the industrial scene, drives the trolley to the cargo, detects whether the hook’s swing is within the proper range, and hoists the hook to the desired position. Experimental results on a 32-ton industrial crane system implies that this algorithm solves the problem of information collection and transfers the hook to a desired position.

Bo He, Yongchun Fang, Ning Sun

A Testbed for Vision-Based Networked Control Systems

With the availability of low-latency wireless communication and low-delay video communication solutions, vision-based networked control systems (NCS) become feasible. In this paper, we describe an NCS testbed which is suitable for the evaluation of the interplay of computer vision algorithms, network protocols and control algorithms under a delay constraint. The system comprises an inverted pendulum which is monitored by a video camera. The h.264-encoded video is sent over a network to an image processing computer. This computer extracts the angle of inclination of the pendulum from the decoded video and sends it over a wireless link to the pendulum. The pendulum uses the angle in a control algorithm to keep itself in a vertical position. We provide a detailed description of the system including the control algorithm and the image processing algorithms, and analyse the latency contributors of the system. The build instructions and source code of the testbed are publicly available. As the testbed is based on standard low-priced components, it is particularly suitable for educational purposes.

Christoph Bachhuber, Simon Conrady, Michael Schütz, Eckehard Steinbach

Visual Servoing of Wheeled Mobile Robots Under Dynamic Environment

In this paper, a novel method is proposed about visual servoing with a monocular wheeled mobile robot system, which can deal with dynamic visual targets. For the existing methods, the mobile robot can reach the desired pose correctly where the feature points are fixed. However, the mobile robot cannot reach the desired pose when the feature points have been moved. By adding a fixed camera in the scene for monitoring the movement of the feature points, the proposed approach can still work well when feature points are moved. From the images of the monitor camera, the POSIT algorithm is utilized to calculate the movement of the feature points. Likewise, we can calculate the relationship between the relevant coordinate systems with respect to the feature points. By coordinate transformation, the translation vector and rotation matrix between the current pose and the desired pose of mobile robot can be obtained by real-time image feedback. Finally, by utilizing the polar coordinate representation, a motion controller is adopted to pilot the mobile robot reach the desired pose. The feasibility of the proposed method is investigated by simulation results.

Chenghao Yin, Baoquan Li, Wuxi Shi, Ning Sun

Towards Space Carving with a Hand-Held Camera

With the rise of VR applications, dense reconstruction of 3D object models becomes an increasingly important subproblem of computer vision. Most existing methods focus on the reconstruction of the actual object and assume that camera poses are known and the observed object is clearly dominant in the image. The goal of this paper is to extend these technologies to less artificial data, and enable dense 3D object modeling from an ordinary hand-held camera observing an object on top of a structured, unknown planar background. The key of our method consists of recovering highly accurate camera poses from structure from motion based on a planar scene assumption, and modeling the structure on the planar background with implicitly smooth Bezier splines. We present a complete end-to-end pipeline able to produce meaningful dense 3D models from a simple space carving approach in near real-time.

Zhirui Wang, Laurent Kneip

Monocular Epipolar Constraint for Optical Flow Estimation

In this paper, the usage of the monocular epipolar geometry for the calculation of optical flow is investigated. We derive the necessary formulation to use the epipolar constraint for the calculation of differential optical flow using the total variational model in a multi-resolution pyramid scheme. Therefore, we minimize an objective function which contains the epipolar constraint with a residual function based on different types of descriptors (brightness, HOG, CENSUS and MLDP). For the calculation of epipolar lines, the relevant fundamental matrices are calculated based on the 7- and 8- point methods. Moreover, SIFT and Lukas-Kanade methods are used to obtain matched features between two consecutive frames, by which fundamental matrices can be calculated. The effect of using different combination of the feature matching methods, fundamental matrix calculation and descriptors are evaluated based on the KITTI 2012 dataset.

Mahmoud A. Mohamed, M. Hossein Mirabdollah, Bärbel Mertsching

Visual Servo Tracking Control of Quadrotor with a Cable Suspended Load

To follow a moving target, the visual servo control of a quadrotor with a cable suspended load is proposed. A monocular camera with rotation degree along y axis is equipped on the quadrotor. The dynamic model for the whole system is presented to design target tracking controller. An image based visual servoing controller is proposed to provide position and yaw reference information for the quadrotor control, when the quadrotor is too far away from target. OpenTLD is used to provide the visual feedback information for visual servoing. When the quadrotor is close enough to the target, the AprilTag technology is applied to provide the pose information for position control. Based on the dynamic model and the reference information from the visual servoing or AprilTag, a quadrotor-load PD controller is presented. Finally, simulation results are presented to illustrate the effectiveness of the proposed approaches.

Erping Jia, Haoyao Chen, Yanjie Li, Yunjiang Lou, Yunhui Liu

Global Localization for Future Space Exploration Rovers

In the context of robotic space exploration the problem of autonomous global or absolute localization remains unsolved. Current rovers require human in the loop approaches to acquire global positioning. In this paper we assess this problem by refining our previous work in a way that advances the performance of the system while making the procedure feasible for real implementation on rovers. A map of semantic landmarks (the Global Network - GN) is extracted on an area that the rover traverses prior to the mission and, during the exploration, a Local Network (LN) is built and matched to estimate rover’s global location. We have optimized several aspects of the system: the motion estimation, the detection and classification –by benchmarking several classifiers– and we have tested the system in a Mars like scenario. With the aim to achieve realistic terms in our scenario a custom robotic platform was developed, bearing operation features similar to ESA’s ExoMars. Our results indicate that the proposed system is able to perform global localization and converges relatively fast to an accurate solution.

Evangelos Boukas, Athanasios S. Polydoros, Gianfranco Visentin, Lazaros Nalpantidis, Antonios Gasteratos

Visual Navigation


Vision-Based Robot Path Planning with Deep Learning

In this paper, a new method based on deep convolutional neural network (CNN) for path planning of robot is proposed, the aim of which is to transform the mission of path planning into a task of environment classification. Firstly, the images of road are collected from cameras installed as required, and then the comprehensive features are abstracted directly from original images through the CNN. Finally, according to the results of classification, the moving direction of robots is exported. In this way, we build an end-to-end recognition system which maps from raw data to motion behavior of robot. Furthermore, experiment has been provided to demonstrate the performance of the proposed method on different roads.

Ping Wu, Yang Cao, Yuqing He, Decai Li

A Method for Improving Target Tracking Accuracy of Vehicle Radar

Compared with other vehicle sensors, the vehicle radar has strong adaptability and it can detect targets in the harsh environment, which make the vehicle radar technology become popular in the field of vehicle sensors. However, there will be a big error in target tracking while detecting targets on the road through vehicle radar. In order to solve this problem, this paper has present a method which uses Elman neural network to study historical trajectory of the target and then predicts the next coordinates of this target. At last the improved nearest neighbor method is used to remove false alarms by combining information of the predicted coordinates and the measurement coordinates, so the target tracking can be finished. The Matlab simulation results prove that the method can improve the target tracking accuracy of vehicle radar.

Yue Guan, Shanshan Feng, Haibo Huang, Liguo Chen

A Robust RGB-D Image-Based SLAM System

Visual SLAM is widely used in robotics and computer vision. Although there have been many excellent achievements over the past few decades, there are still some challenges. 2D feature-based SLAM algorithm has been suffering from the inaccurate or insufficient correspondences while dealing with the case of textureless or frequently repeating regions. Furthermore, most of the SLAM systems cannot be used for long-term localization in a wide range of environment because of the heavy burden of calculating and memory. In this paper, we propose a robust RGB-D keyframe-based SLAM algorithm. The novelty of proposed approach lies in using both 2D and 3D features for tracking, pose estimation and bundle adjustment. By using 2D and 3D features, the SLAM system can achieve high accuracy and robustness in some challenging environments. The experimental results on TUM RGB-D dataset [1] and ICL-NUIM dataset [2] verify the effectiveness of our algorithm.

Liangliang Pan, Jun Cheng, Wei Feng, Xiaopeng Ji

Robust Relocalization Based on Active Loop Closure for Real-Time Monocular SLAM

Remarkable performance has been achieved using the state-of-the-art monocular Simultaneous Localization and Mapping (SLAM) algorithms. However, tracking failure is still a challenging problem during the monocular SLAM process, and it seems to be even inevitable when carrying out long-term SLAM in large-scale environments. In this paper, we propose an active loop closure based relocalization system, which enables the monocular SLAM to detect and recover from tracking failures automatically even in previously unvisited areas where no keyframe exists. We test our system by extensive experiments including using the most popular KITTI dataset, and our own dataset acquired by a hand-held camera in outdoor large-scale and indoor small-scale real-world environments where man-made shakes and interruptions were added. The experimental results show that the least recovery time (within 5 ms) and the longest success distance (up to 46 m) were achieved comparing to other relocalization systems. Furthermore, our system is more robust than others, as it can be used in different kinds of situations, i.e., tracking failures caused by the blur, sudden motion and occlusion. Besides robots or autonomous vehicles, our system can also be employed in other applications, like mobile phones, drones, etc.

Xieyuanli Chen, Huimin Lu, Junhao Xiao, Hui Zhang, Pan Wang

On Scale Initialization in Non-overlapping Multi-perspective Visual Odometry

Multi-perspective camera systems pointing into all directions represent an increasingly interesting solution for visual localization and mapping. They combine the benefits of omni-directional measurements with a sufficient baseline for producing measurements in metric scale. However, the observability of metric scale suffers from degenerate cases if the cameras do not share any overlap in their field of view. This problem is of particular importance in many relevant practical applications, and it impacts most heavily on the difficulty of bootstrapping the structure-from-motion process. The present paper introduces a complete real-time pipeline for visual odometry with non-overlapping, multi-perspective camera systems, and in particular presents a solution to the scale initialization problem. We evaluate our method on both simulated and real data, thus proving robust initialization capacity as well as best-in-class performance regarding the overall motion estimation accuracy.

Yifu Wang, Laurent Kneip

Visual Inspection


A Vision Inspection System for the Defects of Resistance Spot Welding Based on Neural Network

The appearance of spot welding reflects the quality of welding to a large extent. In this study, we developed a vision inspection system, which recognizes the defects of weld in electronic components based on neural network. First, the images of weld are acquired by color camera. Then, we extracted 15 features from the welding images that had been corrected and enhanced. Finally, we used 1800 training samples to train the neural network. And then we got a accuracy of 95.82% under 407 testing samples by the neural network classifier, which had 15 input nodes, 4 hidden nodes and 2 output nodes.

Shaofeng Ye, Zhiye Guo, Peng Zheng, Lei Wang, Chun Lin

Resistance Welding Spot Defect Detection with Convolutional Neural Networks

A convolutional neural network based method is proposed in this paper to classify the images of resistance welding spot. The features of resistance wielding spots are very complex and diverse, which made it difficult to separate the good ones and the bad ones using hard threshold. Several types of convolutional neural networks with different depths and layer nodes are built to learn the features of welding spot. 10 thousand labeled images are used for training and 3 hundred images are used to test the network. As a result, we get a 99.01% accuracy on test images, which is 97.70% better than human inspection.

Zhiye Guo, Shaofeng Ye, Yiju Wang, Chun Lin

Semi-automatic Training of an Object Recognition System in Scene Camera Data Using Gaze Tracking and Accelerometers

Object detection and recognition algorithms usually require large, annotated training sets. The creation of such datasets requires expensive manual annotation. Eye tracking can help in the annotation procedure. Humans use vision constantly to explore the environment and plan motor actions, such as grasping an object.In this paper we investigate the possibility to semi-automatically train object recognition with eye tracking, accelerometer in scene camera data, learning from the natural hand-eye coordination of humans. Our approach involves three steps. First, sensor data are recorded using eye tracking glasses that are used in combination with accelerometers and surface electromyography that are usually applied when controlling prosthetic hands. Second, a set of patches are extracted automatically from the scene camera data while grasping an object. Third, a convolutional neural network is trained and tested using the extracted patches.Results show that the parameters of eye-hand coordination can be used to train an object recognition system semi-automatically. These can be exploited with proper sensors to fine-tune a convolutional neural network for object detection and recognition. This approach opens interesting options to train computer vision and multi-modal data integration systems and lays the foundations for future applications in robotics. In particular, this work targets the improvement of prosthetic hands by recognizing the objects that a person may wish to use. However, the approach can easily be generalized.

Matteo Cognolato, Mara Graziani, Francesca Giordaniello, Gianluca Saetta, Franco Bassetto, Peter Brugger, Barbara Caputo, Henning Müller, Manfredo Atzori

A Surface Defect Detection Based on Convolutional Neural Network

Surface defect detection is a common task in industry production. Generally, designer has to find out a suitable feature to separate defects in the image. The hand-designed feature always changes with different surface properties which lead to weak ability in other datasets. In this paper, we firstly present a general detecting method based on convolutional neural network (CNN) to overcome the common shortcoming. CNN is used to complete image patch classification. And features are automatically exacted in this part. Then, we build a voting mechanism to do a final classification and location. The good performances obtained in both arbitrary textured images and special structure images prove that our algorithm is better than traditional case-by-case detection one. Subsequently, we accelerate algorithm in order to achieve real-time requirements. Finally, multiple scale detection is proposed to get a more detailed locating boundary and a higher accuracy.

Xiaojun Wu, Kai Cao, Xiaodong Gu

A Computer Vision System to Localize and Classify Wastes on the Streets

Littering quantification is an important step for improving cleanliness of cities. When human interpretation is too cumbersome or in some cases impossible, an objective index of cleanliness could reduce the littering by awareness actions. In this paper, we present a fully automated computer vision application for littering quantification based on images taken from the streets and sidewalks. We have employed a deep learning based framework to localize and classify different types of wastes. Since there was no waste dataset available, we built our acquisition system mounted on a vehicle. Collected images containing different types of wastes. These images are then annotated for training and benchmarking the developed system. Our results on real case scenarios show accurate detection of littering on variant backgrounds.

Mohammad Saeed Rad, Andreas von Kaenel, Andre Droux, Francois Tieche, Nabil Ouerhani, Hazım Kemal Ekenel, Jean-Philippe Thiran

Image Processing


Two Effective Algorithms for Color Image Denoising

We present two effective algorithms for removing impulse noise from color images. Our proposed algorithms take a two-step approach: in the first step, noise color pixel candidates are identified by an impulse detector, and in the second step, only those identified noise candidates in the image are restored by using a modified weighted vector median filter. Extensive experiments indicate that our proposed algorithms have good performance, and are more effective than most of the existing algorithms in removing impulse noise from color images.

Jian-jun Zhang, Jian-li Zhang, Meng Gao

An Image Mosaic Method Based on Deviation Splitting and Transferring for Vehicle Panoramic System

A method for look-around image mosaic is proposed, in order to solve the problem that dislocation distortion still exists in overlap region in terms of panoramic system. Firstly, the principal and construction difficulties of panoramic system are analyzed. On this basis, combined with the common panoramic image stitching method, the cause for poor image stitching is analyzed. Afterwards, with the purpose of reducing distortion in overlap region, a method for panoramic image mosaic by dividing left and right images to transfer and split original stitching deviation is proposed. In addition, the detailed implementation plan is given. The proposed stitching method is realized by MATLAB programming, and the experiment shows that the dislocation distortion is less than conventional method. The proposed panoramic image mosaic method is small-dislocation and non-blind, thus it can satisfy the requirement of human vision and contribute to panoramic system’s realization.

Shanshan Feng, Ting Wang, Haibo Huang, Liguo Chen

Edge Detection Using Convolutional Neural Networks for Nematode Development and Adaptation Analysis

The Antarctic nematode Plectus Murrayi is an excellent model organism for the study of stress and molecular mechanisms. Biologists analyze its development and adaptation by measuring the body length and volume. This work proposes an edge detection algorithm to automate this labor-intensive task. Traditional edge detection techniques use predefined filters to calculate the edge strength and apply a threshold to it to identify edge pixels. These classic edge detection techniques work independently of the image data and their results are sometimes inconsistent when edge contrast varies. Convolutional Neural Networks (CNNs) are regarded as powerful visual models that yield hierarchies of features learned from image data, and perform well for edge detection. Most CNNs based edge detection methods rely on classification networks to determine if an edge point exists at the center of a small image patch. This patch-by-patch classification approach is slow and inconsistent. In this paper, we propose an efficient CNN-based regression network that is able to produce accurate edge detection result. This network learns a direct end-to-end mapping between the original image and the desired edge image. This image-to-edge mapping is represented as a CNN that takes the original image as the input and outputs its edge map. The feature-based mapping rules of the network are learned directly from the training images and their accompanying ground truth. Experimental results show that this architecture achieves accurate edge detection and is faster than other CNN-based methods.

Yao Chou, Dah Jye Lee, Dong Zhang

Dynamic Environments Localization via Dimensions Reduction of Deep Learning Features

How to autonomous locate a robot quickly and accurately in dynamic environments is a primary problem for reliable robot navigation. Monocular visual localization combined with deep learning has gained incredible results. However, the features extracted from deep learning are of huge dimensions and the matching algorithm is complex. How to reduce dimensions with precise localization is one of the difficulties. This paper presents a novel approach for robot localization by training in dynamic environments in a large scale. We extracted features from AlexNet and reduced dimensions of features with IPCA, and what’s more, we reduced ambiguities with kernel method, normalization and morphology processing to matching matrix. Finally, we detected best matching sequence online in dynamic environments across seasons. Our localization algorithm can locate robots quickly with high accuracy.

Hui Zhang, Xiangwei Wang, Xiaoguo Du, Ming Liu, Qijun Chen

Human Robot Interaction


A Gesture Recognition Method Based on Binocular Vision System

This paper demonstrates a gesture recognition approach based on binocular camera. The binocular vision system can deal with stereo imaging problem using disparity map. After the cameras are calibrated, the approach uses skin color model and depth information to separate the hand from the environment in the image. And the features of the gestures are extracted by feature extraction algorithm. These gestures as well as their features constitute a set of training examples in machine learning. The Support Vector Machine (SVM), which is supervised learning models, are used to classify these gestures that are labeled with their meaning, such as digits gesture. In training and classification processes, we use the same feature extraction algorithm handling the gesture image and SVM can recognize the meaning of a gesture. The gesture recognition method mentioned in this paper represents a high accuracy in recognizing number gestures.

Liqian Feng, Sheng Bi, Min Dong, Yunda Liu

A Body Emotion-Based Human-Robot Interaction

In order to achieve reasonable and natural interaction when facing vague human actions, a body emotion-based human-robot interaction (BEHRI) algorithm was developed in this paper. Laban movement analysis and fuzzy logic inference was used to extract the movement emotion and torso pose emotion. A finite state machine model was constructed to describe the paradigm of the robot emotion, and then the interactive strategy was designed to generate suitable interactive behaviors. The algorithm was evaluated on UTD-MHAD, and the overall system was tested via questionnaire. The experimental results indicated that the proposed BEHRI algorithm was able to analyze the body emotion precisely, and the interactive behaviors were accessible and satisfying. BEHRI was shown to have good application potentials.

Tehao Zhu, Qunfei Zhao, Jing Xiong

Robot’s Workspace Enhancement with Dynamic Human Presence for Socially-Aware Navigation

The incorporation of service robots in human populated environments gives rise to the adaptation of cruise strategies that allow robots to move in a natural, secure and ordinary manner among their cohabitants. Therefore, robots should firstly apprehend their space similarly with the people and, secondly, should adopt human motion anticipation strategies in their planning mechanism. The paper at hand introduces a closed-loop human oriented robot navigation strategy, where on-board a moving robot, multimodal human detection and tracking methods are deployed to predict human motion intention in the shared workspace. The human occupied space is probabilistically constrained following the proxemics theory. The impact of human presence in the commonly shared space is imprinted to the robot’s navigation behaviour after undergoing a social filtering step based on the inferred walking pattern. The proposed method has been integrated with a robotic platform and extensively evaluated in terms of socially acceptable behaviour in real-life experiments exhibiting increased navigation capacity in human populated environments.

Ioannis Kostavelis, Andreas Kargakos, Dimitrios Giakoumis, Dimitrios Tzovaras

Speaker Identification System Based on Lip-Motion Feature

Traditional lip features have been used in speech recognition, but lately they have also been found useful as a new biometric identifier in computer vision applications. Firstly, we locate lips according to geometric distribution of human faces. Then, we propose an algorithm for extracting representative frame pictures based on gray changes during speech. Scale-invariant feature transform (SIFT) feature is introduced into speaker identification system. Based on Sift algorithm, we extract lip feature including texture and motion information, which can well describe lip deformation progress during speech. Finally, this paper presents a simple classification algorithm by comparing the ratio of eigenvalue to the reference value. Compared with local binary model (LBP) feature and histogram of oriented gradients (HOG) feature, experimental results show that the improved algorithm of feature extraction and classification can work effectively and achieve a satisfactory performance.

Xinjun Ma, Chenchen Wu, Yuanyuan Li, Qianyuan Zhong

Integrating Stereo Vision with a CNN Tracker for a Person-Following Robot

In this paper, we introduce a stereo vision based CNN tracker for a person following robot. The tracker is able to track a person in real-time using an online convolutional neural network. Our approach enables the robot to follow a target under challenging situations such as occlusions, appearance changes, pose changes, crouching, illumination changes or people wearing the same clothes in different environments. The robot follows the target around corners even when it is momentarily unseen by estimating and replicating the local path of the target. We build an extensive dataset for person following robots under challenging situations. We evaluate the proposed system quantitatively by comparing our tracking approach with existing real-time tracking algorithms.

Bao Xin Chen, Raghavender Sahdev, John K. Tsotsos

Recognition of Human Continuous Action with 3D CNN

Under the boom of the service robot, the human continuous action recognition becomes an indispensable research. In this paper, we propose a continuous action recognition method based on multi-channel 3D CNN for extracting multiple features, which are classified with KNN. First, we use fragmentary action as training samples which can be identified in the process of action. Then the training samples are processed through the gray scale, improved L-K optical flow and Gabor filter, to extract the characteristics of diversification using a priori knowledge. Then the 3D CNN is constructed to process multi-channel features that are formed into 128-dimension feature maps. Finally, we use KNN to classify those samples. We find that the fragmentary action in continuous action of the identification showed a good robustness. And the proposed method is verified in HMDB-51 and UCF-101 to be more accurate than Gaussian Bayes or the single 3D CNN in action recognition.

Gang Yu, Ting Li

Stereo System


Multi-view Shape from Shading Constrained by Stereo Image Analysis

In this paper we present the combination of Shape from Shading and stereo vision based on a fully integrated approach. The surface gradients of two camera views of an object are employed to refine an initial disparity map subject to the constraint of integrability of the resulting surface. The gradient field of the object’s surface is computed using Photometric Stereo and analytical reflectance models with spatially varying parameters. We evaluate the proposed algorithm on three data sets including a metallic object and objects with depth discontinuities and small details. We achieve compelling results on all data sets including the cast iron where our method is less noise-sensitive than the reference 3D scanner. However, since the scanner exhibits high-frequency noise, we use its low-passed depth data as reference. The mean error of all data sets is 1 mm and below with a low-cost acquisition setup, consisting of two cameras and 18 light sources only. Furthermore, a new method to calibrate the lighting of a multi-view Photometric Stereo setup is briefly introduced.

Malte Lenoch, Pia Biebrach, Arne Grumpe, Christian Wöhler

A Wi-Fi Indoor Positioning Modeling Based on Location Fingerprint and Cluster Analysis

Wi-Fi indoor positioning modeling based on location fingerprint and cluster analysis is studied. Specific locations are calculated by using RSSI nearest neighbor estimation method, and the positioning accuracies of different terminals are compared. The RSSI signal intensity is used to make clustering process for the fingerprint database. The noise signal in the fingerprint database is filtered. The traditional location fingerprint database, probability estimation fingerprint database and improved clustering algorithm fingerprint database are established. By comparing the positioning error of the testing data in three different fingerprint databases, the accuracy of indoor positioning is improved. Finally, the Wi-Fi data receiving module, the positioning server module and the positioning display module of positioning terminal are established, and the positioning APP is tested in the actual environment.

Zhili Long, Xuanyu Men, Jin Niu, Xing Zhou, Kuanhong Ma

A Real-Time and Energy-Efficient Embedded System for Intelligent ADAS with RNN-Based Deep Risk Prediction using Stereo Camera

The advanced driver assistance system (ADAS) has been actively researched to enable adaptive cruise control and collision avoidance, however, conventional ADAS is not capable of more advanced functions due to the absence of intelligent decision making algorithms such as behavior analysis. Moreover, most algorithms in automotive applications are accelerated by GPUs where its power consumption exceeds the power requirement for practical usage. In this paper, we present a deep risk prediction algorithm, which predicts risky objects prior to collision by behavior prediction. Also, a real-time embedded system with high energy efficiency is proposed to provide practical application of our algorithm to the intelligent ADAS, consuming only ~1 W in average. For validation, we build the risky urban scene stereo (RUSS) database including 50 stereo video sequences captured under various risky road situations. The system is tested with various databases including the RUSS, and it can maximally achieve 30 frames/s throughput with 720p stereo images with 98.1% of risk prediction accuracy.

Kyuho Lee, Gyeongmin Choe, Kyeongryeol Bong, Changhyeon Kim, In So Kweon, Hoi-Jun Yoo

Open-Source Development of a Low-Cost Stereo-Endoscopy System for Natural Orifice Transluminal Endoscopic Surgery

As a minimally invasive procedure, Natural Orifice Transluminal Endoscopic Surgery (NOTES) offers many significant benefits over traditional open surgery, including reduced risks of post-operative complication and a faster recovery rate. However, one major challenge commonly faced when performing such procedures is the lack of depth perception provided by standard monocular endoscopes, which can in turn pose a limitation on the effectiveness of such endoscopic surgery. To overcome this undesirable lack of depth perception during endoscopic imaging, stereoscopic vision can be introduced into current endoscopy technology to assist surgeons in performing safer and faster operations with better depth judgement. While there is already a vast range of highly advanced stereo-endoscopy systems commercially available in the market, practical implementation of these systems still remains to be largely minimal as a result of their high costs. This paper presents our approach for integrating affordability with functionality, through the development of a simple, low-cost stereo-endoscopy system. Constructed using commonly off-the-shelf materials, the system runs in real time to present stereoscopic images acquired from the stereo-endoscope cameras into the surgeon’s eyes simultaneously, thereby equipping the surgeon with binocular vision for depth perception during endoscopic surgery.

Jia Xin Koh, Hongliang Ren

Image Retrieval


Self-adaptive Feature Fusion Method for Improving LBP for Face Identification

In a recent paper, a multi-scale information fusion method was presented to improve LBP for face identification. However, the additional parameters employed in that method cannot be automatically optimised. In this paper, a novel self-adaptive feature fusion method is proposed which extends the mLBP method by removing the need to optimise these parameters. Our method involves four steps. Firstly, a large number of initial features are generated. Then, we proposed a Fisher criteria-based method for evaluating the discriminative capabilities of different feature groups. After that, we proposed a model based on prism volume for selecting the optimal parameter combination. Finally, the resulting multi-scale feature are fused by a extended Euclidean distance fusion. Extensive experiments on two face databases have shown the proposed self-adaptive feature fusion method can find parameters that are optimal to the data in question, and can produce excellent classification performance.

Xin Wei, Hui Wang, Huan Wan, Bryan Sctoney

Bridging Between Computer and Robot Vision Through Data Augmentation: A Case Study on Object Recognition

Despite the impressive progress brought by deep network in visual object recognition, robot vision is still far from being a solved problem. The most successful convolutional architectures are developed starting from ImageNet, a large scale collection of images of object categories downloaded from the Web. This kind of images is very different from the situated and embodied visual experience of robots deployed in unconstrained settings. To reduce the gap between these two visual experiences, this paper proposes a simple yet effective data augmentation layer that zooms on the object of interest and simulates the object detection outcome of a robot vision system. The layer, that can be used with any convolutional deep architecture, brings to an increase in object recognition performance of up to 7%, in experiments performed over three different benchmark databases. An implementation of our robot data augmentation layer has been made publicly available.

Antonio D’Innocente, Fabio Maria Carlucci, Mirco Colosi, Barbara Caputo

Design and Optimization of the Model for Traffic Signs Classification Based on Convolutional Neural Networks

Recently, convolutional neural networks (CNNs) demonstrate state-of-the-art performance in computer vision such as classification, recognition and detection. In this paper, a traffic signs classification system based on CNNs is proposed. Generally, a convolutional network usually has a large number of parameters which need millions of data and a great deal of time to train. To solve this problem, the strategy of transfer learning is utilized in this paper. Besides, further improvement is implemented on the chosen model to improve the performance of the network by changing some fully connected layers into convolutional connection. This is because that the weight shared feature of convolutional layers is able to reduce the number of parameters contained in a network. In addition, these convolutional kernels are decomposed into multi-layer and smaller convolutional kernels to get a better performance. Finally, the performance of the final optimized network is compared with unoptimized networks. Experimental results demonstrate that the final optimized network presents the best performance.

Jiarong Song, Zhong Yang, Tianyi Zhang, Jiaming Han

Selection and Execution of Simple Actions via Visual Attention and Direct Parameter Specification

Can early visual attention processes facilitate the selection and execution of simple robotic actions? We believe that this is the case. Following the selection–for–action agenda known from human attention, we show that central perceptual processing can be avoided or at least relieved from managing simple motor processes. In an attention–classification–action cycle, salient pre-attentional structures are used to provide features to a set of classifiers. Their action proposals are coordinated, parametrized (via direct parameter specification), and executed. We evaluate the system with a simulated mobile robot.

Jan Tünnermann, Steffen Grüne, Bärbel Mertsching

Visual Detection


Fully Convolutional Networks for Surface Defect Inspection in Industrial Environment

In this paper, we propose a reusable and high-efficiency two-stage deep learning based method for surface defect inspection in industrial environment. Aiming to achieve trade-offs between efficiency and accuracy simultaneously, our method makes a novel combination of a segmentation stage (stage1) and a detection stage (stage2), which are consisted of two fully convolutional networks (FCN) separately. In the segmentation stage we use a lightweight FCN to make a spatially dense pixel-wise prediction to inference the area of defect coarsely and quickly. Those predicted defect areas act as the initialization of stage2, guiding the process of detection to refine the segmentation results. We also use an unusual training strategy: training with the patches cropped from the images. Such strategy has greatly utility in industrial inspection where training data may be scarce. We will validate our findings by analyzing the performance obtained on the dataset of DAGM 2007.

Zhiyang Yu, Xiaojun Wu, Xiaodong Gu

The New Detection Algorithm for an Obstacle’s Information in Low Speed Vehicles

MOD (Moving Object Detection) development methods were used motion region detection methods in image, but it is necessary to detect the position and the size of obstacles in a warning area for collision avoidance in a low speed vehicle. Therefore, this paper proposed the new obstacle detection algorithm. First, the proposed algorithm detects the motion region using MHI (Motion History Image) algorithm, which is based on motion information between image frames. After the algorithm is processed by a high-speed and real-time image processing of a moving obstacle, a warning logic system receives the information of the position and the size of the obstacle nearest to a car. Finally, it determines warning signal send to the control part or not. The proposed algorithm recognizes both fixed and moving obstacles such as cars and buildings using 4 - channel AVM camera images and has a fast calculation speed. After we simulated with the image DBs and the simulation tool, we have 80.07% with the average detection rate.

Sinjae Lee, Seok-Cheol Kee

Trajectory-Pooled Deep Convolutional Networks for Violence Detection in Videos

Violence detection in videos is of great importance in many applications, ranging from teenagers protection to online media filtering and searching to surveillance systems. Typical methods mostly rely on hand-crafted features, which may lack enough discriminative capacity for the specific task of violent action recognition. Inspired by the good performance of deep models for human action recognition, we propose a novel method for detecting human violent behaviour in videos by integrating trajectory and deep convolutional neural networks, which takes advantage of hand-crafted features [21] and deep-learned features [23]. To evaluate this method, we carry out experiments on two different violence datasets: Hockey Fights dataset and Crowd Violence dataset. The results demonstrate the advantage of our method over state-of-the art methods on these datasets.

Zihan Meng, Jiabin Yuan, Zhen Li

Hybrid Distance Metric Learning for Real-Time Pedestrian Detection and Re-identification

Cross-camera pedestrian re-identification (re-ID) is of paramount importance for surveillance tasks. Although considerable progress has been made to improve the re-ID accuracy, real-time pedestrian detection and re-ID remains a challeging problem. In this work, first, we proposed an enhanced aggregated channel features (ACF+) based on the ACF pedestrian detector [1] for real-time pedestrian detection and re-ID; Second, to further improve the representation power of the combined multiple channel features, we proposed a novel hybrid distance metric learning method. Extensive experiments have been carried on two public datasets, including VIPeR, and PRID2011. The experimental results show that our proposed method can achieve state-of-the-art accuracy while being computational efficient for real-time applications. The proposed hybrid distance metric learning is general, thus can be applied to any metric learning approaches.

Xinyu Huang, Jiaolong Xu, Gang Guo, Ergong Zheng

RGB-D Saliency Detection by Multi-stream Late Fusion Network

In this paper we aim to address the problem of saliency detection on RGB-D image pairs based on a multi-stream late fusion network. With the prevalence of RGB-D sensors, leveraging additional depth information to facilitate saliency detection task has drawn increasing attention. However, the key challenge that how to fuse RGB data and depth data in an optimum manner is still under-studied. Conventional wisdom simply regards depth information as an undifferentiated channel and models RGB-D saliency detection by using existing RGB saliency detection models directly. However, this paradigm is incapable of capturing specific representations in depth modality and also powerless in fusing multi-modal information. In this paper, we address this problem by proposing a simple yet principled late fusion strategy carried out in conjunction with convolutional neural networks (CNNs). The proposed network is able to learn discriminant representations and explore the complementarity between RGB and depth modalities. Comprehensive experiments on two public datasets witness the benefits of the proposed RGB-D saliency detection network.

Hao Chen, Youfu Li, Dan Su

Visual Recognition


A Cloud-Based Visual SLAM Framework for Low-Cost Agents

Constrained by on-board resource, most of the low-cost robots could not autonomously navigate in unknown environments. In the latest years, cloud computing and storage has been developing rapidly, making it possible to offload parts of visual SLAM processing to a server. However, most of the cloud-based vSLAM frameworks are not suitable or fully tested for the applications of poor-equipped agents. In this paper, we describe an online localization service on a novel cloud-based framework, where the expensive map storage and global feature matching are provided as a service to agents. It enables a scenario that only sensor data collection is executed on agents, while the cloud aids the agents to localize and navigate. At the end, we evaluate the localization service quantitatively and qualitatively. The results indicate that the proposed cloud framework can fit the requirement of real-time applications.

Jianhao Jiao, Peng Yun, Ming Liu

Vision System for Robotized Weed Recognition in Crops and Grasslands

In this paper, we introduce a novel vision system for robotized weed control on various weed recognition tasks. Initially, we present a robotic platform and its camera setup, that can be used in crop-based and grassland-based weed control tasks. Then, we develop our proposed vision system for robotic application, using a weed recognition framework. The resulting system derives from a sequence of state-of-the-art processes including image preprocessing, feature extraction and detection, codebook learning, feature encoding, image representation and classification. Our novel system is optimized using a dataset which represents a crop-based weed control problem of thistles in sugar beet plantation. Moreover, we apply the proposed vision system to a grassland-based weed recognition problem, the control of the Broad-leaved Dock (Rumex obtusifolius L.). It is experimentally shown that our proposed visual system yields state-of-the-art recognition in both examined datasets, while presenting advantages in terms of autonomy and precision over competing methodologies.

Tsampikos Kounalakis, Georgios A. Triantafyllidis, Lazaros Nalpantidis

Pedestrian Detection Over 100 fps with C4 Algorithm

In this paper a novel pedestrian detection algorithm on GPU is presented, which takes advantage of features of census transform histogram (CENTRIST), rather than common HOG feature. The proposed algorithm uses NVIDIA CUDA framework, and can process VGA images at a speed of 108 fps on a low cost notebook computer with a GPU, while without using any other auxiliary technique. Our Implementation enables a factor 17 speedup over original CENTRIST detector while without compromising any accuracy.

Fei Wang, Caifang Lin, Qian Huang

Fire and Smoke Dynamic Textures Characterization by Applying Periodicity Index Based on Motion Features

Dynamic texture has been described as images sequence that demonstrates continuous movement of pixels intensity change patterns in time. We consider the motion features of smoke and fire dynamic textures, which are important for fire calamity surveillance system to analyze the fire situation. We propose a method to understand the motion of intensity change. The objective is not only for classification purpose but also to characterize the motion pattern of fire and smoke dynamic texture. The radius of vector usually describes how fast the intensity change. The motion coherence index has been developed to assess the motion coherency between observed vector and its neighborhoods. We implement strategic motion coherence analysis to determine the motion coherence index of motion vector field in each video frame. In practical, both covariance stationarity of average radius and motion coherence index are efficiently used to investigate fire and smoke characteristics by applying periodicity index for analysis.

Kanoksak Wattanachote, Zehang Lin, Mingchao Jiang, Liuwu Li, Gongliang Wang, Wenyin Liu

A Novel Real-Time Gesture Recognition Algorithm for Human-Robot Interaction on the UAV

This paper provides a new real-time gesture recognition technology for Unmanned Aerial Vehicle (UAV) Control. Despite of the tradition robot controlling system that uses the pre-defined program to control the UAV, this system allows the users to on-line design and control the UAV to finish the abrupt urgent task with different gestures. The system is composed of three parts: On-line personal feature training system, Gesture recognition system and UAV motion control system. In the first part, we collect and analyze user gestures, extract features data and train the recognition program in real time. In the second part, a multi-feature hierarchical filtering algorithm is applied to guarantee both the accuracy and real-time processing speed of our gesture recognition method. In the last part, the gesture recognition result is translated to a UAV through a data transmitter based on Mavlink protocol to achieve the human on-line control for the UAV. Through two extensive experiments, the effectiveness and efficiency of our method has been confirmed.

Bo Chen, Chunsheng Hua, Jianda Han, Yuqing He

Unsupervised Local Linear Preserving Manifold Reduction with Uncertainty Pretraining for Image Recognition

Manifold learning is an efficient dimensionalilty reduction algorithm. But in real applications, difficulty lies in learning the parameters with limited supervised samples. Our proposed algorithm focuses on sparse representation of local linear preserving manifold dimensionality reduction algorithm and can solve the problem of unsupervised clustering. The manifold preserving methods take use of labeled data in manifold reduction except for the final classifier which produces unsupervised manifold reduction algorithm. Another solution for limited data is a novel proposed pretraining using Bayesian nets to construct the initial parameters for manifold learning, which is also robust to data w.r.t. uncertain perturbations. Then we show its validation in experiments and finally apply the algorithm for real world data. The algorithm performs better in noisy input with limited labeled data.

Qianwen Yang, Fuchun Sun

System Design


Visual Tracking and Servoing System for Experiment of Optogenetic Control of Brain Activity

To study the wireless optogenetic control of neural activity using fully implantable devices, we designed experiments that we make laser emit 980-nm light on the experiment mice brain where the upconversion nanoparticles which works as transducer to convert near-infrared energy to visible lights is implanted, observe the mice activity and record its trajectories. Hence, we propose and implement a automatic visual tracking and servoing system to aid and speed up the experiment. Usually, people drives PTZ for active surveillance tracking which aims to keep the object in the middle of the field of view. In this work, we utilize a PTZ to cast laser beam on the target object as the actuator (PTZ) and the sensor (camera) decoupled that they can be arbitrarily installed. And we also present the automatic parameters calibration method and mathematical modeling for this system to keep high accuracy.

Qinghai Liao, Ming Liu, Wenchong Zhang, Peng Shi

Design and Implementation of the Three-Dimensional Observation System for Adult Zebrafish

In recent years, researchers have paid more attention to the neurobehavioral study of the adult zebrafish. It is very helpful to use a convenient observation system for the zebrafish experiments. However, the existing commercial observation systems are very expensive and the homemade systems are not flexible for different experiments. In this paper, we provide an observation system that has uniform illumination, multi-function, better flexibility and lower cost. Firstly, we designed a lighting system that has the uniform illumination through the optical simulation and polynomial fitting. Secondly, we designed the observation system, which includes tank modules and fixed bracket, so that the system can meet the requirements of many experiments. Finally, we chose white LEDs as light source and the aluminum profiles to implement the system, which make the system cheap and lightweight. In this paper, the observation system we designed achieves good result in the adult zebrafish experiments.

Teng Li, Xuefeng Wang, Mingzhu Sun, Xin Zhao

A Novel Visual Detecting and Positioning Method for Screw Holes

A new visual detecting and positioning method was proposed to solve the positioning problem of screw holes in dark-colored workpieces. Firstly, a red LED lighting system was designed and built to make the screw holes distinct even under dark background of workpieces. Then an improved Hough transform was applied to detect screw holes and a template matching method based on the features of gradation histogram was used for precise positioning of screw holes. After that, a sub-pixel positioning method was adopted in order to achieve the sub-pixel accuracy of screw holes. Further, the image coordinates of screw holes were transformed to the coordinates in world coordinates system according to the camera model. Finally, the experiment results demonstrated the effectiveness, accuracy as well as robustness of the proposed method.

Yuntao Wang, Guoyuan Liang, Sheng Huang, Can Wang, Xinyu Wu

Learning the Floor Type for Automated Detection of Dirt Spots for Robotic Floor Cleaning Using Gaussian Mixture Models

While small floor cleaning robots rather cover area than detect actual dirt, larger floor cleaning robots in commercial settings need to actively detect and clean dirt spots. Floor types that have a single colour or simple texture could be tackled with an approach based on a fixed pattern. However, this restricts the use of the robots considerably. It terms of ease-of-use it is desirable to automatically adapt to a new floor type while still detecting dirt spots. We approach this problem as a one class classification problem and exploit the capability of the Gaussian Mixture Model (GMM) for learning the floor pattern. The advantage of the method is that it operates in an unsupervised way, which allows to adapt to new floor types while moving. An extensive evaluation shows that our method detects dirt spots on different floor types and that it outperforms state-of-the-art approaches especially for floor types with a high-frequency texture.

Andreas Grünauer, Georg Halmetschlager-Funek, Johann Prankl, Markus Vincze

Wind Disturbance Rejection in Position Control of Unmanned Helicopter by Nonlinear Damping

This paper presents a new design of a Lyapunov-redesigned control system for large horizontal wind disturbance rejection on a small-scale unmanned autonomous helicopter (UAH). In this paper, the wind disturbance cannot be treated as small perturbations around the equilibrium state any more. Instead, wind disturbances are considered as force/moment disturbances in the state equation. The force/moment caused by the wind can be estimated by the experimental data obtained in the wind tunnel. The whole control system consists of a nominal system controller and a wind disturbance controller. The nominal system controller is designed with back-stepping algorithm while the wind disturbance controller is designed with nonlinear damping algorithm. The nonlinear damping is introduced to ensure that the whole system has a uniformly bounded solution under uncertain large horizontal wind disturbances. Both longitudinal and lateral wind disturbances are considered in the simulation. The simulation results show the wind disturbances are well rejected and the proposed method can be effective for the position control of UAH in windy environment.

Xiaorui Zhu, Lu Yin, Fucheng Deng

3D Vision/Fusion


Calibration of a Structured Light Measurement System Using Binary Shape Coding

In this paper, a calibration method for structured light system is proposed, which is based on pseudo-random coding theory to generate binary shape-coded pattern. In this method, the checkerboard and binary shape-coded patterns are captured by the camera of the structured light system method. Based on the geometric feature of binary shape-coded pattern, a feature point detector is designed. Then, the feature points in the binary geometric image are extracted, and the topological structure is constructed. After that, the pattern elements are extracted with the affine transformation theory and bilinear interpolation algorithm. The identification of pattern elements is modeled as a supervised classification problem, and the convolutional neural network technique is adopted to recognize the pattern elements by collecting a large number of training samples. Thus, the code-words of the feature points are confirmed. According to the projective transformation principle, the correspondence between the camera image plane and projector image plane is determined. Then, the corner points in the camera image plane are transformed into the projector image plane with the correspondence. Thereby the camera and projector are calibrated with Zhang’s calibration method, and the system calibration is achieved. The experimental results show that the calibration accuracy can reach about 0.2 pixels with the proposed method and the quality of reconstructed surface is great.

Hai Zeng, Suming Tang, Zhan Song, Feifei Gu, Ziyu Huang

An Automatic 3D Textured Model Building Method Using Stripe Structured Light System

This paper presents a novel textured model building method using stripe structured light system. It is implemented by automatic registration of multiple point clouds obtained by the structured light system. Firstly, point clouds captured from different viewpoints are pairwise coarsely registered by feature matching of their corresponding RGB images. Secondly, we use an appropriate function to evaluate the quality of every pairwise coarse registration, and construct a pairwise coarse registration graph which uses point clouds as nodes and the evaluation function between them to weight their corresponding edges. Thirdly, an optimal registration tree will be generated by finding the maximum weight spanning tree of the graph and selecting a node as the root to minimize the depth of the tree. Finally, global fine registration is performed by applying ICP algorithm along the optimal registration tree. Median filtering in luminance space is also applied in the color of the integrated point model to adjust the RGB values. Experiment shows that this approach can automatically build full models which are well-registered and compatible in color for textured objects even when those objects are not rich in geometrical information.

Hualie Jiang, Yuping Ye, Zhan Song, Suming Tang, Yuming Dong

An Efficient Method to Find a Triangle with the Least Sum of Distances from Its Vertices to the Covered Point

Depth sensors are used to acquire a scene from various viewpoints, with the resultant depth images integrated into a 3d model. Generally, due to surface reflectance properties, absorptions, occlusions and accessibility limitations, certain areas of scenes are not sampled, leading to holes and introducing undesirable artifacts. An efficient algorithm for filling holes on organized depth images is high significance. Points far away from a covered point, are usually low probability in the aspect of spatial information, due to contamination of outliers and distortion. The paper shows an algorithm to find a triangle whose vertices are nearest to the covered point.

Guoyi Chi, KengLiang Loi, Pongsak Lasang

An On-Line Calibration Technique for General Infrared Camera

The infrared thermal imaging technology has been widely used in the industrial and military fields because of the strong anti-interference ability. One of problem of the infrared camera application is calibration processing, especially, for the long focal length infrared camera. In this paper, we propose an on-line calibration method for general infrared camera where the infrared camera installed on the PAN-Tilt Unit(PTU). The majority advantage of proposal method is no need calibration board. First, infrared image matching algorithm using edge oriented histogram (EOH) descriptor to find correspondence between frames by setting the PTU to variant angles. Then we demonstrate the Pan-Tilt (PT) image matching and calibration algorithm, which is used to calculate the infrared camera intrinsic matrix. The experiments are done on different wavelengths and focal length infrared camera. Infrared calibration board and our proposal method result were compared. The experiment results show that the proposed method is robust and efficient. And we used on-line calibration technique for long distance UAV (Unmanned Aerial Vehicle) detection and localization.

Dianle Zhou, Xiangrong Zeng, Zhiwei Zhong, Yan Liu


Additional information

Premium Partner

    Image Credits