Skip to main content

2015 | Buch

Computer Vision - ACCV 2014 Workshops

Singapore, Singapore, November 1-2, 2014, Revised Selected Papers, Part II

insite
SUCHEN

Über dieses Buch

The three-volume set, consisting of LNCS 9008, 9009, and 9010, contains carefully reviewed and selected papers presented at 15 workshops held in conjunction with the 12th Asian Conference on Computer Vision, ACCV 2014, in Singapore, in November 2014. The 153 full papers presented were selected from numerous submissions. LNCS 9008 contains the papers selected for the Workshop on Human Gait and Action Analysis in the Wild, the Second International Workshop on Big Data in 3D Computer Vision, the Workshop on Deep Learning on Visual Data, the Workshop on Scene Understanding for Autonomous Systems, and the Workshop on Robust Local Descriptors for Computer Vision. LNCS 9009 contains the papers selected for the Workshop on Emerging Topics on Image Restoration and Enhancement, the First International Workshop on Robust Reading, the Second Workshop on User-Centred Computer Vision, the International Workshop on Video Segmentation in Computer Vision, the Workshop: My Car Has Eyes: Intelligent Vehicle with Vision Technology, the Third Workshop on E-Heritage, and the Workshop on Computer Vision for Affective Computing. LNCS 9010 contains the papers selected for the Workshop on Feature and Similarity for Computer Vision, the Third International Workshop on Intelligent Mobile and Egocentric Vision, and the Workshop on Human Identification for Surveillance.

Inhaltsverzeichnis

Frontmatter

Emerging Topics on Image Restoration and Enhancement

Frontmatter
Multi-view Image Restoration from Plenoptic Raw Images

We present a reconstruction algorithm that can restore the captured 4D light field from a portable plenoptic camera without the need for calibration images. An efficient and robust estimator is proposed to accurately detect the centers of microlens images. Based on that estimator, parameters that model the centers of microlens array images are obtained by solving a global optimization problem. To further enhance the quality of reconstructed multi-view images, a novel 4D demosaicing algorithm based on kernel regression is also proposed. Our experimental results show that it outperforms state of the art algorithms.

Shan Xu, Zhi-Liang Zhou, Nicholas Devaney
On the Choice of Tensor Estimation for Corner Detection, Optical Flow and Denoising

Many image processing methods such as corner detection, optical flow and iterative enhancement make use of image tensors. Generally, these tensors are estimated using the structure tensor. In this work we show that the gradient energy tensor can be used as an alternative to the structure tensor in several cases. We apply the gradient energy tensor to common image problem applications such as corner detection, optical flow and image enhancement. Our experimental results suggest that the gradient energy tensor enables real-time tensor-based image enhancement using the graphical processing unit (GPU) and we obtain 40 % increase of frame rate without loss of image quality.

Freddie Åström, Michael Felsberg
Feature-Preserving Image Restoration from Adaptive Triangular Meshes

The triangulation of images has become an active research area in recent years for its compressive representation and ease of image processing and visualization. However, little work has been done on how to faithfully recover image intensities from a triangulated mesh of an image, a process also known as image restoration or decoding from meshes. The existing methods such as linear interpolation, least-square interpolation, or interpolation based on radial basis functions (RBFs) work to some extent, but often yield blurred features (edges, corners, etc.). The main reason for this problem is due to the isotropically-defined Euclidean distance that is taken into consideration in these methods, without considering the anisotropicity of feature intensities in an image. Moreover, most existing methods use intensities defined at mesh nodes whose intensities are often ambiguously defined on or near image edges (or feature boundaries). In the current paper, a new method of restoring an image from its triangulation representation is proposed, by utilizing anisotropic radial basis functions (ARBFs). This method considers not only the geometrical (Euclidean) distances but also the local feature orientations (anisotropic intensities). Additionally, this method is based on the intensities of mesh faces instead of mesh nodes and thus provides a more robust restoration. The two strategies together guarantee excellent feature-preserving restoration of an image with arbitrary super-resolutions from its triangulation representation, as demonstrated by various experiments provided in the paper.

Ke Liu, Ming Xu, Zeyun Yu
Image Enhancement by Gradient Distribution Specification

We propose to use gradient distribution specification for image enhancement. The specified gradient distribution is learned from natural-scene image datasets. This enhances image quality based on two facts: First, the specified distribution is independent of image content. Second, the distance between the learned distribution and the empirical distribution of a given image correlates with subjectively perceived image quality. Based on those two facts, remapping an image such that the distribution of its gradients (and therefore also Laplacians) matches the specified distribution is expected to improve the quality of that image. We call this process “image naturalization”. Our experiments confirm that naturalized images are more appealing to visual perception. Moreover, “naturalness” can be used as a measure of image quality when ground-truth is unknown.

Yuanhao Gong, Ivo F. Sbalzarini
A Two-Step Image Inpainting Algorithm Using Tensor SVD

In this paper, we present a novel exemplar-based image inpainting algorithm using the higher order singular value decomposition (HOSVD). The proposed method performs inpainting of the target image in two steps. At the first step, the target region is inpainted using HOSVD-based filtering of the candidate patches selected from the source region. It helps to propagate the structure and color smoothly in the target region and restrict to appear unwanted artifacts. But a smoothing effect may be visible in the texture regions due to the filtering. In the second step, we recover the texture by an efficient heuristic approach using the already inpainted image. The experimental results show the superiority of the proposed method compared to the state of the art methods.

Mrinmoy Ghorai, Sekhar Mandal, Bhabatosh Chanda
Image Interpolation Based on Weighted and Blended Rational Function

Conventional linear interpolation methods produce interpolated images with blurred edges, while edge directed interpolation methods make enlarged images with good quality edges but with details distortion for some cases. An adaptive rational-based algorithm for the interpolation of digital images with arbitrary scaling factors is proposed. In order to remove artifacts, we construct a new interpolation model with weight and blend, which are used for preserving the clear edge and detail. The proposed model is blended by basic rational interpolation model and three rotated rational models. The weight coefficients are determined by the edge information from different scale based on point sampling. Experimental results show that the proposed method produces images with high objective quality assessment value and good visual quality.

Yifang Liu, Yunfeng Zhang, Qiang Guo, Caiming Zhang

First International Workshop on Robust Reading (IWRR2014)

Frontmatter
Text Localization Based on Fast Feature Pyramids and Multi-Resolution Maximally Stable Extremal Regions

Text localization from scene images is a challenging task that finds application in many areas. In this work, we propose a novel hybrid text localization approach that exploits Multi-resolution Maximally Stable Extremal Regions to discard false-positive detections from the text confidence maps generated by a Fast Feature Pyramid based sliding window classifier. The use of a multi-scale approach during both feature computation and connected component extraction allows our method to identify uncommon text elements that are usually not detected by competing algorithms, while the adoption of approximated features and appropriately filtered connected components assures a low overall computational complexity of the proposed system.

Alessandro Zamberletti, Lucia Noce, Ignazio Gallo
A Hybrid Approach to Detect Texts in Natural Scenes by Integration of a Connected-Component Method and a Sliding-Window Method

Text detection in images of natural scenes is important for scene understanding, content-based image analysis, assistive navigation and automatic geocoding. Achieving such text detection is challenging due to complex backgrounds, non-uniform illumination, and variations in text font, size, and orientation. In this paper, we present a novel hybrid approach for detecting text robustly in natural scenes. We connect two text-detection methods in parallel structure: (1) a connected-component method and (2) a sliding-window method and outputs basically both results. The connected-component method generates text lines based on local relations of connected components. The sliding-window method consisting of a novel Hough Transform-based method generates text lines based on global structure. These two text-detection methods can output complementary results, which enables the system to detect various texts in natural scenes.

Testing with the ICDAR2013 text localization dataset shows that the proposed scheme outperforms the latest published algorithms and the parallel structure consisting of the two different methods contributes to decreasing false negatives and improves recall rate.

Yojiro Tonouchi, Kaoru Suzuki, Kunio Osada
Robust Text Segmentation in Low Quality Images via Adaptive Stroke Width Estimation and Stroke Based Superpixel Grouping

Text segmentation is an important step in the process of character recognition. In literature, there have been numerous methods that work very well in practical applications. However, when an image includes strong noise or surface reflection distraction, accurate text segmentation still faces many challenges. Observing that the stroke width of text is stable and significantly different from that of reflective regions generally, we present a novel method for text segmentation using adaptive stroke width estimation and simple linear iterative clustering superpixel (SLIC-superpixel) region growing in this paper. It consists of four following steps: The first is to normalize image intensity to overcome the influence of gray changes. The second utilizes the intensity consistency to compute normalized stroke width (NSW) map. The third is to estimate the optimal stroke width through searching for the peak value of the histogram of normalized stroke width, the text polarity is also determined. Finally, we propose a local region growing method for text extraction using SLIC-superpixel. Unlike current existing methods of computing stroke width, such as gray level jump on a horizontal scan line and gradient-based SWT methods, the proposed method is based on the statistics of stroke width in the whole image. Hence the stroke width estimation is not only invariant in scale and rotation, but also more robust to surface reflection and noise than that of those methods based only on the pairs of sudden changes of intensity or gradient maps. Experiments with many real images, such as laser marking detonator codes, notice signatures and vehicle license plates,

etc.

, have shown that the proposed algorithm can work well in noised images and also achieve comparable performance with current state-of-the-art method on text segmentation from low quality images.

Anna Zhu, Guoyou Wang, Yangbo Dong
Efficient Character Skew Rectification in Scene Text Images

We present an efficient method for character skew rectification in scene text images. The method is based on a novel skew estimators, which exploit intuitive glyph properties and which can be efficiently computed in a linear time. The estimators are evaluated on a synthetically generated data (including Latin, Cyrillic, Greek, Runic scripts) and real scene text images, where the skew rectification by the proposed method improves the accuracy of a state-of-the-art scene text recognition pipeline.

Michal Bušta, Tomáš Drtina, David Helekal, Lukáš Neumann, Jiří Matas
Performance Improvement of Dot-Matrix Character Recognition by Variation Model Based Learning

This paper describes an effective learning technique for optical dot-matrix characters recognition. Automatic reading system for dot-matrix character is promising for reduction of cost and labor required for quality control of products. Although dot-matrix characters are constructed by specific dot patterns, variation of character appearance due to three-dimensional rotation of printing surface, bleeding of ink and missing parts of character is not negligible. The appearance variation causes degradation of recognition accuracy. The authors propose a technique improving accuracy and robustness of dot-matrix character recognition against such variation, using variation model based learning. The variation model based learning generates training samples containing four types of appearance variation and trains a Modified Quadratic Discriminant Function (MQDF) classifier using generated samples. The effectiveness of the proposed learning technique is empirically evaluated with a dataset which contains 38 classes (2030 character samples) captured from actual products by standard digital cameras. The recognition accuracy has been improved from 78.37 % to 98.52 % by introducing the variation model based learning.

Koji Endo, Wataru Ohyama, Tetsushi Wakabayashi, Fumitaka Kimura
Scene Text Recognition: No Country for Old Men?

It is a generally accepted fact that Off-the-shelf OCR engines do not perform well in unconstrained scenarios like natural scene imagery, where text appears among the clutter of the scene. However, recent research demonstrates that a conventional shape-based OCR engine would be able to produce competitive results in the end-to-end scene text recognition task when provided with a conveniently preprocessed image. In this paper we confirm this finding with a set of experiments where two off-the-shelf OCR engines are combined with an open implementation of a state-of-the-art scene text detection framework. The obtained results demonstrate that in such pipeline, conventional OCR solutions still perform competitively compared to other solutions specifically designed for scene text recognition.

Lluís Gómez, Dimosthenis Karatzas
A Machine Learning Approach to Hypothesis Decoding in Scene Text Recognition

Scene Text Recognition (STR) is a task of localizing and transcribing textual information captured in real-word images. With its increasing accuracy, it becomes a new source of textual data for standard Natural Language Processing tasks and poses new problems because of the specific nature of Scene Text. In this paper, we learn a string hypotheses decoding procedure in an STR pipeline using structured prediction methods that proved to be useful in automatic Speech Recognition and Machine Translation. The model allow to employ a wide range of typographical and language features into the decoding process. The proposed method is evaluated on a standard dataset and improves both character and word recognition performance over the baseline.

Jindřich Libovický, Lukáš Neumann, Pavel Pecina, Jiří Matas
Perspective Scene Text Recognition with Feature Compression and Ranking

In this paper we propose a novel character representation for scene text recognition. In order to recognize each individual character, we first adopt a bag-of-words approach, in which the rotation-invariant circular Fourier-HOG features are densely extracted from an individual character and compressed into middle level features. Then we train a set of two-class linear Support Vector Machines in a one-vs-all schema to rank the compressed features by their contributions to the classification. Based on the ranking result we further select and keep those top rated features to build a compact and discriminative codebook. By using densely extracted features that are rotation-invariant and efficient, our method is capable of recognizing perspective texts of arbitrary orientations, and can be combined with the existing word recognition methods. Experimental results demonstrates that our method is highly efficient and achieves state-of-the-art performance on several benchmark datasets.

Yu Zhou, Shuang Liu, Yongzheng Zhang, Yipeng Wang, Weiyao Lin

Second Workshop on User-Centred Computer Vision (UCCV 2014)

Frontmatter
3D Interaction Through a Real-Time Gesture Search Engine

3D gesture recognition and tracking are highly desired features of interaction design in future mobile and smart environments. Specifically, in virtual/augmented reality applications, intuitive interaction with the physical space seems unavoidable and 3D gestural interaction might be the most effective alternative for the current input facilities such as touchscreens. In this paper, we introduce a novel solution for real-time 3D gesture-based interaction by finding the best match from an extremely large gesture database. This database includes the images of various articulated hand gestures with the annotated 3D position/orientation parameters of the hand joints. Our unique matching algorithm is based on the hierarchical scoring of the low-level edge-orientation features between the query frames and database and retrieving the best match. Once the best match is found from the database in each moment, the pre-recorded 3D motion parameters can instantly be used for natural interaction. The proposed bare-hand interaction technology performs in real-time with high accuracy using an ordinary camera.

Shahrouz Yousefi, Haibo Li
Debugging Object Tracking Results by a Recommender System with Correction Propagation

Achieving error-free object tracking is almost impossible for state-of-the-art tracking algorithms in challenging scenarios such as tracking a large amount of cells over months in microscopy image sequences. Meanwhile, manually debugging (verifying and correcting) tracking results object-by-object and frame-by-frame in thousands of frames is too tedious. In this paper, we propose a novel scheme to debug automated object tracking results with humans in the loop. Tracking data that are highly erroneous are recommended to annotators based on their debugging histories. Since an error found by an annotator may have many analogous errors in the tracking data and the error can also affect its nearby data, we propose a correction propagation scheme to propagate corrections from all human annotators to unchecked data, which efficiently reduces human efforts and accelerates the convergence to high tracking accuracy. Our proposed approach is evaluated on three challenging datasets. The quantitative evaluation and comparison validate that the recommender system with correction propagation is effective and efficient to help humans debug tracking results.

Mingzhong Li, Zhaozheng Yin
An Abstraction for Correspondence Search Using Task-Based Controls

The correspondence problem (finding matching regions in images) is a fundamental task in computer vision. While the concept is simple, the complexity of feature detectors and descriptors has increased as they provide more efficient and higher quality correspondences. This complexity is a barrier to developers or system designers who wish to use computer vision correspondence techniques within their applications. We have designed a novel abstraction layer which uses a task-based description (covering the conditions of the problem) to allow a user to communicate their requirements for the correspondence search. This is mainly based on the idea of

variances

which describe how sets of images vary in blur, intensity, angle, etc. Our framework interprets the description and chooses from a set of algorithms those that satisfy the description. Our proof-of-concept implementation demonstrates the link between the description set by the user and the result returned. The abstraction is also at a high enough level to hide implementation and device details, allowing the simple use of hardware acceleration.

Gregor Miller, Sidney Fels
Interactive Shadow Editing from Single Images

We present a system for interactive shadow editing from single images which includes the manipulations of shape, distribution, sharpness and darkness of shadows according to the features of existing shadows. We first obtain a shadow-free image, shadow boundary and its registered sparse shadow scales using an existing shadow removal method. The modifiable features of the shadow are synthesised from the sparse shadow scales. According to the user-specified shadow-shape and its attributes, our system generates a new shadow matte and composites it into the original image, while also allowing editing of existing shadows. We share our executable for open comparison in community.

Han Gong, Darren Cosker
Hand Part Classification Using Single Depth Images

Hand pose recognition has received increasing attention as an area of HCI. Recently with the spreading of many low cost 3D camera, researches for understanding more natural gestures have been studied. In this paper we present a method for hand part classification and joint estimation from a single depth image. We apply random decision forests(RDF) for hand part classification. Foreground pixels in the hand image are estimated by RDF, which is called per-pixel classification. Then hand joints are estimated based on the classified hand parts. We suggest robust feature extraction method for per-pixel classification, which enhances the accuracy of hand part classification. Depth images and label images synthesized by 3D hand mesh model are used for algorithm verification. Finally we apply our algorithm to the real depth image from conventional 3D camera and show the experiment result.

Myoung-Kyu Sohn, Dong-Ju Kim, Hyunduk Kim
Human Tracking Using a Far-Infrared Sensor Array and a Thermo-Spatial Sensitive Histogram

We propose a human body tracking method using a far-infrared sensor array. A far-infrared sensor array captures the spatial distribution of temperature as a low-resolution image. Since it is difficult to identify a person from the low-resolution thermal image, we can avoid privacy issues. Therefore, it is expected to be applied for the analysis of human behaviors in various places. However, it is difficult to accurately track humans because of the lack of information sufficient to describe the feature of the target human body in the low-resolution thermal image. In order to solve this problem, we propose a thermo-spatial sensitive histogram suitable to represent the target in the low-resolution thermal image. Unlike the conventional histograms, in case of the thermo-spatial sensitive histogram, a voting value is weighted depending on the distance to the target’s position and the difference from the target’s temperature. This histogram allows the accurate tracking by representing the target with multiple histograms and reducing the influence of the background pixels. Based on this histogram, the proposed method tracks humans robustly to occlusions, pose variations, and background clutters. We demonstrate the effectiveness of the method through an experiment using various image sequences.

Takashi Hosono, Tomokazu Takahashi, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase, Tomoyoshi Aizawa, Masato Kawade
Feature Point Tracking Algorithm Evaluation for Augmented Reality in Handheld Devices

In augmented reality applications for handheld devices, accuracy and speed of the tracking algorithm are two of the most critical parameters to achieve realism. This paper presents a comprehensive framework to evaluate feature tracking algorithms on these two parameters. While there is a substantial body of knowledge on these aspects, a novel feature introduced in this paper is the use of error associated with the estimated directional movement in performance measurements to improve the evaluation framework. The work described in this paper is a comparative evaluation of nine widely used feature point tracking algorithms using the developed measurement framework and the results are interpreted based on the characteristics of the algorithms as well as the characteristics of test image sequences.

Amila Perera, Akila Pemasiri, Sameera Wijayarathna, Chameera Wijebandara, Chandana Gamage
Colour Matching Between Stereo Pairs of Images

This paper outlines the process of colour matching stereo pairs of images using a disparity map as input along with the original images. We describe the functionality of a plugin developed for Nuke, which we call EyeMatch, that performs this automatic colour matching under conditions set by the user. The user is presented with various parameters to fine tune the matching process, but no prior knowledge of any of the underlying techniques is necessary. Results are produced quickly, allowing a trial-and-error based approach to fine tuning these parameters, and results are sufficiently accurate to be used in the post-production pipeline.

Stephen Willey, Phil Willis, Jeff Clifford, Ted Waine
User Directed Multi-view-stereo

Depth reconstruction from video footage and image collections is a fundamental part of many modelling and image-based rendering applications. However real-world scenes often contain limited texture information, repeated elements and other ambiguities which remain challenging for fully automatic algorithms. This paper presents a technique that combines intuitive user constraints with dense multi-view stereo reconstruction. By providing annotations in the form of simple paint strokes, a user can guide a multi-view stereo algorithm and avoid common failure cases. We show how smoothness, discontinuity and depth ordering constraints can be incorporated directly into a variational optimization framework for multi-view stereo. Our method avoids the need for heuristic approaches that edit a depth-map in a sequential process, and avoids requiring the user to accurately segment object boundaries or to directly model geometry. We show how with a small amount of intuitive input, a user may create improved depth maps in challenging cases for multi-view-stereo.

Yotam Doron, Neill D. F. Campbell, Jonathan Starck, Jan Kautz
Towards Efficient Feedback Control in Streaming Computer Vision Pipelines

Stream processing is currently an active research direction in computer vision. This is due to the existence of many computer vision algorithms that can be expressed as a pipeline of operations, and the increasing demand for online systems that process image and video streams. Recently, a formal stream algebra has been proposed as an abstract framework that mathematically describes computer vision pipelines. The algebra defines a set of concurrent operators that can describe a pipeline of vision tasks, with image and video streams as operands. In this paper, we extend this algebra framework by developing a formal and abstract description of feedback control in computer vision pipelines. Feedback control allows vision pipelines to perform adaptive parameter selection, iterative optimization and performance tuning. We show how our extension can describe feedback control in the vision pipelines of two state-of-the-art techniques.

Mohamed A. Helala, Ken Q. Pu, Faisal Z. Qureshi

International Workshop on Video Segmentation in Computer Vision

Frontmatter
Background Subtraction: Model-Sharing Strategy Based on Temporal Variation Analysis

This paper presents a new approach for moving detection in complex scenes. Different with previous methods which compare a pixel with its own model and make the model more complex, we take an iterative model-sharing strategy as the process of foreground decision. The current pixel is not only compared with its own model, but may also compared with other pixel’s model which has similar temporal variation. Experiments show that the proposed approach leads to a lower false positive rate and higher precision. It has a better performance when compared with traditional approach.

Yufeng Chen, Kun Zhao, Wenzhe Wu, Shikai Liu
A Fast Object Detecting-Tracking Method in Compressed Domain

The traditional pixel domain tracking algorithms are often applied to rigid objects which move slowly in simple background. But it performs very poor for non-rigid object tracking. In order to solve this problem, this paper proposes a tracking method of rapid detection in compressed domain. Convex hull formed by Self-adaptive boundary searching method and rule-based clustering are adopted for the detector in order to reduce the complexity of the algorithm. At the tracking stage, Kalman filtering is used to forecast the location of the objective. Meanwhile, as the whole process is completed in the compressed domain, it can meet the real-time requirement compared with other algorithms. And it tracks the target more precisely. The experimental results show that the proposed method has the following properties: (1) more advantages in tracking small-sized objects; (2) a better effect when track a fast moving objects; (3) faster tracking speed.

Zenglei Qian, Jiuzhen Liang, Zhiguo Niu, Yongcun Xu, Qin Wu
Automatic RoI Detection for Camera-Based Pulse-Rate Measurement

Remote photoplethysmography (rPPG) enables contactless measurement of pulse-rate by detecting pulse-induced colour changes on human skin using a regular camera. Most of existing rPPG methods exploit the subject face as the Region of Interest (RoI) for pulse-rate measurement by automatic face detection. However, face detection is a suboptimal solution since (1) not all the subregions in a face contain the skin pixels where pulse-signal can be extracted, (2) it fails to locate the RoI in cases when the frontal face is invisible (e.g., side-view faces). In this paper, we present a novel automatic RoI detection method for camera-based pulse-rate measurement, which consists of three main steps: subregion tracking, feature extraction, and clustering of skin regions. To evaluate the robustness of the proposed method, 36 video recordings are made of 6 subjects with different skin-types performing 6 types of head motion. Experimental results show that for the video sequences containing subjects with brighter skin-types and modest body motions, the accuracy of the pulse-rates measured by our method (

$$94\,\%$$

) is comparable to that obtained by a face detector (

$$92\,\%$$

), while the average SNR is significantly improved from 5.8 dB to 8.6 dB.

Ron van Luijtelaar, Wenjin Wang, Sander Stuijk, Gerard de Haan
Sparse Optimization for Motion Segmentation

In this paper, we propose a new framework for segmenting feature-based multiple moving objects with subspace models in affine views. Since the feature data is high-dimensional and complex in the real video sequences, most traditional approaches for motion segmentation use the conventional PCA to obtain a low-dimensional representation, while our proposed framework applies the sparse PCA (SPCA) to obtain a projected subspace, which is a low-dimensional global subspace on a Stiefel manifold with sparse entries. Then, the local subspace separation is achieved via automatically selecting the sparse nearest neighbours. By combining two sparse techniques, the proposed framework segments different motions through a simple spectral clustering on an affinity matrix built with the principal angles. To the best of our knowledge, our framework is the first one to apply the sparse optimization for optimizing the global and local subspace simultaneously. We test our method extensively and compare its performance to several state-of-art motion segmentation methods with experiments on the Hopkins 155 dataset. Our results are comparable with these results, and in many cases exceed them both in terms of segmentation accuracy and computational speed.

Michael Ying Yang, Sitong Feng, Bodo Rosenhahn
Adaptive Foreground Extraction for Crowd Analytics Surveillance on Unconstrained Environments

Background modeling is one of the key steps in any visual surveillance system. A good background modeling algorithm should be able to detect objects/targets under any environmental condition. The influence of illumination variance has been a major challenge in many background modeling algorithms. These algorithms produce poor object segmentation or consume substantial amount of computational time, which makes them not implementable at real time. In this paper we propose a novel background modeling method based on Gaussian Mixture Method (GMM). The proposed method uses Phase Congruency (PC) edge features to overcome the effect of illumination variance, while preserving efficient background/foreground segmentation. Moreover, our method uses a combination of pixel information of GMM and the Phase texture information of PC, to construct a foreground invariant of the illumination variance.

Mohamed Abul Hassan, Aamir Saeed Malik, Walter Nicolas, Ibrahima Faye

My Car Has Eyes: Intelligent Vehicle with Vision Technology

Frontmatter
Driver Assistance System Providing an Intuitive Perspective View of Vehicle Surrounding

Driver assistance systems can help drivers to avoid car accidents by providing warning signals or visual cues of surrounding situations. Instead of the fixed bird’s-eye view monitoring proposed in many previous works, we developed a real-time vehicle surrounding monitoring system that can assist drivers to perceive the vehicle surrounding situations in third-person viewpoints. Four fisheye cameras were mounted around the vehicle in our system. We developed a simple and accurate fisheye camera calibration method to dewarp the captured images into perspective projection ones. Next, we estimated the intrinsic parameters of each undistorted virtual camera by using planar calibration patterns and then obtain the extrinsic camera parameters by using the global patterns on a ground plane. A new method was proposed to tackle the brightness uniformity problem caused by the various lighting conditions of cameras. Finally, we projected the undistorted images onto a 3D hybrid projection model, stitched these images together, and then rendered the images from a third-person viewpoint selected by the driver. The proposed hybrid projection model is composed of a paraboloid model and a columnar model and can achieve rendering results with less distortion. Compared to conventional around-vehicle monitoring systems, our system can provide adaptive, integrated, and intuitive views of vehicle surroundings in a more realistic way.

Yen-Ting Yeh, Chun-Kang Peng, Kuan-Wen Chen, Yong-Sheng Chen, Yi-Ping Hung
Part-Based RDF for Direction Classification of Pedestrians, and a Benchmark

This paper proposes a new benchmark dataset for pedestrian body-direction classification, proposes a new framework for intra-class classification by directly aiming at pedestrian body-direction classification, shows that the proposed framework outperforms a state-of-the-art method,and it also proposes the use of DCT-HOG features (by combining a discrete cosine transform with the histogram of oriented gradients) as a novel approach for defining a random decision forest.

Junli Tao, Reinhard Klette
Path Planning for Unmanned Vehicle Motion Based on Road Detection Using Online Road Map and Satellite Image

This article presents a new methodology for detecting road network and planning the path for vehicle motion using road map and satellite/aerial images. The method estimates road regions from based on network models, which are created from road maps and satellite images on the basis of using image-processing techniques such color filters, difference of Gaussian, and Radon transform. In the case of using the road map images, this method can estimate not only a shape but also a direction of road network, which would not be estimated by the use of the satellite images. However, there are some road segments that branch from the main road are not annotated in road map services. Therefore, it is necessary to detect roads on the satellite image, which is utilized to construct a full path for motion. The scheme of method includes several stages. First, a road network is detected using the road map images, which are collected from online maps services. Second, the detected road network is used to learn a model for road detection in the satellite images. The road network using the satellite images is estimated based on filter models and geometry road structures. Third, the road regions are converted into a Mercator coordinate system and a heuristic based on Dijkstra technique is used to provide the shortest path for vehicle motion. This methodology is tested on the large scene of outdoor areas and the results are documented.

Van-Dung Hoang, Danilo Caceres Hernandez, Alexander Filonenko, Kang-Hyun Jo
Detection and Recognition of Road Markings in Panoramic Images

The detection of road lane markings has many practical applications, such as advanced driver assistance systems and road maintenance. In this paper we propose an algorithm to detect and recognize road lane markings from panoramic images. Our algorithm consists of four steps. First, an inverse perspective mapping is applied to the image, and the potential road markings are segmented based on their intensity difference compared to the surrounding pixels. Second, we extract the distance between the center and the boundary at regular angular steps of each considered potential road marking segment into a feature vector. Third, each segment is classified using a Support Vector Machine (SVM). Finally, by modeling the lane markings, previous false positive detected segments can be rejected based on their orientation and position relative to the lane markings. Our experiments show that the system is capable of recognizing

$$93\,\%$$

,

$$95\,\%$$

and

$$91\,\%$$

of striped line segments, blocks and arrows respectively, as well as

$$94\,\%$$

of the lane markings.

Cheng Li, Ivo Creusen, Lykele Hazelhoff, Peter H. N. de With
A Two Phase Approach for Pedestrian Detection

Most of current pedestrian detectors have pursued high detection rate without carefully considering sample distributions. In this paper, we argue that the following characteristics must be considered; (1) large intra-class variation of pedestrians (multi-modality), and (2) data imbalance between positives and negatives. Pedestrian detection can be regarded as one of

finding needles in a haystack

problems (rare class detection). Inspired by a rare class detection technique, we propose a two-phase classifier integrating an existing baseline detector and a hard negative expert by separately conquering recall and precision. Main idea behind the hard negative expert is to reduce sample space to be learned, so that informative decision boundaries can be effectively learned. The multi-modality problem is dealt with a simple variant of a LDA based random forests as the hard negative expert. We optimally integrate two models by learned integration rules. By virtue of the two-phase structure, our method achieve competitive performance with only little additional computation. Our approach achieves 38.44 % mean miss-rate for the reasonable setting of

Caltech Pedestrian Benchmark

.

Soonmin Hwang, Tae-Hyun Oh, In So Kweon
Uncertainty Estimation for KLT Tracking

The Kanade-Lucas-Tomasi tracker (KLT) is commonly used for tracking feature points due to its excellent speed and reasonable accuracy. It is a standard algorithm in applications such as video stabilization, image mosaicing, egomotion estimation, structure from motion and Simultaneous Localization and Mapping (SLAM). However, our understanding of errors in the output of KLT tracking is incomplete. In this paper, we perform a theoretical error analysis of KLT tracking. We first focus our analysis on the standard KLT tracker and then extend it to the pyramidal KLT tracker and multiple frame tracking. We show that a simple local covariance estimate is insufficient for error analysis and a Gaussian Mixture Model is required to model the multiple local minima in KLT tracking. We perform Monte Carlo simulations to verify the accuracy of the uncertainty estimates.

Sameer Sheorey, Shalini Keshavamurthy, Huili Yu, Hieu Nguyen, Clark N. Taylor

Third ACCV Workshop on E-Heritage

Frontmatter
Combined Hapto-visual and Auditory Rendering of Cultural Heritage Objects

In this work, we develop a multi-modal rendering framework comprising of hapto-visual and auditory data. The prime focus is to haptically render point cloud data representing virtual 3-D models of cultural significance and also to handle their affine transformations. Cultural heritage objects could potentially be very large and one may be required to render the object at various scales of details. Further, surface effects such as texture and friction are incorporated in order to provide a realistic haptic perception to the users. Moreover, the proposed framework includes an appropriate sound synthesis to bring out the acoustic properties of the object. It also includes a graphical user interface with varied options such as choosing the desired orientation of 3-D objects and selecting the desired level of spatial resolution adaptively at runtime. A fast, point proxy-based haptic rendering technique is proposed with proxy update loop running

$$100$$

times faster than the required haptic update frequency of

$$1$$

kHz. The surface properties are integrated in the system by applying a bilateral filter on the depth data of the virtual 3-D models. Position dependent sound synthesis is incorporated with the incorporation of appropriate audio clips.

Praseedha Krishnan Aniyath, Sreeni Kamalalayam Gopalan, Priyadarshini Kumari, Subhasis Chaudhuri
Mesh Denoising Using Multi-scale Curvature-Based Saliency

3D mesh data acquisition is often afflicted by undesirable measurement noise. Such noise has an aversive impact to further processing and also to human perception, and hence plays a pivotal role in mesh processing. We present here a fast saliency-based algorithm that can reduce the noise while preserving the finer details of the original object. In order to capture the object features at multiple scales, our mesh denoising algorithm estimates the mesh saliency from Gaussian weighted curvatures for vertices at fine and coarse scales. The proposed algorithm finds wide application in digitization of archaeological artifacts, such as statues and sculptures, where it is of paramount importance to capture the 3D surface with all its details as accurately as possible. We have tested the algorithm on several datasets, and the results exhibit its speed and efficiency.

Somnath Dutta, Sumandeep Banerjee, Prabir K. Biswas, Partha Bhowmick
A Performance Evaluation of Feature Descriptors for Image Stitching in Architectural Images

We present a performance comparison of

$$4$$

feature descriptors for the task of feature matching in Panorama Stitching on images taken from architectural scenes and archaeological sites. Such scenes are generally characterized by structured objects that vary in their depth and large homogeneous regions. We test SIFT, LIOP, HRI and HRI-CSLTP on

$$4$$

different categories of images: well-structured with some depth variations, partially homogeneous with large depth variations, nearly homogeneous with a little amount of structural details and illumination-variant. These challenges test the distinctiveness and the intensity normalization schemes adopted by these descriptors. HRI-CSLTP and SIFT perform on par with each other and are better than the others on many of the test scenarios while LIOP performs well when the intensity changes are complex. The results of LIOP also show that the order computations of the pixels have to be made in a noise-resilient manner, especially in homogeneous regions.

Prashanth Balasubramanian, Vinay Kumar Verma, Anurag Mittal
Enhancement and Retrieval of Historic Inscription Images

In this paper we have presented a technique for enhancement and retrieval of historic inscription images. Inscription images in general have no distinction between the text layer and background layer due to absence of color difference and possess highly correlated signals and noise; pertaining to which retrieval of such images using search based on feature matching returns inaccurate results. Hence, there is a need to first enhance the readability and then binarize the images to create a digital database for retrieval. Our technique provides a suitable method for the same, by separating the text layer from the non-text layer using the proposed cumulants based Blind Source Extraction(BSE) method, and store them in a digital library with their corresponding historic information. These images are retrieved from database using image search based on Bag-of-Words(BoW) method.

S. Indu, Ayush Tomar, Aman Raj, Santanu Chaudhury
A BRDF Representing Method Based on Gaussian Process

In recent years, digital reconstruction of cultural heritage provides an effective way of protecting historical relics, in which the modeling of surface reflection of historical heritage with high fidelity places a very important role. In this paper Gaussian process (GP) regression based approach is proposed to model the reflection properties of real materials, in which the simulation data generated by the existing model are both used as the training data and the proof that Gaussian process model can be used to describe the material reflection. Matusik’s MERL database is also adopted to perform training and inference and obtain the reflection model of the real material. Simulation results show that the proposed GP regression approach can achieve a good fitting of the reflection properties of certain materials, greatly reduce the BRDF measurement time and ensure high realistic rendering at the same time.

Jianying Hao, Yue Liu, Dongdong Weng
Realistic Walkthrough of Cultural Heritage Sites-Hampi

In this paper we discuss the framework for a realistic walkthrough of cultural heritage sites. The framework includes 3D data acquisition, different data processing steps, coarse to fine 3D reconstruction and rendering to generate realistic walkthrough. Digital preservation of cultural heritage sites is an important area of research since the accessibility of state of the art techniques in computer vision and graphics. We propose a coarse to fine 3D reconstruction of heritage sites using different 3D data acquisition techniques. We have developed geometry based data processing algorithms for 3D data super resolution and hole filling using Riemannian metric tensor and Christoffel symbols as a novel set of features. We generate a walkthrough of the cultural heritage sites using the coarse to fine 3D reconstructed models. We demonstrate the proposed framework using a walkthrough generated for the Vittala Temple at Hampi.

Uma Mudenagudi, Syed Altaf Ganihar, Shreyas Joshi, Shankar Setty, G. Rahul, Somashekhar Dhotrad, Meera Natampally, Prem Kalra
Categorization of Aztec Potsherds Using 3D Local Descriptors

We introduce the Tepalcatl project, an ongoing bi-disciplinary effort conducted by archaeologists and computer vision researchers, which focuses on developing statistical methods for the automatic categorization of potsherds; more precisely, potsherds from ancient Mexico including the Teotihuacan and Aztec civilizations. We captured 3D models of several potsherds, and annotated them using seven taxonomic criteria appropriate for categorization. Our first task consisted in exploiting the descriptive power of two state-of-the-art 3D descriptors. Then, we evaluated their retrieval and classification performance. Finally, we investigated the effects of dimensionality reduction for categorization of our data. Our results are promising and demonstrate the potential of computer vision techniques for archaeological classification of potsherds.

Edgar Roman-Rangel, Diego Jimenez-Badillo, Estibaliz Aguayo-Ortiz
Image Parallax Based Modeling of Depth-Layer Architecture

We present a method to generate a textured 3D model of architecture with a structure of multiple floors and depth layers from image collections. Images are usually used to reconstruct 3D point cloud or analyze facade structure. However, it is still a challenging problem to deal with architecture with depth-layer structure. For example, planar walls and curved roofs appear alternately, front and back layers occlude each other with different depth values, similar materials, and irregular boundaries. A statistic-based top-bottom segmentation algorithm is proposed to divide the 3D point cloud generated by structure-from-motion (SFM) method into different floors. And for each floor with depth layers, a repetition based depth-layer decomposition algorithm based on parallax-shift is proposed to separate the front and back layers, especially for the irregular boundaries. Finally, architecture components are modeled to construct a textured 3D model utilizing the extracting parameters from the segmentation results. Our system has the distinct advantage of producing realistic 3D architecture models with accurate depth information between front and back layers, which is demonstrated by multiple examples in the paper.

Yong Hu, Bei Chu, Yue Qi
A Method for Extracting Text from Stone Inscriptions Using Character Spotting

A novel interactive technique for extraction of text characters from the images of stone inscriptions is introduced in this paper. It is designed particularly for on-site processing of inscription images acquired at various historic palaces, monuments, and temples. Its underlying principle is made of several robust character-analytic elements like HoG features, vowel diacritics, and location-bounded scan lines. Since the process involves character spotting and extraction of the inscribed information to editable text, it would subsequently help the archaeologists for epigraphy, transliteration, and translation of rock inscriptions, particularly for the ones having high degradations, noise, and a variety of styles according to the mason origin and reign. The spotted characters can also be used to create a database for ancient script analysis and related archaeological work. We have tested our method on various stone inscriptions collected from some of the heritage sites of Karnataka, India, and the results are quite promising. An Android application of the proposed work is also developed to aid the epigraphers in the study of inscriptions using a tablet or a mobile phone.

Shashaank M. Aswatha, Ananth Nath Talla, Jayanta Mukhopadhyay, Partha Bhowmick
3D Model Automatic Exploration: Smooth and Intelligent Virtual Camera Control

In a 3D dense point clouds model, virtual tour without assistance is a complex and difficult task discouraging users from doing so. The aim of this work is to achieve a virtual navigation support tool. It will help users to perform virtual tour to explore the 3D model. In particular, the tool will allow to guide the camera automatically. We assume that the user is attracted by rich information areas in the model. This important assumption will be modelled by entropy. Secondly, in order to achieve a realistic automatic navigation we must avoid obstacles, ensure a relevant camera orientation during its motion and regulate the visual movement in the produced image. In this paper, we propose a solution to this problem based on a hierarchical algorithm, which combines the main task to be achieved and the realistic constraints. We validate the system on different complex 3D models: lab, urban environment and a cathedral.

Zaynab Habibi, Guillaume Caron, El Mustapha Mouaddib

Workshop on Computer Vision for Affective Computing (CV4AC)

Frontmatter
A Robust Learning Framework Using PSM and Ameliorated SVMs for Emotional Recognition

This paper proposes a novel machine-learning framework for facial-expression recognition, which is capable of processing images fast and accurately even without having to rely on a large-scale dataset. The framework is derived from Support Vector Machines (SVMs) but distinguishes itself in three key technical points. First, the measure of the samples normalization is based on the Perturbed Subspace Method (PSM), which is an effective way to improve the robustness of a training system. Second, the framework adopts SURF (Speeded Up Robust Features) as features, which is more suitable for dealing with real-time situations. Third, we use region attributes to revise incorrectly detected visual features (described by invisible image attributes at segmented regions of the image). Combining these approaches, the efficiency of machine learning can be improved. Experiments show that the proposed approach is capable of reducing the number of samples effectively, resulting in an obvious reduction in training time.

Jinhui Chen, Yosuke Kitano, Yiting Li, Tetsuya Takiguchi, Yasuo Ariki
Subtle Expression Recognition Using Optical Strain Weighted Features

Optical strain characterizes the relative amount of displacement by a moving object within a time interval. Its ability to compute any small muscular movements on faces can be advantageous to subtle expression research. This paper proposes a novel optical strain weighted feature extraction scheme for subtle facial micro-expression recognition. Motion information is derived from optical strain magnitudes, which is then pooled spatio-temporally to obtain block-wise weights for the spatial image plane. By simple product with the weights, the resulting feature histograms are intuitively scaled to accommodate the importance of block regions. Experiments conducted on two recent spontaneous micro-expression databases–CASMEII and SMIC, demonstrated promising improvement over the baseline results.

Sze-Teng Liong, John See, Raphael C.-W. Phan, Anh Cat Le Ngo, Yee-Hui Oh, KokSheik Wong
Task-Driven Saliency Detection on Music Video

We propose a saliency model to estimate the task-driven eye-movement. Human eye movement patterns is affected by observer’s task and mental state [

1

]. However, the existing saliency model are detected from the low-level image features such as bright regions, edges, colors, etc. In this paper, the tasks (e.g., evaluation of a piano performance) are given to the observer who is watching the music videos. Unlike existing visual-based methods, we use musical score features and image features to detect a saliency. We show that our saliency model outperforms existing models that use eye movement patterns.

Shunsuke Numano, Naoko Enami, Yasuo Ariki
Recognition of Facial Action Units with Action Unit Classifiers and an Association Network

Most previous work of facial action recognition focused only on verifying whether a certain facial action unit appeared or not on a face image. In this paper, we report our investigation on the semantic relationships of facial action units and introduce a novel method for facial action unit recognition based on action unit classifiers and a Bayes network called Facial Action Unit Association Network (FAUAN). Compared with other methods, the proposed method attempts to identify a set of facial action units of a face image simultaneously. We achieve this goal by three steps. At first, the histogram of oriented gradients (HOG) is extracted as features and after that, a Multi-Layer Perceptron (MLP) is trained for the preliminary detection of each individual facial action unit. At last, FAUAN fuses the responses of all the facial action unit classifiers to determine a best set of facial action units. The proposed method achieves a promising performance on the extended Cohn-Kanade Dataset. Experimental results also show that when the individual unit classifiers are not so good, the performance could improve by nearly 10 % in some cases when FAUAN is used.

Junkai Chen, Zenghai Chen, Zheru Chi, Hong Fu
A Non-invasive Facial Visual-Infrared Stereo Vision Based Measurement as an Alternative for Physiological Measurement

Our main aim is to propose a vision-based measurement as an alternative to physiological measurement for recognizing mental stress. The development of this emotion recognition system involved three stages: experimental setup for vision and physiological sensing, facial feature extraction in visual-thermal domain, mental stress stimulus experiment and data analysis and classification based on Support Vector Machine. In this research, 3 vision-based measurement and 2 physiological measurement were implemented in the system. Vision based measurement in facial vision domain consists of eyes blinking and in facial thermal domain consists 3 ROI’s temperature value and blood vessel volume at Supraorbital area. Two physiological measurement were done to measure the ground truth value which is heart rate and salivary amylase level. We also propose a new calibration chessboard attach with fever plaster to locate calibration point in stereo view. A new method of integration of two different sensors for detecting facial feature in both thermal and visual is also presented by applying nostril mask, which allows one to find facial feature namely nose area in thermal and visual domain. Extraction of thermal-visual feature images was done by using SIFT feature detector and extractor to verify the method of using nostril mask. Based on the experiment conducted, 88.6 % of correct matching was detected. In the eyes blinking experiment, almost 98 % match was detected successfully for without glasses and 89 % with glasses. Graph cut algorithm was applied to remove unwanted ROI. The recognition rate of 3 ROI’s was about 90 %–96 %. We also presented new method of automatic detection of blood vessel volume at Supraorbital monitored by LWIR camera. The recognition rate of correctly detected pixel was about 93 %. An experiment to measure mental stress by using the proposed system based on Support Vector Machine classification had been proposed and conducted and showed promising results.

Mohd Norzali Haji Mohd, Masayuki Kashima, Kiminori Sato, Mutsumi Watanabe
A Delaunay-Based Temporal Coding Model for Micro-expression Recognition

Micro-expression recognition has been a challenging problem in computer vision research due to its briefness and subtlety. Previous psychological study shows that even human being can only recognize micro-expressions with low average recognition rates. In this paper, we propose an effective and efficient method to encode the micro-expressions for recognition. The proposed method, referred to as Delaunay-based temporal coding model (DTCM), encodes texture variations corresponding to muscle activities on face due to dynamical micro-expressions. Image sequences of micro-expressions are normalized not only temporally but also spatially based on Delaunay triangulation, so that the influence of personal appearance irrelevant to micro-expressions can be suppressed. Encoding temporal variations at local subregions and selecting spatial salient subregions in the face area escalates the capacity of our method to locate spatiotemporally important features related to the micro-expressions of interest. Extensive experiments on publicly available datasets, including SMIC, CASME, and CASME II, verified the effectiveness of the proposed model.

Zhaoyu Lu, Ziqi Luo, Huicheng Zheng, Jikai Chen, Weihong Li
Backmatter
Metadaten
Titel
Computer Vision - ACCV 2014 Workshops
herausgegeben von
C.V. Jawahar
Shiguang Shan
Copyright-Jahr
2015
Electronic ISBN
978-3-319-16631-5
Print ISBN
978-3-319-16630-8
DOI
https://doi.org/10.1007/978-3-319-16631-5