Skip to main content
main-content

Über dieses Buch

The four-volume set LNCS 7724--7727 constitutes the thoroughly refereed post-conference proceedings of the 11th Asian Conference on Computer Vision, ACCV 2012, held in Daejeon, Korea, in November 2012. The total of 226 contributions presented in these volumes was carefully reviewed and selected from 869 submissions. The papers are organized in topical sections on object detection, learning and matching; object recognition; feature, representation, and recognition; segmentation, grouping, and classification; image representation; image and video retrieval and medical image analysis; face and gesture analysis and recognition; optical flow and tracking; motion, tracking, and computational photography; video analysis and action recognition; shape reconstruction and optimization; shape from X and photometry; applications of computer vision; low-level vision and applications of computer vision.

Inhaltsverzeichnis

Frontmatter

Oral Session 6: Optical Flow and Tracking

Adaptive Integration of Feature Matches into Variational Optical Flow Methods

Despite the significant progress in terms of accuracy achieved by recent variational optical flow methods, the correct handling of large displacements still poses a severe problem for many algorithms. In particular if the motion exceeds the size of an object, standard coarse-to-fine estimation schemes fail to produce meaningful results. While the integration of point correspondences may help to overcome this limitation, such strategies often deteriorate the performance for small displacements due to false or ambiguous matches. In this paper we address the aforementioned problem by proposing an adaptive integration strategy for feature matches. The key idea of our approach is to use the matching energy of the baseline method to carefully select those locations where feature matches may potentially improve the estimation. This adaptive selection does not only reduce the runtime compared to an exhaustive search, it also improves the reliability of the estimation by identifying unnecessary and unreliable features and thus by excluding spurious matches. Results for the Middlebury benchmark and several other image sequences demonstrate that our approach succeeds in handling large displacements in such a way that the performance for small displacements is not compromised. Moreover, experiments even indicate that image sequences with small displacements can benefit from carefully selected point correspondences.

Michael Stoll, Sebastian Volz, Andrés Bruhn

Efficient Learning of Linear Predictors Using Dimensionality Reduction

Using Linear Predictors for template tracking enables fast and reliable real-time processing. However, not being able to learn new templates online limits their use in applications where the scene is not known a priori and multiple templates have to be added online, such as SLAM or SfM. This especially holds for applications running on low-end hardware such as mobile devices. Previous approaches either had to learn Linear Predictors offline [1], or start with a small template and iteratively grow it over time [2]. We propose a fast and simple learning procedure which reduces the necessary training time by up to two orders of magnitude while also slightly improving the tracking robustness with respect to large motions and image noise. This is illustrated in an exhaustive evaluation where we compare our approach with state-of-the-art approaches. Additionally, we show the learning and tracking in mobile phone applications which demonstrates the efficiency of the proposed approach.

Stefan Holzer, Slobodan Ilic, David Joseph Tan, Nassir Navab

Robust Visual Tracking Using Dynamic Classifier Selection with Sparse Representation of Label Noise

Recently a category of tracking methods based on “tracking-by-detection” is widely used in visual tracking problem. Most of these methods update the classifier online using the samples generated by the tracker to handle the appearance changes. However, the self-updating scheme makes these methods suffer from drifting problem because of the incorrect labels of weak classifiers in training samples. In this paper, we split the class labels into true labels and noise labels and model them by sparse representation. A novel dynamic classifier selection method, robust to noisy training data, is proposed. Moreover, we apply the proposed classifier selection algorithm to visual tracking by integrating a part based online boosting framework. We have evaluated our proposed method on 12 challenging sequences involving severe occlusions, significant illumination changes and large pose variations. Both the qualitative and quantitative evaluations demonstrate that our approach tracks objects accurately and robustly and outperforms state-of-the-art trackers.

Yuefeng Chen, Qing Wang

Poster Session 6: Motion, Tracking, and Computational Photography

Dynamic Objectness for Adaptive Tracking

A fundamental problem of object tracking is to adapt to unseen views of the object while not getting distracted by other objects. We introduce

Dynamic Objectness

in a discriminative tracking framework to sporadically re-discover the tracked object based on motion. In doing so, drifting is effectively limited since tracking becomes more aware of objects as independently moving entities in the scene. The approach not only follows the object, but also the background to not easily adapt to other distracting objects. Finally, an appearance model of the object is incrementally built for an eventual re-detection after a partial or full occlusion. We evaluated it on several well-known tracking sequences and demonstrate results with superior accuracy, especially in difficult sequences with changing aspect ratios, varying scale, partial occlusion and non-rigid objects.

Severin Stalder, Helmut Grabner, Luc Van Gool

Visual Tracking in Continuous Appearance Space via Sparse Coding

Particle Filter is the most widely used framework for object tracking. Despite its advantages in handling complex cases, the discretization of the object appearance space makes it difficult to search the solution efficiently, and the number of particles is also greatly limited in consideration of computational cost, especially for some time-consuming object representations, e.g. sparse representation. In this paper, we propose a novel tracking method in which the appearance space is relaxed to be continuous, the solution then can be searched efficiently via sparse coding iteratively. As particle filter, our method can be combined with many generic tracking methods; typically, we adopt ℓ

1

tracker, and demonstrate that with our method both its efficiency and accuracy can be improved in comparison to the version based on particle filter. Another advantage of our method is that it can handle dynamic change of object appearance by adaptively updating the object template model using the learned dictionary, and at the same time can avoid drifting by using representation error for supervision. Our method thus can perform more robust than previous methods in dynamic scenes of gradual changes. Both qualitative and quantitative evaluations demonstrate the efficiency and robustness of the proposed method.

Guofeng Wang, Fan Zhong, Yue Liu, Qunsheng Peng, Xueying Qin

Robust Object Tracking in Crowd Dynamic Scenes Using Explicit Stereo Depth

In this paper, we exploit robust depth information with simple color-shape appearance model on single object tracking in crowd dynamic scenes. Since binocular video streams are captured from a moving camera rig, background subtraction cannot provide a reliable enhancement of region of interest. Our main contribution is a novel tracking strategy to employ explicit stereo depth to track and segment object in crowd dynamic scenes with occlusion handling. Appearance cues including color and shape play a secondary role to further extract the foreground acquired by previous depth-based segmentation. The proposed depth-driven tracking approach can largely alleviate the drifting issue, especially when the object frequently interacts with similar background in long sequence tracking. The problems caused by rapid object appearance change can also be avoided due to the stability of the depth cue. Furthermore, we propose a new, yet simple and effective depth-based scheme to cope with complete occlusion in tracking. From experiments on a large collection of challenging outdoor and indoor sequences, our algorithm demonstrates accurate and reliable tracking performance which outperforms other state-of-the-art competing algorithms.

Chi Li, Le Lu, Gregory D. Hager, Jianyu Tang, Hanzi Wang

Structured Visual Tracking with Dynamic Graph

Structure information has been increasingly incorporated into computer vision field, whereas only a few tracking methods have employed the inner structure of the target. In this paper, we introduce a dynamic graph with pairwise Markov property to model the structure information between the inner parts of the target. The target tracking is viewed as tracking a dynamic undirected graph whose nodes are the target parts and edges are the interactions between parts. These target parts within the graph waiting for matching are separated from the background with graph cut, and a spectral matching technique is exploited to accomplish the graph tracking. With the help of an intuitive updating mechanism, our dynamic graph can robustly adapt to the variations of target structure. Experimental results demonstrate that our structured tracker outperforms several state-of-the-art trackers in occlusion and structure deformations.

Zhaowei Cai, Longyin Wen, Jianwei Yang, Zhen Lei, Stan Z. Li

Online Multi-target Tracking by Large Margin Structured Learning

We present an online data association algorithm for multi-object tracking using structured prediction. This problem is formulated as a bipartite matching and solved by a generalized classification, specifically, Structural Support Vector Machines (S-SVM). Our structural classifier is trained based on matching results given the similarities between all pairs of objects identified in two consecutive frames, where the similarity can be defined by various features such as appearance, location, motion, etc. With an appropriate joint feature map and loss function in the S-SVM, finding the most violated constraint in training and predicting structured labels in testing are modeled by the simple and efficient Kuhn-Munkres (Hungarian) algorithm in a bipartite graph. The proposed structural classifier can be generalized effectively for many sequences without re-training. Our algorithm also provides a method to handle entering/leaving objects, short-term occlusions, and misdetections by introducing virtual agents—additional nodes in a bipartite graph. We tested our algorithm on multiple datasets and obtained comparable results to the state-of-the-art methods with great efficiency and simplicity.

Suna Kim, Suha Kwak, Jan Feyereisl, Bohyung Han

An Anchor Patch Based Optimization Framework for Reducing Optical Flow Drift in Long Image Sequences

Tracking through long image sequences is a fundamental research issue in computer vision. This task relies on estimating correspondences between image pairs over time where error accumulation in tracking can result in

drift

. In this paper, we propose an optimization framework that utilises a novel Anchor Patch algorithm which significantly reduces overall tracking errors given long sequences containing highly deformable objects. The framework may be applied to any tracking algorithm that calculates dense correspondences between images, e.g. optical flow. We demonstrate the success of our approach by showing significant tracking error reduction using 6 existing optical flow algorithms applied to a range of benchmark ground truth sequences. We also provide quantitative analysis of our approach given synthetic occlusions and image noise.

Wenbin Li, Darren Cosker, Matthew Brown

One-Class Multiple Instance Learning and Applications to Target Tracking

Existing work in the field of Multiple Instance Learning (MIL) have only looked at the standard two-class problem assuming both positive and negative bags are available. In this work, we propose the first analysis of the one-class version of MIL problem where one is only provided input data in the form of positive bags. We also propose an SVM-based formulation to solve this problem setting. To make the approach computationally tractable we further develop a iterative heuristic algorithm using instance priors. We demonstrate the validity of our approach with synthetic data and compare it with the two-class approach. While previous work in target tracking using MIL have made certain run-time assumptions (such as motion) to address the problem, we generalize the approach and demonstrate the applicability of our work to this problem domain. We develop a scene prior modeling technique to obtain foreground-background priors to aid our one-class MIL algorithm and demonstrate its performance on standard tracking sequences.

Karthik Sankaranarayanan, James W. Davis

Dense Scene Flow Based on Depth and Multi-channel Bilateral Filter

There is close relationship between depth information and scene flow. However, it’s not fully utilized in most of scene flow estimators. In this paper, we propose a method to estimate scene flow with monocular appearance images and corresponding depth images. We combine a global energy optimization and a bilateral filter into a two-step framework. Occluded pixels are detected by the consistency of appearance and depth, and the corresponding data errors are excluded from the energy function. The appearance and depth information are also utilized in anisotropic regularization to suppress over-smoothing. The multi-channel bilateral filter is introduced to correct scene flow with various information in non-local areas. The proposed approach is tested on Middlebury dataset and the sequences captured by KINECT. Experiment results show that it can estimate dense and accurate scene flow in challenging environments and keep the discontinuity around motion boundaries.

Xiaowei Zhang, Dapeng Chen, Zejian Yuan, Nanning Zheng

Object Tracking within the Framework of Concept Drift

It is well known that the backgrounds or the targets always change in real scenes, which weakens the effectiveness of classical tracking algorithms because of frequent model mismatches. In this paper, an object tracking algorithm within the framework of concept drift is proposed to solve this problem. We detect the driftpoints using a simple message-passing algorithm based on Bayesian Approach. The analyzed probability distribution lays the foundation for the self-adaption of our new model. Our tracking algorithm within the framework of concept drift improves the tracking robustness and accuracy which is illustrated by the two experiments on two real-world changing scenes.

Li Chen, Yue Zhou, Jie Yang

Multiple Target Tracking Using Frame Triplets

This paper addresses the problem of multi-frame, multi-target video tracking. Unlike recent approaches that use only unary and pairwise costs, we propose a solution based on three-frame tracklets to leverage constant-velocity motion constraints while keeping computation time low. Tracklets are solved for within a sliding window of frame triplets, each having a two frame overlap with neighboring triplets. Any inconsistencies in these local tracklet solutions are resolved by considering a larger temporal window, and the remaining tracklets are then merged globally using a min-cost network flow formulation. The result is a set of high-quality trajectories capable of spanning gaps caused by missed detections and long-term occlusions. Our experimental results show good performance in complex scenes.

Asad A. Butt, Robert T. Collins

Spatio-Temporal Clustering Model for Multi-object Tracking through Occlusions

The occlusion in dynamic or clutter scene is a critical issue in multi-object tracking. Using latent variable to formulate this problem, some methods achieved state-of-the-art performance, while making an exact solution computationally intractable. In this paper, we present a hierarchical association framework to address the problem of occlusion in a complex scene taken by a single camera. At the first stage, reliable tracklets are obtained by frame-to-frame association of detection responses in a flow network. After that, we propose to formulate tracklets association problem in a spatio-temporal clustering model which presents the problem as faithfully as possible. Due to the important role that affinity model plays in our formulation, we then construct a sparsity induced affinity model under the assumption that a detection sample in a tracklet can be efficiently represented by another tracklet belonging to the same object. Furthermore, we give a near-optimal algorithm based on globally greedy strategy to deal with spatio-temporal clustering, which runs linearly with the number of tracklets. We quantitatively evaluate the performance of our method on three challenging data sets and achieve a significant improvement compared to state-of-the-art tracking systems.

Lei Zhang, Qing Wang

Robust Object Tracking Using Constellation Model with Superpixel

Tracking objects under occlusion or non-rigid deformation poses a major problem: appearance variation of target makes existing bounding rectangle based representation vulnerable to background noise imported during adaptive appearance update. We address the object tracking problem by exploring superpixel based visual information around the target. Instead of representing each object with a single holistic appearance model, we propose to track each target with multiple related parts and model the tracking system as a Dynamic Bayesian Network(DBN). Based on visual features from superpixels, we propose a constellation appearance model with multiple parts which is adaptable to appearance variations. A particle-based approximate inference algorithm over the DBN is proposed for tracking. Experimental results show that the proposed algorithm performs favorably against existing object trackers especially during deformation and occlusion.

Weijun Wang, Ramakant Nevatia

Robust Registration-Based Tracking by Sparse Representation with Model Update

Object tracking by image registration based on the Lucas-Kanade method has been studied over decades. The classical method is known to be sensitive to illumination changes, pose variation and occlusion. A great number of papers have been presented to address this problem. Despite great advances achieved thus far, robust registration-based tracking in challenging conditions remains unsolved. This paper presents a novel method which extends the Lucas-Kanade using the sparse representation. Our objective function involves joint optimization of the warp function and the optimal linear combination of the test image with a set of basis vectors in a dictionary. The objective function is regularized by ℓ

1

norm of the linear combination coefficients. It is a non-linear and non-convex problem and we minimize it by alternating between the warp function and coefficients. We thus achieve an efficient algorithm which iteratively solves the LASSO and classical Lucas-Kanade by optimizing one while keeping another fixed. Unlike existing sparsity-based work that uses exemplar templates as the object model, we explore the low-dimensional linear subspace of the object appearances for object representation. For adaptation to dynamical scenarios, the mean vector and basis vectors of the appearance subspace are updated online by incremental SVD. Experiments demonstrate the promising performance of the proposed method in challenging image sequences.

Peihua Li, Qilong Wang

Robust and Efficient Pose Estimation from Line Correspondences

We propose a non-iterative solution for the Perspective-n-Line (PnL) problem, which can efficiently and accurately estimate the camera pose for both small number and large number of line correspondences. By selecting a rotation axis in the camera framework, the reference lines are divided into triplets to form a sixteenth order cost function, and then the optimum is retrieved from the roots of the derivative of the cost function by evaluating the orthogonal errors and the reprojected errors of the local minima. The final pose estimation is normalized by a 3D alignment approach. The advantages of the proposed method are as follows: (1) it stably retrieves the optimum of the solution with very little computational complexity and high accuracy; (2) small line sets can be robustly handled to achieve highly accurate results and; (3) large line sets can be efficiently handled because it is O(n).

Lilian Zhang, Chi Xu, Kok-Meng Lee, Reinhard Koch

Nonlocal Spectral Prior Model for Low-Level Vision

Image nonlocal self-similarity has been widely adopted as natural image prior in various low-level vision tasks such as image restoration, while the low-rank matrix recovery theory has been drawing much attention to describe and utilize the image nonlocal self-similarities. However, whether the low-rank prior models exist to characterize the nonlocal self-similarity for a wide range of natural images is not clear yet. In this paper we investigate this issue by evaluating the heavy-tailed distributions of singular values of the matrices of nonlocal similar patches collected from natural images. A novel image prior model, namely nonlocal spectral prior (NSP) model, is then proposed to characterize the singular values of nonlocal similar patches. We consequently apply the NSP model to typical image restoration tasks, including denoising, super-resolution and deblurring, and the experimental results demonstrated the highly competitive performance of NSP in solving these low-level vision problems.

Shenlong Wang, Lei Zhang, Yan Liang

Simultaneous Multiple Rotation Averaging Using Lagrangian Duality

Multiple rotation averaging is an important problem in computer vision. The problem is challenging because of the nonlinear constraints required to represent the set of rotations. To our knowledge no one has proposed any globally optimal solution for the case of simultaneous updates of the rotations. In this paper we propose a simple procedure based on Lagrangian duality that can be used to verify global optimality of a local solution, by solving a linear system of equations. We show experimentally on real and synthetic data that unless the noise levels are extremely high this procedure always generates the globally optimal solution.

Johan Fredriksson, Carl Olsson

Observation-Driven Adaptive Differential Evolution for Robust Bronchoscope 3-D Motion Tracking

This paper proposes an observation-driven adaptive differential evolution (OADE) algorithm for accurate and robust bronchoscope 3-dimensional (3-D) motion tracking during electromagnetically navigated bronchoscopy. Two advantages of our framework are distinguished from any other adaptive differential evolution methods: (1) current observation information including sensor measurement and video image is used in the mutation equation and the selection function, respectively, and (2) the mutation factors and crossover rate are adaptively determined in terms of current image information. From experimental results, our OADE method was demonstrated to be an effective and promising tracking scheme. Our approach can reduce the tracking position error from 3.9 to 2.8 mm, as well as the position smoothness from 4.2 to 1.4 mm.

Xiongbiao Luo, Kensaku Mori

Tracking Growing Axons by Particle Filtering in 3D + t Fluorescent Two-Photon Microscopy Images

Analyzing the behavior of axons in the developing nervous systems is essential for biologists to understand the biological mechanisms underlying how growing axons reach their target cells. The analysis of the motion patterns of growing axons requires detecting axonal tips and tracking their trajectories within complex and large data sets. When performed manually, the tracking task is arduous and time-consuming. To this end, we propose a tracking method, based on the particle filtering technique, to follow the traces of axonal tips that appear as small bright spots in the 3D + 

t

fluorescent two-photon microscopy images exhibiting low signal-to-noise ratios (SNR) and complex background. The proposed tracking method uses multiple dynamic models in the proposal distribution to predict the positions of the growing axons. Furthermore, it incorporates object appearance, motion characteristics of the growing axons, and filament information in the computation of the observation model. The integration of these three sources prevents the tracker from being distracted by other objects that have appearances similar to the tracked objects, resulting in improved accuracy of recovered trajectories. The experimental results obtained from the microscopy images show that the proposed method can successfully estimate trajectories of growing axons, demonstrating its effectiveness even under the presence of noise and complex background.

Huei-Fang Yang, Xavier Descombes, Charles Kervrann, Caroline Medioni, Florence Besse

Image Upscaling Using Multiple Dictionaries of Natural Image Patches

We propose a new high-quality up-scaling technique that extends the existing example based super-resolution (SR) framework. Our approach is based on the fundamental idea that a low-resolution (LR) image could be generated from any of the multiple possible high-resolution (HR) images. Therefore it would be more natural to use multiple predictors of HR patch from LR patch instead of single one. In this work we build a generic framework to estimate an HR image from LR one using an adaptive prior (select the predictor locally) based on the local statistics of LR images. We use natural image patch prior as the HR image statistics. We partition the natural images into documents and group them to discover the inherent topics using probabilistic Latent Semantic Analysis (pLSA) and also learn the dual dictionaries of HR and LR image patch pairs for each of the topics using sparse dictionary learning technique. Then for test image we infer locally which topic it corresponds to and then we use the corresponding learned dual dictionary to generate HR image. Experimental results show the effectiveness of our method over existing state-of-art methods.

Pulak Purkait, Bhabatosh Chanda

A Biologically Motivated Double-Opponency Approach to Illumination Invariance

In this paper we propose a biologically inspired computational model based upon the human visual pathway in order to achieve a feature pair that is robust to changes in scene illumination variation. Here, we draw inspiration from the V4 area in the visual cortex and utilise an approach based upon both, the colour opponency and the spatially opponent centre surround receptive field mechanisms present in the human visual system. We do this making use of an optimisation setting which yields the optimal synaptic strength of the centre-surround neurons based on the colour discrimination for the double-opponent feature pair. This approach greatly reduces the effects of the illuminant in terms of discrimination of perceptually similar colours. We illustrate the utility of our approach for purposes of recognising perceptually similar colours, colour-based object recognition and skin detection under widely varying illumination conditions using bench marked data sets. We also compared our results to those yielded by a number of alternatives.

Sivalogeswaran Ratnasingam, Antonio Robles-Kelly

Measuring Linearity of Closed Curves and Connected Compound Curves

In this paper we define a new linearity measure for closed curves. We start with simple closed curves which represent the boundaries of bounded planar regions. It turns out that the method can be extended to closed curves which self-intersect and also to certain configurations consisting of several curves, including open curve segments. In all cases, the measured linearities range over the interval (0,1], and do not change under translation, rotation and scaling transformations of the considered curve. In addition, the highest possible linearity (which is 1) is reached if and only if the measured curve consists of two overlapping (i.e. coincident) straight line segments. The new linearity measure is theoretically well founded and all related statements are supported with rigorous mathematical proofs.

Paul L. Rosin, Jovanka Pantović, Joviša Žunić

Patch Mosaic for Fast Motion Deblurring

This paper proposes using a mosaic image patches composed of the most informative edges found in the original blurry image for the purpose of estimating a motion blur kernel with minimum computational cost. To select these patches we develop a new image analysis tool to efficiently locate informative patches we call the

informative-edge map

. The combination of

patch mosaic

and informative patch selection enables a new motion blur kernel estimation algorithm to recover blur kernels far more quickly and accurately than existing state-of-the-art methods. We also show that patch mosaic can form a framework for reducing the computation time of other motion deblurring algorithms with minimal modification. Experimental results with various test images show that our algorithm to be 5-100 times faster than previously published blind motion deblurring algorithms while achieving equal or better estimation accuracy.

Hyeoungho Bae, Charless C. Fowlkes, Pai H. Chou

Single-Image Blind Deblurring for Non-uniform Camera-Shake Blur

In this paper we address the problem of estimating latent sharp image and unknown blur kernel from a single motion-blurred image. The blur results from camera shake and is spatially variant. Meanwhile, the blur kernel of motion has three degrees of freedom, i.e., translations and in-plane rotation. In order to solve this problem, we first analyzed the homography blur model for the non-uniform camera-shake blur. We simplified the model to 3-dimensional camera motion which can be accelerated by exploiting the fast Fourier transform to process subsequent image deconvolution. We then proposed an effective method to handle the blind image-deblurring problem by the image decomposition, which does not need to segment the image into local subregions under the assumption of spatially invariant blur. Experimental results on both synthetic and real blurred images show that the presented approach can successfully remove various kinds of blur.

Yuquan Xu, Lu Wang, Xiyuan Hu, Silong Peng

Image Super-Resolution Using Local Learnable Kernel Regression

In this paper, we address the problem of learning-based image super-resolution and propose a novel approach called Local Learnable Kernel Regression (LLKR). The proposed model employs a local metric learning method to improve the kernel regression for reconstructing high resolution images. We formulate the learning problem as seeking multiple optimal Mahalanobis metrics to minimize the total kernel regression errors on the training images. Through learning local metrics in the space of low resolution image patches, our method is capable to build a precise data-adaptive kernel regression model in the space of high resolution patches. Since the local metrics split the whole data set into several subspaces and the training process can be executed off-line, our method is very efficient at runtime. We demonstrate that the new developed method is comparable or even outperforms other super-resolution algorithms on benchmark test images. The experimental results also show that our algorithm can still achieve a good performance even with a large magnification factor.

Renjie Liao, Zengchang Qin

MRF-Based Blind Image Deconvolution

This paper proposes an optimization-based blind image deconvolution method. The proposed method relies on imposing a discrete MRF prior on the deconvolved image. The use of such a prior leads to a very efficient and powerful deconvolution algorithm that carefully combines advanced optimization techniques. We demonstrate the extreme effectiveness of our method by applying it on a wide variety of very challenging cases that involve the inference of large and complicated blur kernels.

Nikos Komodakis, Nikos Paragios

Efficient Image Appearance Description Using Dense Sampling Based Local Binary Patterns

This work presents a novel image appearance description method based on the highly popular local binary pattern (LBP) texture features. The key idea consists of introducing a dense sampling encoding strategy for extracting more stable and discriminative texture patterns in local regions. Compared to the conventional

sparse

sampling scheme commonly used in basic LBP, our proposed dense sampling aims to generate, through a form of up-sampling, more neighboring pixels so that more stable LBP codes, carrying out richer information, are computed. This yields in significantly enhanced image description which is less prone to noise and to sparse and unstable histograms. Another interesting property of the dense sampling scheme is that it can be easily integrated with many existing LBP variants. Extensive experiments on three different classification problems namely face recognition, texture classification and age group estimation on various challenging benchmark databases clearly demonstrate the efficiency of the proposed scheme, showing very promising results compared not only to original LBP but also to state-of-the-art especially in the very demanding task of human age estimation.

Juha Ylioinas, Abdenour Hadid, Yimo Guo, Matti Pietikäinen

Navigation toward Non-static Target Object Using Footprint Detection Based Tracking

The destination of a traditional robot navigation task is usually a static location. However, many real life applications require a robot to continuously identify and find its way toward a non-static target, e.g., following a walking person. In this paper, we present a navigation framework for this task which is based on simultaneous navigation and tracking. It consists of iterations of data acquiring, perception/cognition and motion executing. In the perception/cognition step, visual tracking is introduced to keep track of the target object. This setting is much more challenging than regular tracking tasks, because the target object shows much larger variance in location, shape and size in consecutive images acquired while navigating. A Footprint Detection based Tracker (FD-Tracker) is proposed to robustly track the target object in such scenarios. We first perform object footprint detection in the plan-view map to grasp possible target locations. The information is then fused into a Bayesian tracking framework to prune target candidates. As compared to previous methods, our results demonstrate that using footprint can boost the performance of visual tracker. Promising experimental results of navigating a robot to various goals in an office environment further proofs the robustness of our navigation framework.

Meng Yi, Yinfei Yang, Wenjing Qi, Yu Zhou, Yunfeng Li, Zygmunt Pizlo, Longin Jan Latecki

Single Image Super Resolution Reconstruction in Perturbed Exemplar Sub-space

This paper presents a novel single image super resolution method that reconstructs a super resolution image in an exemplar sub-space. The proposed method first synthesizes LR patches by perturbing the image formation model, and stores them in a dictionary. An SR image is generated by replacing the input image patchwise with an HR patch in the dictionary whose LR patch best matches the input. The abundance of the exemplars enables the proposed method to synthesize SR images within the exemplar sub-space. This gives numerous advantages over the previous methods, such as the robustness against noise. Experiments on documents images show the proposed method outperforms previous methods not only in image quality, but also in recognition rate, namely about 30% higher than the previous methods.

Takashi Shibata, Akihiko Iketani, Shuji Senda

Image Super-Resolution: Use of Self-learning and Gabor Prior

Recent approaches on single image super-resolution (SR) have attempted to exploit self-similarity to avoid the use of multiple images. In this paper, we propose an SR method based on self-learning and Gabor prior. Given a low resolution (LR) test image

I

0

and its coarser resolution version

I

− 1

, both captured from the same camera, we first estimate the degradation between LR and HR (

I

1

) images by constructing the LR-HR patches from LR test image,

I

0

. The HR patches are obtained from

I

0

by searching for similar patches (of

I

0

) of the same size in

I

− 1

. A nearest neighbor search is used to find the best LR match which is then used to obtain the parent HR patch from

I

0

. All such LR-HR patches form

self-learned

dictionaries. The HR patches that do not find LR match in

I

− 1

are estimated using self-learned dictionaries constructed from the already found LR-HR patches. A compressive sensing-based method is used to obtain the missing HR patches. The estimated LR-HR pairs are used to obtain the LR image formation model by computing the degradation for each pair. A new prior, called

Gabor Prior

, based on the outputs of a Gabor filter bank is proposed that restricts the solution space by imposing the condition of preserving the SR features at different frequencies. The experimental results show the effectiveness of the proposed approach.

Nilay Khatri, Manjunath V. Joshi

Oral Session 7: Video Analysis and Action Recognition

Action Disambiguation Analysis Using Normalized Google-Like Distance Correlogram

Classifying realistic human actions in video remains challenging for existing intro-variability and inter-ambiguity in action classes. Recently, Spatial-Temporal Interest Point (STIP) based local features have shown great promise in complex action analysis. However, these methods have the limitation that they typically focus on Bag-of-Words (BoW) algorithm, which can hardly discriminate actions’ ambiguity due to ignoring of spatial-temporal occurrence relations of visual words. In this paper, we propose a new model to capture this contextual relationship in terms of pairwise features’ co-occurrence. Normalized Google-Like Distance (NGLD) is proposed to numerically measuring this co-occurrence, due to its effectiveness in semantic correlation analysis. All pairwise distances compose a NGLD correlogram and its normalized form is incorporated into the final action representation. It is proved a much richer descriptor by observably reducing action ambiguity in experiments, conducted on WEIZMANN dataset and the more challenging UCF sports. Results also demonstrate the proposed model is more effective and robust than BoW on different setups.

Qianru Sun, Hong Liu

Alpha-Flow for Video Matting

This work addresses the problem of video matting, that is extracting the opacity-layer of a foreground object from a video sequence. We introduce the notion of alpha-flow which corresponds to the flow in the opacity layer. The idea is derived from the process of rotoscoping, where a user-supplied object mask is smoothly interpolated between keyframes while preserving its correspondence with the underlying image. Our key contribution is an algorithm which infers both the opacity masks and the alpha-flow in an efficient and unified manner. We embed our algorithm in an interactive video matting system where the first and last frame of a sequence are given as keyframes, and additional user strokes may be provided in intermediate frames. We show high quality results on various challenging sequences, and give a detailed comparison to competing techniques.

Mikhail Sindeev, Anton Konushin, Carsten Rother

Combinational Subsequence Matching for Human Identification from General Actions

Except for gait analysis in a controlled environment, few have considered the use of motion characteristics for human identification, due to the complexity caused by the spatial nonrigidity and temporal randomness of human action. This work is a new attempt at mining biometric information from more general actions. A novel method for calculating the distance between two time series is proposed, where automatic segmentation and matching are conducted simultaneously. Given a query sequence, our method can efficiently match it against the gallery dataset. Local continuity and global optimality are both considered. The matching algorithm is efficiently solved by Linear Programming (LP). Synthetic data sequences and challenging broadcast sports videos are used to validate the effectiveness of our algorithm. The results show that action-based biometrics are promising for human identification, and the proposed approach is effective for this application.

Maodi Hu, Yunhong Wang, James J. Little

Poster Session 7: Video Analysis and Action Recognition

Iterative Semi-Global Matching for Robust Driver Assistance Systems

Semi-global matching (SGM) is a technique of choice for dense stereo estimation in current industrial driver-assistance systems due to its real-time processing capability and its convincing performance. In this paper we introduce iSGM as a new cost integration concept for semi-global matching. In iSGM, accumulated costs are iteratively evaluated and intermediate disparity results serve as input to generate

semi-global distance maps

. This novel data structure supports fast analysis of spatial disparity information and allows for reliable search space reduction in consecutive cost accumulation. As a consequence horizontal costs are stabilized which improves the robustness of the matching result. We demonstrate the superiority of this iterative integration concept against a standard configuration of semi-global matching and compare our results to current state-of-the-art methods on the KITTI Vision Benchmark Suite.

Simon Hermann, Reinhard Klette

Action Recognition Using Canonical Correlation Kernels

In this paper, we propose the canonical correlation kernel (

CCK

), that seamlessly integrates the advantages of lower dimensional representation of videos with a discriminative classifier like

SVM

. In the process of defining the kernel, we learn a low-dimensional (linear as well as nonlinear) representation of the video data, which is originally represented as a tensor. We densely compute features at single (or two) frame level, and avoid any explicit tracking. Tensor representation provides the holistic view of the video data, which is the starting point of computing the

CCK

. Our kernel is defined in terms of the principal angles between the lower dimensional representations of the tensor, and captures the similarity of two videos in an efficient manner. We test our approach on four public data sets and demonstrate consistent superior results over the state of the art methods, including those that use canonical correlations.

G. Nagendar, Sai Ganesh Bandiatmakuri, Mahesh Goud Tandarpally, C. V. Jawahar

A New Framework for Background Subtraction Using Multiple Cues

In this work, to effectively detect moving objects in a fixed camera scene, we propose a novel background subtraction framework employing diverse cues: pixel texture, pixel color and region appearance. The texture information of the scene is clustered by the conventional codebook based background modeling technique, and utilized to detect initial foreground regions. In this process, we employ a new texture operator namely, scene adaptive local binary pattern (SALBP) that provides more consistent and accurate texture-code generation by applying scene adaptive multiple thresholds. Background statistics of the color cues are also modeled by the codebook scheme and employed to refine the texture-based detection results by integrating color and texture characteristics. Finally, appearance of each refined foreground blob is verified by measuring the partial directed Hausdorff distance between the shape of a blob boundary and the edge-map of the corresponding sub-image region in the input frame. The proposed method is compared with other state-of-the-art background subtraction techniques and its results demonstrate that our method outperforms others for complicated environments in video surveillance applications.

SeungJong Noh, Moongu Jeon

Weighted Interaction Force Estimation for Abnormality Detection in Crowd Scenes

In this paper, we propose a weighted interaction force estimation in the social force model(SFM)-based framework, in which the properties of surrounding individuals in terms of motion consistence, distance apart, and angle-of-view along moving directions are fully utilized in order to more precisely discriminate normal or abnormal behaviors of crowd. To avoid the challenges in object tracking in crowded videos, we first perform particle advection to capture the continuity of crowd flow and use these moving particles as individuals for the interaction force estimation. For a more reasonable interaction force estimation, we jointly consider the properties of surrounding individuals, assuming that the individuals with consistent motion (as a particle group) and the ones out of the angle-of-view have no influence on each other, besides the farther apart ones have weaker influence. In particular, particle groups are clustered by spectral clustering algorithm, in which a novel and high discriminative gait feature in frequency domain, combined with spatial and motion feature, is used. The estimated interaction forces are mapped to image span to form force flow, from which bag-of-word features are extracted. Sparse Topical Coding (STC) model is used to find abnormal events. Experiments conducted on three datasets demonstrate the promising performance of our work against other related ones.

Xiaobin Zhu, Jing Liu, Jinqiao Wang, Wei Fu, Hanqing Lu

Egocentric Activity Monitoring and Recovery

This paper presents a novel approach for real-time egocentric activity recognition in which component atomic events are characterised in terms of binary relationships between parts of the body and manipulated objects. The key contribution is to summarise, within a histogram, the relationships that hold over a fixed time interval. This histogram is then classified into one of a number of atomic events. The relationships encode both the types of body parts and objects involved (e.g. wrist, hammer) together with a quantised representation of their distance apart and the normalised rate of change in this distance. The quantisation and classifier are both configured in a prior learning phase from training data. An activity is represented by a Markov model over atomic events. We show the application of the method in the prediction of the next atomic event within a manual procedure (e.g. assembling a simple device) and the detection of deviations from an expected procedure. This could be used for example in training operators in the use or servicing of a piece of equipment, or the assembly of a device from components. We evaluate our approach (’Bag-of-Relations’) on two datasets: ‘labelling and packaging bottles’ and ‘hammering nails and driving screws’, and show superior performance to existing Bag-of-Features methods that work with histograms derived from image features [1]. Finally, we show that the combination of data from vision and inertial (IMU) sensors outperforms either modality alone.

Ardhendu Behera, David C. Hogg, Anthony G. Cohn

Spatiotemporal Salience via Centre-Surround Comparison of Visual Spacetime Orientations

Early delineation of the most salient portions of a temporal image stream (e.g., a video) could serve to guide subsequent processing to the most important portions of the data at hand. Toward such ends, the present paper documents an algorithm for spatiotemporal salience detection. The algorithm is based on a definition of salient regions as those that differ from their surrounding regions, with the individual regions characterized in terms of 3D, (

x

,

y

,

t

), measurements of visual spacetime orientation. The algorithm has been implemented in software and evaluated empirically on a publically available database for visual salience detection. The results show that the algorithm outperforms a variety of alternative algorithms and even approaches human performance.

Andrei Zaharescu, Richard Wildes

Temporal-Spatial Refinements for Video Concept Fusion

The context-based concept fusion (CBCF) is increasingly used in video semantic indexing, which uses various relations among different concepts to refine the original detection results. In this paper, we present a CBCF method called Temporal-Spatial Node Balance algorithm (TSNB). This method is based on a physical model, in which the concepts are regard as nodes and the relations are regard as forces. Then all the spatial and temporal relations and the moving cost of the nodes will be balanced. This method is intuitive and observable to explain a concept how to influence others or be influenced by others. And it uses both the spatial and temporal information to describe the semantic structure of the video. We use TSNB algorithm on the datasets of TRECVid 2005-2010. The results show that this method outperforms all the existed works as we know. Besides, it is faster.

Jie Geng, Zhenjiang Miao, Hai Chi

Features with Feelings—Incorporating User Preferences in Video Categorization

Rapid growth of video content over internet has necessitated an immediate need to organize these large databases into meaningful categories. In this paper, we explore the benefits of leveraging social attitudes (beliefs, opinions, interests and evaluations of people) with machine learning concepts (audio/video features) in the challenging and pressing task of organization of online video databases. Through the analysis of view counts, we model social participation (people’s choices) towards a video’s contents. Observations reveal that viewership patterns are correlated with video genres. We propose logistic growth models to characterize videos based on usage and obtain a probability of video category. We then combine these subjectively assessed priors with likelihood of video class (as estimated from objective audio/video features) to establish the final category in a Bayesian framework. We provide a comparitive analysis of classification accuracies when a) categories are known a priori b) when they are not known a priori. Experimentally, we establish improvement in classification accuracy upon incorporating social attitudes with state-of-the-art audio/video features.

Ramya Srinivasan, Amit K. Roy-Chowdhury

A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition

Bag of visual words (BoVW) models have been widely and successfully used in video based action recognition. One key step in constructing BoVW representation is to encode feature with a codebook. Recently, a number of new encoding methods have been developed to improve the performance of BoVW based object recognition and scene classification, such as soft assignment encoding [1], sparse encoding [2], locality-constrained linear encoding [3] and Fisher kernel encoding [4]. However, their effects for action recognition are still unknown. The main objective of this paper is to evaluate and compare these new encoding methods in the context of video based action recognition. We also analyze and evaluate the combination of encoding methods with different pooling and normalization strategies. We carry out experiments on KTH dataset [5] and HMDB51 dataset [6]. The results show the new encoding methods can significantly improve the recognition accuracy compared with classical VQ. Among them, Fisher kernel encoding and sparse encoding have the best performance. By properly choosing pooling and normalization methods, we achieve the state-of-the-art performance on HMDB51.

Xingxing Wang, LiMin Wang, Yu Qiao

Dynamic Saliency Models and Human Attention: A Comparative Study on Videos

Significant progress has been made in terms of computational models of bottom-up visual attention (saliency). However, efficient ways of comparing these models for still images remain an open research question. The problem is even more challenging when dealing with videos and dynamic saliency. The paper proposes a framework for dynamic-saliency model evaluation, based on a new database of diverse videos for which eye-tracking data has been collected. In addition, we present evaluation results obtained for 4 state-of-the-art dynamic-saliency models, two of which have not been verified on eye-tracking data before.

Nicolas Riche, Matei Mancas, Dubravko Culibrk, Vladimir Crnojevic, Bernard Gosselin, Thierry Dutoit

Horror Video Scene Recognition Based on Multi-view Multi-instance Learning

Comparing with the research of pornographic content filtering on Web, Web horror content filtering, especially horror video scene recognition is still on the stage of exploration. Most existing methods identify horror scene only from independent frames, ignoring the context cues among frames in a video scene. In this paper, we propose a Multi-view Multi-Instance Leaning (M

2

IL) model based on joint sparse coding technique that takes the bag of instances from independent view and contextual view into account simultaneously and apply it on horror scene recognition. Experiments on a horror video dataset collected from internet demonstrate that our method’s performance is superior to the other existing algorithms.

Xinmiao Ding, Bing Li, Weiming Hu, Weihua Xiong, Zhenchong Wang

Learning Object Appearance from Occlusions Using Structure and Motion Recovery

Visual effect creation as used in movie production often require structure and motion recovery and video segmentation. Both techniques are essential to integrate virtual objects between scene elements. In this paper, a new method for video segmentation is presented. It incorporates 3D scene information from the structure and motion recovery. By connecting and evaluating discontinued feature tracks, occlusion and reappearance information is obtained during sequential camera and scene estimation.

The foreground is characterized as image regions which temporarily occlude the rigid scene structure. The scene structure is represented by reconstructed object points. Their projections onto the camera images provide the cues for regions classified as foreground or background. The knowledge of occluded parts of a connected feature track is used to feed the object segmentation which crops the foreground image regions automatically.

Two applications are presented: the occlusion of integrated virtual objects and the blurred background effect. Several demonstrations on official and self-made data show very realistic results in augmented reality.

Kai Cordes, Björn Scheuermann, Bodo Rosenhahn, Jörn Ostermann

Exploring the Similarities of Neighboring Spatiotemporal Points for Action Pair Matching

In this paper we present a novel similarity measure between two image sequences, that is a)robust to different viewpoints and recording conditions (illumination variations and clothing) b)robust to geometric transformations (translation, scale and rotation transformations) and c)invariant to the number of frames of the image sequence as well as of its time scaling. More precisely, we create a similarity measure that exploits the underlying relationships among neighborhoods of detected spatiotemporal points in a frame of an image sequence. We find the space in which the similarities of neighboring spatiotemporal points lie in, and map it to another space of smaller dimensionality. In the new space the projected similarities are of fixed dimensionality, depending on the number of neighbors we have considered. We use the information about that newly extracted space to define a novel similarity measure between two image sequences and create in that way a similarity vector that can be used as an input to a classifier. We apply the proposed similarity measure to the ‘action pair matching’ problem, in which we try to decide whether two action image sequences contain the same action or not. Experiments conducted using the Action Similarity Labeling (ASLAN) dataset verify the superiority of the proposed method over state of the art techniques in terms of accuracy rate.

Irene Kotsia, Ioannis Patras

Sequential Reconstruction Segment-Wise Feature Track and Structure Updating Based on Parallax Paths

This paper presents a novel method for multi-view sequential scene reconstruction scenarios such as in aerial video, that exploits the constraints imposed by the path of a moving camera to allow for a new way of detecting and correcting inaccuracies in the feature tracking and structure computation processes. The main contribution of this paper is to show that for short, planar segments of a continuous camera trajectory, parallax movement corresponding to a viewed scene point should ideally form a scaled and translated version of this trajectory when projected onto a parallel plane. This creates two constraints, which differ from those of standard factorization, that allow for the detection and correction of inaccurate feature tracks and to improve scene structure. Results are shown for real and synthetic aerial video and turntable sequences, where the proposed method was shown to correct outlier tracks, detect and correct tracking drift, and allow for a novel improvement of scene structure, additionally resulting in an improved convergence for bundle adjustment optimization.

Mauricio Hess-Flores, Mark A. Duchaineau, Kenneth I. Joy

Generic Active Appearance Models Revisited

The proposed Active Orientation Models (AOMs) are generative models of facial shape and appearance. Their main differences with the well-known paradigm of Active Appearance Models (AAMs) are (i) they use a different statistical model of appearance, (ii) they are accompanied by a robust algorithm for model fitting and parameter estimation and (iii) and, most importantly, they generalize well to unseen faces and variations. Their main similarity is computational complexity. The project-out version of AOMs is as computationally efficient as the standard project-out inverse compositional algorithm which is admittedly the fastest algorithm for fitting AAMs. We show that not only does the AOM generalize well to unseen identities, but also it outperforms state-of-the-art algorithms for the same task by a large margin. Finally, we prove our claims by providing Matlab code for reproducing our experiments (

http://ibug.doc.ic.ac.uk/resources

).

Georgios Tzimiropoulos, Joan Alabort-i-Medina, Stefanos Zafeiriou, Maja Pantic

Tracking Pedestrian with Multi-component Online Deformable Part-Based Model

In this work we present a novel online algorithm to track pedestrian by integrating both the bottom-up and the top-down models of pedestrian. Motivated by the observation that the appearance of a pedestrian changes a lot in different perspectives or poses, the proposed bottom-up model has multiple components to represent distinct groups of the pedestrian appearances. Also, similar pedestrian appearances have several common salient local patterns and their structure is relatively stable. So, each component of the proposed bottom-up model uses an online deformable part-based model (OLDPM) containing one root and several shared parts to represent the flexible structure and salient local patterns of an appearance. We term the bottom-up model multi-component OLDPM in this paper. We borrow an offline trained class specific pedestrian model [19] as the top-down model. The top-down model is used to extend the bottom-up model with a new OLDPM when a new appearance can’t be covered by the bottom-up model. The multi-component OLDPM has three advantages compared with other models. First, through an incremental support vector machine (INCSVM) [2] associated with the each component, the OLDPM of each component can effectively adapt to the pedestrian appearance variations of a specified perspective and pose. Second, OLDPM can efficiently generate match penalty maps of parts preserving the 2bit binary pattern (2bitBP) [10] through robust real-time pattern matching algorithm [16], and can search over all possible configurations in an image in linear-time by distance transforms algorithm [5]. Last but not least, parts can be shared among components to reduce the computational complexity for matching. We compare our method with four cutting edge tracking algorithms over seven visual sequences and provide quantitative and qualitative performance comparisons.

Yi Xie, Mingtao Pei, Zhao Liu, Tianfu Wu

Local Distance Comparison for Multiple-shot People Re-identification

In this paper, we propose a novel approach for multiple-shot people re-identification. To deal with the multimodal properties of the people appearance distribution, we formulate the re-identification problem as a local distance comparison problem, and introduce an energy-based loss function that measures the similarity between appearance instances by calculating the distance between corresponding subsets (with the same semantic meaning) in feature space. While the loss function favors short distances, which indicate high similarity between different appearances of people, it penalizes large distances and overlaps between subsets, which reflect low similarity between different appearances. In this way, fast people re-identification can be achieved in a robust manner against varying appearance. The performance of our approach has been evaluated by applying it to the public benchmark datasets ETHZ and CAVIAR4REID. Experimental results show significant improvements over previous reports.

Guanwen Zhang, Yu Wang, Jien Kato, Takafumi Marutani, Kenji Mase

Non-sequential Multi-view Detection, Localization and Identification of People Using Multi-modal Feature Maps

We present a novel multi-modal fusion framework for non-sequential person detection, localization and identification from multiple views. Our goal is independent processing of randomly-accessed sections of video, either individual frames or small batches thereof. This way, we aim to limit the error propagation that makes the existing approaches unsuitable for fully-autonomous tracking of multiple people in long video sequences. Our framework uses one or more trained classifiers to fuse multiple weak feature maps. We perform experimental validation on a challenging dataset, demonstrating how the framework can, depending on the provided feature maps, be used either only to improve generic person detection, or enable simultaneous detection and recognition of individuals. Finally, we show that tracking-by-identification using the output of the proposed framework outperforms the state-of-the-art identification-by-tracking approach in terms of preserved track identities.

Rok Mandeljc, Stanislav Kovačič, Matej Kristan, Janez Perš

Full 6DOF Pose Estimation from Geo-Located Images

Estimating the external calibration – the pose – of a camera with respect to its environment is a fundamental task in Computer Vision (CV). In this paper, we propose a novel method for estimating the unknown 6DOF pose of a camera with known intrinsic parameters from epipolar geometry only. For a set of geo-located reference images, we assume the camera position - but not the orientation - to be known. We estimate epipolar geometry between the image of the query camera and the individual reference images using image features. Epipolar geometry inherently contains information about the relative positioning of the query camera with respect to each of the reference cameras, giving rise to a set of relative pose estimates. Combining the set of pose estimates and the positions of the reference cameras in a robust manner allows us to estimate a full 6DOF pose for the query camera. We evaluate our algorithm on different datasets of real imagery in indoor and outdoor environments. Since our pose estimation method does not rely on an explicit reconstruction of the scene, our approach exposes several significant advantages over existing algorithms from the area of pose estimation.

Clemens Arth, Gerhard Reitmayr, Dieter Schmalstieg

Learning a Quality-Based Ranking for Feature Point Trajectories

Long term motion analysis poses many standing challenges that need to be addressed for advancing this field. One of these challenges is finding algorithms that correctly handle occlusion and can detect when a pixel trajectory needs to be stopped. Very few optical algorithms provide an occlusion map and are appropriate for this task. Another challenge is finding a framework for the accurate evaluation of the motion field produced by an algorithm. This work makes two contributions in these directions. First, it presents a RMSE based error measure for evaluating feature tracking algorithms on sequences with rigid motion under the affine camera model. The proposed measure was observed to be consistent with the relative ranking of a number of optical flow algorithms on the Middlebury dataset. Second, it introduces a feature tracking algorithm based on RankBoost that automatically prunes bad trajectories obtained by an optical flow algorithm. The proposed feature tracking algorithm is observed to outperform many feature trackers based on optical flow using both the proposed measure and an indirect measure based on motion segmentation.

Liangjing Ding, Adrian Barbu, Anke Meyer-Baese

Backmatter

Weitere Informationen

Premium Partner

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.

Whitepaper

- ANZEIGE -

Best Practices für die Mitarbeiter-Partizipation in der Produktentwicklung

Unternehmen haben das Innovationspotenzial der eigenen Mitarbeiter auch außerhalb der F&E-Abteilung erkannt. Viele Initiativen zur Partizipation scheitern in der Praxis jedoch häufig. Lesen Sie hier  - basierend auf einer qualitativ-explorativen Expertenstudie - mehr über die wesentlichen Problemfelder der mitarbeiterzentrierten Produktentwicklung und profitieren Sie von konkreten Handlungsempfehlungen aus der Praxis.
Jetzt gratis downloaden!

Bildnachweise