Top

2010 | Book

Read chapter Read first chapter

Computer Vision – ECCV 2010

11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part V

Editors: Kostas Daniilidis, Petros Maragos, Nikos Paragios

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

The 2010 edition of the European Conference on Computer Vision was held in Heraklion, Crete. The call for papers attracted an absolute record of 1,174 submissions. We describe here the selection of the accepted papers: Thirty-eight area chairs were selected coming from Europe (18), USA and Canada (16), and Asia (4). Their selection was based on the following criteria: (1) Researchers who had served at least two times as Area Chairs within the past two years at major vision conferences were excluded; (2) Researchers who served as Area Chairs at the 2010 Computer Vision and Pattern Recognition were also excluded (exception: ECCV 2012 Program Chairs); (3) Minimization of overlap introduced by Area Chairs being former student and advisors; (4) 20% of the Area Chairs had never served before in a major conference; (5) The Area Chair selection process made all possible efforts to achieve a reasonable geographic distribution between countries, thematic areas and trends in computer vision. Each Area Chair was assigned by the Program Chairs between 28–32 papers. Based on paper content, the Area Chair recommended up to seven potential reviewers per paper. Such assignment was made using all reviewers in the database including the conflicting ones. The Program Chairs manually entered the missing conflict domains of approximately 300 reviewers. Based on the recommendation of the Area Chairs, three reviewers were selected per paper (with at least one being of the top three suggestions), with 99.

Frontmatter

Spotlights and Posters W2

Towards Computational Models of the Visual Aesthetic Appeal of Consumer Videos

In this paper, we tackle the problem of characterizing the aesthetic appeal of consumer videos and automatically classifying them into high or low aesthetic appeal. First, we conduct a controlled user study to collect ratings on the aesthetic value of 160 consumer videos. Next, we propose and evaluate a set of low level features that are combined in a hierarchical way in order to model the aesthetic appeal of consumer videos. After selecting the 7 most discriminative features, we successfully classify aesthetically appealing

vs.

aesthetically unappealing videos with a 73% classification accuracy using a support vector machine.

Anush K. Moorthy, Pere Obrador, Nuria Oliver

Object Recognition Using Junctions

In this paper, we propose an object detection/recognition algorithm based on a new set of shape-driven features and morphological operators. Each object class is modeled by the corner points (junctions) on its contour. We design two types of shape-context like features between the corner points, which are efficient to compute and effective in capturing the underlying shape deformation. In the testing stage, we use a recently proposed junction detection algorithm [1] to detect corner points/junctions on natural images. The detection and recognition of an object are then done by matching learned shape features to those in the input image with an efficient search strategy. The proposed system is robust to a certain degree of scale change and we obtained encouraging results on the ETHZ dataset. Our algorithm also has advantages of recognizing object parts and dealing with occlusions.

Bo Wang, Xiang Bai, Xinggang Wang, Wenyu Liu, Zhuowen Tu

Using Partial Edge Contour Matches for Efficient Object Category Localization

We propose a method for object category localization by partially matching edge contours to a single shape prototype of the category. Previous work in this area either relies on piecewise contour approximations, requires meaningful supervised decompositions, or matches coarse shape-based descriptions at local interest points. Our method avoids error-prone pre-processing steps by using all obtained edges in a partial contour matching setting. The matched fragments are efficiently summarized and aggregated to form location hypotheses. The efficiency and accuracy of our edge fragment based voting step yields high quality hypotheses in low computation time. The experimental evaluation achieves excellent performance in the hypotheses voting stage and yields competitive results on challenging datasets like ETHZ and INRIA horses.

Hayko Riemenschneider, Michael Donoser, Horst Bischof

Active Mask Hierarchies for Object Detection

This paper presents a new object representation, Active Mask Hierarchies (AMH), for object detection. In this representation, an object is described using a mixture of hierarchical trees where the nodes represent the object and its parts in pyramid form. To account for shape variations at a range of scales, a dictionary of masks with varied shape patterns are attached to the nodes at different layers. The shape masks are “active” in that they enable parts to move with different displacements. The masks in this active hierarchy are associated with histograms of words (HOWs) and oriented gradients (HOGs) to enable rich appearance representation of both structured (eg, cat face) and textured (eg, cat body) image regions. Learning the hierarchical model is a latent SVM problem which can be solved by the incremental concave-convex procedure (iCCCP). The resulting system is comparable with the state-of-the-art methods when evaluated on the challenging public PASCAL 2007 and 2009 datasets.

Yuanhao Chen, Long (Leo) Zhu, Alan Yuille

From a Set of Shapes to Object Discovery

This paper presents an approach to object discovery in a given unlabeled image set, based on mining repetitive spatial configurations of image contours. Contours that similarly deform from one image to another are viewed as collaborating, or, otherwise, conflicting. This is captured by a graph over all pairs of matching contours, whose maximum a posteriori multicoloring assignment is taken to represent the shapes of discovered objects. Multicoloring is conducted by our new Coordinate Ascent Swendsen-Wang cut (CASW). CASW uses the Metropolis-Hastings (MH) reversible jumps to probabilistically sample graph edges, and color nodes. CASW extends SW cut by introducing a regularization in the posterior of multicoloring assignments that prevents the MH jumps to arrive at trivial solutions. Also, CASW seeks to learn parameters of the posterior via maximizing a lower bound of the MH acceptance rate. This speeds up multicoloring iterations, and facilitates MH jumps from local minima. On benchmark datasets, we outperform all existing approaches to unsupervised object discovery.

Nadia Payet, Sinisa Todorovic

What Does Classifying More Than 10,000 Image Categories Tell Us?

Image classification is a critical task for both humans and computers. One of the challenges lies in the large scale of the semantic space. In particular, humans can recognize tens of thousands of object classes and scenes. No computer vision algorithm today has been tested at this scale. This paper presents a study of large scale categorization including a series of challenging experiments on classification with more than 10,000 image classes. We find that a) computational issues become crucial in algorithm design; b) conventional wisdom from a couple of hundred image categories on relative performance of different classifiers does not necessarily hold when the number of categories increases; c) there is a surprisingly strong relationship between the structure of WordNet (developed for studying language) and the difficulty of visual categorization; d) classification can be improved by exploiting the semantic hierarchy. Toward the future goal of developing automatic vision algorithms to recognize tens of thousands or even millions of image categories, we make a series of observations and arguments about dataset scale, category density, and image hierarchy.

Jia Deng, Alexander C. Berg, Kai Li, Li Fei-Fei

Modeling and Analysis of Dynamic Behaviors of Web Image Collections

Can we model the temporal evolution of topics in Web image collections

If so, can we exploit the understanding of dynamics to solve novel visual problems or improve recognition performance

? These two challenging questions are the motivation for this work. We propose a nonparametric approach to modeling and analysis of topical evolution in image sets. A scalable and parallelizable sequential Monte Carlo based method is developed to construct the similarity network of a large-scale dataset that provides a base representation for wide ranges of dynamics analysis. In this paper, we provide several experimental results to support the usefulness of image dynamics with the datasets of 47 topics gathered from Flickr. First, we produce some interesting observations such as tracking of subtopic evolution and outbreak detection, which cannot be achieved with conventional image sets. Second, we also present the complementary benefits that the images can introduce over the associated text analysis. Finally, we show that the training using the

temporal association

significantly improves the recognition performance.

Gunhee Kim, Eric P. Xing, Antonio Torralba

Non-local Characterization of Scenery Images: Statistics, 3D Reasoning, and a Generative Model

This work focuses on characterizing scenery images. We semantically divide the objects in natural landscape scenes into background and foreground and show that the shapes of the regions associated with these two types are statistically different. We then focus on the background regions. We study statistical properties such as size and shape, location and relative location, the characteristics of the boundary curves and the correlation of the properties to the region’s semantic identity. Then we discuss the imaging process of a simplified 3D scene model and show how it explains the empirical observations. We further show that the observed properties suffice to characterize the gist of scenery images, propose a generative parametric graphical model, and use it to learn and generate semantic sketches of new images, which indeed look like those associated with natural scenery.

Tamar Avraham, Michael Lindenbaum

Efficient Highly Over-Complete Sparse Coding Using a Mixture Model

Sparse coding of sensory data has recently attracted notable attention in research of learning useful features from the unlabeled data. Empirical studies show that mapping the data into a significantly higher-dimensional space with sparse coding can lead to superior classification performance. However, computationally it is challenging to learn a set of highly over-complete dictionary bases and to encode the test data with the learned bases. In this paper, we describe a mixture sparse coding model that can produce high-dimensional sparse representations very efficiently. Besides the computational advantage, the model effectively encourages data that are similar to each other to enjoy similar sparse representations. What’s more, the proposed model can be regarded as an approximation to the recently proposed local coordinate coding (LCC), which states that sparse coding can approximately learn the nonlinear manifold of the sensory data in a locally linear manner. Therefore, the feature learned by the mixture sparse coding model works pretty well with linear classifiers. We apply the proposed model to PASCAL VOC 2007 and 2009 datasets for the classification task, both achieving

state-of-the-art

performances.

Jianchao Yang, Kai Yu, Thomas Huang

Attribute-Based Transfer Learning for Object Categorization with Zero/One Training Example

This paper studies the one-shot and zero-shot learning problems, where each object category has only one training example or has no training example at all. We approach this problem by transferring knowledge from known categories (a.k.a

source categories

) to new categories (a.k.a

target categories

) via object attributes. Object attributes are high level descriptions of object categories, such as color, texture, shape, etc. Since they represent common properties across different categories, they can be used to transfer knowledge from source categories to target categories effectively. Based on this insight, we propose an attribute-based transfer learning framework in this paper. We first build a generative attribute model to learn the probabilistic distributions of image features for each attribute, which we consider as attribute priors. These attribute priors can be used to (1) classify unseen images of target categories (zero-shot learning), or (2) facilitate learning classifiers for target categories when there is only one training examples per target category (one-shot learning). We demonstrate the effectiveness of the proposed approaches using the

Animal with Attributes

data set and show state-of-the-art performance in both zero-shot and one-shot learning tests.

Xiaodong Yu, Yiannis Aloimonos

Image Classification Using Super-Vector Coding of Local Image Descriptors

This paper introduces a new framework for image classification using local visual descriptors. The pipeline first performs a nonlinear feature transformation on descriptors, then aggregates the results together to form image-level representations, and finally applies a classification model. For all the three steps we suggest novel solutions which make our approach appealing in theory, more scalable in computation, and transparent in classification. Our experiments demonstrate that the proposed classification method achieves state-of-the-art accuracy on the well-known PASCAL benchmarks.

Xi Zhou, Kai Yu, Tong Zhang, Thomas S. Huang

A Discriminative Latent Model of Object Classes and Attributes

We present a discriminatively trained model for joint modelling of object class labels (e.g. “person”, “dog”, “chair”, etc.) and their visual attributes (e.g. “has head”, “furry”, “metal”, etc.). We treat attributes of an object as latent variables in our model and capture the correlations among attributes using an undirected graphical model built from training data. The advantage of our model is that it allows us to infer object class labels using the information of both the test image itself and its (latent) attributes. Our model unifies object class prediction and attribute prediction in a principled framework. It is also flexible enough to deal with different performance measurements. Our experimental results provide quantitative evidence that attributes can improve object naming.

Yang Wang, Greg Mori

Seeing People in Social Context: Recognizing People and Social Relationships

The people in an image are generally not strangers, but instead often share social relationships such as husband-wife, siblings, grandparent-child, father-child, or mother-child. Further, the social relationship between a pair of people influences the relative position and appearance of the people in the image. This paper explores using familial social relationships as context for recognizing people and for recognizing the social relationships between pairs of people. We introduce a model for representing the interaction between social relationship, facial appearance, and identity. We show that the family relationship a pair of people share influences the relative pairwise features between them. The experiments on a set of personal collections show significant improvement in people recognition is achieved by modeling social relationships, even in a weak label setting that is attractive in practical applications. Furthermore, we show the social relationships are effectively recognized in images from a separate test image collection.

Gang Wang, Andrew Gallagher, Jiebo Luo, David Forsyth

Discovering Multipart Appearance Models from Captioned Images

Even a relatively unstructured captioned image set depicting a variety of objects in cluttered scenes contains strong correlations between caption words and repeated visual structures. We exploit these correlations to discover named objects and learn hierarchical models of their appearance. Revising and extending a previous technique for finding small, distinctive configurations of local features, our method assembles these co-occurring parts into graphs with greater spatial extent and flexibility. The resulting multipart appearance models remain scale, translation and rotation invariant, but are more reliable detectors and provide better localization. We demonstrate improved annotation precision and recall on datasets to which the non-hierarchical technique was previously applied and show extended spatial coverage of detected objects.

Michael Jamieson, Yulia Eskin, Afsaneh Fazly, Suzanne Stevenson, Sven Dickinson

Voting by Grouping Dependent Parts

Hough voting methods efficiently handle the high complexity of multi-scale, category-level object detection in cluttered scenes. The primary weakness of this approach is however that mutually

dependent

local

observations are

independently

voting for intrinsically

global

object properties such as object scale. All the votes are added up to obtain object hypotheses. The assumption is thus that object hypotheses are a sum of independent part votes. Popular representation schemes are, however, based on an overlapping sampling of semi-local image features with large spatial support (e.g. SIFT or geometric blur). Features are thus mutually dependent and we incorporate these dependences into probabilistic Hough voting by presenting an objective function that combines three intimately related problems: i) grouping of mutually dependent parts, ii) solving the correspondence problem conjointly for dependent parts, and iii) finding concerted object hypotheses using extended groups rather than based on local observations alone. Experiments successfully demonstrate that state-of-the-art Hough voting and even sliding windows are significantly improved by utilizing part dependences and jointly optimizing groups, correspondences, and votes.

Pradeep Yarlagadda, Antonio Monroy, Björn Ommer

Superpixels and Supervoxels in an Energy Optimization Framework

Many methods for object recognition, segmentation, etc., rely on a tessellation of an image into “superpixels”. A superpixel is an image patch which is better aligned with intensity edges than a rectangular patch. Superpixels can be extracted with any segmentation algorithm, however, most of them produce highly irregular superpixels, with widely varying sizes and shapes. A more regular space tessellation may be desired. We formulate the superpixel partitioning problem in an energy minimization framework, and optimize with graph cuts. Our energy function explicitly encourages regular superpixels. We explore variations of the basic energy, which allow a trade-off between a less regular tessellation but more accurate boundaries or better efficiency. Our advantage over previous work is computational efficiency, principled optimization, and applicability to 3D “supervoxel” segmentation. We achieve high boundary recall on images and spatial coherence on video. We also show that compact superpixels improve accuracy on a simple application of salient object segmentation.

Olga Veksler, Yuri Boykov, Paria Mehrani

Segmentation

Convex Relaxation for Multilabel Problems with Product Label Spaces

Convex relaxations for continuous multilabel problems have attracted a lot of interest recently [1,2,3,4,5]. Unfortunately, in previous methods, the runtime and memory requirements scale linearly in the total number of labels, making them very inefficient and often unapplicable for problems with higher dimensional label spaces. In this paper, we propose a reduction technique for the case that the label space is a product space, and introduce proper regularizers. The resulting convex relaxation requires orders of magnitude less memory and computation time than previously, which enables us to apply it to large-scale problems like optic flow, stereo with occlusion detection, and segmentation into a very large number of regions. Despite the drastic gain in performance, we do not arrive at less accurate solutions than the original relaxation. Using the novel method, we can for the first time efficiently compute solutions to the optic flow functional which are within provable bounds of typically 5% of the global optimum.

Bastian Goldluecke, Daniel Cremers

Graph Cut Based Inference with Co-occurrence Statistics

Markov and Conditional random fields (

crf

s) used in computer vision typically model only local interactions between variables, as this is computationally tractable. In this paper we consider a class of global potentials defined over all variables in the

crf

. We show how they can be readily optimised using standard graph cut algorithms at little extra expense compared to a standard pairwise field.

This result can be directly used for the problem of

class based image segmentation

which has seen increasing recent interest within computer vision. Here the aim is to assign a label to each pixel of a given image from a set of possible object classes. Typically these methods use random fields to model local interactions between pixels or super-pixels. One of the cues that helps recognition is global

object co-occurrence statistics

, a measure of which classes (such as chair or motorbike) are likely to occur in the same image together. There have been several approaches proposed to exploit this property, but all of them suffer from different limitations and typically carry a high computational cost, preventing their application on large images. We find that the new model we propose produces an improvement in the labelling compared to just using a pairwise model.

Lubor Ladicky, Chris Russell, Pushmeet Kohli, Philip H. S. Torr

Ambrosio-Tortorelli Segmentation of Stochastic Images

We present an extension of the classical Ambrosio-Tortorelli approximation of the Mumford-Shah approach for the segmentation of images with uncertain gray values resulting from measurement errors and noise. Our approach yields a reliable precision estimate for the segmentation result, and it allows to quantify the robustness of edges in noisy images and under gray value uncertainty. We develop an ansatz space for such images by identifying gray values with random variables. The use of these stochastic images in the minimization of energies of Ambrosio-Tortorelli type leads to stochastic partial differential equations for the stochastic smoothed image and a stochastic phase field for the edge set. For their discretization we utilize the generalized polynomial chaos expansion and the generalized spectral decomposition (GSD) method. We demonstrate the performance of the method on artificial data as well as real medical ultrasound data.

Torben Pätz, Tobias Preusser

Multiple Hypothesis Video Segmentation from Superpixel Flows

Multiple Hypothesis Video Segmentation (MHVS) is a method for the unsupervised photometric segmentation of video sequences. MHVS segments arbitrarily long video streams by considering only a few frames at a time, and handles the automatic creation, continuation and termination of labels with no user initialization or supervision. The process begins by generating several pre-segmentations per frame and enumerating multiple possible trajectories of pixel regions within a short time window. After assigning each trajectory a score, we let the trajectories compete with each other to segment the sequence. We determine the solution of this segmentation problem as the MAP labeling of a higher-order random field. This framework allows MHVS to achieve spatial and temporal long-range label consistency while operating in an on-line manner. We test MHVS on several videos of natural scenes with arbitrary camera and object motion.

Amelio Vazquez-Reina, Shai Avidan, Hanspeter Pfister, Eric Miller

Object Segmentation by Long Term Analysis of Point Trajectories

Unsupervised learning requires a grouping step that defines which data belong together. A natural way of grouping in images is the segmentation of objects or parts of objects. While pure bottom-up segmentation from static cues is well known to be ambiguous at the object level, the story changes as soon as objects move. In this paper, we present a method that uses long term point trajectories based on dense optical flow. Defining pair-wise distances between these trajectories allows to cluster them, which results in temporally consistent segmentations of moving objects in a video shot. In contrast to multi-body factorization, points and even whole objects may appear or disappear during the shot. We provide a benchmark dataset and an evaluation method for this so far uncovered setting.

Thomas Brox, Jitendra Malik

Spotlights and Posters R1

Exploiting Repetitive Object Patterns for Model Compression and Completion

Many man-made and natural structures consist of similar elements arranged in regular patterns. In this paper we present an unsupervised approach for discovering and reasoning on repetitive patterns of objects in a single image. We propose an unsupervised detection technique based on a voting scheme of image descriptors. We then introduce the concept of

latticelets

: minimal sets of arcs that generalize the connectivity of repetitive patterns. Latticelets are used for building polygonal cycles where the smallest cycles define the sought groups of repetitive elements. The proposed method can be used for pattern prediction and completion and high-level image compression. Conditional Random Fields are used as a formalism to predict the location of elements at places where they are partially occluded or detected with very low confidence. Model compression is achieved by extracting and efficiently representing the repetitive structures in the image. Our method has been tested on simulated and real data and the quantitative and qualitative result show the effectiveness of the approach.

Luciano Spinello, Rudolph Triebel, Dizan Vasquez, Kai O. Arras, Roland Siegwart

Feature Tracking for Wide-Baseline Image Retrieval

We address the problem of large scale image retrieval in a wide-baseline setting, where for any query image all the matching database images will come from very different viewpoints. In such settings traditional bag-of-visual-words approaches are not equipped to handle the significant feature descriptor transformations that occur under large camera motions. In this paper we present a novel approach that includes an offline step of feature matching which allows us to observe how local descriptors transform under large camera motions. These observations are encoded in a graph in the quantized feature space. This graph can be used directly within a soft-assignment feature quantization scheme for image retrieval.

Ameesh Makadia

Crowd Detection with a Multiview Sampler

We present a Bayesian approach for simultaneously estimating the number of people in a crowd and their spatial locations by sampling from a posterior distribution over crowd configurations. Although this framework can be naturally extended from single to multiview detection, we show that the naive extension leads to an inefficient sampler that is easily trapped in local modes. We therefore develop a set of novel proposals that leverage multiview geometry to propose global moves that jump more efficiently between modes of the posterior distribution. We also develop a statistical model of crowd configurations that can handle dependencies among people and while not requiring discretization of their spatial locations. We quantitatively evaluate our algorithm on a publicly available benchmark dataset with different crowd densities and environmental conditions, and show that our approach outperforms other state-of-the-art methods for detecting and counting people in crowds.

Weina Ge, Robert T. Collins

A Unified Contour-Pixel Model for Figure-Ground Segmentation

The goal of this paper is to provide an accurate pixel-level segmentation of a deformable foreground object in an image. We combine state-of-the-art local image segmentation techniques with a global object-specific contour model to form a coherent energy function over the outline of the object and the pixels inside it. The energy function includes terms from a variant of the TextonBoost method, which labels each pixel as either foreground or background. It also includes terms over landmark points from a LOOPS model [1], which combines global object shape with landmark-specific detectors. We allow the pixel-level segmentation and object outline to inform each other through energy potentials so that they form a coherent object segmentation with globally consistent shape and appearance. We introduce an inference method to optimize this energy that proposes moves within the complex energy space based on multiple initial oversegmentations of the entire image. We show that this method achieves state-of-the-art results in precisely segmenting articulated objects in cluttered natural scenes.

Ben Packer, Stephen Gould, Daphne Koller

SuperParsing: Scalable Nonparametric Image Parsing with Superpixels

This paper presents a simple and effective nonparametric approach to the problem of image parsing, or labeling image regions (in our case, superpixels produced by bottom-up segmentation) with their categories. This approach requires no training, and it can easily scale to datasets with tens of thousands of images and hundreds of labels. It works by scene-level matching with global image descriptors, followed by superpixel-level matching with local features and efficient Markov random field (MRF) optimization for incorporating neighborhood context. Our MRF setup can also compute a simultaneous labeling of image regions into semantic classes (e.g., tree, building, car) and geometric classes (sky, vertical, ground). Our system outperforms the state-of-the-art nonparametric method based on SIFT Flow on a dataset of 2,688 images and 33 labels. In addition, we report per-pixel rates on a larger dataset of 15,150 images and 170 labels. To our knowledge, this is the first complete evaluation of image parsing on a dataset of this size, and it establishes a new benchmark for the problem.

Joseph Tighe, Svetlana Lazebnik

Segmenting Salient Objects from Images and Videos

In this paper we introduce a new salient object segmentation method, which is based on combining a saliency measure with a conditional random field (CRF) model. The proposed saliency measure is formulated using a statistical framework and local feature contrast in illumination, color, and motion information. The resulting saliency map is then used in a CRF model to define an energy minimization based segmentation approach, which aims to recover well-defined salient objects. The method is efficiently implemented by using the integral histogram approach and graph cut solvers. Compared to previous approaches the introduced method is among the few which are applicable to both still images and videos including motion cues. The experiments show that our approach outperforms the current state-of-the-art methods in both qualitative and quantitative terms.

Esa Rahtu, Juho Kannala, Mikko Salo, Janne Heikkilä

ClassCut for Unsupervised Class Segmentation

We propose a novel method for unsupervised class segmentation on a set of images. It alternates between segmenting object instances and learning a class model. The method is based on a segmentation energy defined over all images at the same time, which can be optimized efficiently by techniques used before in interactive segmentation. Over iterations, our method progressively learns a class model by integrating observations over all images. In addition to appearance, this model captures the location and shape of the class with respect to an automatically determined coordinate frame common across images. This frame allows us to build stronger shape and location models, similar to those used in object class detection. Our method is inspired by interactive segmentation methods [1], but it is fully automatic and learns models characteristic for the object class rather than specific to one particular object/image. We experimentally demonstrate on the Caltech4, Caltech101, and Weizmann horses datasets that our method (a) transfers class knowledge across images and this improves results compared to segmenting every image independently; (b) outperforms Grabcut [1] for the task of unsupervised segmentation; (c) offers competitive performance compared to the state-of-the-art in unsupervised segmentation and in particular it outperforms the topic model [2].

Bogdan Alexe, Thomas Deselaers, Vittorio Ferrari

A Dynamic Programming Approach to Reconstructing Building Interiors

A number of recent papers have investigated reconstruction under Manhattan world assumption, in which surfaces in the world are assumed to be aligned with one of three dominant directions [1,2,3,4]. In this paper we present a dynamic programming solution to the reconstruction problem for “indoor” Manhattan worlds (a sub–class of Manhattan worlds). Our algorithm deterministically finds the global optimum and exhibits computational complexity linear in both model complexity and image size. This is an important improvement over previous methods that were either approximate [3] or exponential in model complexity [4]. We present results for a new dataset containing several hundred manually annotated images, which are released in conjunction with this paper.

Alex Flint, Christopher Mei, David Murray, Ian Reid

Discriminative Mixture-of-Templates for Viewpoint Classification

Object viewpoint classification aims at predicting an approximate 3D pose of objects in a scene and is receiving increasing attention. State-of-the-art approaches to viewpoint classification use generative models to capture relations between object parts. In this work we propose to use a mixture of holistic templates (e.g. HOG) and discriminative learning for joint viewpoint classification and category detection. Inspired by the work of Felzenszwalb et al 2009, we discriminatively train multiple components simultaneously for each object category. A large number of components are learned in the mixture and they are associated with canonical viewpoints of the object through different levels of supervision, being fully supervised, semi-supervised, or unsupervised. We show that discriminative learning is capable of producing mixture components that directly provide robust viewpoint classification, significantly outperforming the state of the art: we improve the viewpoint accuracy on the Savarese et al 3D Object database from 57% to 74%, and that on the VOC 2006 car database from 73% to 86%. In addition, the mixture-of-templates approach to object viewpoint/pose has a natural extension to the continuous case by discriminatively learning a linear appearance model locally at each discrete view. We evaluate continuous viewpoint estimation on a dataset of everyday objects collected using IMUs for groundtruth annotation: our mixture model shows great promise comparing to a number of baselines including discrete nearest neighbor and linear regression.

Chunhui Gu, Xiaofeng Ren

Efficient Non-consecutive Feature Tracking for Structure-from-Motion

Structure-from-motion (SfM) is an important computer vision problem and largely relies on the quality of feature tracking. In image sequences, if disjointed tracks caused by objects moving in and out of the view, occasional occlusion, or image noise, are not handled well, the corresponding SfM could be significantly affected. In this paper, we address the non-consecutive feature point tracking problem and propose an effective method to match interrupted tracks. Our framework consists of steps of solving the feature ‘dropout’ problem when indistinctive structures, noise or even large image distortion exist, and of rapidly recognizing and joining common features located in different subsequences. Experimental results on several challenging and large-scale video sets show that our method notably improves SfM.

Guofeng Zhang, Zilong Dong, Jiaya Jia, Tien-Tsin Wong, Hujun Bao

P2Π: A Minimal Solution for Registration of 3D Points to 3D Planes

This paper presents a class of minimal solutions for the 3D-to-3D registration problem in which the sensor data are 3D points and the corresponding object data are 3D planes. In order to compute the 6 degrees-of-freedom transformation between the sensor and the object, we need at least six points on three or more planes. We systematically investigate and develop pose estimation algorithms for several configurations, including all minimal configurations, that arise from the distribution of points on planes. The degenerate configurations are also identified. We point out that many existing and unsolved 2D-to-3D and 3D-to-3D pose estimation algorithms involving points, lines, and planes can be transformed into the problem of registering points to planes. In addition to simulations, we also demonstrate the algorithm’s effectiveness in two real-world applications: registration of a robotic arm with an object using a contact sensor, and registration of 3D point clouds that were obtained using multi-view reconstruction of planar city models.

Srikumar Ramalingam, Yuichi Taguchi, Tim K. Marks, Oncel Tuzel

Boosting Chamfer Matching by Learning Chamfer Distance Normalization

We propose a novel technique that significantly improves the performance of oriented chamfer matching on images with cluttered background. Different to other matching methods, which only measures how well a template fits to an edge map, we evaluate the score of the template in comparison to auxiliary contours, which we call normalizers. We utilize AdaBoost to learn a Normalized Oriented Chamfer Distance (NOCD). Our experimental results demonstrate that it boosts the detection rate of the oriented chamfer distance. The simplicity and ease of training of NOCD on a small number of training samples promise that it can replace chamfer distance and oriented chamfer distance in any template matching application.

Tianyang Ma, Xingwei Yang, Longin Jan Latecki

Geometry Construction from Caustic Images

In this work we investigate an inverse geometry problem. Given a light source, a diffuse plane and a caustic image, how must a geometric object look like (transmissive or reflective) in oder to project the desired caustic onto the diffuse plane when lit by the light source? In order to construct the geometry we apply an analysis-by-synthesis approach, exploiting the GPU to accelerate caustic rendering based on the current geometry estimate. The optimization is driven by simultaneous perturbation stochastic approximation (SPSA). We confirm that this algorithm converges to the global minimum with high probability even in this ill-posed setting. We demonstrate results for precise geometry reconstruction given a caustic image and for reflector design producing an intended light distribution.

Manuel Finckh, Holger Dammertz, Hendrik P. A. Lensch

Archive Film Restoration Based on Spatiotemporal Random Walks

We propose a novel restoration method for defects and missing regions in video sequences, particularly in application to archive film restoration. Our statistical framework is based on random walks to examine the spatiotemporal path of a degraded pixel, and uses texture features in addition to intensity and motion information traditionally used in previous restoration works. The degraded pixels within a frame are restored in a multiscale framework by updating their features (intensity, motion and texture) at each level with reference to the attributes of normal pixels and other defective pixels in the previous scale as long as they fall within the defective pixel’s random walk-based spatiotemporal neighbourhood. The proposed algorithm is compared against two state-of-the-art methods to demonstrate improved accuracy in restoring synthetic and real degraded image sequences.

Xiaosong Wang, Majid Mirmehdi

Reweighted Random Walks for Graph Matching

Graph matching is an essential problem in computer vision and machine learning. In this paper, we introduce a random walk view on the problem and propose a robust graph matching algorithm against outliers and deformation. Matching between two graphs is formulated as node selection on an association graph whose nodes represent candidate correspondences between the two graphs. The solution is obtained by simulating random walks with reweighting jumps enforcing the matching constraints on the association graph. Our algorithm achieves noise-robust graph matching by iteratively updating and exploiting the confidences of candidate correspondences. In a practical sense, our work is of particular importance since the real-world matching problem is made difficult by the presence of noise and outliers. Extensive and comparative experiments demonstrate that it outperforms the state-of-the-art graph matching algorithms especially in the presence of outliers and deformation.

Minsu Cho, Jungmin Lee, Kyoung Mu Lee

Rotation Invariant Non-rigid Shape Matching in Cluttered Scenes

This paper presents a novel and efficient method for locating deformable shapes in cluttered scenes. The shapes to be detected may undergo arbitrary translational and rotational changes, and they can be non-rigidly deformed, occluded and corrupted by clutters. All these problems make the accurate and robust shape matching very difficult. By using a new shape representation, which involves a powerful feature descriptor, the proposed method can overcome the above difficulties successfully, and it possesses the property of global optimality. The experiments on both synthetic and real data validated that the proposed algorithm is robust to various types of disturbances. It can robustly detect the desired shapes in complex and highly cluttered scenes.

Wei Lian, Lei Zhang

Loosely Distinctive Features for Robust Surface Alignment

Many successful feature detectors and descriptors exist for 2D intensity images. However, obtaining the same effectiveness in the domain of 3D objects has proven to be a more elusive goal. In fact, the smoothness often found in surfaces and the lack of texture information on the range images produced by conventional 3D scanners hinder both the localization of interesting points and the distinctiveness of their characterization in terms of descriptors. To overcome these limitations several approaches have been suggested, ranging from the simple enlargement of the area over which the descriptors are computed to the reliance on external texture information. In this paper we offer a change in perspective, where a game-theoretic matching technique that exploits global geometric consistency allows to obtain an extremely robust surface registration even when coupled with simple surface features exhibiting very low distinctiveness. In order to assess the performance of the whole approach we compare it with state-of-the-art alignment pipelines. Furthermore, we show that using the novel feature points with well-known alternative non-global matching techniques leads to poorer results.

Andrea Albarelli, Emanuele Rodolà, Andrea Torsello

Accelerated Hypothesis Generation for Multi-structure Robust Fitting

Random hypothesis generation underpins many geometric model fitting techniques. Unfortunately it is also computationally expensive. We propose a fundamentally new approach to accelerate hypothesis sampling by guiding it with information derived from residual sorting. We show that residual sorting innately encodes the probability of two points to have arisen from the same model and is obtained without recourse to domain knowledge (

e.g.

keypoint matching scores) typically used in previous sampling enhancement methods. More crucially our approach is naturally capable of handling data with multiple model instances and excels in applications (

e.g.

multi-homography fitting) which easily frustrate other techniques. Experiments show that our method provides superior efficiency on various geometric model estimation tasks. Implementation of our algorithm is available on the authors’ homepage.

Tat-Jun Chin, Jin Yu, David Suter

Aligning Spatio-Temporal Signals on a Special Manifold

We investigate the spatio-temporal alignment of videos or features/signals extracted from them. Specifically, we formally define an

alignment manifold

and formulate the alignment problem as an optimization procedure on this non-linear space by exploiting its intrinsic geometry. We focus our attention on semantically meaningful videos or signals,

e.g.

, those describing or capturing human motion or activities, and propose a new formalism for temporal alignment accounting for executing rate variations among realizations of the same video event. By construction, we address this static and deterministic alignment task in a dynamic and stochastic manner: we regard the search for optimal alignment parameters as a recursive state estimation problem for a particular dynamic system evolving on the alignment manifold. Consequently, a Sequential Importance Sampling iteration on the alignment manifold is designed for effective and efficient alignment. We demonstrate the performance on several types of input data that arise in vision problems.

Ruonan Li, Rama Chellappa

Supervised Label Transfer for Semantic Segmentation of Street Scenes

In this paper, we propose a robust supervised label transfer method for the semantic segmentation of street scenes. Given an input image of street scene, we first find multiple image sets from the training database consisting of images with annotation, each of which can cover all semantic categories in the input image. Then, we establish dense correspondence between the input image and each found image sets with a proposed KNN-MRF matching scheme. It is followed by a matching correspondences classification that tries to reduce the number of semantically incorrect correspondences with trained matching correspondences classification models for different categories. With those matching correspondences classified as semantically correct correspondences, we infer the confidence values of each super pixel belonging to different semantic categories, and integrate them and spatial smoothness constraint in a markov random field to segment the input image. Experiments on three datasets show our method outperforms the traditional learning based methods and the previous nonparametric label transfer method, for the semantic segmentation of street scenes.

Honghui Zhang, Jianxiong Xiao, Long Quan

Category Independent Object Proposals

We propose a category-independent method to produce a bag of regions and rank them, such that top-ranked regions are likely to be good segmentations of different objects. Our key objectives are completeness and diversity: every object should have at least one good proposed region, and a diverse set should be top-ranked. Our approach is to generate a set of segmentations by performing graph cuts based on a seed region and a learned affinity function. Then, the regions are ranked using structured learning based on various cues. Our experiments on BSDS and PASCAL VOC 2008 demonstrate our ability to find most objects within a small bag of proposed regions.

Ian Endres, Derek Hoiem

Photo-Consistent Planar Patches from Unstructured Cloud of Points

Planar patches are a very compact and stable intermediate representation of 3D scenes, as they are a good starting point for a complete automatic reconstruction of surfaces. This paper presents a novel method for extracting planar patches from an unstructured cloud of points that is produced by a typical structure and motion pipeline. The method integrates several constraints inside J-linkage, a robust algorithm for multiple models fitting. It makes use of information coming both from the 3D structure and the images. Several results show the effectiveness of the proposed approach.

Roberto Toldo, Andrea Fusiello

Contour Grouping and Abstraction Using Simple Part Models

We address the problem of contour-based perceptual grouping using a user-defined vocabulary of simple part models. We train a family of classifiers on the vocabulary, and apply them to a region oversegmentation of the input image to detect closed contours that are consistent with some shape in the vocabulary. Given such a set of consistent cycles, they are both abstracted and categorized through a novel application of an active shape model also trained on the vocabulary. From an image of a real object, our framework recovers the projections of the abstract surfaces that comprise an idealized model of the object. We evaluate our framework on a newly constructed dataset annotated with a set of ground truth abstract surfaces.

Pablo Sala, Sven Dickinson

Dynamic Color Flow: A Motion-Adaptive Color Model for Object Segmentation in Video

Accurately modeling object colors, and features in general, plays a critical role in video segmentation and analysis. Commonly used color models, such as global Gaussian mixtures, localized Gaussian mixtures, and pixel-wise adaptive ones, often fail to accurately represent the object appearance in complicated scenes, thereby leading to segmentation errors. We introduce a new color model,

Dynamic Color Flow

, which unlike previous approaches, incorporates motion estimation into color modeling in a probabilistic framework, and adaptively changes model parameters to match the local properties of the motion. The proposed model accurately and reliably describes changes in the scene’s appearance caused by motion across frames. We show how to apply this color model to both foreground and background layers in a balanced way for efficient object segmentation in video. Experimental results show that when compared with previous approaches, our model provides more accurate foreground and background estimations, leading to more efficient video object cutout systems.

Xue Bai, Jue Wang, Guillermo Sapiro

What Is the Chance of Happening: A New Way to Predict Where People Look

Visual attention is an important issue in image and video analysis and keeps being an open problem in the computer vision field. Motivated by the famous Helmholtz principle, a new approach of visual attention analysis is proposed in this paper based on the low level feature statistics of natural images and the Bayesian framework. Firstly, two priors, i.e., Surrounding Feature Prior (

SFP

) and Single Feature Probability Distribution (

SFPD

) are learned and integrated by a Bayesian framework to compute the chance of happening (

CoH

) of each pixel in an image. Then another prior, i.e., Center Bias Prior (

CBP

), is learned and applied to the

CoH

to compute the saliency map of the image. The experimental results demonstrate that the proposed approach is both effective and efficient by providing more accurate and quick visual attention location. We make three major contributions in this paper: (1) A set of simple but powerful priors,

SFP

SFPD

and

CBP

, are presented in an intuitive way; (2) A computational model of

CoH

based on Bayesian framework is given to integrate

SFP

and

SFPD

together; (3) A computationally plausible way to obtain the saliency map of natural images based on

CoH

and

CBP

Yezhou Yang, Mingli Song, Na Li, Jiajun Bu, Chun Chen

Supervised and Unsupervised Clustering with Probabilistic Shift

We present a novel scale adaptive, nonparametric approach to clustering point patterns. Clusters are detected by moving all points to their cluster cores using shift vectors. First, we propose a novel scale selection criterion based on local density isotropy which determines the neighborhoods over which the shift vectors are computed. We then construct a directed graph induced by these shift vectors. Clustering is obtained by simulating random walks on this digraph. We also examine the spectral properties of a similarity matrix obtained from the directed graph to obtain a K-way partitioning of the data. Additionally, we use the eigenvector alignment algorithm of [1] to automatically determine the number of clusters in the dataset. We also compare our approach with supervised[2] and completely unsupervised spectral clustering[1], normalized cuts[3], K-Means, and adaptive bandwidth meanshift[4] on MNIST digits, USPS digits and UCI machine learning data.

Sanketh Shetty, Narendra Ahuja

Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery

Detecting objects, estimating their pose and recovering 3D shape information are critical problems in many vision and robotics applications. This paper addresses the above needs by proposing a new method called DEHV - Depth-Encoded Hough Voting detection scheme. Inspired by the Hough voting scheme introduced in [13], DEHV incorporates depth information into the process of learning distributions of image features (patches) representing an object category. DEHV takes advantage of the interplay between the scale of each object patch in the image and its distance (depth) from the corresponding physical patch attached to the 3D object. DEHV jointly detects objects, infers their categories, estimates their pose, and infers/decodes objects depth maps from either a single image (when no depth maps are available in testing) or a single image augmented with depth map (when this is available in testing). Extensive quantitative and qualitative experimental analysis on existing datasets [6,9,22] and a newly proposed 3D table-top object category dataset shows that our DEHV scheme obtains competitive detection and pose estimation results as well as convincing 3D shape reconstruction from just one single uncalibrated image. Finally, we demonstrate that our technique can be successfully employed as a key building block in two application scenarios (highly accurate 6 degrees of freedom (6 DOF) pose estimation and 3D object modeling).

Min Sun, Gary Bradski, Bing-Xin Xu, Silvio Savarese

Shape Analysis of Planar Objects with Arbitrary Topologies Using Conformal Geometry

The study of 2D shapes is a central problem in the field of computer vision. In 2D shape analysis, classification and recognition of objects from their observed silhouettes are extremely crucial and yet difficult. It usually involves an efficient representation of 2D shape space with natural metric, so that its mathematical structure can be used for further analysis. Although significant progress has been made for the study of 2D simply-connected shapes, very few works have been done on the study of 2D objects with

arbitrary topologies

. In this work, we propose a representation of general 2D domains with arbitrary topologies using

conformal geometry

. A natural metric can be defined on the proposed representation space, which gives a metric to measure dissimilarities between objects. The main idea is to map the exterior and interior of the domain conformally to unit disks and circle domains, using holomorphic 1-forms. A set of diffeomorphisms from the unit circle

$\mathbb{S}^1$

to itself can be obtained, which together with the conformal modules are used to define the shape signature. We prove mathematically that our proposed signature uniquely represents shapes with arbitrary topologies. We also introduce a reconstruction algorithm to obtain shapes from their signatures. This completes our framework and allows us to move back and forth between shapes and signatures. Experiments show the efficacy of our proposed algorithm as a stable shape representation scheme.

Lok Ming Lui, Wei Zeng, Shing-Tung Yau, Xianfeng Gu

A Coarse-to-Fine Taxonomy of Constellations for Fast Multi-class Object Detection

In order for recognition systems to scale to a larger number of object categories building visual class taxonomies is important to achieve running times logarithmic in the number of classes [1,2]. In this paper we propose a novel approach for speeding up recognition times of multi-class part-based object representations. The main idea is to construct a taxonomy of constellation models cascaded from coarse-to-fine resolution and use it in recognition with an efficient search strategy. The taxonomy is built

automatically

in a way to minimize the number of expected computations during recognition by optimizing the cost-to-power ratio [3]. The structure and the depth of the taxonomy is not pre-determined but is inferred from the data. The approach is utilized on the hierarchy-of-parts model [4] achieving efficiency in both, the representation of the structure of objects as well as in the number of modeled object classes. We achieve speed-up even for a small number of object classes on the ETHZ and TUD dataset. On a larger scale, our approach achieves detection time that is logarithmic in the number of classes.

Sanja Fidler, Marko Boben, Aleš Leonardis

Object Classification Using Heterogeneous Co-occurrence Features

Co-occurrence features are effective for object classification because observing co-occurrence of two events is far more informative than observing occurrence of each event separately. For example, a color co-occurrence histogram captures co-occurrence of pairs of colors at a given distance while a color histogram just expresses frequency of each color. As one of such co-occurrence features, CoHOG (co-occurrence histograms of oriented gradients) has been proposed and a method using CoHOG with a linear classifier has shown a comparable performance with state-of-the-art pedestrian detection methods. According to recent studies, it has been suggested that combining heterogeneous features such as texture, shape, and color is useful for object classification. Therefore, we introduce three heterogeneous features based on co-occurrence called color-CoHOG, CoHED, and CoHD, respectively. Each heterogeneous features are evaluated on the INRIA person dataset and the Oxford 17/102 category flower datasets. The experimental results show that color-CoHOG is effective for the INRIA person dataset and CoHED is effective for the Oxford flower datasets. By combining above heterogeneous features, the proposed method achieves comparable classification performance to state-of-the-art methods on the above datasets. The results suggest that the proposed method using heterogeneous features can be used as an off-the-shelf method for various object classification tasks.

Satoshi Ito, Susumu Kubota

Converting Level Set Gradients to Shape Gradients

The level set representation of shapes is useful for shape evolution and is widely used for the minimization of energies with respect to shapes. Many algorithms consider energies depending explicitly on the signed distance function (SDF) associated with a shape, and differentiate these energies with respect to the SDF directly in order to make the level set representation evolve. This framework is known as the “variational level set method”. We show that this gradient computation is actually mathematically incorrect, and can lead to undesirable performance in practice. Instead, we derive the expression of the gradient with respect to the shape, and show that it can be easily computed from the gradient of the energy with respect to the SDF. We discuss some problematic gradients from the literature, show how they can easily be fixed, and provide experimental comparisons illustrating the improvement.

Siqi Chen, Guillaume Charpiat, Richard J. Radke

A Close-Form Iterative Algorithm for Depth Inferring from a Single Image

Inferring depth from a single image is a difficult task in computer vision, which needs to utilize adequate monocular cues contained in the image. Inspired by Saxena et al’s work, this paper presents a close-form iterative algorithm to process multi-scale image segmentation and depth inferring alternately, which can significantly improve segmentation and depth estimate results. First, an EM-based algorithm is applied to obtain an initial multi-scale image segmentation result. Then, the multi-scale Markov random field (MRF) model, trained by supervised learning, is used to infer both depths and the relations between depths at different image regions. Next, a graph-based region merging algorithm is applied to merge the segmentations at the larger scales by incorporating the inferred depths. At the last, the refined multi-scale image segmentations are used as input of MRF model and the depth are re-inferred. The above processes are iteratively continued until the expected results are achieved. Since there are no changes on the segmentations at the finest scale in the iterative process, it still can capture the detailed 3D structure. Meanwhile, the refined segmentations at the other scales will help obtain more global structure information in the image. The contrastive experimental results verify the validity of our method that it can infer quantitatively better depth estimations for 62.7% of 134 images downloaded from the Saxena’s database. Our method can also improve the image segmentation results in the sense of scene interpretation. Moreover, the paper extends the method to estimate the depth of the scene with fore-objects.

Yang Cao, Yan Xia, Zengfu Wang

Learning Shape Segmentation Using Constrained Spectral Clustering and Probabilistic Label Transfer

We propose a spectral learning approach to shape segmentation. The method is composed of a

constrained spectral clustering

algorithm that is used to supervise the segmentation of a shape from a training data set, followed by a

probabilistic label transfer

algorithm that is used to match two shapes and to transfer cluster labels from a training-shape to a test-shape. The novelty resides both in the use of the Laplacian embedding to propagate must-link and cannot-link constraints, and in the segmentation algorithm which is based on a learn, align, transfer, and classify paradigm. We compare the results obtained with our method with other constrained spectral clustering methods and we assess its performance based on ground-truth data.

Avinash Sharma, Etienne von Lavante, Radu Horaud

Weakly Supervised Shape Based Object Detection with Particle Filter

We describe an efficient approach to construct shape models composed of contour parts with partially-supervised learning. The proposed approach can easily transfer parts structure to different object classes as long as they have similar shape. The spatial layout between parts is described by a non-parametric density, which is more flexible and easier to learn than commonly used Gaussian or other parametric distributions. We express object detection as state estimation inference executed using a novel Particle Filters (PF) framework with static observations, which is quite different from previous PF methods. Although the underlying graph structure of our model is given by a fully connected graph, the proposed PF algorithm efficiently linearizes it by exploring the conditional dependencies of the nodes representing contour parts. Experimental results demonstrate that the proposed approach can not only yield very good detection results but also accurately locates contours of target objects in cluttered images.

Xingwei Yang, Longin Jan Latecki

Geodesic Shape Retrieval via Optimal Mass Transport

This paper presents a new method for 2-D and 3-D shape retrieval based on geodesic signatures. These signatures are high dimensional statistical distributions computed by extracting several features from the set of geodesic distance maps to each point. The resulting high dimensional distributions are matched to perform retrieval using a fast approximate Wasserstein metric. This allows to propose a unifying framework for the compact description of planar shapes and 3-D surfaces.

Julien Rabin, Gabriel Peyré, Laurent D. Cohen

Spotlights and Posters R2

Image Segmentation with Topic Random Field

Recently, there has been increasing interests in applying aspect models (e.g., PLSA and LDA) in image segmentation. However, these models ignore spatial relationships among local topic labels in an image and suffers from information loss by representing image feature using the index of its closest match in the codebook. In this paper, we propose Topic Random Field (TRF) to tackle these two problems. Specifically, TRF defines a Markov Random Field over hidden labels of an image, to enforce the spatial coherence between topic labels for neighboring regions. Moreover, TRF utilizes a noise channel to model the generation of local image features, and avoids the off-line process of building visual codebook. We provide details of variational inference and parameter learning for TRF. Experimental evaluations on three image data sets show that TRF achieves better segmentation performance.

Bin Zhao, Li Fei-Fei, Eric P. Xing

Backmatter

Title: Computer Vision – ECCV 2010
Editors: Kostas Daniilidis
Petros Maragos
Nikos Paragios
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-15555-0
Print ISBN: 978-3-642-15554-3
DOI: https://doi.org/10.1007/978-3-642-15555-0

Springer Professional

About this book

Table of Contents

Frontmatter

Spotlights and Posters W2

Towards Computational Models of the Visual Aesthetic Appeal of Consumer Videos

Object Recognition Using Junctions

Using Partial Edge Contour Matches for Efficient Object Category Localization

Active Mask Hierarchies for Object Detection

From a Set of Shapes to Object Discovery

What Does Classifying More Than 10,000 Image Categories Tell Us?

Modeling and Analysis of Dynamic Behaviors of Web Image Collections

Non-local Characterization of Scenery Images: Statistics, 3D Reasoning, and a Generative Model

Efficient Highly Over-Complete Sparse Coding Using a Mixture Model

Attribute-Based Transfer Learning for Object Categorization with Zero/One Training Example

Image Classification Using Super-Vector Coding of Local Image Descriptors

A Discriminative Latent Model of Object Classes and Attributes

Seeing People in Social Context: Recognizing People and Social Relationships

Discovering Multipart Appearance Models from Captioned Images

Voting by Grouping Dependent Parts

Superpixels and Supervoxels in an Energy Optimization Framework

Segmentation

Convex Relaxation for Multilabel Problems with Product Label Spaces

Graph Cut Based Inference with Co-occurrence Statistics

Ambrosio-Tortorelli Segmentation of Stochastic Images

Multiple Hypothesis Video Segmentation from Superpixel Flows

Object Segmentation by Long Term Analysis of Point Trajectories

Spotlights and Posters R1

Exploiting Repetitive Object Patterns for Model Compression and Completion

Feature Tracking for Wide-Baseline Image Retrieval

Crowd Detection with a Multiview Sampler

A Unified Contour-Pixel Model for Figure-Ground Segmentation

SuperParsing: Scalable Nonparametric Image Parsing with Superpixels

Segmenting Salient Objects from Images and Videos

ClassCut for Unsupervised Class Segmentation

A Dynamic Programming Approach to Reconstructing Building Interiors

Discriminative Mixture-of-Templates for Viewpoint Classification

Efficient Non-consecutive Feature Tracking for Structure-from-Motion

P2Π: A Minimal Solution for Registration of 3D Points to 3D Planes

Boosting Chamfer Matching by Learning Chamfer Distance Normalization

Geometry Construction from Caustic Images

Archive Film Restoration Based on Spatiotemporal Random Walks

Reweighted Random Walks for Graph Matching

Rotation Invariant Non-rigid Shape Matching in Cluttered Scenes

Loosely Distinctive Features for Robust Surface Alignment

Accelerated Hypothesis Generation for Multi-structure Robust Fitting

Aligning Spatio-Temporal Signals on a Special Manifold

Supervised Label Transfer for Semantic Segmentation of Street Scenes

Category Independent Object Proposals

Photo-Consistent Planar Patches from Unstructured Cloud of Points

Contour Grouping and Abstraction Using Simple Part Models

Dynamic Color Flow: A Motion-Adaptive Color Model for Object Segmentation in Video

What Is the Chance of Happening: A New Way to Predict Where People Look

Supervised and Unsupervised Clustering with Probabilistic Shift

Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery

Shape Analysis of Planar Objects with Arbitrary Topologies Using Conformal Geometry

A Coarse-to-Fine Taxonomy of Constellations for Fast Multi-class Object Detection

Object Classification Using Heterogeneous Co-occurrence Features

Converting Level Set Gradients to Shape Gradients

A Close-Form Iterative Algorithm for Depth Inferring from a Single Image

Learning Shape Segmentation Using Constrained Spectral Clustering and Probabilistic Label Transfer

Weakly Supervised Shape Based Object Detection with Particle Filter

Geodesic Shape Retrieval via Optimal Mass Transport

Spotlights and Posters R2

Image Segmentation with Topic Random Field

Backmatter

Premium Partner