Skip to main content

2017 | Buch

Computer Vision – ACCV 2016

13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part I

insite
SUCHEN

Über dieses Buch

The five-volume set LNCS 10111-10115 constitutes the thoroughly refereed post-conference proceedings of the 13th Asian Conference on Computer Vision, ACCV 2016, held in Taipei, Taiwan, in November 2016.

The total of 143 contributions presented in these volumes was carefully reviewed and selected from 479 submissions. The papers are organized in topical sections on Segmentation and Classification; Segmentation and Semantic Segmentation; Dictionary Learning, Retrieval, and Clustering; Deep Learning; People Tracking and Action Recognition; People and Actions; Faces; Computational Photography; Face and Gestures; Image Alignment; Computational Photography and Image Processing; Language and Video; 3D Computer Vision; Image Attributes, Language, and Recognition; Video Understanding; and 3D Vision.

Inhaltsverzeichnis

Frontmatter

Segmentation and Classification

Frontmatter
Realtime Hierarchical Clustering Based on Boundary and Surface Statistics
Abstract
Visual grouping is a key mechanism in human scene perception. There, it belongs to the subconscious, early processing and is key prerequisite for other high level tasks such as recognition. In this paper, we introduce an efficient, realtime capable algorithm which likewise agglomerates a valuable hierarchical clustering of a scene, while using purely local appearance statistics.
To speed up the processing, first we subdivide the image into meaningful, atomic segments using a fast Watershed transform. Starting from there, our rapid, agglomerative clustering algorithm prunes and maintains the connectivity graph between clusters to contain only such pairs, which directly touch in the image domain and are reciprocal nearest neighbors (RNN) wrt. a distance metric. The core of this approach is our novel cluster distance: it combines boundary and surface statistics both in terms of appearance as well as spatial linkage. This yields state-of-the-art performance, as we demonstrate in conclusive experiments conducted on BSDS500 and Pascal-Context datasets.
Dominik Alexander Klein, Dirk Schulz, Armin Bernd Cremers
Weakly-Supervised Video Scene Co-parsing
Abstract
In this paper, we propose a scene co-parsing framework to assign pixel-wise semantic labels in weakly-labeled videos, i.e., only video-level category labels are given. To exploit rich semantic information, we first collect all videos that share the same video-level labels and segment them into supervoxels. We then select representative supervoxels for each category via a supervoxel ranking process. This ranking problem is formulated with a submodular objective function and a scene-object classifier is incorporated to distinguish scenes and objects. To assign each supervoxel a semantic label, we match each supervoxel to these selected representatives in the feature domain. Each supervoxel is then associated with a series of category potentials and assigned to a semantic label with the maximum one. The proposed co-parsing framework extends scene parsing from single images to videos and exploits mutual information among a video collection. Experimental results on the Wild-8 and SUNY-24 datasets show that the proposed algorithm performs favorably against the state-of-the-art approaches.
Guangyu Zhong, Yi-Hsuan Tsai, Ming-Hsuan Yang
Supervoxel-Based Segmentation of 3D Volumetric Images
Abstract
While computer vision has made noticeable advances in the state of the art for 2D image segmentation, the same cannot be said for 3D volumetric datasets. In this work, we present a scalable approach to volumetric segmentation. The methodology, driven by supervoxel extraction, combines local and global gradient-based features together to first produce a low level supervoxel graph. Subsequently, an agglomerative approach is used to group supervoxel structures into a segmentation hierarchy with explicitly imposed containment of lower level supervoxels in higher level supervoxels. Comparisons are conducted against state of the art 3D segmentation algorithms. The considered applications are 3D spatial and 2D spatiotemporal segmentation scenarios.
Chengliang Yang, Manu Sethi, Anand Rangarajan, Sanjay Ranka
Message Passing on the Two-Layer Network for Geometric Model Fitting
Abstract
In this paper, we propose a novel model fitting method to recover multiple geometric structures from data corrupted by noises and outliers. Instead of analyzing each model hypothesis or each data point separately, the proposed method combines both the consensus information in all model hypotheses and the preference information in all data points into a two-layer network, in which the vertices in the first layer represent the data points and the vertices in the second layer represent the model hypotheses. Based on this formulation, the clusters in the second layer of the network, corresponding to the true structures, are detected by using an effective Two-Stage Message Passing (TSMP) algorithm. TSMP can not only accurately detect multiple structures in data without specifying the number of structures, but also handle data even with a large number of outliers. Experimental results on both synthetic data and real images further demonstrate the superiority of the proposed method over several state-of-the-art fitting methods.
Xing Wang, Guobao Xiao, Yan Yan, Hanzi Wang
Deep Supervised Hashing with Triplet Labels
Abstract
Hashing is one of the most popular and powerful approximate nearest neighbor search techniques for large-scale image retrieval. Most traditional hashing methods first represent images as off-the-shelf visual features and then produce hashing codes in a separate stage. However, off-the-shelf visual features may not be optimally compatible with the hash code learning procedure, which may result in sub-optimal hash codes. Recently, deep hashing methods have been proposed to simultaneously learn image features and hash codes using deep neural networks and have shown superior performance over traditional hashing methods. Most deep hashing methods are given supervised information in the form of pairwise labels or triplet labels. The current state-of-the-art deep hashing method DPSH [1], which is based on pairwise labels, performs image feature learning and hash code learning simultaneously by maximizing the likelihood of pairwise similarities. Inspired by DPSH [1], we propose a triplet label based deep hashing method which aims to maximize the likelihood of the given triplet labels. Experimental results show that our method outperforms all the baselines on CIFAR-10 and NUS-WIDE datasets, including the state-of-the-art method DPSH [1] and all the previous triplet label based deep hashing methods.
Xiaofang Wang, Yi Shi, Kris M. Kitani
Boosting Zero-Shot Image Classification via Pairwise Relationship Learning
Abstract
Zero-shot image classification (ZSIC) is one of the emerging challenges in the communities of computer vision, artificial intelligence and machine learning. In this paper, we propose to exploit the pairwise relationships between test instances to increase the performance of conventional methods, e.g. direct attribute prediction (DAP), for the ZSIC problem. To infer pairwise relationships between test instances, we introduce two different methods, a binary classification based method and a metric learning based method. Based on the inferred relationships, we construct a similarity graph to represent test instances, and then employ an adaptive graph anchors voting method to refine the results of DAP iteratively: In each iteration, we partition the similarity graph with the normalized spectral clustering method, and determine the class label of each cluster via the voting of graph anchors. Extensive experiments validate the effectiveness of our method: with the properly learned pairwise relationships, we successfully boost the mean class accuracy of DAP on two standard benchmarks for the ZSIC problem, Animal with Attribute and aPascal-aYahoo, from \(57.46\%\) to \(84.43\%\) and \(26.59\%\) to \(70.09\%\), respectively. Besides, experimental results on the SUN Attribute also suggest our method can obtain considerable performance improvement for the large-scale ZSIC problem.
Hanhui Li, Hefeng Wu, Shujin Lin, Liang Lin, Xiaonan Luo, Ebroul Izquierdo

Segmentation and Semantic Segmentation

Frontmatter
Hierarchical Supervoxel Graph for Interactive Video Object Representation and Segmentation
Abstract
In this paper, we study the problem of how to represent and segment objects in a video. To handle the motion and variations of the internal regions of objects, we present an interactive hierarchical supervoxel representation for video object segmentation. First, a hierarchical supervoxel graph with various granularities is built based on local clustering and region merging to represent the video, in which both color histogram and motion information are leveraged in the feature space, and visual saliency is also taken into account as merging guidance to build the graph. Then, a supervoxel selection algorithm is introduced to choose supervoxels with diverse granularities to represent the object(s) labeled by the user. Finally, based on above representations, an interactive video object segmentation framework is proposed to handle complex and diverse scenes with large motion and occlusions. The experimental results show the effectiveness of the proposed algorithms in supervoxel graph construction and video object segmentation.
Xiang Fu, Changhu Wang, C.-C. Jay Kuo
Learning to Generate Object Segment Proposals with Multi-modal Cues
Abstract
This paper presents a learning-based object segmentation proposal generation method for stereo images. Unlike existing methods which mostly rely on low-level appearance cue and handcrafted similarity functions to group segments, our method makes use of learned deep features and designed geometric features to represent a region, as well as a learned similarity network to guide the grouping process. Given an initial segmentation hierarchy, we sequentially merge adjacent regions in each level based on their affinity measured by the similarity network. This merging process generates new segmentation hierarchies, which are then used to produce a pool of regional proposals by taking region singletons, pairs, triplets and 4-tuples from them. In addition, we learn a ranking network that predicts the objectness score of each regional proposal and diversify the ranking based on Maximum Marginal Relevance measures. Experiments on the Cityscapes dataset show that our approach performs significantly better than the baseline and the current state-of-the-art.
Haoyang Zhang, Xuming He, Fatih Porikli
Saliency Detection via Diversity-Induced Multi-view Matrix Decomposition
Abstract
In this paper, a diversity-induced multi-view matrix decomposition model (DMMD) for salient object detection is proposed. In order to make the background cleaner, \(\mathrm {Schatten}\)-p norm with an appropriate value of p in (0,1] is used to constrain the background part. A group sparsity induced norm is imposed on the foreground (salient part) to describe potential spatial relationships of patches. And most importantly, a diversity-induced multi-view regularization based Hilbert-Schmidt Independence Criterion (HSIC), is employed to explore the complementary information of different features. The independence between the multiple features will be enhanced. The optimization problem can be solved through an augmented Lagrange multipliers method. Finally, high-level priors are merged to boom the salient regions detection. Experiments on the widely used MSRA-5000 dataset show that the DMMD model outperforms other state-of-the-art methods.
Xiaoli Sun, Zhixiang He, Xiujun Zhang, Wenbin Zou, George Baciu
Parallel Accelerated Matting Method Based on Local Learning
Abstract
To pursue effective and fast matting method is of great importance in digital image editing. This paper proposes a scheme to accelerate learning based digital matting and implement it on modern GPU in parallel, which involves learning stage and solving stage. Firstly, we present GPU-based method to accelerate the pixel-wise learning stage. Then, trimap skeleton based algorithm is proposed to divide the image into blocks and process blocks in parallel to speed up the solving stage. Experimental results demonstrated that the proposed scheme achieves a maximal 12+ speedup over previous serial methods without degrading segmentation precision.
Xiaoqiang Li, Qing Cui
Semi-supervised Domain Adaptation for Weakly Labeled Semantic Video Object Segmentation
Abstract
Deep convolutional neural networks (CNNs) have been immensely successful in many high-level computer vision tasks given large labelled datasets. However, for video semantic object segmentation, a domain where labels are scarce, effectively exploiting the representation power of CNN with limited training data remains a challenge. Simply borrowing the existing pre-trained CNN image recognition model for video segmentation task can severely hurt performance. We propose a semi-supervised approach to adapting CNN image recognition model trained from labelled image data to the target domain exploiting both semantic evidence learned from CNN, and the intrinsic structures of video data. By explicitly modelling and compensating for the domain shift from the source domain to the target domain, this proposed approach underpins a robust semantic object segmentation method against the changes in appearance, shape and occlusion in natural videos. We present extensive experiments on challenging datasets that demonstrate the superior performance of our approach compared with the state-of-the-art methods.
Huiling Wang, Tapani Raiko, Lasse Lensu, Tinghuai Wang, Juha Karhunen
Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks
Abstract
This work investigates the use of deep fully convolutional neural networks (DFCNN) for pixel-wise scene labeling of Earth Observation images. Especially, we train a variant of the SegNet architecture on remote sensing data over an urban area and study different strategies for performing accurate semantic segmentation. Our contributions are the following: (1) we transfer efficiently a DFCNN from generic everyday images to remote sensing images; (2) we introduce a multi-kernel convolutional layer for fast aggregation of predictions at multiple scales; (3) we perform data fusion from heterogeneous sensors (optical and laser) using residual correction. Our framework improves state-of-the-art accuracy on the ISPRS Vaihingen 2D Semantic Labeling dataset.
Nicolas Audebert, Bertrand Le Saux, Sébastien Lefèvre
Object Boundary Guided Semantic Segmentation
Abstract
Semantic segmentation is critical to image content understanding and object localization. Recent development in fully-convolutional neural network (FCN) has enabled accurate pixel-level labeling. One issue in previous works is that the FCN based method does not exploit the object boundary information to delineate segmentation details since the object boundary label is ignored in the network training. To tackle this problem, we introduce a double branch fully convolutional neural network, which separates the learning of the desirable semantic class labeling with mask-level object proposals guided by relabeled boundaries. This network, called object boundary guided FCN (OBG-FCN), is able to integrate the distinct properties of object shape and class features elegantly in a fully convolutional way with a designed masking architecture. We conduct experiments on the PASCAL VOC segmentation benchmark, and show that the end-to-end trainable OBG-FCN system offers great improvement in optimizing the target semantic segmentation quality.
Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu, C.-C. Jay Kuo
FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture
Abstract
In this paper we address the problem of semantic labeling of indoor scenes on RGB-D data. With the availability of RGB-D cameras, it is expected that additional depth measurement will improve the accuracy. Here we investigate a solution how to incorporate complementary depth information into a semantic segmentation framework by making use of convolutional neural networks (CNNs). Recently encoder-decoder type fully convolutional CNN architectures have achieved a great success in the field of semantic segmentation. Motivated by this observation we propose an encoder-decoder type network, where the encoder part is composed of two branches of networks that simultaneously extract features from RGB and depth images and fuse depth features into the RGB feature maps as the network goes deeper. Comprehensive experimental evaluations demonstrate that the proposed fusion-based architecture achieves competitive results with the state-of-the-art methods on the challenging SUN RGB-D benchmark obtaining 76.27% global accuracy, 48.30% average class accuracy and 37.29% average intersection-over-union score.
Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers
Point-Cut: Interactive Image Segmentation Using Point Supervision
Abstract
Interactive image segmentation is a fundamental task in many applications in graphics, image processing, and computational photography. Many leading methods formulate elaborated energy functionals, achieving high performance with reflecting human’s intention. However, they show limitations in practical usage since user interaction is labor intensive to obtain segments efficiently. We present an interactive segmentation method to handle this problem. Our approach, called point cut, requires minimal point supervision only. To this end, we use off-the-shelf object proposal methods that generate object candidates with high recall. With the single point supervision, foreground appearance can be estimated with high accuracy, and then integrated into a graph cut optimization to generate binary segments. Intensive experiments show that our approach outperforms existing methods for interactive object segmentation both qualitatively and quantitatively.
Changjae Oh, Bumsub Ham, Kwanghoon Sohn
A Holistic Approach for Data-Driven Object Cutout
Abstract
Object cutout is a fundamental operation for image editing and manipulation, yet it is extremely challenging to automate it in real-world images, which typically contain considerable background clutter. In contrast to existing cutout methods, which are based mainly on low-level image analysis, we propose a more holistic approach, which considers the entire shape of the object of interest by leveraging higher-level image analysis and learnt global shape priors. Specifically, we leverage a deep neural network (DNN) trained for objects of a particular class (chairs) for realizing this mechanism. Given a rectangular image region, the DNN outputs a probability map (P-map) that indicates for each pixel inside the rectangle how likely it is to be contained inside an object from the class of interest. We show that the resulting P-maps may be used to evaluate how likely a rectangle proposal is to contain an instance of the class, and further process good proposals to produce an accurate object cutout mask. This amounts to an automatic end-to-end pipeline for catergory-specific object cutout. We evaluate our approach on segmentation benchmark datasets, and show that it significantly outperforms the state-of-the-art on them.
Huayong Xu, Yangyan Li, Wenzheng Chen, Dani Lischinski, Daniel Cohen-Or, Baoquan Chen
Interactive Segmentation from 1-Bit Feedback
Abstract
This paper presents an efficient algorithm for interactive image segmentation that responds to 1-bit user feedback. The goal of this type of segmentation is to propose a sequence of yes-or-no questions to the user. Then, according to the 1-bit answers from the user, the segmentation algorithm progressively revises the questions and the segments, so that the segmentation result can approach the ideal region of interest (ROI) in the mind of the user. We define a question as an event that whether a chosen superpixel hits the ROI or not. In general, an interactive image segmentation algorithm is better to achieve high segmentation accuracy, low response time, and simple manipulation. We fulfill these demands by designing an efficient interactive segmentation algorithm from 1-bit user feedback. Our algorithm employs techniques from over-segmentation, entropy calculation, and transductive inference. Over-segmentation reduces the solution set of questions and the computational costs of transductive inference. Entropy calculation provides a way to characterize the query order of superpixels. Transductive inference is used to estimate the similarity between superpixels and to partition the superpixels into ROI and region of uninterest (ROU). Following the clues from the similarity between superpixels, we design the query-superpixel selection mechanism for human-machine interaction. Our key idea is to narrow down the solution set of questions, and then to propose the most informative question based on the clues of the similarities among the superpixels. We assess our method on four publicly available datasets. The experiments demonstrate that our method provides a plausible solution to the problem of interactive image segmentation with merely 1-bit user feedback.
Ding-Jie Chen, Hwann-Tzong Chen, Long-Wen Chang
Geodesic Distance Histogram Feature for Video Segmentation
Abstract
This paper proposes a geodesic-distance-based feature that encodes global information for improved video segmentation algorithms. The feature is a joint histogram of intensity and geodesic distances, where the geodesic distances are computed as the shortest paths between superpixels via their boundaries. We also incorporate adaptive voting weights and spatial pyramid configurations to include spatial information into the geodesic histogram feature and show that this further improves results. The feature is generic and can be used as part of various algorithms. In experiments, we test the geodesic histogram feature by incorporating it into two existing video segmentation frameworks. This leads to significantly better performance in 3D video segmentation benchmarks on two datasets.
Hieu Le, Vu Nguyen, Chen-Ping Yu, Dimitris Samaras
HF-FCN: Hierarchically Fused Fully Convolutional Network for Robust Building Extraction
Abstract
Automatic building extraction from remote sensing images plays an important role in a diverse range of applications. However, it is significantly challenging to extract arbitrary-size buildings with largely variant appearances or occlusions. In this paper, we propose a robust system employing a novel hierarchically fused fully convolutional network (HF-FCN), which effectively integrates the information generated from a group of neurons with multi-scale receptive fields. Our architecture takes an aerial image as the input without warping or cropping it and directly generates the building map. The experiment results tested on a public aerial imagery dataset demonstrate that our method surpasses state-of-the-art methods in the building detection accuracy and significantly reduces the time cost.
Tongchun Zuo, Juntao Feng, Xuejin Chen

Dictionary Learning, Retrieval, and Clustering

Frontmatter
Dictionary Reduction: Automatic Compact Dictionary Learning for Classification
Abstract
A complete and discriminative dictionary can achieve superior performance. However, it also consumes extra processing time and memory, especially for large datasets. Most existing compact dictionary learning methods need to set the dictionary size manually, therefore an appropriate dictionary size is usually obtained in an exhaustive search manner. How to automatically learn a compact dictionary with high fidelity is still an open challenge. We propose an automatic compact dictionary learning (ACDL) method which can guarantee a more compact and discriminative dictionary while at the same time maintaining the state-of-the-art classification performance. We incorporate two innovative components in the formulation of the dictionary learning algorithm. First, an indicator function is introduced that automatically removes highly correlated dictionary atoms with weak discrimination capacity. Second, two additional constraints, namely, the sum-to-one and the non-negative constraints are imposed on the sparse coefficients. On one hand, this achieves the same functionality as the \(L_2\)-normalization on the raw data to maintain a stable sparsity threshold. On the other hand, this effectively preserves the geometric structure of the raw data which would be otherwise destroyed by the \(L_2\)-normalization. Extensive evaluations have shown that the preservation of geometric structure of the raw data plays an important role in achieving high classification performance with smallest dictionary size. Experimental results conducted on four recognition problems demonstrate the proposed ACDL can achieve competitive classification performance using a drastically reduced dictionary (https://​github.​com/​susanqq/​ACDL.​git).
Yang Song, Zhifei Zhang, Liu Liu, Alireza Rahimpour, Hairong Qi
A Vote-and-Verify Strategy for Fast Spatial Verification in Image Retrieval
Abstract
Spatial verification is a crucial part of every image retrieval system, as it accounts for the fact that geometric feature configurations are typically ignored by the Bag-of-Words representation. Since spatial verification quickly becomes the bottleneck of the retrieval process, runtime efficiency is extremely important. At the same time, spatial verification should be able to reliably distinguish between related and unrelated images. While methods based on RANSAC’s hypothesize-and-verify framework achieve high accuracy, they are not particularly efficient. Conversely, verification approaches based on Hough voting are extremely efficient but not as accurate. In this paper, we develop a novel spatial verification approach that uses an efficient voting scheme to identify promising transformation hypotheses that are subsequently verified and refined. Through comprehensive experiments, we show that our method is able to achieve a verification accuracy similar to state-of-the-art hypothesize-and-verify approaches while providing faster runtimes than state-of-the-art voting-based methods.
Johannes L. Schönberger, True Price, Torsten Sattler, Jan-Michael Frahm, Marc Pollefeys
SSP: Supervised Sparse Projections for Large-Scale Retrieval in High Dimensions
Abstract
As “big data” transforms the way we solve computer vision problems, the question of how we can efficiently leverage large labelled databases becomes increasingly important. High-dimensional features, such as the convolutional neural network activations that drive many leading recognition frameworks, pose particular challenges for efficient retrieval. We present a novel method for learning compact binary codes in which the conventional dense projection matrix is replaced with a discriminatively-trained sparse projection matrix. The proposed method achieves two to three times faster encoding than modern dense binary encoding methods, while obtaining comparable retrieval accuracy, on SUN RGB-D, AwA, and ImageNet datasets. The method is also more accurate than unsupervised high-dimensional binary encoding methods at similar encoding speeds.
Frederick Tung, James J. Little
An Online Algorithm for Efficient and Temporally Consistent Subspace Clustering
Abstract
We present an online algorithm for the efficient clustering of data drawn from a union of arbitrary dimensional, non-static subspaces. Our algorithm is based on an online min-Mahalanobis distance classifier, which simultaneously clusters and is updated from subspace data. In contrast to most existing methods, our algorithm can cope with large amounts of batch or sequential data and is temporally consistent when dealing with time varying data (i.e. time-series). Starting from an initial condition, the classifier provides a first estimate of the subspace clusters in the current time-window. From this estimate, we update the classifier using stochastic gradient descent. The updated classifier is applied back onto the data to refine the subspace clusters, while at the same time we recover the explicit rotations that align the subspaces between time- windows. The whole procedure is repeated until convergence, resulting in a fast, efficient and accurate algorithm. We have tested our algorithm on synthetic and three real datasets and compared with competing methods from literature. Our results show that our algorithm outperforms the competition with superior clustering accuracy and computation speed.
Vasileios Zografos, Kai Krajsek, Bjoern Menze
Sparse Gradient Pursuit for Robust Visual Analysis
Abstract
Many high-dimensional data analysis problems, such as clustering and classification, usually involve the minimization of a Laplacian regularization, which is equivalent to minimize square errors of the gradient on a graph, i.e., the disparity among the adjacent nodes in a data graph. However, the Laplacian criterion usually preserves the locally homogeneous data structure but suppresses the discrimination among samples across clusters, which accordingly leads to undesirable confusion among similar observations belonging to different clusters. In this paper, we propose a novel criterion, named Sparse Gradient Pursuit (SGP), to simultaneously preserve the within-class homogeneity and the between-class discrimination for unsupervised data clustering. In addition, we show that the proposed SGP criterion is generic and can be extended to handle semi-supervised learning problems by incorporating the label information into the data graph. Though this unified semi-supervised learning model leads to a nonconvex optimization problem, we develop a new numerical scheme for the SGP related nonconvex optimization problem and analyze the convergence property of the proposed algorithm under mild conditions. Extensive experiments demonstrate that the proposed algorithm performs favorably against the state-of-the-art unsupervised and semi-supervised methods.
Jiangxin Dong, Risheng Liu, Kewei Tang, Yiyang Wang, Xindong Zhang, Zhixun Su
F-SORT: An Alternative for Faster Geometric Verification
Abstract
This paper presents a novel geometric verification approach coined Fast Sequence Order Re-sorting Technique (F-SORT), capable of rapidly validating matches between images under arbitrary viewing conditions. By using a fundamental framework of re-sorting image features into local sequence groups for geometric validation along different orientations, we simulate the enforcement of geometric constraints within each sequence group in various views and rotations. While conventional geometric verification (e.g. RANSAC) and state-of-the-art fully affine invariant image matching approaches (e.g. ASIFT) are high in computational cost, our approach is multiple times less computational expensive. We evaluate F-SORT on the Stanford Mobile Visual Search (SMVS) and the Zurich Buildings (ZuBuD) image databases comprising an overall of 9 image categories, and report competitive performance with respect to PROSAC, RANSAC and ASIFT. Out of the 9 categories, F-SORT wins PROSAC in 9 categories, RANSAC in 8 categories and ASIFT in 7 categories, with a significant reduction in computational cost of over nine-fold, thirty-fold and hundred-fold respectively.
Jacob Chan, Jimmy Addison Lee, Kemao Qian
Clustering Symmetric Positive Definite Matrices on the Riemannian Manifolds
Abstract
Using structured features such as symmetric positive definite (SPD) matrices to encode visual information has been found to be effective in computer vision. Traditional pattern recognition methods developed in the Euclidean space are not suitable for directly processing SPD matrices because they lie in Riemannian manifolds of negative curvature. The main contribution of this paper is the development of a novel framework, termed Riemannian Competitive Learning (RCL), for SPD matrices clustering. In this framework, we introduce a conscious competition mechanism and develop a robust algorithm termed Riemannian Frequency Sensitive Competitive Learning (rFSCL). Compared with existing methods, rFSCL has three distinctive advantages. Firstly, rFSCL inherits the online nature of competitive learning making it capable of handling very large data sets. Secondly, rFSCL inherits the advantage of conscious competitive learning which means that it is less sensitive to the initial values of the cluster centers and that all clusters are fully utilized without the “dead unit” problem associated with many clustering algorithms. Thirdly, as an intrinsic Riemannian clustering method, rFSCL operates along the geodesic on the manifold and the algorithms is completely independent of the choice of local coordinate systems. Extensive experiments show its superior performance compared with other state of the art SPD matrices clustering methods.
Ligang Zheng, Guoping Qiu, Jiwu Huang
Subspace Learning Based Low-Rank Representation
Abstract
Subspace segmentation has been a hot topic in the past decades. Recently, spectral-clustering based methods arouse broad interests, however, they usually consider the similarity extraction in the original space. In this paper, we propose subspace learning based low-rank representation to learn a subspace favoring the similarity extraction for the low-rank representation. The process of learning the subspace and achieving the representation is conducted simultaneously and thus they can benefit from each other. After extending the linear projection to nonlinear mapping, our method can handle manifold clustering problem which is a general case of subspace segmentation. Moreover, our method can also be applied in the problem of recognition by adding suitable penalty on the learned subspace. Extensive experimental results confirm the effectiveness of our method.
Kewei Tang, Xiaodong Liu, Zhixun Su, Wei Jiang, Jiangxin Dong
Backmatter
Metadaten
Titel
Computer Vision – ACCV 2016
herausgegeben von
Shang-Hong Lai
Vincent Lepetit
Ko Nishino
Yoichi Sato
Copyright-Jahr
2017
Electronic ISBN
978-3-319-54181-5
Print ISBN
978-3-319-54180-8
DOI
https://doi.org/10.1007/978-3-319-54181-5