nach oben

2017 | Buch

Kapitel lesen Erstes Kapitel lesen

Visual Attributes

herausgegeben von: Rogerio Schmidt Feris, Christoph Lampert, Devi Parikh

Verlag: Springer International Publishing

Buchreihe : Advances in Pattern Recognition

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This unique text/reference provides a detailed overview of the latest advances in machine learning and computer vision related to visual attributes, highlighting how this emerging field intersects with other disciplines, such as computational linguistics and human-machine interaction. Topics and features: presents attribute-based methods for zero-shot classification, learning using privileged information, and methods for multi-task attribute learning; describes the concept of relative attributes, and examines the effectiveness of modeling relative attributes in image search applications; reviews state-of-the-art methods for estimation of human attributes, and describes their use in a range of different applications; discusses attempts to build a vocabulary of visual attributes; explores the connections between visual attributes and natural language; provides contributions from an international selection of world-renowned scientists, covering both theoretical aspects and practical applications.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction to Visual Attributes

Abstract

This chapter serves as an introduction to the content of the book.

Rogerio Schmidt Feris, Christoph Lampert, Devi Parikh

Attribute-Based Recognition

Frontmatter

Chapter 2. An Embarrassingly Simple Approach to Zero-Shot Learning

Abstract

Zero-shot learning concerns learning how to recognise new classes from just a description of them. Many sophisticated approaches have been proposed to address the challenges this problem comprises. Here we describe a zero-shot learning approach that can be implemented in just one line of code, yet it is able to outperform state-of-the-art approaches on standard datasets. The approach is based on a more general framework which models the relationships between features, attributes, and classes as a network with two linear layers, where the weights of the top layer are not learned but are given by the environment. We further provide a learning bound on the generalisation error of this kind of approaches, by casting them as domain adaptation methods. In experiments carried out on three standard real datasets, we found that our approach is able to perform significantly better than the state of the art on all of them.

Bernardino Romera-Paredes, Philip H. S. Torr

Chapter 3. In the Era of Deep Convolutional Features: Are Attributes Still Useful Privileged Data?

Abstract

Our answer is, if used for challenging computer vision tasks, attributes are useful privileged data. We introduce a learning framework called learning using privileged information (LUPI) to the computer vision field to solve the object recognition task in images. We want computers to be able to learn more efficiently at the expense of providing extra information during training time. In this chapter, we focus on semantic attributes as a source of additional information about image data. This information is privileged to image data as it is not available at test time. Recently, image features from deep convolutional neural networks (CNNs) have become primary candidates for many visual recognition tasks. We will therefore analyze the usefulness of attributes as privileged information in the context of deep CNN features as image representation. We explore two maximum-margin LUPI techniques and provide a kernelized version of them to handle nonlinear binary classification problems. We interpret LUPI methods as learning to identify easy and hard objects in the privileged space and transferring this knowledge to train a better classifier in the original data space. We provide a thorough analysis and comparison of information transfer from privileged to the original data spaces for two maximum-margin LUPI methods and a recently proposed probabilistic LUPI method based on Gaussian processes. Our experiments show that in a typical recognition task such as deciding whether an object is “present” or “not present” in an image, attributes do not lead to improvement in the prediction performance when used as privileged information. In an ambiguous vision task such as determining how “easy” or “difficult” it is to spot an object in an image, we show that attribute representation is useful privileged information for deep CNN image features.

Viktoriia Sharmanska, Novi Quadrianto

Chapter 4. Divide, Share, and Conquer: Multi-task Attribute Learning with Selective Sharing

Abstract

Existing methods to learn visual attributes are plagued by two common issues: (i) they are prone to confusion by properties that are correlated with the attribute of interest among training samples and (ii) they often learn generic, imprecise “lowest common denominator” attribute models in an attempt to generalize across classes where a single attribute may have very different visual manifestations. Yet, many proposed applications of attributes rely on being able to learn the precise and correct semantic concept corresponding to each attribute. We argue that these issues are both largely due to indiscriminate “oversharing” amongst attribute classifiers along two axes—(i) visual features and (ii) classifier parameters. To address both these issues, we introduce the general idea of selective sharing during multi-task learning of attributes. First, we show how selective sharing helps learn decorrelated models for each attribute in a vocabulary. Second, we show how selective sharing permits a new form of transfer learning between attributes, yielding a specialized attribute model for each individual object category. We validate both these instantiations of our selective sharing idea through extensive experiments on multiple datasets. We show how they help preserve semantics in learned attribute models, benefitting various downstream applications such as image retrieval or zero-shot learning.

Chao-Yeh Chen, Dinesh Jayaraman, Fei Sha, Kristen Grauman

Relative Attributes and Their Application to Image Search

Frontmatter

Chapter 5. Attributes for Image Retrieval

Abstract

Image retrieval is a computer vision application that people encounter in their everyday lives. To enable accurate retrieval results, a human user needs to be able to communicate in a rich and noiseless way with the retrieval system. We propose semantic visual attributes as a communication channel for search because they are commonly used by humans to describe the world around them. We first propose a new feedback interaction where users can directly comment on how individual properties of retrieved content should be adjusted to more closely match the desired visual content. We then show how to ensure this interaction is as informative as possible, by having the vision system ask those questions that will most increase its certainty over what content is relevant. To ensure that attribute-based statements from the user are not misinterpreted by the system, we model the unique ways in which users employ attribute terms, and develop personalized attribute models. We discover clusters among users in terms of how they use a given attribute term, and consequently discover the distinct “shades of meaning” of these attributes. Our work is a significant step in the direction of bridging the semantic gap between high-level user intent and low-level visual features. We discuss extensions to further increase the utility of attributes for practical search applications.

Adriana Kovashka, Kristen Grauman

Chapter 6. Fine-Grained Comparisons with Attributes

Abstract

Given two images, we want to predict which exhibits a particular visual attribute more than the other—even when the two images are quite similar. For example, given two beach scenes, which looks more calm? Given two high-heeled shoes, which is more ornate? Existing relative attribute methods rely on global ranking functions. However, rarely will the visual cues relevant to a comparison be constant for all data, nor will humans’ perception of the attribute necessarily permit a global ordering. At the same time, not every image pair is even orderable for a given attribute. Attempting to map relative attribute ranks to “equality” predictions is nontrivial, particularly since the span of indistinguishable pairs in attribute space may vary in different parts of the feature space. To address these issues, we introduce local learning approaches for fine-grained visual comparisons, where a predictive model is trained on the fly using only the data most relevant to the novel input. In particular, given a novel pair of images, we develop local learning methods to (1) infer their relative attribute ordering with a ranking function trained using only analogous labeled image pairs, (2) infer the optimal “neighborhood,” i.e., the subset of the training instances most relevant for training a given local model, and (3) infer whether the pair is even distinguishable, based on a local model for just noticeable differences in attributes. Our methods outperform state-of-the-art methods for relative attribute prediction on challenging datasets, including a large newly curated shoe dataset for fine-grained comparisons. We find that for fine-grained comparisons, more labeled data is not necessarily preferable to isolating the right data.

Aron Yu, Kristen Grauman

Chapter 7. Localizing and Visualizing Relative Attributes

Abstract

In this chapter, we present a weakly supervised approach that discovers the spatial extent of relative attributes, given only pairs of ordered images. In contrast to traditional approaches that use global appearance features or rely on keypoint detectors, our goal is to automatically discover the image regions that are relevant to the attribute, even when the attribute’s appearance changes drastically across its attribute spectrum. To accomplish this, we first develop a novel formulation that combines a detector with local smoothness to discover a set of coherent visual chains across the image collection. We then introduce an efficient way to generate additional chains anchored on the initial discovered ones. Finally, we automatically identify the visual chains that are most relevant to the attribute (those whose appearance has high correlation with attribute strength), and create an ensemble image representation to model the attribute. Through extensive experiments, we demonstrate our method’s promise relative to several baselines in modeling relative attributes.

Fanyi Xiao, Yong Jae Lee

Describing People Based on Attributes

Frontmatter

Chapter 8. Deep Learning Face Attributes for Detection and Alignment

Abstract

Describable face attributes are labels that can be given to a face image to describe its characteristics. Examples of face attributes include gender, age, ethnicity, face shape, and nose size. Predicting face attributes in the wild is challenging due to complex face variations. This chapter aims to provide an in-depth presentation of recent progress and the current state-of-the-art approaches to solving some of the fundamental challenges in face attribute recognition, particularly from the angle of deep learning. We highlight effective techniques for training deep convolutional networks for predicting face attributes in the wild, and addressing the problem of imbalanced distribution of attributes. In addition, we discuss the use of face attributes as rich contexts to facilitate accurate face detection and face alignment in return. The chapter ends by posing an open question for the face attribute recognition challenge arising from emerging and future applications.

Chen Change Loy, Ping Luo, Chen Huang

Chapter 9. Visual Attributes for Fashion Analytics

Abstract

In this chapter, we describe methods that leverage clothing and facial attributes as mid-level features for fashion recommendation and retrieval. We introduce a system called Magic Closet for recommending clothing for different occasions, and a system called Beauty E-Expert for hairstyle and facial makeup recommendation. For fashion retrieval, we describe a cross-domain clothing retrieval system, which receives as input a user photo of a particular clothing item taken in unconstrained conditions, and retrieves the exact same or similar item from online shopping catalogs. In each of these systems, we show the value of attribute-guided learning and describe approaches to transfer semantic concepts from large-scale uncluttered annotated data to challenging real-world imagery.

Si Liu, Lisa M. Brown, Qiang Chen, Junshi Huang, Luoqi Liu, Shuicheng Yan

Defining a Vocabulary of Attributes

Frontmatter

Chapter 10. A Taxonomy of Part and Attribute Discovery Techniques

Abstract

This chapter surveys recent techniques for discovering a set of Parts and Attributes (PnAs) in order to enable fine-grained visual discrimination between its instances. Part and Attribute (PnA)-based representations are popular in computer vision as they allow modeling of appearance in a compositional manner, and provide a basis for communication between a human and a machine for various interactive applications. Based on two main properties of these techniques a unified taxonomy of PnA discovery methods is presented. The first distinction between the techniques is whether the PnAs are semantically aligned, i.e., if they are human interpretable or not. In order to achieve the semantic alignment these techniques rely on additional supervision in the form of annotations. Techniques within this category can be further categorized based on if the annotations are language-based, such as nameable labels, or if they are language-free, such as relative similarity comparisons. After a brief introduction motivating the need for PnA based representations, the bulk of the chapter will be dedicated to techniques for PnA discovery categorized into non-semantic, semantic language-based, and semantic language-free methods. Throughout the chapter we will illustrate the trade-offs among various approaches though examples from the existing literature.

Subhransu Maji

Chapter 11. The SUN Attribute Database: Organizing Scenes by Affordances, Materials, and Layout

Abstract

One of the core challenges of computer vision is understanding the content of a scene. Often, scene understanding is demonstrated in terms of object recognition, 3D layout estimation from multiple views, or scene categorization. In this chapter we instead reason about scene attributes—high-level properties of scenes related to affordances (‘shopping,’ ‘studying’), materials (‘rock,’ ‘carpet’), surface properties (‘dirty,’ ‘dry’), spatial layout (‘symmetrical,’ ‘enclosed’), lighting (‘direct sun,’ ‘electric lighting’), and more (‘scary,’ ‘cold’). We describe crowd experiments to first determine a taxonomy of 102 interesting attributes and then to annotate binary attributes for 14,140 scenes. These scenes are sampled from 707 categories of the SUN database and this lets us study the interplay between scene attributes and scene categories. We evaluate attribute recognition with several existing scene descriptors. Our experiments suggest that scene attributes are an efficient feature for capturing high-level semantics in scenes.

Genevieve Patterson, James Hays

Attributes and Language

Frontmatter

Chapter 12. Attributes as Semantic Units Between Natural Language and Visual Recognition

Abstract

Impressive progress has been made in the fields of computer vision and natural language processing. However, it remains a challenge to find the best point of interaction for these very different modalities. In this chapter, we discuss how attributes allow us to exchange information between the two modalities and in this way lead to an interaction on a semantic level. Specifically we discuss how attributes allow using knowledge mined from language resources for recognizing novel visual categories, how we can generate sentence description about images and video, how we can ground natural language in visual content, and finally, how we can answer natural language questions about images.

Marcus Rohrbach

Chapter 13. Grounding the Meaning of Words with Visual Attributes

Abstract

We address the problem of grounding representations of word meaning. Our approach learns higher level representations in a stacked autoencoder architecture from visual and textual input. The two input modalities are encoded as vectors of attributes and are obtained automatically from images and text. To obtain visual attributes (e.g. has_legs, is_yellow) from images, we train attribute classifiers by using our large-scale taxonomy of 600 visual attributes, representing more than 500 concepts and 700 K images. We extract textual attributes (e.g. bird, breed) from text with an existing distributional model. Experimental results on tasks related to word similarity show that the attribute-based vectors can be usefully integrated by our stacked autoencoder model to create bimodal representations which are overall more accurate than representations based on the individual modalities or different integration mechanisms (The work presented in this chapter is based on [89]).

Carina Silberer

Backmatter

Titel: Visual Attributes
herausgegeben von: Rogerio Schmidt Feris
Christoph Lampert
Devi Parikh
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-50077-5
Print ISBN: 978-3-319-50075-1
DOI: https://doi.org/10.1007/978-3-319-50077-5