1 Introduction
2 Background
2.1 The Problem
2.2 Summary of Progress in the Past Decades
2.3 Key Challenges
3 Bag of Words based Texture Representation
3.1 The BoW Pipeline
Step | Approach | Highlights |
---|---|---|
Local Texture Descriptors (Sect. 3.2) | Sparse Descriptors | |
Keypoint detectors plus novel descriptors SPIN and RIFT | ||
A comprehensive evaluation of multiple keypoint detectors, feature descriptors, and classifier kernels | ||
Dense Descriptors | ||
Gabor Wavelets | Joint optimum resolution in time and frequency; Multiscale and multiorientation analysis | |
LMfilters (Leung and Malik 2001) | First to propose Bag of Texton (BoT) model (i.e. the BoW model) | |
Schmid Filters | Gabor like filters; Rotation invariant | |
MR8 (Varma and Zisserman 2005) | Rotationally invariant filters and low-dimensional filter response space | |
Patch Intensity (Varma and Zisserman 2009) | Challenge the dominant role of filter descriptors and propose image raw intensity feature | |
LBP (Ojala et al. 2002b) | Fast binary features with gray scale invariance; Predefined codebook | |
Random Projection (Liu and Fieguth 2012) | First to introduce compressive sensing and random projection into texture classification | |
Sorted Random Projection (Liu et al. 2011a) | Efficient and effective approach for random projection to achieve rotation invariance | |
Basic Image Features (BIFs) (Crosier and Griffin 2010) | Introduce BIFs of Griffin and Lillholm into texture classification; Predefined codebook | |
Weber Local Descriptor (WLD) (Crosier and Griffin 2010) | A descriptor based on Weber’s Law | |
Fractal Based Descriptors | ||
MultiFractal Spectrum (Xu et al. 2009b) | Invariant under the bi-Lipschitz mapping | |
Codebook Generation (Sect. 3.3) | No codebook learning step; Computationally efficient | |
Most commonly used method; Cannot capture overlapping distributions in the feature space | ||
Considers both cluster centers and covariances which describe the spreads of clusters | ||
Sparse representation based; Minimize reconstruction error of data; Computationally expensive | ||
Feature Encoding (Sect. 3.4) | Voting Based Methods | Require a large codebook (usually learned by kmeans); Usually combine with nonlinear SVM |
Quantize each feature to nearest codeword; Fast to compute; Codes are sparse and high dimensional | ||
Assigns each feature to multiple codewords; Does not minimize reconstruction error | ||
Fisher Vector (FV) Based Methods | Require a small codebook; Very high dimension; Combines with efficient linear SVM | |
FV (Perronnin and Dance 2007) | GMM-based; Encodes higher order statistics; Efficient to compute | |
Uses signed square rooting and \(L_2\) normalization; State of the art performance in texture classification | ||
A simplified version of FV | ||
Reconstruction Based Methods | Enforce sparse representation; Explores the manifold structure of data; Minimize reconstruction error | |
Leverage that fact that natural images are sparse; Optimization is computationally expensive | ||
Local smooth sparsity; Fast computation through approximated LLC | ||
Feature Pooling (Sect. 3.5) | Average Pooling | The most widely used pooling scheme in texture representation |
Max Pooling | Usually used in combination with sparse coding and LLC | |
Spatial Pyramid Pooling (SPM) | Preserving more spatial information; Higher feature dimensionality | |
Classifier (Sect. 3.5) | Simple and elegant nonparametric classifier; Popular in texture classification | |
Kernel SVM (Zhang et al. 2007) | Usually in combination with Chi Square for BoW based representation | |
Linear SVM (Cimpoi et al. 2016) | Suitable for high-dimensional feature representation like FV and VLAD |
3.2 Local Texture Descriptors
3.2.1 Sparse Texture Descriptors
3.2.2 Dense Texture Descriptors
-
Interest point detectors typically produce a sparse output and could miss important texture elements.
-
A sparse output in a small image might not produce sufficient regions for robust statistical characterization.
-
There are issues regarding the repeatability of the detectors, the stability of the selected regions and the instability of orientation estimation (Mikolajczyk et al. 2005).
3.2.3 Fractal Based Descriptors
3.3 Codebook Generation
3.4 Feature Encoding
3.5 Feature Pooling and Classification
Approach | Highlights |
---|---|
AlexNet (Krizhevsky et al. 2012) | Achieved breakthrough image classification result on ImageNet; The historical turning point of feature representation from handcrafted to CNN |
Similar complexity as AlexNet, but better texture classification performance | |
VGGVD (Simonyan and Zisserman 2015) | Much deeper than AlexNet; Much Larger model size than AlexNet and VGGM; Much better texture recognition performance than AlexNet and VGGM |
GoogleNet (Szegedy et al. 2015) | Much deeper than AlexNet; Small pretrained model size; Not often used in texture classification |
ResNet (He et al. 2016) | Significantly deeper than VGGVD; Smaller model size (ResNet 101) than AlexNet |
Using Finetuned CNN Models (Sect. 4.2) | End-to-end learning |
TCNN (Andrearczyk and Whelan 2016) | Using global average pooling; Combining outputs from multiple CONV layers |
Introducing a novel and orderless bilinear feature pooling method; Generalizing Fisher Vector and VLAD; Good representation ability; Very high feature dimensionality | |
Compact BCNN (Gao et al. 2016) | Adopting Random Maclaurin Projection or Tensor Sketch Projection to reduce the dimensionality of bilinear features (e.g. from 262144 (\(512^2\)) to 8192); Maintain similar performance to BCNN; |
FASON (Dai et al. 2017) | |
NetVLAD (Arandjelovic et al. 2016) | Plugging a VLAD like layer in a CNN network at the last CONV layer |
DeepTEN (Zhang et al. 2017) | Similar to NetVLAD (Arandjelovic et al. 2016), integrating an encoding layer on top of CONV layers; Generalizing orderless pooling methods such as VLAD and FV in a CNN trained end to end |
Texture Specific Deep Convolutional Models (Sect. 4.3) | |
ScatNet (Bruna and Mallat 2013) | Use Gabor wavelets for comvolution; Mathematical interpretation of CNNs; Features being stable to deformations and preserving high frequency information; |
PCANet (Chan et al. 2015) | Inspired by ScatNet (Bruna and Mallat 2013), using PCA filters to replace Gabor wavelets;Using LBP and histogramming as feature pooling; No local invariance |
4 CNN Based Texture Representation
-
using pretrained generic CNN models,
-
using finetuned CNN models, and
-
using handcrafted deep convolutional networks.
4.1 Using Pretrained Generic CNN Models
4.2 Using Finetuned CNN Models
4.3 Using Handcrafted Deep Convolutional Networks
5 Attribute-Based Texture Representation
6 Texture Datasets and Performance
6.1 Texture Datasets
No. | Texture dataset | References | Total images | Texture classes | Image size | Gray or color | Imaging environment | Illumination changes | Rotation changes | Viewpoint changes | Scale changes | Image content | Instances or categories | Year | Download link |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Brodatz |
Brodatz (1966b) | 111 | 111 |
\(640\times 640\)
| Gray | Controlled | Objects | Instances | 1966 |
Brodatz (1966a) | ||||
2 | VisTex | − | 167 | 167 |
\(786\times 512\)
| Color | Wild |
\(\surd \)
| Objects | Instances | 1995 |
VisTex (1995) | |||
3 | CUReT |
Dana et al. (1999) | 5612 | 92 |
\(200\times 200\)
| Color | Controlled |
\(\surd \)
| Small |
\(\surd \)
| Materials | Instances | 1999 |
CUReT (1999) | |
4 | Outex |
Ojala et al. (2002a) | 8640 | 320 |
\(746\times 538\)
| Color | Controlled |
\(\surd \)
|
\(\surd \)
| Materials | Instances | 2002 |
Outex (2002) | ||
5 | KTHTIPS | 810 | 10 |
\(200\times 200\)
| Color | Controlled |
\(\surd \)
| Small | Small |
\(\surd \)
| Materials | Instances | 2004 |
KTHTIPS (2004) | |
6 | UIUC |
Lazebnik et al. (2005) | 1000 | 25 |
\(640 \times 480\)
| Gray | Wild |
\(\surd \)
|
\(\surd \)
|
\(\surd \)
|
\(\surd \)
| Materials | Instances | 2005 |
UIUC (2005) |
7 | KTHTIPS2a | 4608 | 11 |
\(200 \times 200\)
| Color | Controlled |
\(\surd \)
| Small | Small |
\(\surd \)
| Materials | Categories | 2006 |
KTHTIPS (2004) | |
8 | KTHTIPS2b | 4752 | 11 |
\(200 \times 200\)
| Color | Controlled |
\(\surd \)
| Small | Small |
\(\surd \)
| Materials | Categories | 2006 |
KTHTIPS (2004) | |
9 | UMD |
Xu et al. (2009b) | 1000 | 25 |
\(1280 \times 960\)
| Gray | Wild |
\(\surd \)
|
\(\surd \)
|
\(\surd \)
|
\(\surd \)
| Objects | Instances | 2009 |
UMD (2009) |
10 | ALOT |
Burghouts and Geusebroek (2009) | 25000 | 250 |
\(1536 \times 1024\)
| Color | Controlled |
\(\surd \)
| Materials | Instances | 2009 |
ALOT (2009) | |||
11 | RawFooT |
Cusano et al. (2016) | 3128 | 68 |
\(800\times 800\)
| Color | Controlled |
\(\surd \)
| Materials | Instances | 2016 |
Raw Food Texture (RFT) (2016) | |||
12 | FMD | 1000 | 10 |
\(512 \times 384\)
| Color | Wild |
\(\surd \)
|
\(\surd \)
| Materials | Categories | 2009 |
FMD (2009) | |||
13 | DreTex |
Oxholm et al. (2012) | 40000 | 20 |
\(200 \times 200\)
| Color | Controlled |
\(\surd \)
|
\(\surd \)
|
\(\surd \)
| Materials | Instances | 2012 |
Drexel (2012) | |
14 | UBO2014 |
Weinmann et al. (2014) | 1915284 | 7 |
\(400\times 400\)
| Color | Synthesis |
\(\surd \)
|
\(\surd \)
|
\(\surd \)
| Materials | Categories | 2014 |
UBO2014 (2016) | |
15 | OpenSurfaces |
Bell et al. (2013) | 10422 | 22 | Unfixed | Color | Wild |
\(\surd \)
|
\(\surd \)
|
\(\surd \)
|
\(\surd \)
| Materials | Clutter | 2013 |
Open Surfaces (2013) |
16 | DTD |
Cimpoi et al. (2014) | 5640 | 47 | Unfixed | Color | Wild |
\(\surd \)
|
\(\surd \)
|
\(\surd \)
| Attributes | Categories | 2014 |
DTD (2014) | |
17 | MINC |
Bell et al. (2015) | 2996674 | 23 | Unfixed | Color | Wild |
\(\surd \)
|
\(\surd \)
|
\(\surd \)
|
\(\surd \)
| Materials | Clutter | 2015 |
MINC (2015) |
18 | MINC2500 |
Bell et al. (2015) | 57500 | 23 |
\(362 \times 362\)
| Color | Wild |
\(\surd \)
|
\(\surd \)
|
\(\surd \)
|
\(\surd \)
| Materials | Clutter | 2015 |
MINC (2015) |
19 | GTOS |
Xue et al. (2017) | 34243 | 40 |
\(240 \times 240\)
| Color | Partially Controlled |
\(\surd \)
|
\(\surd \)
|
\(\surd \)
| Materials | Instances | 2016 |
Ground Terrain in Outdoor Scenes (GTOS) (2016) | |
20 | LFMD |
Wang et al. (2016) | 1200 | 12 |
\(3787\times 2632\)
| Color | Uncontrolled |
\(\surd \)
| Materials | Categories | 2016 |
LFMD (2016) | |||
21 | RDAD |
Bormann et al. (2016) | 1488 | 57 |
\(2592\times 1944\)
| Color | Uncontrolled |
\(\surd \)
|
\(\surd \)
| Objects | Instances | 2016 |
Robotics Domain Attributes Database (RDAD) (2016) |