1 Introduction
1.1 Motivation
-
“Flower-pattern” information is a texture in the photograph
-
“Price” is a relational attribute
-
A “party” relates to an abstract (emotional) visual category of the same photograph, detectable by higher layers of neural networks used for computer vision.
-
Classification requires precisely labelled training data in order to classify unbound inputs into a fixed schema, delivering an assignment of any object into a well-known class. The target can be generic, including domain-specific classes as well as general imagery.
-
Multimedia exploration based on similarity search works with unbound inputs, without any training phase or schema. The result is an interactive process with human controlling the loop.
-
Attribute discovery is in many aspects dual to classification approaches. It works with a well-known dataset, does not differentiate between training and real data and does not need any supervision at design time. Supervision at design time is replaced by acceptance at run time. It gradually delivers an extension of database schema by an unbound number of new attributes, out of which none were known before.
1.2 Database schema augmentation and attribute discovery
Classification | Exploration | Attr. discovery | |
---|---|---|---|
Inputs | Unbound | Unbound | Known upfront |
Training | Supervised | None | Unsupervised |
Schema | Fixed | None | Dynamic |
Examples | Manual labels | None | Equal to real data |
Target | Generic | Generic | Domain-specific |
Delivers | Assignment | Interaction | Schema extension |
1.3 Contribution
2 Related work and background
2.1 Relational data management
2.2 Similarity search
2.3 Multi-modal search
2.4 Deep learning
2.5 Visual pattern discovery
2.6 Product and fashion classification
3 Augmenting database schema by latent visual attributes
3.1 Formal task definition
3.2 Image patches
3.3 Similarity sets
3.4 Refinement of attribute candidates
-
A new attribute should be discriminative within the dataset. Although a new attribute set to the same value for every entity in the database is possible, it would not bring any benefit to end users of the system (e.g. discovering a binary attribute SHOE=true within a dataset of only shoes). In terminology of information retrieval, that attribute would be a stop word (visual stop word, or visual noise in our case). As defined by Li et al. [27], visual patterns in one attribute should be significantly different from patterns found in other attributes.
-
A new attribute should be represented within the dataset. Inversely to the discriminative property, attributes discovered in one or just a few entities would infest the schema and decrease usability of information retrieval. As seen by Li et al. [27], visual patterns of representative attributes should appear frequently among images (but not as frequent to become stop words).
-
A new attribute should be unique within all attributes, both pre-existing and new.
-
Near-duplicate filter: Reducing the number of similarity sets by excluding any sets based on duplicates or near duplicates. This is filtered out by comparing threshold \(T_{nearDuplicate}\) value against the distance values of the smallest ten distances of the similarity set.
-
Large sets filter: Reducing the number of similarity sets by excluding all similarity sets which would have too many close members. This is done by comparing the maximal distance value against \(T_{maxValue}\).
-
Distance derivative filter: The size of similarity sets is reduced using distance derivative as a filter. All distances within a set can be ordered from lowest to greatest, and objects cut-off based on a steep increase of the distance value within the distance distribution. A \(T_{distanceDerivative}\) is compared against the ratio of two consecutive distance values.
-
Similarity set clustering: After executing all previous filtering steps, the number of duplicate sets can be reduced using clustering. The remaining similarity sets can be viewed as edges in a graph of patches, and these sets can be merged together using independent component graph analysis [21].As an alternative, frequent pattern mining algorithms like Market basket analysis [1] can be used to perform a similar task.$$\begin{aligned}&\text {Let } G(P, S) \text { be a graph where }P\text { is a set of patches and S is a set of similarity sets} \\&\qquad \qquad \{G^1,G^2,\ldots G^n\} := connectedComponentAnalysis(G) \\&\text {then removing of too small/large sets leads to the final set of similarity sets } \\&\qquad \qquad S' := \{\ G^i(V^i,E^i) |\ T_{clusterMin} \le |V^i| \le T_{clusterMax} \} \end{aligned}$$
-
De-duplication: The set of similarity sets \(S'\) can be turned into a function which scores all entities in the database (by evaluating a multi-query search in the distance space). This means we could compare them by the rankings/permutations they produce. However, this would not work for pre-existing attributes in the schema, which are not distance-based or ranking-based. We can employ the point-biserial correlation \(r_{pb}\) [51] as an correlation indicator between a pre-existing dichotomous variable/attribute and a distance-based continuous variablewhere n is the total number of entities, \(n_1\) is the number of entities having the flag and \(M_1\) is the mean value of their distance, \(n_0\) is the number of entities not having the flag and \(M_0\) is their mean distance. \(s_n\) represents the standard deviation of the distance values, and it is a member to normalize the results. We can find the most correlated existing attribute for each similarity set and ignore the similarity set if there is a perfect correlation. On top of that, correlation to existing attributes can be further used as a mechanism to rank similarity sets based on what value they could provide to an existing database. The exact same principle and calculation can be done and compared against n-tuples (pairs, triples) of existing attributes. As an example, Fig. 4 illustrates a proposed attribute for blue jeans. The original dataset does not have any definition of that, but it does know a category “jeans” and a tag “blue”. The visual information correlates with that pair better than with any of the pair’s individual components.$$\begin{aligned} r_{pb} = \frac{M_1 - M_0}{s_n} \sqrt{\frac{n_1 n_0}{n^2}} \end{aligned}$$
3.5 Online acceptance by human actors
-
Database architect is the data professional responsible for database schema modelling and application usage of the new attributes which the system will generate. They are also expected to select appropriate feature extractors and distance functions from the available palette given his/her knowledge of the application domain and initial trial runs to fine-tune hyperparameters.
-
Domain administrator (also denoted as domain expert) understands the domain of the database and the entities it contains, and they do not need to have technical knowledge about the system’s software. Their task is to go trough proposed attribute candidates, evaluate them and accept or reject them. Part of the acceptance is also appropriate naming of new attributes.
-
End user utilizes new attributes in a transparent and integrated way, not having to differentiate between pre-existing attributes and the ones provided by the system. Usages of the attributes are application-specific and can, for example, cover searching, comparison or recommendation. By using the application, end user is also implicitly providing feedback for the system, e.g. by search history, basket contents, page operations or time spent on different pages. In Sect. 4.7, we elaborate how this information can be utilized as crowd-based evidence to further improve the process and reduce the workload of the domain administrator.
3.6 Integration of accepted attributes
-
VisualAttributeDefinition represents the new visual attribute. It contains a domain-expert-provided name, their subjective quality evaluation, the original attribute candidate as a compact and repeatable search query, and a distance threshold that was selected by the domain expert.IDNameQualityCandidatesDistanceTreshold42Flower pattern9Img:mpn961-6078.jpg;Patch:6x8@271.7464
-
ProductAttributes is a linkage table between the new attribute and the database entities. It also provides the calculated distance for ranking purposes and a coverage within the image calculated based on the entity image’s patches matching the attribute.ProductAttributeDistanceCoverage143933421.727020
-
DiscardedProducts, DiscardedCategories represent manual filtering done by the domain expert before accepting the attribute.AttributeDiscardedProduct42227680AttributeDiscardedCategory265accessories
4 System architecture
4.1 Extraction of multimedia descriptors
4.2 Generation of image patches
4.3 Constructing similarity sets
4.4 Implementing noise removal
4.5 Extracting frequent patterns
4.6 Interfaces for domain administrator
-
See an initial view with proposed attribute candidates together with the patches defining them.
-
Reject candidates from further considerations, as an explicit feedback to the system.
-
Expand proposed candidates to see how they are applied to other entities in the dataset.
-
Accept an expanded attribute up to a certain threshold while using available mechanism to filter out possible noise.
-
Not meaningful as an attribute.
-
Already covered as an existing category.
-
Too trivial, e.g. just a colour feature.
-
Filter out individual images.
-
Black-list entire entity categories using existing relational data.
-
White-list some of the existing categories and hide everything else.
4.7 Utilisation of system-collected implicit feedback
4.8 Integrating schema augmentation into SQL environment
5 Evaluation
5.1 Dataset
5.2 Offline model selection
feat.extr. | \(T_{dist.Der.}\) | \(T_{clusterMin}\) | \(T_{clusterMax}\) | \(\#candidates\) | coverage |
---|---|---|---|---|---|
conv3 | 0.88 | 8 | 400 | 1531 | 10,584 |
conv3 | 0.91 | 10 | 400 | 1251 | 10,086 |
conv3 | 0.96 | 10 | 50 | 1517 | 10,032 |
conv4 | 0.89 | 8 | 800 | 1776 | 11,901 |
conv4 | 0.91 | 12 | 800 | 983 | 10,418 |
conv4 | 0.94 | 12 | 400 | 1096 | 10,235 |
conv5 | 0.76 | 6 | 50 | 2176 | 10,719 |
conv5 | 0.80 | 8 | 800 | 1270 | 10,024 |
conv5 | 0.92 | 8 | 200 | 1725 | 10,071 |