1 Introduction
-
Inference efficiency With the sustainable growth of e-commerce, a large amount of clothing is available online at high speed nowadays. Considering that the existing work need to store a high-dimension real-value vector for each item, the persistent and temporal storage costs for inference are heavy burden due to the massive data scale. In addition, the existing work employs the Euclidean distance to calculate the nearest neighbours for each query target. Given the huge amount of clothing, the inference process would be very slow. As a result, it is necessary to develop a compact feature representation for clothing items to support high efficient and scalable fashion matching with limited storage cost.
-
Label quality Precise labels that represent matching relationships are important for constructing an effective learning system. In other words, a matching matrix to carry the relationships (i.e. matched, un-matched, unknown) among clothing items is the essential priori knowledge for the learning process in the recommender system. As fashion matching is subjective without a clear definition, precise matching relationships are generally achieved from fashion expertise. To the best of our knowledge, the existing datasets for fashion matching, i.e. Deep Fashion [37, 38] and Amazon Product Data [41, 52], construct the matching labels purely according to customers’ shopping carts in single transactions. Obviously, co-purchased items cannot be guaranteed relevant or matched with each other. The matching labels generated in this way is not reliable for fashion matching supervision.
-
Fashion understanding Individuals may have different understanding of fashion. Fashion, from the perspective of automatic fashion matching, need to be understood by the learning over user-clothing interactions and visual features. Accordingly, how to design a better learning process to effectively capture the fashion is in high demand for personalization.
-
We propose a supervised learning to hash framework that learns the discrete binary representations of clothing items from their visual content features and the matching matrix constructed based on expertise knowledge. An iterative optimization guaranteed with convergence is proposed to effectively solve the optimal binary representation of clothing items. The discretization can significantly reduce the memory cost and accelerate the fashion recommendation speed.
-
We construct three real-life fashion datasets with clothing images and professional fashion coordinates advices. These datasets are built up based on websites Netaporter,1 Farfetch2 and Mytheresa.3 To the best of our knowledge, this is the first large-scale fashion database with professional advices for fashion recommendation.
2 Related Work
2.1 Fashion Recommendation
2.2 Hashing
3 Methodology
3.1 Problem Formulation
3.2 Kernelized Feature Embedding
3.3 Optimization
3.3.1 Optimizing F
3.3.2 Optimizing B
3.3.3 Initializing B
3.3.4 Precision Parameter \(C_{ij}\)
3.3.5 Online Recommendation
4 Experiment
4.1 Experimental Settings
4.1.1 Dataset
# of | Netaporter | Farfetch | Mytheresa |
---|---|---|---|
Total instances | 20,868 | 105,864 | 12,932 |
Selected items | 17,488 | 31,788 | 7548 |
Categories | 58 | 202 | 60 |
Matching pairs | 27,490 | 28,978 | 10,193 |
4.1.2 Feature Extraction
4.1.3 Description of Matching Matrix
4.1.4 Matching Matrix Self-Augmentation
4.1.5 Baselines and Implementation Details
4.1.6 Evaluation Metrics
4.2 Overall Comparison with Baselines
Bits | KSH | IMH | CCA-ITQ | SDH | DSFCH | |
---|---|---|---|---|---|---|
(a) Netaporter | ||||||
AUC | 16 |
0.5746
| 0.4559 | 0.5033 | 0.4771 | 0.5575 |
32 | 0.5460 | 0.4714 | 0.5038 | 0.4115 |
0.5707
| |
64 | 0.5363 | 0.4854 | 0.5018 | 0.4531 |
0.6475
| |
128 | 0.5237 | 0.4933 | 0.5076 | 0.4958 |
0.7220
| |
Training | 16 | (8.4 ± 2.0)e2 | 1.8e3 ± 25.3 | 1.3e3 ± 0.9 | (5.0 ± 0.2)e2 | 87.7 ± 2.3 |
32 | (1.6 ± 0.3)e3 | 1.8e3 ± 13.3 | 1.3e3 ± 3.6 | (1.3 ± 0.1)e3 | 93.9 ± 2.7 | |
64 | (3.5 ± 1.1)e3 | 1.7e3 ± 66.9 | 1.3e3 ± 2.5 | 4.4e3 ± 43.1 | 1.3e2 ± 1.9 | |
128 | (9.5 ± 1.8)e3 | 1.6e3 ± 9.2 | 1.3e3 ± 3.5 | 9.5e3 ± 77.7 | 1.6e2 ± 1.4 | |
Test | 16 | (5.7 ± 0.5)e−2 | 4.0 ± 0.06 | 2.6 ± 0.7 | 0.96 ± 0.03 | (7.7 ± 0.2)e−2 |
32 | (6.0 ± 0.1)e−2 | 6.1 ± 0.2 | 3.9 ± 0.1 | 1.7 ± 0.2 | (8.1 ± 0.3)e−2 | |
64 | (6.8 ± 0.3)e−2 | 9.9 ± 0.4 | 8.2 ± 0.2 | 3.4 ± 1.6e−2 | (9.5 ± 0.3)e−2 | |
128 | 0.82 ± 9.0e−3 | 19.0 ± 1.0 | 17.4 ± 0.2 | 7.6 ± 4.2e−2 | 0.12 ± 8.9e−3 | |
(b) Farfetch | ||||||
AUC | 16 |
0.5536
| 0.4629 | 0.5079 | 0.4954 | 0.5354 |
32 |
0.5620
| 0.4807 | 0.5111 | 0.4845 | 0.5614 | |
64 | 0.5604 | 0.4905 | 0.5077 | 0.4723 |
0.6311
| |
128 | 0.6131 | 0.5001 | 0.5027 | 0.5205 |
0.7258
| |
Training | 16 | (2.9 ± 0.05)e3 | 4.5e3 ± 25.0 | 3.1e3 ± 1.0 | (1.9 ± 0.1)e3 | (7.2 ± 1.0)e2 |
32 | (5.7 ± 0.09)e3 | 4.5e3 ± 9.2 | 3.1e3 ± 1.8 | (4.8 ± 0.1)e3 | (8.3 ± 0.8)e2 | |
64 | (1.2 ± 0.08)e4 | 4.6e3 ± 1.4e2 | 3.2e3 ± 1.6 | (1.3 ± 0.01)e4 | (8.1 ± 1.0)e2 | |
128 | (2.3 ± 0.01)e4 | 4.8e3 ± 12.5 | 3.2e3 ± 1.2 | (3.2 ± 0.01)e4 | (9.1 ± 0.6)e2 | |
Test | 16 | 0.25 ± 0.1 | 4.9 ± 1.5 | 3.1 ± 0.2 | 2.5 ± 0.3 | 2.9 ± 8.8e−2 |
32 | 0.24 ± 0.2 | 7.4 ± 1.0 | 6.0 ± 0.2 | 3.3 ± 0.1 | 2.9 ± 6.4e−2 | |
64 | 0.47 ± 5.4e−3 | 21.3 ± 6.8 | 12.8 ± 0.5 | 7.4 ± 0.2 | 2.9 ± 9.1e−2 | |
128 | 0.49 ± 2.2e−3 | 36.9 ± 5.1 | 26.7 ± 1.2 | 16.03 ± 0.4 | 3.0 ± 0.2 | |
(c) Mytheresa | ||||||
AUC | 16 | 0.5806 | 0.4765 | 0.5001 | 0.5477 |
0.5830
|
32 | 0.5516 | 0.4794 | 0.4959 | 0.5698 |
0.6507
| |
64 | 0.5301 | 0.4954 | 0.5005 | 0.5095 |
0.7173
| |
128 | 0.5086 | 0.5024 | 0.5115 | 0.5513 |
0.7539
| |
Training | 16 | 40.7 ± 1.4 | 1e2 ± 69.5 | 210.8 ± 0.4 | 93.9 ± 7.3 | 17.0 ± 0.6 |
32 | 78.0 ± 2.3 | 1e2 ± 70.5 | 249.3 ± 2.5 | 249.1 ± 7.3 | 19.8 ± 0.9 | |
64 | 1.5e2 ± 2.9 | 1e2 ± 69.0 | 297.3 ± 2.7 | (6.4 ± 0.1)e2 | 32.2 ± 0.6 | |
128 | 3.5e2 ± 8.9 | 1e2 ± 73.5 | 221.2 ± 1.3 | (1.7 ± 0.03)e3 | 44.5 ± 1.6 | |
Test | 16 | (1.6 ± 0.8)e−2 | 0.12 ± 0.1 | 0.1 ± 6.4e−3 | (8.6 ± 3.9)e−2 | (1.7 ± 0.1)e−2 |
32 | (1.3 ± 0.1)e−2 | 0.61 ± 0.4 | 0.9 ± 1.5e−2 | (3.9 ± 1.3)e−2 | (1.8 ± 0.2)e−2 | |
64 | (1.4 ± 0.2)e−2 | 1.15 ± 0.8 | 1.7 ± 6.8e−2 | 0.69 ± 9.3e−3 | (2.1 ± 0.2)e−2 | |
128 | (1.9 ± 0.2)e−2 | 2.31 ± 2.3 | 3.6 ± 0.2 | 1.6 ± 0.5 | (3.0 ± 0.3)e−2 |
4.3 Comprehensive Analysis on DSFCH
4.3.1 Self-Augmentation Study
4.3.2 Discrete and Convergence Study
Constraint | 16 bits | 32 bits | 64 bits | 128 bits |
---|---|---|---|---|
AUC | ||||
Discrete |
0.5702
| 0.5898 |
0.6684
|
0.7500
|
Relaxed | 0.5663 |
0.6004
| 0.6655 | 0.7134 |