1 Introduction
1.1 Contributions
- We propose to use specific networks, different from standard visual representation networks, to capture artistic context in paintings.
- We explore two different modalities of our proposed networks, one based on multitask learning and another one based on knowledge graphs.
- We investigate the resulting context-aware embeddings with a visualisation tool, finding insights on how the different artistic attributes are clustered in different embedding spaces.
2 Related work
2.1 Automatic art analysis
2.2 Multitask learning
2.3 Knowledge graphs
3 Multitask learning ContextNet
4 Knowledge graph ContextNet
4.1 Artistic knowledge graph
4.2 Training
4.3 ContextNet at test time
5 Art classification evaluation
5.1 Implementation details
5.2 Evaluation dataset
- Type classification Using the attribute Type, each painting is classified according to 10 different common types of paintings: portrait, landscape, religious, study, genre, still life, mythological, interior, historical and other.
- School classification The School attribute is used to assign each painting to one of the schools of art that appear at least in ten samples in the training set: Italian, Dutch, French, Flemish, German, Spanish, English, Netherlandish, Austrian, Hungarian, American, Danish, Swiss, Russian, Scottish, Greek, Catalan, Bohemian, Swedish, Irish, Norwegian, Polish and Other. Paintings with a school different to those are assigned to the class Unknown. In total, there are 25 school classes.
- Timeframe classification The attribute Timeframe, which corresponds to periods of 50 years evenly distributed between 801 and 1900, is used to classify each painting according to its creation date. We only consider timeframes with at least ten paintings in the training set, obtaining a total of 18 classes, which includes an Unknown class for timeframes out of the selection.
- Author identification The Author attribute is used to classify paintings according to 350 different painters. Although the SemArt dataset provides 3281 unique authors, we only consider the ones with at least ten paintings in the training set, including an Unknown class for painters not contained in the final selection.
5.3 Baselines
- Pre-trained Networks VGG16 [50], ResNet50 [24] and Res-Net152 [24] with their pre-trained weights learnt in natural image classification. To adapt the models for art classification, we modified the last fully connected layer to match the number of classes of each task. The weights of the last layer were initialised randomly and fine-tuned during training, whereas the weights of the rest of the network were frozen.
- ResNet50+Attributes The output of each fine-tuned classification model from above was concatenated to the output of a pre-trained ResNet50 network without the last fully connected layer. The result was a high-dimensional embedding representing the visual content of the image and its attribute predictions. The high-dimensional embedding was input into a last fully connected layer with ReLU to predict the attribute of interest. Only the weights from the pre-trained ResNet50 and the last layer were fine-tuned, whereas the weights of the attribute classifiers were frozen.
- ResNet50+Captions For each painting, we generated a caption using the captioning model from [57]. Captions were represented by a multi-hot vector with a vocabulary size of 5000 and encoded into a 512-dimensional embedding with a fully connected layer followed by an hyperbolic tangent or tanh activation. The caption embeddings were then concatenated to the output of a ResNet50 network without the last fully connected layer. The concatenated vector was fed into a fully connected layer with ReLU to obtain the prediction.
Method | Type | School | TF | Author |
---|---|---|---|---|
VGG16 pre-trained | 0.706 | 0.502 | 0.418 | 0.482 |
ResNet50 pre-trained | 0.726 | 0.557 | 0.456 | 0.500 |
ResNet152 pre-trained | 0.740 | 0.540 | 0.454 | 0.489 |
VGG16 fine-tuned | 0.768 | 0.616 | 0.559 | 0.520 |
ResNet50 fine-tuned | 0.765 | 0.655 | 0.604 | 0.515 |
ResNet152 fine-tuned | 0.790 | 0.653 | 0.598 | 0.573 |
ResNet50+Attributes | 0.785 | 0.667 | 0.599 | 0.561 |
ResNet50+Captions | 0.799 | 0.649 | 0.598 | 0.607 |
MTL context-aware | 0.791 | 0.691 | 0.632 | 0.603 |
KGM context-aware | 0.815 | 0.671 | 0.613 | 0.615 |
5.4 Results analysis
6 Art retrieval evaluation
6.1 Implementation details
Model | Text-to-image | Image-to-text | ||||||
---|---|---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | MR | R@1 | R@5 | R@10 | MR | |
CML | 0.144 | 0.332 | 0.454 | 14 | 0.138 | 0.327 | 0.457 | 14 |
CML* | 0.164 | 0.384 | 0.505 | 10 | 0.162 | 0.366 | 0.479 | 12 |
AMD | ||||||||
Type | 0.114 | 0.304 | 0.398 | 17 | 0.125 | 0.280 | 0.398 | 16 |
School | 0.103 | 0.283 | 0.401 | 19 | 0.118 | 0.298 | 0.423 | 16 |
TF | 0.117 | 0.297 | 0.389 | 20 | 0.123 | 0.298 | 0.413 | 17 |
Author | 0.131 | 0.303 | 0.418 | 17 | 0.120 | 0.302 | 0.428 | 16 |
Res152 | ||||||||
Type | 0.178 | 0.383 | 0.525 | 9 | 0.165 | 0.364 | 0.491 | 11 |
School | 0.192 | 0.386 | 0.507 | 10 | 0.163 | 0.364 | 0.484 | 12 |
TF | 0.127 | 0.322 | 0.432 | 18 | 0.130 | 0.336 | 0.444 | 16 |
Author | 0.236 | 0.451 | 0.572 | 7 | 0.204 | 0.440 | 0.535 | 8 |
MTL | ||||||||
Type | 0.145 | 0.358 | 0.474 | 12 | 0.150 | 0.350 | 0.475 | 12 |
School | 0.196 | 0.428 | 0.536 | 8 | 0.172 | 0.396 | 0.520 | 10 |
TF | 0.171 | 0.394 | 0.525 | 9 | 0.138 | 0.353 | 0.466 | 12 |
Author | 0.232 | 0.452 | 0.567 | 7 | 0.206 | 0.431 | 0.535 | 9 |
KGM | ||||||||
Type | 0.152 | 0.367 | 0.506 | 10 | 0.147 | 0.367 | 0.507 | 10 |
School | 0.162 | 0.371 | 0.483 | 12 | 0.156 | 0.355 | 0.483 | 11 |
TF | 0.175 | 0.399 | 0.506 | 10 | 0.148 | 0.360 | 0.472 | 12 |
Author | 0.247 | 0.477 | 0.581 | 6 | 0.212 | 0.446 | 0.563 | 7 |
6.2 Results analysis
Model | Land | Relig | Myth | Genre | Port | Total |
---|---|---|---|---|---|---|
Easy set | ||||||
CCA [20] | 0.708 | 0.609 | 0.571 | 0.714 | 0.615 | 0.650 |
CML [20] | 0.917 | 0.683 | 0.714 | 1 | 0.538 | 0.750 |
KGM Author | 0.875 | 0.805 | 0.857 | 0.857 | 0.846 | 0.830 |
Human | 0.918 | 0.795 | 0.864 | 1 | 1 | 0.889 |
Difficult set | ||||||
CCA [20] | 0.600 | 0.525 | 0.400 | 0.300 | 0.400 | 0.470 |
CML [20] | 0.500 | 0.875 | 0.600 | 0.200 | 0.500 | 0.620 |
KGM Author | 0.600 | 0.825 | 0.700 | 0.400 | 0.650 | 0.680 |
Human | 0.579 | 0.744 | 0.714 | 0.720 | 0.674 | 0.714 |
7 Discussion and visualisation
7.1 Separability of embeddings
7.2 Knowledge graph visualisation
ResNet | node2vec | MTL | |||
---|---|---|---|---|---|
Node | Degree | Node | Degree | Node | Degree |
Still life | 707 | 1551–1600 | 297 | 1651–1700 | 174 |
Oil on canvas | 463 | portrait | 287 | landscape | 167 |
1601–1650 | 321 | Oil on canvas | 174 | Oil on canvas | 112 |
1651–1700 | 210 | Oil on panel | 45 | Dutch | 63 |
Oil on panel | 139 | Italian | 40 | Oil on panel | 34 |
Dutch | 93 | Oil on wood | 28 | BACKHUYSEN, Ludolf | 11 |
1701–1750 | 63 | TINTORETTO | 27 | POST, Frans | 10 |
1851–1900 | 60 | GRECO, El | 26 | KONINCK, Philips | 10 |
1551–1600 | 53 | ARCIMBOLDO, Giuseppe | 17 | Oil on wood | 9 |
Italian | 50 | Oil on oak panel | 16 | VELDE, Adriaen van de | 9 |
Oil on wood | 49 | MORONI, Giovanni Battista | 14 | CAPPELLE, Jan van de | 9 |
Oil on oak panel | 46 | Flemish | 14 | MOUCHERON, Frederick de | 9 |
Flemish | 41 | VERONESE, Paolo | 14 | PYNACKER, Adam | 8 |
French | 34 | MOR VAN DASHORST, Anthonis | 14 | WYNANTS, Jan | 8 |