1 Introduction
Quality | Record ID | Original Italian Description | English Translation |
---|---|---|---|
High | iccd2225343 | Dipinto entro cornice lignea verniciata ocra con bordo interno dorato. Amedeo III è raffigurato di profilo in armatura scura con ceselli in oro, mascheroni dorati sulle spalle e sull’elmo, cimiero con piume rosse e bianche. Nella parte inferiore del dipinto fascia con iscrizione a caratteri stampatello. Personaggi: Amedeo III di Savoia | Painting within an ocher painted wooden frame with a inner golden border. Amedeo III is depicted in profile with a dark armor chiseled in gold, golden figurehead on the shoulders and on the helmet. Crest with white and red plumage. On the lower part of the painting inscription with block letters. Characters: Amedeo the 3rd of Savoy |
Low | work82865 | Congdon si è raramente dedicato al disegno come forma espressiva autonoma, così la mole di disegni raccolti sui taccuini non sono altro che appunti visivi presi durante numerosi viaggi. In questo senso non è possibile, se non raramente, assegnare al singolo disegno un’opera finita direttamente corrispondente, così questi disegni non vengono nemmeno ad essere schizzi preparatori. La sommatoria di tutti i disegni relativi a un luogo danno origine a una serie di dipinti che non hanno un corrispettivo oggettivo nei disegni stessi. Tutto questo giustifica la presenza degli appunti all’interno delle immagini (colori, sfumature e spiegazioni di vario genere). Nel caso probabile veduta di Napoli eseguita durante un viaggio del 1951. | Congdon has rarely devoted himself to drawing as an autonomous expressive form, so the drawings in his notebooks are nothing more than visual sketch taken during his numerous trips. Rarely it is possible to assign to the single drawing the corresponding attributes as finished art work since they represents the base idea for others drawings or paintings. The collection of all the drawings related to a place give rise to a series of paintings that do not have a direct mapping to the drawings themselves. All this justifies the presence of notes inside the images (colors, shades and explanations of various kinds). In this case, probably, a view of Naples from 1951 |
-
Research Question 1 (RQ1) Which machine learning algorithm should be used to assess the quality of cultural heritage descriptions approximating as much as possible human judgement?
-
Research Question 2 (RQ2) Can a classification model trained with descriptions in a given cultural heritage domain be effectively applied to automatically assess description quality in other domains?
-
Research Question 3 (RQ3) How many annotated resources are needed to create enough training data to automatically assess the quality of descriptions?
2 State of the art
2.1 Metadata quality frameworks
2.2 NLP and machine learning for description quality
3 Dataset description
Dataset | High-quality | Low-quality (manual) | Low-quality (auto) | Total |
---|---|---|---|---|
Visual Art Work | 30,383 | 19,824 | 9,784 | 59,991 |
Archaeology | 19,280 | 6,334 | 4,264 | 29,878 |
Architecture | 6,908 | 1842 | 2,202 | 10,952 |
Overall dataset | 56,571 | 28,000 | 16,250 | 100,821 |
-
If the length of the description is less than 3 words, it is labelled as “low quality” (e.g. “Painting”, “Rectangular table”, “View of harbour”). This is done automatically based on the assumption that in few tokens it is not possible to describe both the object and the subject of a record. This concerns 5,349 descriptions, automatically labelled as “low quality”;
-
If there are descriptions coming from a collection not updated after 2012, they are very likely to be “low quality”. This assumption is based on the annotator’s domain knowledge, being aware of the history of Cultura Italia collections and therefore being able to identify less curated batches of records. This assumption is practically confirmed randomly sampling 500 records from such collections and manually checking each of them, confirming that none of the samples can be classified as “high quality”. This way 10,901 descriptions are automatically labelled as “low quality”;
-
The remaining descriptions are then manually annotated one by one and labelled as “high quality” or “low quality”.
4 Classification framework
-
Stopword removal Stopwords include all terms that do not convey a semantic meaning such as articles, prepositions, auxiliaries, etc. These are removed from each description by comparing each token against a pre-defined list of Italian words imported from the NLTK Python library.15
-
Punctuation removal Following the same principle of stopword removal, each punctuation is removed from the descriptions.
4.1 Support vector machine (SVM)
4.2 FastText implementation of the multinomial logistic regression (MLR\(_\text {ft}\))
4.3 Baseline
5 Experimental setup
5.1 Parameter setting
Dataset | C | G | Kernel |
---|---|---|---|
Visual Art Works | 3 | 3 | RBF |
Archaeology | 3 | 3 | RBF |
Architecture | 32 | 8 | RBF |
Entire dataset | 1 | 3 | RBF |
5.2 Evaluation measures
-
Recall (R) \(= \frac{TP}{TP + FN }\). It measures how extensively a certain class is covered by the classifier;
-
Precision (P) \(= \frac{TP}{TP + FP }\). It measures how precise a classifier is, independently from its coverage;
-
\(F1 = 2 \times \frac{P \times R}{P+R}\).
6 Evaluation results
Dataset | System | Embeddings | Dim. | Low-quality | High-quality | Overall | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | P | R | F1 | ||||
VAW | Baseline | .505 | .446 | .474 | .515 | .574 | .543 | .510 | .510 | .508 | ||
SVM | Wikipedia | 50 | .809 | .762 | .785 | .781 | .824 | .802 | .795 | .793 | .793 | |
SVM | Wikipedia | 300 | .850 | .826 | .838 | .835 | .858 | .846 | .843 | .842 | .842 | |
SVM | in-domain | 50 | .809 | .762 | .785 | .780 | .824 | .802 | .794 | .793 | .793 | |
SVM | in-domain | 300 | .850 | .826 | .838 | .835 | .858 | .846 | .843 | .842 | .842 | |
MLR\(_\text {ft}\) | Wikipedia | 50 | .834 | .876 | .854 | .873 | .830 | .851 | .853 | .853 |
.853
| |
MLR\(_\text {ft}\) | Wikipedia | 300 | .832 | .875 | .853 | .872 | .828 | .849 | .852 | .852 | .851 | |
MLR\(_\text {ft}\) | in-domain | 50 | .834 | .860 | .847 | .859 | .834 | .846 | .847 | .847 | .847 | |
MLR\(_\text {ft}\) | in-domain | 300 | .838 | .848 | .843 | .850 | .840 | .845 | .844 | .844 | .844 | |
Ar | Baseline | .547 | .194 | .286 | .673 | .912 | .774 | .610 | .553 | .530 | ||
SVM | Wikipedia | 50 | .814 | .659 | .728 | .830 | .918 | .872 | .822 | .788 | .800 | |
SVM | Wikipedia | 300 | .850 | .752 | .798 | .872 | .927 | .899 | .861 | .839 | .848 | |
SVM | in-domain | 50 | .815 | .656 | .727 | .829 | .918 | .871 | .822 | .787 | .799 | |
SVM | in-domain | 300 | .850 | .752 | .798 | .872 | .927 | .899 | .861 | .839 | .848 | |
MLR\(_\text {ft}\) | Wikipedia | 50 | .861 | .848 | .854 | .917 | .925 | .921 | .889 | .886 |
.888
| |
MLR\(_\text {ft}\) | Wikipedia | 300 | .862 | .843 | .852 | .915 | .926 | .920 | .888 | .884 | .886 | |
MLR\(_\text {ft}\) | in-domain | 50 | .860 | .844 | .852 | .915 | .925 | .920 | .888 | .884 | .886 | |
MLR\(_\text {ft}\) | in-domain | 300 | .861 | .845 | .853 | .916 | .925 | .920 | .888 | .885 | .886 | |
A | Baseline | .530 | .288 | .373 | .671 | .850 | .750 | .600 | .569 | .562 | ||
SVM | Wikipedia | 50 | .796 | .786 | .791 | .875 | .882 | .879 | .836 | .834 | .835 | |
SVM | Wikipedia | 300 | .816 | .799 | .807 | .883 | .895 | .889 | .850 | .847 | .848 | |
SVM | in-domain | 50 | .799 | .791 | .795 | .878 | .883 | .880 | .838 | .837 | .838 | |
SVM | in-domain | 300 | .816 | .799 | .807 | .883 | .895 | .889 | .850 | .847 | .848 | |
MLR\(_\text {ft}\) | Wikipedia | 50 | .845 | .822 | .833 | .890 | .905 | .897 | .868 | .864 | .865 | |
MLR\(_\text {ft}\) | Wikipedia | 300 | .843 | .821 | .831 | .889 | .903 | .896 | .866 | .862 | .864 | |
MLR\(_\text {ft}\) | in-domain | 50 | .843 | .812 | .828 | .884 | .905 | .895 | .864 | .859 | .861 | |
MLR\(_\text {ft}\) | in-domain | 300 | .844 | .825 | .834 | .891 | .904 | .897 | .868 | .864 |
.866
| |
All | baseline | .493 | .255 | .336 | .577 | .795 | .669 | .535 | .525 | .502 | ||
SVM | Wikipedia | 50 | .755 | .609 | .674 | .734 | .845 | .786 | .744 | .727 | .730 | |
SVM | Wikipedia | 300 | .794 | .693 | .740 | .782 | .860 | .819 | .788 | .776 | .780 | |
SVM | in-domain | 50 | .757 | .609 | .675 | .735 | .847 | .787 | .746 | .728 | .731 | |
SVM | in-domain | 300 | .794 | .693 | .740 | .782 | .860 | .819 | .788 | .776 | .780 | |
MLR\(_\text {ft}\) | Wikipedia | 50 | .769 | .738 | .753 | .801 | .826 | .813 | .785 | .782 |
.783
| |
MLR\(_\text {ft}\) | Wikipedia | 300 | .767 | .740 | .753 | .801 | .824 | .812 | .784 | .782 |
.783
| |
MLR\(_\text {ft}\) | in-domain | 50 | .769 | .734 | .751 | .798 | .827 | .812 | .784 | .781 | .782 | |
MLR\(_\text {ft}\) | in-domain | 300 | .771 | .732 | .751 | .798 | .829 | .813 | .784 | .781 | .782 |
6.1 RQ1: Which machine learning algorithm should be used to assess the quality of cultural heritage descriptions approximating as much as possible human judgement?
6.2 RQ2: Can a classification model trained with descriptions in a given cultural heritage domain be effectively applied to automatically assess description quality in other domains?
Dataset | P | R | F1 | |
---|---|---|---|---|
Test | Train | |||
VAW | Ar | .653 | .645 | .640 |
VAW | A | .488 | .498 | .371 |
Ar | VAW | .644 | .654 | .617 |
Ar | A | .447 | .488 | .414 |
A | VAW | .551 | .552 | .550 |
A | Ar | .560 | .562 | .556 |
VAW | Ar+A | .610 | .609 | .609 |
Ar | VAW+A | .624 | .635 | .613 |
A | VAW+Ar | .573 | .576 | .572 |
VAW+Ar | A | .464 | .494 | .383 |
VAW+A | Ar | .637 | .633 | .627 |
A+Ar | VAW | .610 | .617 | .596 |
VAW+Ar+A | A | .661 | .556 | .495 |
VAW+Ar+A | Ar | .738 | .741 | .735 |
VAW+Ar+A | VAW | .833 | .838 | .831 |
6.3 RQ3: How many annotated resources are needed to create enough training data to automatically assess the quality of descriptions?
7 Discussions
Record ID | Description | Gold | Predicted | Error |
---|---|---|---|---|
work_48470 | Oinochoe a corpo baccellato. Applique with female protome matrix at the handle attachment. | HQ | LQ | A |
124472 | Black-figure painted attican Kylix , Siana type. | HQ | LQ | A |
10530 | Corintian Amphoriskos with zoomorphic decoration. | HQ | LQ | A |
iccd3415758 | The Saint, kneeled down looks up. on the bottom, to the left, there is a winged putto. | LQ | HQ | B |
iccd3145858 | the base lies on a parallelepiped-shaped base; [...] high volute handle. | LQ | HQ | B |
iccd3165805 | Brocade satin; checkered pattern. The compositional unit derives by [...] with flowers and leafs. | LQ | HQ | B |
iccd3908065 | Rich Oriental with mustache and half-closed mouth, head slightly oriented to [...] Figure: man | LQ | HQ | B |
iccd4413810 | The cycle includes three illustrated tondos, [...] . | LQ | HQ | C |
iccd3913506 | Wooden little angels sitting on a cloud, wrapped in a blue mantle, with wings [...] | HQ | LQ | C |
Record ID | Description | Gold | Predicted |
---|---|---|---|
work_15736 | The big polyptych commissioned by the Guidalotti family for their chapel [..] | LQ | LQ |
work_63812 | Thanks to Shearman it was verified that the painting was located in the building in via Larga where it remained [...] | LQ | LQ |
iccd3906852 | Crib statuette depicting an angel in a flying posture, dressed [...] | HQ | HQ |
iccd2307693 | [...] The man depicted has a mustache and beard and wears a wide-brimmed hat [...] | HQ | HQ |
-
Error type A: Descriptions containing Latin and/or Greek terms: misclassifications in these cases (e.g. work_48470 and work_48471 in Table 6) may be due to the fact that these words are not frequent and therefore are not represented in a meaningful way in the embedding space;
-
Error type B: Descriptions only partially compliant with the cataloguing guidelines provided by the ICCD: these descriptions are typically annotated as low quality in our gold standard, even if the description does not contain factual errors per se on the item. In our experiments, they tend to be automatically annotated as being of high quality (see for example the record iccd3908065 in Table 6);21
-
Error type C: Descriptions where the subject is implicit: in these cases the classifier is not able to properly identify the domain of the item, as there may be no reference about the typology of the cultural object (see record iccd3913506 in Table 6).