Skip to main content

2021 | OriginalPaper | Buchkapitel

Semantic Analysis of Cultural Heritage Data: Aligning Paintings and Descriptions in Art-Historic Collections

verfasst von : Nitisha Jain, Christian Bartz, Tobias Bredow, Emanuel Metzenthin, Jona Otholt, Ralf Krestel

Erschienen in: Pattern Recognition. ICPR International Workshops and Challenges

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Art-historic documents often contain multimodal data in terms of images of artworks and metadata, descriptions, or interpretations thereof. Most research efforts have focused either on image analysis or text analysis independently since the associations between the two modes are usually lost during digitization. In this work, we focus on the task of alignment of images and textual descriptions in art-historic digital collections. To this end, we reproduce an existing approach that learns alignments in a semi-supervised fashion. We identify several challenges while automatically aligning images and texts, specifically for the cultural heritage domain, which limit the scalability of previous works. To improve the performance of alignment, we introduce various enhancements to extend the existing approach that show promising results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bartz, C., Jain, N., Krestel, R.: Automatic matching of paintings and descriptions in art-historic archives using multimodal analysis. In: Proceedings of the International Workshop on Artificial Intelligence for Historical Image Enrichment and Access (AI4HI), pp. 23–28 (2020) Bartz, C., Jain, N., Krestel, R.: Automatic matching of paintings and descriptions in art-historic archives using multimodal analysis. In: Proceedings of the International Workshop on Artificial Intelligence for Historical Image Enrichment and Access (AI4HI), pp. 23–28 (2020)
2.
Zurück zum Zitat Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)CrossRef Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)CrossRef
3.
Zurück zum Zitat Bradski, G., Kaehler, A.D., Opencv, D.: Dobb’s journal of software tools. OpenCV Libr 25, 120 (2000) Bradski, G., Kaehler, A.D., Opencv, D.: Dobb’s journal of software tools. OpenCV Libr 25, 120 (2000)
4.
Zurück zum Zitat Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, (EMNLP), pp. 1724–1734 (2014) Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, (EMNLP), pp. 1724–1734 (2014)
5.
Zurück zum Zitat Cornia, M., Stefanini, M., Baraldi, L., Corsini, M., Cucchiara, R.: Explaining digital humanities by aligning images and textual descriptions. Pattern Recogn. Lett. 129, 166–172 (2020)CrossRef Cornia, M., Stefanini, M., Baraldi, L., Corsini, M., Cucchiara, R.: Explaining digital humanities by aligning images and textual descriptions. Pattern Recogn. Lett. 129, 166–172 (2020)CrossRef
6.
Zurück zum Zitat de Boer, V., Wielemaker, J., van Gent, J., Hildebrand, M., Isaac, A., van Ossenbruggen, J., Schreiber, G.: Supporting linked data production for cultural heritage institutes: the Amsterdam museum case study. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 733–747. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_56CrossRef de Boer, V., Wielemaker, J., van Gent, J., Hildebrand, M., Isaac, A., van Ossenbruggen, J., Schreiber, G.: Supporting linked data production for cultural heritage institutes: the Amsterdam museum case study. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 733–747. Springer, Heidelberg (2012). https://​doi.​org/​10.​1007/​978-3-642-30284-8_​56CrossRef
7.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009) Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)
8.
Zurück zum Zitat Dijkshoorn, C., Jongma, L., Aroyo, L., Van Ossenbruggen, J., Schreiber, G., ter Weele, W., Wielemaker, J.: The rijksmuseum collection as linked data. Semantic Web 9(2), 221–230 (2018)CrossRef Dijkshoorn, C., Jongma, L., Aroyo, L., Van Ossenbruggen, J., Schreiber, G., ter Weele, W., Wielemaker, J.: The rijksmuseum collection as linked data. Semantic Web 9(2), 221–230 (2018)CrossRef
9.
Zurück zum Zitat Elgammal, A., Liu, B., Kim, D., Elhoseiny, M., Mazzone, M.: The shape of art history in the eyes of the machine. In: Proceedings of the Conference on Artificial Intelligence (AAAI) (2018) Elgammal, A., Liu, B., Kim, D., Elhoseiny, M., Mazzone, M.: The shape of art history in the eyes of the machine. In: Proceedings of the Conference on Artificial Intelligence (AAAI) (2018)
10.
Zurück zum Zitat Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998) Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
11.
Zurück zum Zitat Garcia, N., Renoust, B., Nakashima, Y.: Context-aware embeddings for automatic art analysis. In: Proceedings of the International Conference on Multimedia Retrieval (ICMR), pp. 25–33. ICMR ’19, Ottawa ON, Canada, June 2019 Garcia, N., Renoust, B., Nakashima, Y.: Context-aware embeddings for automatic art analysis. In: Proceedings of the International Conference on Multimedia Retrieval (ICMR), pp. 25–33. ICMR ’19, Ottawa ON, Canada, June 2019
12.
Zurück zum Zitat Garcia, N., Renoust, B., Nakashima, Y.: Understanding art through multi-modal retrieval in paintings. arXiv:1904.10615 [cs], April 2019 Garcia, N., Renoust, B., Nakashima, Y.: Understanding art through multi-modal retrieval in paintings. arXiv:​1904.​10615 [cs], April 2019
14.
Zurück zum Zitat Garcia, N., Vogiatzis, G.: How to read paintings: semantic art understanding with multi-modal retrieval. In: Proceedings of the ECCV Workshops (Workshop on Computer Vision for Art Analysis), pp. 676–691 (2018) Garcia, N., Vogiatzis, G.: How to read paintings: semantic art understanding with multi-modal retrieval. In: Proceedings of the ECCV Workshops (Workshop on Computer Vision for Art Analysis), pp. 676–691 (2018)
16.
Zurück zum Zitat Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: Proceedings of the International Conference on Learning Representations, September 2018 Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: Proceedings of the International Conference on Learning Representations, September 2018
17.
Zurück zum Zitat Harris, M., Levene, M., Zhang, D., Levene, D.: Finding parallel passages in cultural heritage archives. J. Comput. Cultural Heritage 11(3), 1–24 (2018)CrossRef Harris, M., Levene, M., Zhang, D., Levene, D.: Finding parallel passages in cultural heritage archives. J. Comput. Cultural Heritage 11(3), 1–24 (2018)CrossRef
18.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
19.
Zurück zum Zitat Hoffer, E., Hubara, I., Soudry, D.: Train longer, generalize better: closing the generalization gap in large batch training of neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1731–1741 (2017) Hoffer, E., Hubara, I., Soudry, D.: Train longer, generalize better: closing the generalization gap in large batch training of neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1731–1741 (2017)
20.
Zurück zum Zitat Huang, X., Zhong, S.h., Xiao, Z.: Fine-art painting classification via two-channel deep residual network. In: Advances in Multimedia Information Processing (PCM), pp. 79–88 (2018) Huang, X., Zhong, S.h., Xiao, Z.: Fine-art painting classification via two-channel deep residual network. In: Advances in Multimedia Information Processing (PCM), pp. 79–88 (2018)
21.
Zurück zum Zitat Huang, Y., Wang, L.: ACMM: Aligned cross-modal memory for few-shot image and sentence matching. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 5774–5783 (2019) Huang, Y., Wang, L.: ACMM: Aligned cross-modal memory for few-shot image and sentence matching. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 5774–5783 (2019)
22.
Zurück zum Zitat Hyvönen, E., Rantala, H.: Knowledge-based relation discovery in cultural heritage knowledge graphs. In: Proceedings of the Digital Humanities in the Nordic Countries Conference (DHN), pp. 230–239 (2019) Hyvönen, E., Rantala, H.: Knowledge-based relation discovery in cultural heritage knowledge graphs. In: Proceedings of the Digital Humanities in the Nordic Countries Conference (DHN), pp. 230–239 (2019)
24.
Zurück zum Zitat Jing, Y., Yang, Y., Feng, Z., Ye, J., Yu, Y., Song, M.: Neural style transfer: a review. Trans. Vis. Comput. Graph. 26(11), 3365–3385 (2019)CrossRef Jing, Y., Yang, Y., Feng, Z., Ye, J., Yu, Y., Song, M.: Neural style transfer: a review. Trans. Vis. Comput. Graph. 26(11), 3365–3385 (2019)CrossRef
25.
Zurück zum Zitat Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Represenations (ICLR), San Diego (2015) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Represenations (ICLR), San Diego (2015)
26.
Zurück zum Zitat Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. arXiv:1411.2539 [cs] (2014) Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. arXiv:​1411.​2539 [cs] (2014)
27.
Zurück zum Zitat Lee, C.Y., Batra, T., Baig, M.H., Ulbricht, D.: Sliced Wasserstein discrepancy for unsupervised domain adaptation. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10285–10295 (2019) Lee, C.Y., Batra, T., Baig, M.H., Ulbricht, D.: Sliced Wasserstein discrepancy for unsupervised domain adaptation. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10285–10295 (2019)
29.
Zurück zum Zitat Liu, Y., Guo, Y., Liu, L., Bakker, E.M., Lew, M.S.: CycleMatch: a cycle-consistent embedding network for image-text matching. Pattern Recogn. 93, 365–379 (2019)CrossRef Liu, Y., Guo, Y., Liu, L., Bakker, E.M., Lew, M.S.: CycleMatch: a cycle-consistent embedding network for image-text matching. Pattern Recogn. 93, 365–379 (2019)CrossRef
30.
Zurück zum Zitat Miller, G.A.: WordNet: An electronic lexical database. MIT press (1998) Miller, G.A.: WordNet: An electronic lexical database. MIT press (1998)
31.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
32.
Zurück zum Zitat Segers, R., et al.: Hacking History via Event Extraction. In: Proceedings of the International Conference on Knowledge Capture (K-CAP), pp. 161–162 (2011) Segers, R., et al.: Hacking History via Event Extraction. In: Proceedings of the International Conference on Knowledge Capture (K-CAP), pp. 161–162 (2011)
33.
Zurück zum Zitat Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 629–633 (2007) Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 629–633 (2007)
34.
Zurück zum Zitat Stefanini, M., Cornia, M., Baraldi, L., Corsini, M., Cucchiara, R.: Artpedia: a new visual-semantic dataset with visual and contextual sentences in the artistic domain. In: Image Analysis and Processing (ICIAP), pp. 729–740 (2019) Stefanini, M., Cornia, M., Baraldi, L., Corsini, M., Cucchiara, R.: Artpedia: a new visual-semantic dataset with visual and contextual sentences in the artistic domain. In: Image Analysis and Processing (ICIAP), pp. 729–740 (2019)
35.
Zurück zum Zitat Thomas, C., Kovashka, A.: Artistic object recognition by unsupervised style adaptation. In: Proceedings of the Asian Conference on Computer Vision (ACCV), pp. 460–476 (2019) Thomas, C., Kovashka, A.: Artistic object recognition by unsupervised style adaptation. In: Proceedings of the Asian Conference on Computer Vision (ACCV), pp. 460–476 (2019)
36.
Zurück zum Zitat Van Hooland, S., Verborgh, R.: Linked Data for Libraries, Archives and Museums: How to Clean. Link and Publish your Metadata, Facet Publishing (2014) Van Hooland, S., Verborgh, R.: Linked Data for Libraries, Archives and Museums: How to Clean. Link and Publish your Metadata, Facet Publishing (2014)
37.
Zurück zum Zitat Yang, S., Oh, B.M., Merchant, D., Howe, B., West, J.: Classifying digitized art type and time period. In: Proceedings of the Workshop on Data Science for Digital Art History (DSDAH) (2018) Yang, S., Oh, B.M., Merchant, D., Howe, B., West, J.: Classifying digitized art type and time period. In: Proceedings of the Workshop on Data Science for Digital Art History (DSDAH) (2018)
Metadaten
Titel
Semantic Analysis of Cultural Heritage Data: Aligning Paintings and Descriptions in Art-Historic Collections
verfasst von
Nitisha Jain
Christian Bartz
Tobias Bredow
Emanuel Metzenthin
Jona Otholt
Ralf Krestel
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-68796-0_37