Skip to main content

2022 | OriginalPaper | Buchkapitel

Multimodal Embedding for Lifelog Retrieval

verfasst von : Liting Zhou, Cathal Gurrin

Erschienen in: MultiMedia Modeling

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nowadays, research on lifelog retrieval is attracting increasing attention with a focus on applying machine learning, especially for data annotation/enrichment which is necessary to facilitate effective retrieval. In this paper, we propose two annotation approaches that apply state-of-the-art text/visual and joint embedding technologies for lifelog query-text retrieval tasks. Both approaches are evaluated on the commonly used NTCIR13-lifelog dataset and the results demonstrate embedding techniques show improved retrieval accuracy over conventional text matching methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
2.
3.
Zurück zum Zitat Cornia, M., et al.: Meshed-memory transformer for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) Cornia, M., et al.: Meshed-memory transformer for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
4.
Zurück zum Zitat Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805 (2018)
5.
7.
Zurück zum Zitat Gong, Y., et al.: A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106(2), 210–233 (2014)CrossRef Gong, Y., et al.: A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106(2), 210–233 (2014)CrossRef
8.
Zurück zum Zitat Gurrin, C., Smeaton, A.F., Doherty, A.R.: Lifelogging: personal big data. Found. Trends Inf. Retrieval 8(1), 1–125 (2014)CrossRef Gurrin, C., Smeaton, A.F., Doherty, A.R.: Lifelogging: personal big data. Found. Trends Inf. Retrieval 8(1), 1–125 (2014)CrossRef
9.
Zurück zum Zitat Gurrin, C., et al.: Introduction to the Fourth Annual Lifelog Search Challenge, In: Proceedings of the 2021 International Conference on Multimedia Retrieval, LSC 2021 (2021) Gurrin, C., et al.: Introduction to the Fourth Annual Lifelog Search Challenge, In: Proceedings of the 2021 International Conference on Multimedia Retrieval, LSC 2021 (2021)
10.
Zurück zum Zitat Gurrin, C., et al.: Overview of NTCIR-13 Lifelog-2 task (2017) Gurrin, C., et al.: Overview of NTCIR-13 Lifelog-2 task (2017)
11.
Zurück zum Zitat He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
12.
Zurück zum Zitat Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a ranking task: data, models and evaluation metrics. J. Artif. Intell. Res. 47, 853–899 (2013)MathSciNetCrossRef Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a ranking task: data, models and evaluation metrics. J. Artif. Intell. Res. 47, 853–899 (2013)MathSciNetCrossRef
13.
Zurück zum Zitat Karpathy, A., Li, F.F.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) Karpathy, A., Li, F.F.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
14.
Zurück zum Zitat Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014) Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:​1411.​2539 (2014)
15.
Zurück zum Zitat Kuznetsova, P., et al.: Treetalk: composition and compression of trees for image descriptions. Trans. Assoc. Comput. Linguist. 2, 351–362 (2014)CrossRef Kuznetsova, P., et al.: Treetalk: composition and compression of trees for image descriptions. Trans. Assoc. Comput. Linguist. 2, 351–362 (2014)CrossRef
17.
Zurück zum Zitat Lee, H.: Unsupervised feature learning via sparse hierarchical representations. Stanford University (2010) Lee, H.: Unsupervised feature learning via sparse hierarchical representations. Stanford University (2010)
18.
Zurück zum Zitat Mason, R., Eugene, C.: Nonparametric method for data-driven image captioning. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (2014) Mason, R., Eugene, C.: Nonparametric method for data-driven image captioning. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (2014)
19.
Zurück zum Zitat Ordonez, V., Kulkarni, G., Berg, T.: Im2text: describing images using 1 million captioned photographs. Adv. Neural Inf. Process. Syst. 24, 1143–1151 (2011) Ordonez, V., Kulkarni, G., Berg, T.: Im2text: describing images using 1 million captioned photographs. Adv. Neural Inf. Process. Syst. 24, 1143–1151 (2011)
20.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014) Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)
21.
Zurück zum Zitat Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M.: Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8150, pp. 246–253. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40763-5_31CrossRef Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M.: Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8150, pp. 246–253. Springer, Heidelberg (2013). https://​doi.​org/​10.​1007/​978-3-642-40763-5_​31CrossRef
22.
Zurück zum Zitat Radford, A., et al.: Improving language understanding by generative pre-training (2018) Radford, A., et al.: Improving language understanding by generative pre-training (2018)
23.
Zurück zum Zitat Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process Syst. 28, 91–99 (2015) Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process Syst. 28, 91–99 (2015)
24.
25.
Zurück zum Zitat Yang, S.: Feature engineering in fine-grained image classification. Diss. (2013) Yang, S.: Feature engineering in fine-grained image classification. Diss. (2013)
26.
Zurück zum Zitat Zhou, L., et al.: Lifer: an interactive lifelog retrieval system. In: Proceedings of the 2018 ACM Workshop on The Lifelog Search Challenge (2018) Zhou, L., et al.: Lifer: an interactive lifelog retrieval system. In: Proceedings of the 2018 ACM Workshop on The Lifelog Search Challenge (2018)
Metadaten
Titel
Multimodal Embedding for Lifelog Retrieval
verfasst von
Liting Zhou
Cathal Gurrin
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-98358-1_33

Premium Partner