Skip to main content
Top

2022 | OriginalPaper | Chapter

Multimodal Embedding for Lifelog Retrieval

Authors : Liting Zhou, Cathal Gurrin

Published in: MultiMedia Modeling

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Nowadays, research on lifelog retrieval is attracting increasing attention with a focus on applying machine learning, especially for data annotation/enrichment which is necessary to facilitate effective retrieval. In this paper, we propose two annotation approaches that apply state-of-the-art text/visual and joint embedding technologies for lifelog query-text retrieval tasks. Both approaches are evaluated on the commonly used NTCIR13-lifelog dataset and the results demonstrate embedding techniques show improved retrieval accuracy over conventional text matching methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
3.
go back to reference Cornia, M., et al.: Meshed-memory transformer for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) Cornia, M., et al.: Meshed-memory transformer for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
4.
go back to reference Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805 (2018)
7.
go back to reference Gong, Y., et al.: A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106(2), 210–233 (2014)CrossRef Gong, Y., et al.: A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106(2), 210–233 (2014)CrossRef
8.
go back to reference Gurrin, C., Smeaton, A.F., Doherty, A.R.: Lifelogging: personal big data. Found. Trends Inf. Retrieval 8(1), 1–125 (2014)CrossRef Gurrin, C., Smeaton, A.F., Doherty, A.R.: Lifelogging: personal big data. Found. Trends Inf. Retrieval 8(1), 1–125 (2014)CrossRef
9.
go back to reference Gurrin, C., et al.: Introduction to the Fourth Annual Lifelog Search Challenge, In: Proceedings of the 2021 International Conference on Multimedia Retrieval, LSC 2021 (2021) Gurrin, C., et al.: Introduction to the Fourth Annual Lifelog Search Challenge, In: Proceedings of the 2021 International Conference on Multimedia Retrieval, LSC 2021 (2021)
10.
go back to reference Gurrin, C., et al.: Overview of NTCIR-13 Lifelog-2 task (2017) Gurrin, C., et al.: Overview of NTCIR-13 Lifelog-2 task (2017)
11.
go back to reference He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
12.
go back to reference Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a ranking task: data, models and evaluation metrics. J. Artif. Intell. Res. 47, 853–899 (2013)MathSciNetCrossRef Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a ranking task: data, models and evaluation metrics. J. Artif. Intell. Res. 47, 853–899 (2013)MathSciNetCrossRef
13.
go back to reference Karpathy, A., Li, F.F.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) Karpathy, A., Li, F.F.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
14.
go back to reference Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014) Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:​1411.​2539 (2014)
15.
go back to reference Kuznetsova, P., et al.: Treetalk: composition and compression of trees for image descriptions. Trans. Assoc. Comput. Linguist. 2, 351–362 (2014)CrossRef Kuznetsova, P., et al.: Treetalk: composition and compression of trees for image descriptions. Trans. Assoc. Comput. Linguist. 2, 351–362 (2014)CrossRef
17.
go back to reference Lee, H.: Unsupervised feature learning via sparse hierarchical representations. Stanford University (2010) Lee, H.: Unsupervised feature learning via sparse hierarchical representations. Stanford University (2010)
18.
go back to reference Mason, R., Eugene, C.: Nonparametric method for data-driven image captioning. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (2014) Mason, R., Eugene, C.: Nonparametric method for data-driven image captioning. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (2014)
19.
go back to reference Ordonez, V., Kulkarni, G., Berg, T.: Im2text: describing images using 1 million captioned photographs. Adv. Neural Inf. Process. Syst. 24, 1143–1151 (2011) Ordonez, V., Kulkarni, G., Berg, T.: Im2text: describing images using 1 million captioned photographs. Adv. Neural Inf. Process. Syst. 24, 1143–1151 (2011)
20.
go back to reference Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014) Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)
21.
go back to reference Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M.: Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8150, pp. 246–253. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40763-5_31CrossRef Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M.: Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8150, pp. 246–253. Springer, Heidelberg (2013). https://​doi.​org/​10.​1007/​978-3-642-40763-5_​31CrossRef
22.
go back to reference Radford, A., et al.: Improving language understanding by generative pre-training (2018) Radford, A., et al.: Improving language understanding by generative pre-training (2018)
23.
go back to reference Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process Syst. 28, 91–99 (2015) Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process Syst. 28, 91–99 (2015)
24.
25.
go back to reference Yang, S.: Feature engineering in fine-grained image classification. Diss. (2013) Yang, S.: Feature engineering in fine-grained image classification. Diss. (2013)
26.
go back to reference Zhou, L., et al.: Lifer: an interactive lifelog retrieval system. In: Proceedings of the 2018 ACM Workshop on The Lifelog Search Challenge (2018) Zhou, L., et al.: Lifer: an interactive lifelog retrieval system. In: Proceedings of the 2018 ACM Workshop on The Lifelog Search Challenge (2018)
Metadata
Title
Multimodal Embedding for Lifelog Retrieval
Authors
Liting Zhou
Cathal Gurrin
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-030-98358-1_33

Premium Partner