Top

Published in:

2022 | OriginalPaper | Chapter

Multimodal Embedding for Lifelog Retrieval

Authors : Liting Zhou, Cathal Gurrin

Published in: MultiMedia Modeling

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Nowadays, research on lifelog retrieval is attracting increasing attention with a focus on applying machine learning, especially for data annotation/enrichment which is necessary to facilitate effective retrieval. In this paper, we propose two annotation approaches that apply state-of-the-art text/visual and joint embedding technologies for lifelog query-text retrieval tasks. Both approaches are evaluated on the commonly used NTCIR13-lifelog dataset and the results demonstrate embedding techniques show improved retrieval accuracy over conventional text matching methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Prediction of Blood Glucose Using Contextual LifeLog Data

next chapter A Multiple Positives Enhanced NCE Loss for Image-Text Retrieval

Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

Church, K.W.: Word2Vec. Nat. Lang. Eng. 23(1), 155–162 (2017)CrossRef

Cornia, M., et al.: Meshed-memory transformer for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

Faghri, F., et al.: Vse++: Improving visual-semantic embeddings with hard negatives. arXiv preprint arXiv:1707.05612 (2017)

Farhadi, A., et al.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_2CrossRef

Gong, Y., et al.: A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106(2), 210–233 (2014)CrossRef

Gurrin, C., Smeaton, A.F., Doherty, A.R.: Lifelogging: personal big data. Found. Trends Inf. Retrieval 8(1), 1–125 (2014)CrossRef

Gurrin, C., et al.: Introduction to the Fourth Annual Lifelog Search Challenge, In: Proceedings of the 2021 International Conference on Multimedia Retrieval, LSC 2021 (2021)

10.

Gurrin, C., et al.: Overview of NTCIR-13 Lifelog-2 task (2017)

11.

He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

12.

Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a ranking task: data, models and evaluation metrics. J. Artif. Intell. Res. 47, 853–899 (2013)MathSciNetCrossRef

13.

Karpathy, A., Li, F.F.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

14.

Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014)

15.

Kuznetsova, P., et al.: Treetalk: composition and compression of trees for image descriptions. Trans. Assoc. Comput. Linguist. 2, 351–362 (2014)CrossRef

16.

Lample, G., Alexis, C.: Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291 (2019)

17.

Lee, H.: Unsupervised feature learning via sparse hierarchical representations. Stanford University (2010)

18.

Mason, R., Eugene, C.: Nonparametric method for data-driven image captioning. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (2014)

19.

Ordonez, V., Kulkarni, G., Berg, T.: Im2text: describing images using 1 million captioned photographs. Adv. Neural Inf. Process. Syst. 24, 1143–1151 (2011)

20.

Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)

21.

Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M.: Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8150, pp. 246–253. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40763-5_31CrossRef

22.

Radford, A., et al.: Improving language understanding by generative pre-training (2018)

23.

Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process Syst. 28, 91–99 (2015)

24.

Reimers, N., Iryna, G.: Sentence-bert: sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019)

25.

Yang, S.: Feature engineering in fine-grained image classification. Diss. (2013)

26.

Zhou, L., et al.: Lifer: an interactive lifelog retrieval system. In: Proceedings of the 2018 ACM Workshop on The Lifelog Search Challenge (2018)

Title: Multimodal Embedding for Lifelog Retrieval
Authors: Liting Zhou
Cathal Gurrin
Publisher: Springer International Publishing
Book: MultiMedia Modeling
Print ISBN: 978-3-030-98357-4

Electronic ISBN: 978-3-030-98358-1

Copyright Year: 2022
DOI: https://doi.org/10.1007/978-3-030-98358-1_33

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner