Abstract
The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user’s day-to-day activities. It captures on average 3,000 images in a typical day, equating to almost 1 million images per year. It can be used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer’s life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the domain of visual lifelogs. Our concept detector models the correspondence between low-level visual features and high-level semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept’s presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were evaluated on a subset of 95,907 images, to determine the accuracy for detection of each semantic concept. We conducted further analysis on the temporal consistency, co-occurance and relationships within the detected concepts to more extensively investigate the robustness of the detectors within this domain.
Similar content being viewed by others
References
Bell G, Gemmell J (2007) A digital life. Scientific American, New York
Bovik A, Clark M, Geisler W (1990) Multichannel texture analysis using localized spatial filters. IEEE Trans Pattern Anal Mach Intell 12(1):55–73
Byrne D, Doherty AR, Snoek CG, Jones GG, Smeaton AF (2008) Validating the detection of everyday concepts in visual lifelogs. In: SAMT ’08: proceedings of the 3rd international conference on semantic and digital media technologies. Springer, Berlin, pp 15–30
Byrne D, Lavelle B, Doherty AR, Jones GJF, Smeaton AF (2007) Using bluetooth and GPS metadata to measure event similarity in sensecam images. In: IMAI’07 - 5th international conference on intelligent multimedia and ambient intelligence, Salt Lake City, pp 1454–1460
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chang SF, He J, Jiang YG, Khoury EE, Ngo CW, Yanagawa A, Zavesky E (2008) Columbia University/VIREO-CityU/IRIT TRECVid2008 High-Level feature extraction and interactive video search. In: Proceedings of TRECVid workshop, Gaithersburg, 2008
DeVaul R (2001) Real-time motion classification for wearable computing applications. Tech. rep., Massachusetts Institute of Technology, MIT, Cambridge
Doherty A, Smeaton AF (2008) Combining face detection and novelty to identify important events in a visual lifelog. In: CIT 2008—IEEE international conference on computer and information technology, workshop on image- and video-based pattern analysis and applications, Sydney
Doherty AR, Byrne D, Smeaton AF, Jones GJF, Hughes M (2008) Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs. In: CIVR ’08: proceedings of the 2008 international conference on content-based image and video retrieval, Niagara Falls, Canada. ACM, New York, pp 259–268
Doherty AR, Smeaton AF (2008) Automatically segmenting lifelog data into events. In: WIAMIS ’08: proceedings of the 2008 ninth international workshop on image analysis for multimedia interactive services, Klagenfurt, Germany. IEEE Computer Society, Washington, DC, pp 20–23
Fleiss J (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382
Fuller M, Kelly L, Jones GJF (2008) Applying contextual memory cues for retrieval from personal information archives. In: PIM 2008 - proceedings of personal information management, workshop at CHI 2008
Geusebroek JM (2006) Compact object descriptors from local colour invariant histograms. In: British machine vision conference, vol 3, pp 1029–1038
Geusebroek JM, Smeulders AWM (2005) A six-stimulus theory for stochastic texture. Int J Comput Vis 62:7–16
Gurrin C, Smeaton AF, Byrne D, O’Hare N, Jones GJF, O’Connor NE (2008) An examination of a large visual lifelog. In: AIRS 2008—Asia information retrieval symposium, Harbin
Hauptmann A, Yan R, Lin WH (2007) How many high-level concepts will fill the semantic gap in news video retrieval? In: CIVR ’07: proceedings of the 6th ACM international conference on image and video retrieval. ACM, New York, pp 627–634
Hoang MA, Geusebroek JM, Smeulders AWM (2005) Color texture measurement and segmentation. Signal Process 85(2):265–275
Hodges S, Williams L, Berry E, Izadi S, Srinivasan J, Butler A, Smyth G, Kapur N, Wood K (2006) SenseCam: a retrospective memory aid. In: UbiComp - 8th international conference on ubiquitous computing, Calif., USA
Jiang YG, Ngo CW, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: CIVR ’07: proceedings of the 6th ACM international conference on image and video retrieval. ACM, New York, NY, USA, pp 494–501
Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: Computer vision, 2005. ICCV 2005. Tenth IEEE international conference on 1, 604–610, vol 1
Kapur J, Sahoo P, Wong A (1985) A new method for gray-level picture thresholding using the entropy of the histogram. Comput Vis Graph Image Process 29(3):273–285
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Lee H, Smeaton AF, O’Connor NE, Jones GJ (2006) Adaptive visual summary of lifelog photos for personal information management. In: AIR Workshop—1st international workshop on adaptive information retrieval, Glasgow, pp 22–23
Lin HT, Lin CJ, Weng R (2007) A note on Platt’s probabilistic outputs for support vector machines. Mach Learn 68(3):267–276
Naphade H, Huang T (2001) A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Trans Multimedia 3(1):141–151
Naphade MR, Kennedy L, Kender JR, Chang SF, Smith JR, Over P, Hauptmann A (2005) A light scale concept ontology for multimedia understanding for TRECVid 2005. Tech. rep., In IBM Research Technical Report
Natsev A, Jiangy W, Merlery M, Smith JR, Tesic J, Xie L, Yan R (2008) IBM research TRECVid-2008 video retrieval system. In: Proceedings of TRECVid workshop, 2008, Gaithersburg
O’Hare N, Lee H, Cooray S, Gurrin C, Jones GJF, Malobabic J, O’Connor NE, Smeaton AF, Uscilowski B (2006) MediAssist: using content-based analysis and context to manage personal photo collections. In: CIVR2006 - 5th international conference on image and video retrieval. Springer, Tempe, pp 529–532
Smeaton A, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proceedings of the 8th ACM international workshop on multimedia information retrieval. ACM, New York, pp 321–330
Snoek CGM, Everts I, van Gemert JC, Geusebroek JM, Huurnink B, Koelma DC, van Liempt M, de Rooij O, van de Sande KEA, Smeulders AWM, Uijlings JRR, Worring M (2007) The MediaMill TRECVid 2007 semantic video search engine. In: Proceedings of TRECVid workshop, Gaithersburg, 2007
Snoek CGM, van Gemert JC, Gevers T, Huurnink B, Koelma DC, van Liempt M, de Rooij O, van de Sande KEA, Seinstra FJ, Smeulders AWM, Thean AHC, Veenman CJ, Worring M (2006) The MediaMill TRECVID 2006 semantic video search engine. In: Proceedings of the TRECVID workshop, Gaithersburg
Snoek CGM, Worring M, van Gemert JC, Geusebroek JM, Smeulders AWM (2006) The challenge problem for automated detection of 101 semantic concepts in multimedia. In: MULTIMEDIA ’06: proceedings of the 14th annual ACM international conference on multimedia, Santa Barbara, CA, USA. ACM, New York, pp 421–430
van Gemert JC, Snoek CGM, Veenman CJ, Smeulders AWM, Geusebroek JM (2009) Comparing compact codebooks for visual categorization. Comput Vis Image Underst. doi:10.1016/j.cviu.2009.08.004
Vapnik VN (2000) The nature of statistical learning theory, 2nd edn. Springer, New York
Wang D, Liu X, Luo L, Li J, Zhang B (2007) Video diver: generic video indexing with diverse features. In: MIR ’07: proceedings of the 9th ACM international workshop on workshop on multimedia information retrieval, Augsburg, Germany. ACM, New York, pp 61–70
Yanagawa A, Chang SF, Kennedy L, Hsu W (2007) Columbia University’s baseline detectors for 374 LSCOM semantic visual concepts. Tech. rep., Columbia University
Yang J, Hauptmann AG (2006) Exploring temporal consistency for video analysis and retrieval. In: MIR ’06: proceedings of the 8th ACM international workshop on multimedia information retrieval, Santa Barbara, pp 33–42
Acknowledgements
We are grateful to the AceMedia project and Microsoft Research for support. This work is supported by the Irish Research Council for Science Engineering and Technology, by Science Foundation Ireland under grant 07/CE/I1147 and by the EU IST-CHORUS project. We would also like to extend our thanks to the participants who made their personal lifelog collection available for these experiments, and who partook in the annotation effort.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Byrne, D., Doherty, A.R., Snoek, C.G.M. et al. Everyday concept detection in visual lifelogs: validation, relationships and trends. Multimed Tools Appl 49, 119–144 (2010). https://doi.org/10.1007/s11042-009-0403-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-009-0403-8