Skip to main content

Advertisement

Log in

An innovative web-based collaborative platform for video annotation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Large scale labeled datasets are of key importance for the development of automatic video analysis tools as they, from one hand, allow multi-class classifiers training and, from the other hand, support the algorithms’ evaluation phase. This is widely recognized by the multimedia and computer vision communities, as witnessed by the growing number of available datasets; however, the research still lacks in annotation tools able to meet user needs, since a lot of human concentration is necessary to generate high quality ground truth data. Nevertheless, it is not feasible to collect large video ground truths, covering as much scenarios and object categories as possible, by exploiting only the effort of isolated research groups. In this paper we present a collaborative web-based platform for video ground truth annotation. It features an easy and intuitive user interface that allows plain video annotation and instant sharing/integration of the generated ground truths, in order to not only alleviate a large part of the effort and time needed, but also to increase the quality of the generated annotations. The tool has been on-line in the last four months and, at the current date, we have collected about 70,000 annotations. A comparative performance evaluation has also shown that our system outperforms existing state of the art methods in terms of annotation time, annotation quality and system’s usability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. www.fish4knowledge.eu

  2. http://f4k.ing.unict.it/perla.dev

  3. www.fish4knowledge.eu

References

  1. Ahn LV (2006) Games with a purpose. Computer 39(6):92–94

    Article  Google Scholar 

  2. Ambardekar A, Nicolescu M, Dascalu S (2009) Ground truth verification tool (GTVT) for video surveillance systems. In: Proceedings of the 2009 second international conferences on advances in computer-human interactions, ser. ACHI ’09, pp 354–359

  3. Barbour B, Ricanek Jr K (2012) An interactive tool for extremely dense landmarking of faces. In: Proceedings of the 1st international workshop on visual interfaces for ground truth collection in computer vision applications, ser. VIGTA ’12. ACM, New York, pp 13:1–13:5

  4. Barnich O, Van Droogenbroeck M (2011) ViBe: a universal background subtraction algorithm for video sequences. IEEE Trans Image Process 20(6):1709–1724 [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/21189241

    Article  MathSciNet  Google Scholar 

  5. Bassel GW, Glaab E, Marquez J, Holdsworth MJ, Bacardit J (2011) Functional network construction in Arabidopsis using rule-based machine learning on large-scale data sets. Plant Cell 23(9):3101–3116

    Article  Google Scholar 

  6. Bertini M, Del Bimbo A, Torniai C (2005) Automatic video annotation using ontologies extended with visual information. In: Proceedings of the 13th annual ACM international conference on multimedia, ser. MULTIMEDIA ’05, pp 395–398

  7. Biewald L (2012) Massive multiplayer human computation for fun, money, and survival. In: Proceedings of the 11th international conference on current trends in web engineering, ser. ICWE’11, pp 171–176

  8. Blake A, Isard M (1996) The condensation algorithm—conditional density propagation and applications to visual tracking. In: Advances in neural information processing systems. MIT Press, pp 655–668

    Google Scholar 

  9. Brabham D (2008) Crowdsourcing as a model for problem solving an introduction and cases. Convergence 14(1):75–90

    Google Scholar 

  10. Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698

    Article  Google Scholar 

  11. Chin JP, Diehl VA, Norman KL (1988) Development of an instrument measuring user satisfaction of the human-computer interface. In: Proceedings of the SIGCHI conference on human factors in computing systems, ser. CHI ’88. ACM, New York, pp 213–218

    Chapter  Google Scholar 

  12. Doerman D, Mihalcik D (2000) Tools and techniques for video performance evaluation. In: Proceedings of 15th international conference on pattern recognition, vol 4, pp 167–170

  13. Faro A, Giordano D, Spampinato C (2011) Adaptive background modeling integrated with luminosity sensors and occlusion processing for reliable vehicle detection. IEEE Trans Intell Transp Syst 12:1398–1412

    Article  Google Scholar 

  14. Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70

    Article  Google Scholar 

  15. Fisher R (2004) CAVIAR test case scenarios. Online Book

  16. Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory. Springer, pp 23–37

  17. Giordano D, Kavasidis I, Pino C, Spampinato C (2011) A semantic-based and adaptive architecture for automatic multimedia retrieval composition. In: 2011 9th international workshop on content-based multimedia indexing (CBMI), pp 181–186

  18. Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. California Institute of Technology, Tech. Rep. 7694

  19. Heroux P, Barbu E, Adam S, Trupin E (2007) Automatic ground-truth generation for document image analysis and understanding. In: Proceedings of the ninth international conference on document analysis and recognition, ser. ICDAR ’07, vol 01, pp 476–480

  20. Howe J (2006) The rise of crowdsourcing. Wired Magazine 14(6):1–4

    MathSciNet  Google Scholar 

  21. Jaynes C, Webb S, Steele R, Xiong Q (2002) An open development environment for evaluation of video surveillance systems. In: PETS02, pp 32–39

  22. Kass M, Witkin A, Terzopoulos D (1988) Snakes: active contour models. Int J Comput Vis 1: 321–331

    Article  Google Scholar 

  23. Kavasidis I, Palazzo S, Di Salvo R, Giordano D, Spampinato C (2012) A semi-automatic tool for detection and tracking ground truth generation in videos. In: VIGTA ’12: proceedings of the 1st international workshop on visual interfaces for ground truth collection in computer vision applications. ACM, New York, pp 1–5

    Chapter  Google Scholar 

  24. Kawahara T, Nanjo H, Shinozaki T, Furui S (2003) Benchmark test for speech recognition using the corpus. In: Proceedings of ISCA & IEEE workshop on spontaneous speech processing and recognition, pp 135–138

  25. Mai HT, Kim MH (2013) Utilizing similarity relationships among existing data for high accuracy processing of content-based image retrieval. Multimed Tools Appl. doi:10.1007/s11042-013-1360-9

  26. Marques O, Barman N (2003) Semi-automatic semantic annotation of images using machine learning techniques. The Semantic Web-ISWC 2003, pp 550–565

  27. Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proc. 8th int’l conf. computer vision, vol 2, pp 416–423

  28. Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the national conference on artificial intelligence, vol 21, no 1. AAAI Press, Menlo Park, MIT Press, Cambridge, p 775, 1999

  29. Moehrmann J, Heidemann G (2012) Efficient annotation of image data sets for computer vision applications. In: Proceedings of the 1st international workshop on visual interfaces for ground truth collection in computer vision applications, ser. VIGTA ’12, pp 2:1–2:6

  30. Mutch J, Lowe D (2008) Object class recognition and localization using sparse features with limited receptive fields. Int J Comput Vis 80:45–57

    Article  Google Scholar 

  31. Quinn AJ, Bederson BB (2011) Human computation: a survey and taxonomy of a growing field. In: Proceedings of the 2011 annual conference on human factors in computing systems, ser. CHI ’11, pp 1403–1412

  32. Rashtchian C, Young P, Hodosh M, Hockenmaier J (2010) Collecting image annotations using Amazon’s Mechanical Turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, ser. CSLDAMT ’10, pp 139–147

  33. Rother C, Kolmogorov V, Blake A (2004) “GrabCut”: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314

    Article  Google Scholar 

  34. Rotter P (2013) Relevance feedback based on n-tuplewise comparison and the ELECTRE methodology and an application in content-based image retrieval. Multimed Tools Appl. doi:10.1007/s11042-013-1384-1

  35. Russell BC, Torralba A, Murphy KP, Freeman WT (2008) Labelme: a database and web-based tool for image annotation. Int J Comput Vis 77(1–3):157–173

    Article  Google Scholar 

  36. Sigal L, Balan A, Black M (2010) Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int J Comput Vis 87(1):4–27. doi:10.1007/s11263-009-0273-6

    Article  Google Scholar 

  37. Sorokin A, Forsyth D (2008) Utility data annotation with Amazon Mechanical Turk. In: 2008 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, Piscataway, pp 1–8

    Chapter  Google Scholar 

  38. Spampinato C, Boom B, He J (eds) (2012) VIGTA ’12: proceedings of the 1st international workshop on visual interfaces for ground truth collection in computer vision applications. ACM, New York

    Google Scholar 

  39. Spampinato C, Palazzo S, Boom B, van Ossenbruggen J, Kavasidis I, Di R, Salvo Lin F, Giordano D, Hardman L, Fisher R (2012) Understanding fish behavior during typhoon events in real-life underwater environments. Multimed Tools Appl. doi:10.1007/s11042-012-1101-5

  40. Spampinato C, Palazzo S, Giordano D, Kavasidis I, Lin F, Lin Y (2012) Covariance based fish tracking in real-life underwater environment. In: International conference on computer vision theory and applications, VISAPP 2012, pp 409–414

  41. Stork DG (1999) Character and document research in the open mind initiative. In: Proceedings of the fifth international conference on document analysis and recognition, ser. ICDAR ’99

  42. Utasi A, Benedek C (2012) A multi-view annotation tool for people detection evaluation. In: Proceedings of the 1st international workshop on visual interfaces for ground truth collection in computer vision applications, ser. VIGTA ’12, pp 3:1–3:6

  43. von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the SIGCHI conference on human factors in computing systems, ser. CHI ’04. ACM, New York, pp 319–326 [Online]. Available: http://doi.acm.org/10.1145/985692.985733

  44. Wache H, Voegele T, Visser U, Stuckenschmidt H, Schuster G, Neumann H, Hübner S (2001) Ontology-based integration of information-a survey of existing approaches. In: IJCAI-01 workshop: ontologies and information sharing, vol 2001. Citeseer, pp 108–117

  45. Yuen J, Russell BC, Liu C, Torralba A (2009) Labelme video: building a video database with human annotations. In: ICCV’09, pp 1451–1458

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their constructive and invaluable comments. This research was funded by European Commission FP7 grant 257024, in the Fish4Knowledge project.Footnote 3

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Isaak Kavasidis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kavasidis, I., Palazzo, S., Salvo, R.D. et al. An innovative web-based collaborative platform for video annotation. Multimed Tools Appl 70, 413–432 (2014). https://doi.org/10.1007/s11042-013-1419-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1419-7

Keywords

Navigation