Top

International Journal of Computer Vision

Published in:

27-05-2019

Click Carving: Interactive Object Segmentation in Images and Videos with Point Clicks

Authors: Suyog Dutt Jain, Kristen Grauman

Published in: International Journal of Computer Vision | Issue 9/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

We present a novel form of interactive object segmentation called Click Carving which enables accurate segmentation of objects in images and videos with only a few point clicks. Whereas conventional interactive pipelines take the user’s initialization as a starting point, we show the value in the system taking lead even in initialization. In particular, for a given image or a video frame, the system precomputes a ranked list of thousands of possible segmentation hypotheses (also referred to as object region proposals) using appearance and motion cues. Then, the user looks at the top ranked proposals, and clicks on the object boundary to carve away erroneous ones. This process iterates (typically 2–3 times), and each time the system revises the top ranked proposal set, until the user is satisfied with a resulting segmentation mask. In the case of images, this mask is considered as the final object segmentation. However in the case of videos, the object region proposals rely on motion as well, and the resulting segmentation mask in the first frame is further propagated across the video to obtain a complete spatio-temporal object tube. On six challenging image and video datasets, we provide extensive comparisons with both existing work and simpler alternative methods. In all, the proposed Click Carving approach strikes an excellent of accuracy and human effort. It outperforms all similarly fast methods, and is competitive or better than those requiring 2–12 times the effort.

previous article Multi-target Tracking in Multiple Non-overlapping Cameras Using Fast-Constrained Dominant Sets

next article Efficient Feature Matching via Nonnegative Orthogonal Relaxation

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

More details and videos can be found at: http://vision.cs.utexas.edu/projects/clickcarving/.

Code available at: http://vision.cs.utexas.edu/projects/clickcarving/.

The unsupervised NLC method (Faktor and Irani 2014) reports excellent results on a subset of the Segtrack-v2 dataset; the method achieves state of the art results for that subset. We were unable to reproduce the results using the publicly available NLC code, potentially because of an OS incompatibility.

IVID (Shankar Nagaraja et al. 2015) does not report annotation times for Segtrack-v2. Also, VSB100 dataset wasn’t used in their experiments.

More details and videos can be found at: http://vision.cs.utexas.edu/projects/clickcarving/.

Acuna, D., Ling, H., Kar, A., & Fidler, S. (2018). Efficient interactive annotation of segmentation datasets with polygon-rnn++.

Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., & Malik, J. (2014). Multiscale combinatorial grouping. In CVPR.

Badrinarayanan, V., Galasso, F., & Cipolla, R. (2010). Label propagation in video sequences. In CVPR.

Bai, X., & Sapiro, G. (2007). Distancecut: Interactive segmentation and matting of images and videos. In 2007 IEEE international conference on image processing.

Bai, X., Wang, J., Simons, D., & Sapiro, G. (2009) Video snapcut: Robust video object cutout using localized classifiers. In SIGGRAPH.

Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2010). iCoseg: Interactive co-segmentation with intelligent scribble guidance. In CVPR.

Bearman, A., Russakovsky, O., Ferrari, V., & Fei-Fei, L. (2015). What’s the point: Semantic segmentation with point supervision. ArXiv e-prints.

Bell, S., Upchurch, P., Snavely, N., & Bala, K. (2015). Material recognition in the wild with the materials in context database. In Computer Vision and Pattern Recognition (CVPR).

Boykov, Y., & Jolly, M. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In CVPR.

Carreira, J., & Sminchisescu, C. (2012). CPMC: Automatic object segmentation using constrained parametric min-cuts. PAMI, 34(7), 1312–1328.CrossRef

Castrejón, L., Kundu, K., Urtasun, R., & Fidler, S. (2017). Annotating object instances with a polygon-rnn. In CVPR.

Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2015). Semantic image segmentation with deep convolutional nets and fully connected crfs. In ICLR.

Cheng, M.-M., Zhang, G.-X., Mitra, N. J., Huang, X., & Hu, S.-M. (2011). Global contrast based salient region detection. In CVPR (pp. 409–416).

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).

Faktor, A., & Irani, M. (2014). Video segmentation by non-local consensus voting. In Proceedings of the British machine vision conference. BMVA Press.

Fathi, A., Balcan, M., Ren, X., & Rehg, J. (2011). Combining self training and active learning for video segmentation. In BMVC.

Fragkiadaki, K., Arbelaez, P., Felsen, P., & Malik, J. (2015). Learning to segment moving objects in videos. In CVPR.

Galasso, F., Nagaraja, N. S., Cardenas, T. J., Brox, T., & Schiele, B. (2013). A unified video segmentation benchmark: Annotation, metrics and analysis. In ICCV.

Godec, M., Roth, P. M., & Bischof, H. (2011). Hough-based tracking of non-rigid objects. In ICCV.

Grundmann, M., Kwatra, V., Han, M., & Essa, I. (2010). Efficient hierarchical graph based video segmentation. In CVPR.

Gulshan, V., Rother, C., Criminisi, A., Blake, A., & Zisserman, A. (2010). Geodesic star convexity for interactive image segmentation. In CVPR.

Jain, S., & Grauman, K. (2013). Predicting sufficient annotation strength for interactive foreground segmentation. In ICCV.

Jain, S. D., & Grauman, K. (2014). Supervoxel-consistent foreground propagation in video. In ECCV 2014. Lecture notes in computer science (pp. 656–671). Springer.

Jain, S. D., & Grauman, K. (2016). Click carving: Segmenting objects in video with point clicks. In AAAI conference on human computation and crowdsourcing (HCOMP).

Jiang, B., Zhang, L., Lu, H., Yang, C., & Yang, M.-H. (2013). Saliency detection via absorbing markov chain. In ICCV.

Karasev, V., Ravichandran, A., & Soatto, S. (2014). Active frame, location, and detector selection for automated and manual video annotation. In Proceedings of the IEEE conference on computer vision and pattern recognition.

Kass, M., Witkin, A., & Terzopoulos, D. (1988). Snakes: Active contour models. In IJCV (pp. 321–331).

Kohli, P., Nickisch, H., Rother, C., & Rhemann, C. (2012). User-centric learning and evaluation of interactive segmentation systems. IJCV, 100(3), 261–274.MathSciNetCrossRef

Krähenbühl, P., & Koltun, V. (2014). In Computer vision—ECCV 2014: 13th European conference, Zurich, Switzerland, September 6–12, 2014, proceedings, part V, chapter geodesic object proposals (pp. 725–739). Cham: Springer.

Krause, A., & Guestrin, C. (2007). Near-optimal observation selection using submodular functions. In National conference on artificial intelligence (AAAI), nectar track.

Lee, Y. J., Kim, J., & Grauman, K. (2011). Key-segments for video object segmentation. In ICCV.

Lempitsky, V. S., Kohli, P., Rother, C., & Sharp, T. (2009). Image segmentation with a bounding box prior. In ICCV

Levinkov, E., Tompkin, J., Bonneel, N., Kirchhoff, S., Andres, B., & Pfister, H. (2016). Interactive multicut video segmentation. In Proceedings of the 24th Pacific conference on computer graphics and applications: Short papers (pp. 33–38).

Li, F., Kim, T., Humayun, A., Tsai, D., & Rehg, J. M. (2013). Video segmentation by tracking many figure-ground segments. In ICCV.

Li, X., Zhao, L., Wei, L., Yang, M.-H., Fei, W., Zhuang, Y., et al. (2016). DeepSaliency: Multi-task deep neural network model for salient object detection. IEEE TIP, 25(8), 3919–3930.MathSciNetMATH

Li, Y., Hou, X., Koch, C., Rehg, J. M., & Yuille, A. L. (2014). The secrets of salient object segmentation. In CVPR.

Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In ECCV.

Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., et al. (2011). Learning to detect a salient object. PAMI, 33(2), 353–367.CrossRef

Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR.

Ma, T., & Latecki, L. (2012). Maximum weight cliques with mutex constraints for video object segmentation. In CVPR.

Malisiewicz, T., & Efros, A. A. (2007). Spatial support for objects via multiple segmentations. In BMVC.

Malmberg, F., Strand, R., & Nyström, I. (2011). Generalized hard constraints for graph segmentation. In SCIA.

McGuinness, K., & O’Connor, N. E. (2010). A comparative evaluation of interactive segmentation algorithms. Pattern Recognition, 43(2), 434–444. Interactive Imaging and Vision.CrossRefMATH

Mortensen, E., & Barrett, W. (1995). Intelligent scissors for image composition. In SIGGRAPH.

Nickisch, H., Rother, C., Kohli, P., & Rhemann, C. (2010). Learning an interactive segmentation system. In Proceedings of the seventh Indian conference on computer vision, graphics and image processing, ICVGIP ’10 (pp. 274–281). New York, NY: ACM.

Noh, H., Hong, S., & Han, B. (2015). Learning deconvolution network for semantic segmentation. In 2015 IEEE international conference on computer vision (ICCV).

Oneata, D., Revaud, J., Verbeek, J., & Schmid, C. (2014). Spatio-temporal object detection proposals. In ECCV.

Papadopoulos, D., Uijlings, J., Keller, F., & Ferrari, V. (2017). Training object class detectors with click supervision. In CVPR.

Papazoglou, A., & Ferrari, V. (2013). Fast object segmentation in unconstrained video. In ICCV.

Perazzi, F., Krähenbühl, P., Pritch, Y., & Hornung, A. (2012). Saliency filters: Contrast based filtering for salient region detection. In CVPR (pp. 733–740).

Pinheiro, P. O., Collobert, R., & Dollár, P. (2015). Learning to segment object candidates. In NIPS

Pont-Tuset, J., Farré, M. A., & Smolic, A. (2015). Semi-automatic video object segmentation by advanced manipulation of segmentation hierarchies. In International workshop on content-based multimedia indexing (CBMI).

Ren, X., & Malik, J. (2007). Tracking as repeated figure/ground segmentation. In CVPR.

Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut-interactive foreground extraction using iterated graph cuts. In SIGGRAPH.

Russakovsky, O., Li, L.-J., & Fei-Fei, L. (2015). Best of both worlds: Human–machine collaboration for object annotation. In CVPR.

Shankar Nagaraja, N., Schmidt, F. R., & Brox, T. (2015). Video segmentation with just a few strokes. In ICCV.

Sundberg, P., Brox, T., Maire, M., Arbelaez, P., & Malik, J. (2011). Occlusion boundary detection and figure/ground assignment from optical flow. In CVPR, Washington, DC, USA.

Tsai, D., Flagg, M., & Rehg, J. (2010). Motion coherent tracking with multi-label mrf optimization. In BMVC.

The OpenCV reference manual, 2.4.9.0 edition, April 2014.

Uijlings, J. R. R., van de Sande, K. E. A., Gevers, T., & Smeulders, A. W. M. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.CrossRef

Vijayanarasimhan, S., & Grauman, K. (2012). Active frame selection for label propagation in videos. In ECCV.

Vondrick, C., & Ramanan, D. (2011). Video annotation and tracking with active learning. In NIPS.

Wang, J., Bhat, P., Colburn, A., Agrawala, M., & Cohen, M. F. (2005). Interactive video cutout. ACM Transactions on Graphics, 24(3), 585–594.CrossRef

Wang, T., Han, B., & Collomosse, J. (2014). Touchcut: Fast image and video segmentation using single-touch interaction. Computer Vision and Image Understanding, 120, 14–30.CrossRef

Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2015). Learning to detect motion boundaries. In CVPR 2015, Boston, United States.

Wen, L., Du, D., Lei, Z., Li, S. Z., & Yang, M.-H. (2015). Jots: Joint online tracking and segmentation. In CVPR.

Wu, Z., Li, F., Sukthankar, R., & Rehg, J. M. (2015). Robust video segment proposals with painless occlusion handling. In CVPR.

Xu, N., Price, B. L., Cohen, S., Yang, J., & Huang, T. S. (2016). Deep interactive object selection. CVPR (pp. 373–381).

Yu, G., & Yuan, J. (2015). Fast action proposals for human action detection and search. In CVPR.

Zhang, D., Javed, O., & Shah, M. (2013). Video object segmentation through spatially accurate and temporally dense extraction of primary object regions. In CVPR.

Zhao, R., Ouyang, W., Li, H., & Wang, X. (2015). Saliency detection by multi-context learning. In CVPR.

Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., et al. (2015). Conditional random fields as recurrent neural networks.

Title: Click Carving: Interactive Object Segmentation in Images and Videos with Point Clicks
Authors: Suyog Dutt Jain
Kristen Grauman
Publication date: 27-05-2019
Publisher: Springer US
Published in: International Journal of Computer Vision / Issue 9/2019
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-019-01184-2

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 9/2019

Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization

Lucid Data Dreaming for Video Object Segmentation

Unsupervised Learning of Foreground Object Segmentation

Deep Learning Approach in Aerial Imagery for Supporting Land Search and Rescue Missions

Predicting How to Distribute Work Between Algorithms and Humans to Segment an Image Batch

Deep Supervised Hashing for Fast Image Retrieval

Premium Partner