Skip to main content
Top
Published in: The Journal of Supercomputing 14/2023

25-04-2023

Interactive object annotation based on one-click guidance

Authors: Yijin Xiong, Xin Gao, Guoying Zhang

Published in: The Journal of Supercomputing | Issue 14/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Due to the large workload of manual annotation of datasets, uneven data quality and high professional thresholds have been a problem. Based on the idea of semi-automatic annotation, this article discusses the method use of interactive methods to obtain accurate annotations of objects. We propose a method of human–machine interactive object annotation based on one-click guidance. Specifically, we click on a point close to the center of the object and use the prior information of this point to give a guide to the model. The advantages of our method are fourfold: (1) the simulated click method is transferable and can be labeled across datasets; (2) clicks help to eliminate irrelevant areas within the bounding box; (3) the operation is more convenient and does not require artificial boxes, we only need to give the relevant location information; (4) our method supports additional click annotations for further correction. To verify the effectiveness of the proposed method, we conducted a lot of experiments on the KITTI and PASCAL VOC2012 datasets, and the results proved that our model has improved average IoU by 18.1% and 14.6% compared with Anno-Mage and CVAT, respectively. Our method focuses on improving the accuracy and efficiency of annotation, and provides a new idea for the field of semi-automatic annotation.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5296–5305 Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5296–5305
2.
go back to reference Nandhini P, Kuppuswami S, Malliga S, DeviPriya R (2022) Enhanced rank attack detection algorithm (e-rad) for securing rpl-based iot networks by early detection and isolation of rank attackers. J Supercomput 1–24 Nandhini P, Kuppuswami S, Malliga S, DeviPriya R (2022) Enhanced rank attack detection algorithm (e-rad) for securing rpl-based iot networks by early detection and isolation of rank attackers. J Supercomput 1–24
3.
go back to reference Suseendran G, Akila D, Vijaykumar H, Jabeen TN, Nirmala R, Nayyar A (2022) Multi-sensor information fusion for efficient smart transport vehicle tracking and positioning based on deep learning technique. J Supercomput 1–26 Suseendran G, Akila D, Vijaykumar H, Jabeen TN, Nirmala R, Nayyar A (2022) Multi-sensor information fusion for efficient smart transport vehicle tracking and positioning based on deep learning technique. J Supercomput 1–26
4.
go back to reference Varga V, Lőrincz A (2020) Reducing human efforts in video segmentation annotation with reinforcement learning. Neurocomputing 405:247–258CrossRef Varga V, Lőrincz A (2020) Reducing human efforts in video segmentation annotation with reinforcement learning. Neurocomputing 405:247–258CrossRef
5.
go back to reference Kishorekumar R, Deepa P (2020) A framework for semantic image annotation using legion algorithm. J Supercomput 76(6):4169–4183CrossRef Kishorekumar R, Deepa P (2020) A framework for semantic image annotation using legion algorithm. J Supercomput 76(6):4169–4183CrossRef
6.
go back to reference Pham T-N, Nguyen V-H, Huh J-H (2023) Integration of improved yolov5 for face mask detector and auto-labeling to generate dataset for fighting against covid-19. J Supercomput 1–27 Pham T-N, Nguyen V-H, Huh J-H (2023) Integration of improved yolov5 for face mask detector and auto-labeling to generate dataset for fighting against covid-19. J Supercomput 1–27
7.
go back to reference Boukthir K, Qahtani AM, Almutiry O, Dhahri H, Alimi AM (2022) Reduced annotation based on deep active learning for Arabic text detection in natural scene images. Pattern Recogn Lett 157:42–48CrossRef Boukthir K, Qahtani AM, Almutiry O, Dhahri H, Alimi AM (2022) Reduced annotation based on deep active learning for Arabic text detection in natural scene images. Pattern Recogn Lett 157:42–48CrossRef
8.
go back to reference Russell BC, Torralba A, Murphy KP, Freeman WT (2008) Labelme: a database and web-based tool for image annotation. Int J Comput Vis 77(1–3):157–173CrossRef Russell BC, Torralba A, Murphy KP, Freeman WT (2008) Labelme: a database and web-based tool for image annotation. Int J Comput Vis 77(1–3):157–173CrossRef
9.
go back to reference Su H, Deng J, Fei-Fei L (2012) Crowdsourcing annotations for visual object detection. In: Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence Su H, Deng J, Fei-Fei L (2012) Crowdsourcing annotations for visual object detection. In: Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence
10.
go back to reference Acuna D, Ling H, Kar A, Fidler S (2018) Efficient interactive annotation of segmentation datasets with polygon-rnn++. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 859–868 Acuna D, Ling H, Kar A, Fidler S (2018) Efficient interactive annotation of segmentation datasets with polygon-rnn++. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 859–868
11.
go back to reference Vondrick C, Patterson D, Ramanan D (2013) Efficiently scaling up crowdsourced video annotation. Int J Comput Vis 101(1):184–204CrossRef Vondrick C, Patterson D, Ramanan D (2013) Efficiently scaling up crowdsourced video annotation. Int J Comput Vis 101(1):184–204CrossRef
12.
go back to reference Mottaghi R, Chen X, Liu X, Cho N-G, Lee S-W, Fidler S, Urtasun R, Yuille A (2014) The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 891–898 Mottaghi R, Chen X, Liu X, Cho N-G, Lee S-W, Fidler S, Urtasun R, Yuille A (2014) The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 891–898
13.
go back to reference Zhang S, Liew JH, Wei Y, Wei S, Zhao Y (2020) Interactive object segmentation with inside–outside guidance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12234–12244 Zhang S, Liew JH, Wei Y, Wei S, Zhao Y (2020) Interactive object segmentation with inside–outside guidance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12234–12244
14.
go back to reference Pacha S, Murugan SR, Sethukarasi R (2020) Semantic annotation of summarized sensor data stream for effective query processing. J Supercomput 76(6):4017–4039CrossRef Pacha S, Murugan SR, Sethukarasi R (2020) Semantic annotation of summarized sensor data stream for effective query processing. J Supercomput 76(6):4017–4039CrossRef
15.
go back to reference Schembera B (2021) Like a rainbow in the dark: metadata annotation for hpc applications in the age of dark data. J Supercomput 77(8):8946–8966CrossRef Schembera B (2021) Like a rainbow in the dark: metadata annotation for hpc applications in the age of dark data. J Supercomput 77(8):8946–8966CrossRef
16.
go back to reference Ling H, Gao J, Kar A, Chen W, Fidler S (2019) Fast interactive object annotation with curve-gcn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5257–5266 Ling H, Gao J, Kar A, Chen W, Fidler S (2019) Fast interactive object annotation with curve-gcn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5257–5266
17.
go back to reference Gao X, Zhang G, Xiong Y (2022) Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel. Measurement 194:111001CrossRef Gao X, Zhang G, Xiong Y (2022) Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel. Measurement 194:111001CrossRef
18.
go back to reference Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338CrossRef Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338CrossRef
19.
go back to reference Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 3354–3361 Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 3354–3361
20.
go back to reference Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237CrossRef Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237CrossRef
22.
go back to reference Dutta A, Zisserman A (2019) The via annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 2276–2279 Dutta A, Zisserman A (2019) The via annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 2276–2279
23.
go back to reference Yu F, Xian W, Chen Y, Liu F, Liao M, Madhavan V, Darrell T (2018) Bdd100k: a diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687 2(5), 6 Yu F, Xian W, Chen Y, Liu F, Liao M, Madhavan V, Darrell T (2018) Bdd100k: a diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:​1805.​04687 2(5), 6
27.
go back to reference Wang B, Wu V, Wu B, Keutzer K (2019) Latte: accelerating lidar point cloud annotation via sensor fusion, one-click annotation, and tracking. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, pp 265–272 Wang B, Wu V, Wu B, Keutzer K (2019) Latte: accelerating lidar point cloud annotation via sensor fusion, one-click annotation, and tracking. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, pp 265–272
28.
go back to reference Piewak F, Pinggera P, Schafer M, Peter D, Schwarz B, Schneider N, Enzweiler M, Pfeiffer D, Zollner M (2018) Boosting lidar-based semantic labeling by cross-modal training data generation. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp 0–0 Piewak F, Pinggera P, Schafer M, Peter D, Schwarz B, Schneider N, Enzweiler M, Pfeiffer D, Zollner M (2018) Boosting lidar-based semantic labeling by cross-modal training data generation. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp 0–0
29.
go back to reference Yue X, Wu B, Seshia SA, Keutzer K, Sangiovanni-Vincentelli AL (2018) A lidar point cloud generator: from a virtual world to autonomous driving. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp 458–464 Yue X, Wu B, Seshia SA, Keutzer K, Sangiovanni-Vincentelli AL (2018) A lidar point cloud generator: from a virtual world to autonomous driving. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp 458–464
30.
go back to reference Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) Carla: an open urban driving simulator. In: Conference on Robot Learning. PMLR, pp 1–16 Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) Carla: an open urban driving simulator. In: Conference on Robot Learning. PMLR, pp 1–16
31.
go back to reference Maninis K-K, Caelles S, Pont-Tuset J, Van Gool L (2018) Deep extreme cut: from extreme points to object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 616–625 Maninis K-K, Caelles S, Pont-Tuset J, Van Gool L (2018) Deep extreme cut: from extreme points to object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 616–625
32.
go back to reference Papadopoulos DP, Uijlings JR, Keller F, Ferrari V (2017) Extreme clicking for efficient object annotation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4930–4939 Papadopoulos DP, Uijlings JR, Keller F, Ferrari V (2017) Extreme clicking for efficient object annotation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4930–4939
33.
go back to reference Fails JA, Olsen Jr DR (2003) Interactive machine learning. In: Proceedings of the 8th International Conference on Intelligent User Interfaces, pp 39–45 Fails JA, Olsen Jr DR (2003) Interactive machine learning. In: Proceedings of the 8th International Conference on Intelligent User Interfaces, pp 39–45
34.
go back to reference Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99 Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
35.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
36.
go back to reference Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125 Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125
37.
go back to reference Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 510–519 Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 510–519
38.
go back to reference Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141 Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Metadata
Title
Interactive object annotation based on one-click guidance
Authors
Yijin Xiong
Xin Gao
Guoying Zhang
Publication date
25-04-2023
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 14/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-023-05279-z

Other articles of this Issue 14/2023

The Journal of Supercomputing 14/2023 Go to the issue

Premium Partner