Skip to main content
Erschienen in: International Journal of Computer Vision 2/2023

11.11.2022

Robots Understanding Contextual Information in Human-Centered Environments Using Weakly Supervised Mask Data Distillation

verfasst von: Daniel Dworakowski, Angus Fung, Goldie Nejat

Erschienen in: International Journal of Computer Vision | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Contextual information contained within human environments, such as text on signs, symbols and objects provide important information for robots to use for exploration and navigation. To identify and segment contextual information from images obtained in these environments data-driven methods such as Convolutional Neural Networks (CNNs) can be used. However, these methods require significant amounts of human labeled data which is time-consuming to obtain. In this paper, we present the novel Weakly Supervised Mask Data Distillation (WeSuperMaDD) architecture for autonomously generating pseudo segmentation labels (PSLs) using CNNs not specifically trained for the task of text segmentation, e.g., CNNs alternatively trained for: object classification or image captioning. WeSuperMaDD is uniquely able to generate PSLs using learned image features from datasets that are sparse and with limited diversity, which are common in robot navigation tasks in human-centred environments (i.e., malls, stores). Our proposed architecture uses a new mask refinement system which automatically searches for the PSL with the fewest foreground pixels that satisfies cost constraints. This removes the need for handcrafted heuristic rules. Extensive experiments were conducted to validate the performance of WeSuperMaDD in generating PSLs for datasets containing text of various scales, fonts, orientations, curvatures, and perspectives in several indoor/outdoor environments. A detailed comparison study conducted with existing approaches found a significant improvement in PSL quality. Furthermore, an instance segmentation CNN trained using the WeSuperMaDD architecture achieved measurable improvements in accuracy when compared to an instance segmentation CNN trained with Naïve PSLs. We also found our method to have comparable performance to existing text detection methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., & Sivic, J. (2016). NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5297–5307). Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., & Sivic, J. (2016). NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5297–5307).
Zurück zum Zitat Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., et al. (2019a). What is wrong with scene text recognition model comparisons? Dataset and model analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 4715–4723). Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., et al. (2019a). What is wrong with scene text recognition model comparisons? Dataset and model analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 4715–4723).
Zurück zum Zitat Baek, Y., Lee, B., Han, D., Yun, S., & Lee, H. (2019b). Character Region Awareness for Text Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9365–9374). Baek, Y., Lee, B., Han, D., Yun, S., & Lee, H. (2019b). Character Region Awareness for Text Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9365–9374).
Zurück zum Zitat Barnes, D., Maddern, W., & Posner, I. (2017). Find your own way: Weakly-supervised segmentation of path proposals for urban autonomy. In 2017 IEEE International Conference on Robotics and Automation (ICRA) (pp. 203–210). Barnes, D., Maddern, W., & Posner, I. (2017). Find your own way: Weakly-supervised segmentation of path proposals for urban autonomy. In 2017 IEEE International Conference on Robotics and Automation (ICRA) (pp. 203–210).
Zurück zum Zitat Bellocchio, E., Ciarfuglia, T. A., Costante, G., & Valigi, P. (2019). Weakly supervised fruit counting for yield estimation using spatial consistency. IEEE Robotics and Automation Letters, 4(3), 2348–2355.CrossRef Bellocchio, E., Ciarfuglia, T. A., Costante, G., & Valigi, P. (2019). Weakly supervised fruit counting for yield estimation using spatial consistency. IEEE Robotics and Automation Letters, 4(3), 2348–2355.CrossRef
Zurück zum Zitat Benenson, R., Popov, S., & Ferrari, V. (2019). Large-scale interactive object segmentation with human annotators. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11700–11709). Benenson, R., Popov, S., & Ferrari, V. (2019). Large-scale interactive object segmentation with human annotators. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11700–11709).
Zurück zum Zitat Bojarski, M., Choromanska, A., Choromanski, K., Firner, B., Jackel, L., Muller, U., & Zieba, K. (2018). VisualBackProp: efficient visualization of CNNs. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 4701–4708). Bojarski, M., Choromanska, A., Choromanski, K., Firner, B., Jackel, L., Muller, U., & Zieba, K. (2018). VisualBackProp: efficient visualization of CNNs. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 4701–4708).
Zurück zum Zitat Bonechi, S., Andreini, P., Bianchini, M., & Scarselli, F. (2019). COCO_TS Dataset: Pixel–level annotations based on weak supervision for scene text segmentation. In International Conference on Artificial Neural Networks and Machine Learning (pp. 238–250). Cham: Springer. Bonechi, S., Andreini, P., Bianchini, M., & Scarselli, F. (2019). COCO_TS Dataset: Pixel–level annotations based on weak supervision for scene text segmentation. In International Conference on Artificial Neural Networks and Machine Learning (pp. 238–250). Cham: Springer.
Zurück zum Zitat Case, C., Suresh, B., Coates, A.,& Ng, A. Y., (2011). Autonomous sign reading for semantic mapping. In 2011 IEEE international Conference on Robotics and Automation (pp. 3297–3303). Case, C., Suresh, B., Coates, A.,& Ng, A. Y., (2011). Autonomous sign reading for semantic mapping. In 2011 IEEE international Conference on Robotics and Automation (pp. 3297–3303).
Zurück zum Zitat Chapelle, O., Schlkopf, B., & Zien, A. (2010). Semi-supervised learning (1st ed.). The MIT Press. Chapelle, O., Schlkopf, B., & Zien, A. (2010). Semi-supervised learning (1st ed.). The MIT Press.
Zurück zum Zitat Ch’ng, C. K., & Chan, C. S. (2017). Total-text: A comprehensive dataset for scene text detection and recognition. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) (pp. 935–942). Ch’ng, C. K., & Chan, C. S. (2017). Total-text: A comprehensive dataset for scene text detection and recognition. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) (pp. 935–942).
Zurück zum Zitat Cleveland, J., Thakur, D., Dames, P., Phillips, C., Kientz, T., Daniilidis, K., et al. (2017). Automated system for semantic object labeling with soft-object recognition and dynamic programming segmentation. IEEE Transactions on Automation Science and Engineering, 14(2), 820–833.CrossRef Cleveland, J., Thakur, D., Dames, P., Phillips, C., Kientz, T., Daniilidis, K., et al. (2017). Automated system for semantic object labeling with soft-object recognition and dynamic programming segmentation. IEEE Transactions on Automation Science and Engineering, 14(2), 820–833.CrossRef
Zurück zum Zitat Deng, L., Gong, Y., Lin, Y., Shuai, J., Tu, X., Zhang, Y., et al. (2019b). Detecting multi-oriented text with corner-based region proposals. Neurocomputing, 334, 134–142.CrossRef Deng, L., Gong, Y., Lin, Y., Shuai, J., Tu, X., Zhang, Y., et al. (2019b). Detecting multi-oriented text with corner-based region proposals. Neurocomputing, 334, 134–142.CrossRef
Zurück zum Zitat Deng, L., Gong, Y., Lu, X., Lin, Y., Ma, Z., & Xie, M. (2019a). STELA: A real-time scene text detector with learned anchor. IEEE Access, 7, 153400–153407.CrossRef Deng, L., Gong, Y., Lu, X., Lin, Y., Ma, Z., & Xie, M. (2019a). STELA: A real-time scene text detector with learned anchor. IEEE Access, 7, 153400–153407.CrossRef
Zurück zum Zitat Dworakowski, D., Thompson, C., Pham-Hung, M., & Nejat, G. (2021). A robot architecture using contextSLAM to find products in unknown crowded retail environments. Robotics, 10(4), 110.CrossRef Dworakowski, D., Thompson, C., Pham-Hung, M., & Nejat, G. (2021). A robot architecture using contextSLAM to find products in unknown crowded retail environments. Robotics, 10(4), 110.CrossRef
Zurück zum Zitat Everingham, M., Eslami, S. M. A., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.CrossRef Everingham, M., Eslami, S. M. A., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.CrossRef
Zurück zum Zitat Fu, C.-Y., Shvets, M., & Berg, A. C. (2019). RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free. arxiv. Fu, C.-Y., Shvets, M., & Berg, A. C. (2019). RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free. arxiv.
Zurück zum Zitat Gregorio, D. D., Tonioni, A., Palli, G., & Stefano, L. D. (2020). Semiautomatic labeling for deep learning in robotics. IEEE Transactions on Automation Science and Engineering, 17(2), 611–620.CrossRef Gregorio, D. D., Tonioni, A., Palli, G., & Stefano, L. D. (2020). Semiautomatic labeling for deep learning in robotics. IEEE Transactions on Automation Science and Engineering, 17(2), 611–620.CrossRef
Zurück zum Zitat Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2315–2324). Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2315–2324).
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2961–2969). He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2961–2969).
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern RecognitioN (pp. 770–778). He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern RecognitioN (pp. 770–778).
Zurück zum Zitat Hou, Q., Massiceti, D., Dokania, P. K., Wei, Y., Cheng, M.-M., & Torr, P. H. (2017). Bottom-up top-down cues for weakly-supervised semantic segmentation. In International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (pp. 263–277). Springer. Hou, Q., Massiceti, D., Dokania, P. K., Wei, Y., Cheng, M.-M., & Torr, P. H. (2017). Bottom-up top-down cues for weakly-supervised semantic segmentation. In International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (pp. 263–277). Springer.
Zurück zum Zitat Huang, J., Sivakumar, V., Mnatsakanyan, M., & Pang, G. (2018). Improving rotated text detection with rotation region proposal networks. arxiv. Huang, J., Sivakumar, V., Mnatsakanyan, M., & Pang, G. (2018). Improving rotated text detection with rotation region proposal networks. arxiv.
Zurück zum Zitat Ibrahim, M. S., Vahdat, A., & Macready, W. G. (2018). Weakly supervised semantic image segmentation with self-correcting networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12715–12725). Ibrahim, M. S., Vahdat, A., & Macready, W. G. (2018). Weakly supervised semantic image segmentation with self-correcting networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12715–12725).
Zurück zum Zitat Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Synthetic data and artificial neural networks for natural scene text recognition. arxiv. Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Synthetic data and artificial neural networks for natural scene text recognition. arxiv.
Zurück zum Zitat Jain, S. D., & Grauman, K. (2013). Predicting sufficient annotation strength for interactive foreground segmentation. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1313–1320). Jain, S. D., & Grauman, K. (2013). Predicting sufficient annotation strength for interactive foreground segmentation. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1313–1320).
Zurück zum Zitat Jing, L., Chen, Y., & Tian, Y. (2020). Coarse-to-fine semantic segmentation from image-level labels. IEEE Transactions on Image Processing, 29, 225–236.MathSciNetCrossRefMATH Jing, L., Chen, Y., & Tian, Y. (2020). Coarse-to-fine semantic segmentation from image-level labels. IEEE Transactions on Image Processing, 29, 225–236.MathSciNetCrossRefMATH
Zurück zum Zitat Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., et al. (2015). ICDAR 2015 competition on robust reading. In 13th International Conference on Document Analysis and Recognition (pp. 1156–1160). Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., et al. (2015). ICDAR 2015 competition on robust reading. In 13th International Conference on Document Analysis and Recognition (pp. 1156–1160).
Zurück zum Zitat Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L. G. i, Mestre, S. R., et al. (2013). ICDAR 2013 robust reading competition. In 12th International Conference on Document Analysis and Recognition (pp. 1484–1493). Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L. G. i, Mestre, S. R., et al. (2013). ICDAR 2013 robust reading competition. In 12th International Conference on Document Analysis and Recognition (pp. 1484–1493).
Zurück zum Zitat Khoreva, A., Benenson, R., Hosang, J., Hein, M., & Schiele, B. (2017). Simple does it: weakly supervised instance and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 876–885). Khoreva, A., Benenson, R., Hosang, J., Hein, M., & Schiele, B. (2017). Simple does it: weakly supervised instance and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 876–885).
Zurück zum Zitat Kolesnikov, A., & Lampert, C. H. (2016). Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV (pp. 695–711). Kolesnikov, A., & Lampert, C. H. (2016). Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV (pp. 695–711).
Zurück zum Zitat Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2005). Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30, 25–36. Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2005). Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30, 25–36.
Zurück zum Zitat Li, G., Xie, Y., & Lin, L. (2018). Weakly supervised salient object detection using image labels. In AAAI Conf. on Artificial Intelligence (pp. 7024–7031). Li, G., Xie, Y., & Lin, L. (2018). Weakly supervised salient object detection using image labels. In AAAI Conf. on Artificial Intelligence (pp. 7024–7031).
Zurück zum Zitat Li, Y., Wang, T., Kang, B., Tang, S., Wang, C., Li, J., & Feng, J. (2020). Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10991–11000). Li, Y., Wang, T., Kang, B., Tang, S., Wang, C., Li, J., & Feng, J. (2020). Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10991–11000).
Zurück zum Zitat Liang, H., Sanket, N. J., Fermüller, C., & Aloimonos, Y. (2019). SalientDSO: Bringing attention to direct sparse odometry. IEEE Transactions on Automation Science and Engineering, 16(4), 1619–1626.CrossRef Liang, H., Sanket, N. J., Fermüller, C., & Aloimonos, Y. (2019). SalientDSO: Bringing attention to direct sparse odometry. IEEE Transactions on Automation Science and Engineering, 16(4), 1619–1626.CrossRef
Zurück zum Zitat Liao, M., Shi, B., & Bai, X. (2018a). Textboxes++: A Single-Shot Oriented Scene Text Detector. IEEE Transactions on Image Processing, 27(8), 3676–3690.MathSciNetCrossRefMATH Liao, M., Shi, B., & Bai, X. (2018a). Textboxes++: A Single-Shot Oriented Scene Text Detector. IEEE Transactions on Image Processing, 27(8), 3676–3690.MathSciNetCrossRefMATH
Zurück zum Zitat Liao, M., Zhang, J., Wan, Z., Xie, F., Liang, J., Lyu, P., et al. (2018b). Scene text recognition from two-dimensional perspective, arXiv. Liao, M., Zhang, J., Wan, Z., Xie, F., Liang, J., Lyu, P., et al. (2018b). Scene text recognition from two-dimensional perspective, arXiv.
Zurück zum Zitat Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P., et al. (2017). Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2980–2988). Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P., et al. (2017). Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2980–2988).
Zurück zum Zitat Liu, J., Liu, X., Sheng, J., Liang, D., Li, X., & Liu, Q. (2019). Pyramid mask text detector. Liu, J., Liu, X., Sheng, J., Liang, D., Li, X., & Liu, Q. (2019). Pyramid mask text detector.
Zurück zum Zitat Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., & Yan, J. (2018). FOTS fast oriented text spotting with a unified network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5676–5685). Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., & Yan, J. (2018). FOTS fast oriented text spotting with a unified network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5676–5685).
Zurück zum Zitat Lyu, P., Liao, M., Yao, C., Wu, W., & Bai, X. (2018a). Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proceedings of the European Conference on Computer Vision (ECCV). Lyu, P., Liao, M., Yao, C., Wu, W., & Bai, X. (2018a). Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proceedings of the European Conference on Computer Vision (ECCV).
Zurück zum Zitat Lyu, P., Yao, C., Wu, W., Yan, S., & Bai, X. (2018b). Multi-oriented scene text detection via corner localization and region segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7553–7563) Lyu, P., Yao, C., Wu, W., Yan, S., & Bai, X. (2018b). Multi-oriented scene text detection via corner localization and region segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7553–7563)
Zurück zum Zitat Mahendran, A., & Vedaldi, A. (2016). Salient deconvolutional networks. Computer vision—ECCV 2016 (pp. 120–135). Springer.CrossRef Mahendran, A., & Vedaldi, A. (2016). Salient deconvolutional networks. Computer vision—ECCV 2016 (pp. 120–135). Springer.CrossRef
Zurück zum Zitat Mishra, A., Alahari, K., & Jawahar, C. V. (2012). Scene text recognition using higher order language priors. In British Machine Vision Conference (p. 127.1–127.11). Mishra, A., Alahari, K., & Jawahar, C. V. (2012). Scene text recognition using higher order language priors. In British Machine Vision Conference (p. 127.1–127.11).
Zurück zum Zitat Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., et al. (2017). ICDAR2017 Robust reading challenge on multi-lingual scene text detection and script identification—RRC-MLT. In 2017 14th IAPR International Conference on Document Analysis and Recognition (pp. 1454–1459). Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., et al. (2017). ICDAR2017 Robust reading challenge on multi-lingual scene text detection and script identification—RRC-MLT. In 2017 14th IAPR International Conference on Document Analysis and Recognition (pp. 1454–1459).
Zurück zum Zitat Niu, S., Lin, H., Niu, T., Li, B., & Wang, X. (2019). DefectGAN: Weakly-supervised defect detection using generative adversarial network. In IEEE International Conference on Automation Science and Engineering (pp. 127–132). Niu, S., Lin, H., Niu, T., Li, B., & Wang, X. (2019). DefectGAN: Weakly-supervised defect detection using generative adversarial network. In IEEE International Conference on Automation Science and Engineering (pp. 127–132).
Zurück zum Zitat Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62–66.CrossRef Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62–66.CrossRef
Zurück zum Zitat Overview—ICDAR2017 Competition on Multi-lingual scene text detection and script identification - Robust Reading Competition. (2017, January 4). Robust Reading Competition. https://rrc.cvc.uab.es/?ch=8. Accessed 20 November 2020 Overview—ICDAR2017 Competition on Multi-lingual scene text detection and script identification - Robust Reading Competition. (2017, January 4). Robust Reading Competition. https://​rrc.​cvc.​uab.​es/​?​ch=​8. Accessed 20 November 2020
Zurück zum Zitat Peng, Z., Gao, S., Xiao, B., Guo, S., & Yang, Y. (2018). CrowdGIS: Updating digital maps via mobile crowdsensing. IEEE Transactions on Automation Science and Engineering, 15(1), 369–380.CrossRef Peng, Z., Gao, S., Xiao, B., Guo, S., & Yang, Y. (2018). CrowdGIS: Updating digital maps via mobile crowdsensing. IEEE Transactions on Automation Science and Engineering, 15(1), 369–380.CrossRef
Zurück zum Zitat Pont-Tuset, J., Arbeláez, P., Barron, J. T., Marques, F., & Malik, J. (2017). Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 128–140.CrossRef Pont-Tuset, J., Arbeláez, P., Barron, J. T., Marques, F., & Malik, J. (2017). Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 128–140.CrossRef
Zurück zum Zitat Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., & He, K. (2018). Data distillation: Towards omni-supervised learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4119–4128). Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., & He, K. (2018). Data distillation: Towards omni-supervised learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4119–4128).
Zurück zum Zitat Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.CrossRef Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.CrossRef
Zurück zum Zitat Rother, C., Kolmogorov, V., & Blake, A. (2004). “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3), 309–314.CrossRef Rother, C., Kolmogorov, V., & Blake, A. (2004). “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3), 309–314.CrossRef
Zurück zum Zitat Ruder, S. (2017). An overview of multi-task learning in deep neural networks. arxiv. Ruder, S. (2017). An overview of multi-task learning in deep neural networks. arxiv.
Zurück zum Zitat Saleh, F. S., Aliakbarian, M. S., Salzmann, M., Petersson, L., Alvarez, J. M., & Gould, S. (2018). Incorporating network built-in priors in weakly-supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 1382–1396.CrossRef Saleh, F. S., Aliakbarian, M. S., Salzmann, M., Petersson, L., Alvarez, J. M., & Gould, S. (2018). Incorporating network built-in priors in weakly-supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 1382–1396.CrossRef
Zurück zum Zitat Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2020). Grad-CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2), 336–359.CrossRef Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2020). Grad-CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2), 336–359.CrossRef
Zurück zum Zitat Shariati, A., Holz, C., & Sinha, S. (2020). Towards privacy-preserving ego-motion estimation using an extremely low-resolution camera. IEEE Robotics and Automation Letters, 5(2), 1222–1229.CrossRef Shariati, A., Holz, C., & Sinha, S. (2020). Towards privacy-preserving ego-motion estimation using an extremely low-resolution camera. IEEE Robotics and Automation Letters, 5(2), 1222–1229.CrossRef
Zurück zum Zitat Shi, B., Bai, X., & Yao, C. (2016). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298–2304.CrossRef Shi, B., Bai, X., & Yao, C. (2016). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298–2304.CrossRef
Zurück zum Zitat Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In Workshop at International Conference on Learning Representations. Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In Workshop at International Conference on Learning Representations.
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arxiv. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arxiv.
Zurück zum Zitat Singh, A., Yang, L., & Levine, S. (2017). GPLAC: Generalizing vision-based robotic skills using weakly labeled images. In Proceedings of the IEEE International Conference on Computer Vision (pp. 5851–5860). Singh, A., Yang, L., & Levine, S. (2017). GPLAC: Generalizing vision-based robotic skills using weakly labeled images. In Proceedings of the IEEE International Conference on Computer Vision (pp. 5851–5860).
Zurück zum Zitat Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., et al. (2019). Scalability in perception for autonomous driving: waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2446–2454). Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., et al. (2019). Scalability in perception for autonomous driving: waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2446–2454).
Zurück zum Zitat Thompson, C., Khan, H., Dworakowski, D., Harrigan, K., & Nejat, G. (2018). An autonomous shopping assistance robot for grocery stores. In IEEE/RSJ Proceedings of the Workshop on Robotic Co-workers 4.0. Thompson, C., Khan, H., Dworakowski, D., Harrigan, K., & Nejat, G. (2018). An autonomous shopping assistance robot for grocery stores. In IEEE/RSJ Proceedings of the Workshop on Robotic Co-workers 4.0.
Zurück zum Zitat Vardazaryan, A., Mutter, D., Marescaux, J., & Padoy, N., et al. (2018). Weakly-supervised learning for tool localization in laparoscopic videos. In D. Stoyanov, Z. Taylor, S. Balocco, R. Sznitman, A. Martel, & L. Maier-Hein (Eds.), Intravascular imaging and computer assisted stenting and large-scale annotation of biomedical data and expert label synthesis (pp. 169–179). Springer.CrossRef Vardazaryan, A., Mutter, D., Marescaux, J., & Padoy, N., et al. (2018). Weakly-supervised learning for tool localization in laparoscopic videos. In D. Stoyanov, Z. Taylor, S. Balocco, R. Sznitman, A. Martel, & L. Maier-Hein (Eds.), Intravascular imaging and computer assisted stenting and large-scale annotation of biomedical data and expert label synthesis (pp. 169–179). Springer.CrossRef
Zurück zum Zitat Vilar, E., Rebelo, F., & Noriega, P. (2014). Indoor human wayfinding performance using vertical and horizontal signage in virtual reality. Human Factors and Ergonomics in Manufacturing & Service Industries, 24(6), 601–615.CrossRef Vilar, E., Rebelo, F., & Noriega, P. (2014). Indoor human wayfinding performance using vertical and horizontal signage in virtual reality. Human Factors and Ergonomics in Manufacturing & Service Industries, 24(6), 601–615.CrossRef
Zurück zum Zitat Wan, F., Liu, C., Ke, W., Ji, X., Jiao, J., & Ye, Q. (2019). C-MIL: Continuation multiple instance learning for weakly supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Wan, F., Liu, C., Ke, W., Ji, X., Jiao, J., & Ye, Q. (2019). C-MIL: Continuation multiple instance learning for weakly supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zurück zum Zitat Wan, F., Wei, P., Jiao, J., Han, Z., & Ye, Q. (2018). Min-entropy latent model for weakly supervised object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Wan, F., Wei, P., Jiao, J., Han, Z., & Ye, Q. (2018). Min-entropy latent model for weakly supervised object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Zurück zum Zitat Wang, B. H., Chao, W., Wang, Y., Hariharan, B., Weinberger, K. Q., & Campbell, M. (2019). LDLS: 3-D object segmentation through label diffusion from 2-D images. IEEE Robotics and Automation Letters, 4(3), 2902–2909.CrossRef Wang, B. H., Chao, W., Wang, Y., Hariharan, B., Weinberger, K. Q., & Campbell, M. (2019). LDLS: 3-D object segmentation through label diffusion from 2-D images. IEEE Robotics and Automation Letters, 4(3), 2902–2909.CrossRef
Zurück zum Zitat Wang, C., Zhao, S., Zhu, L., Luo, K., Guo, Y., Wang, J., & Liu, S. (2021). Semi-supervised pixel-level scene text segmentation by mutually guided network. IEEE Transactions on Image Processing, 30, 8212–8221.CrossRef Wang, C., Zhao, S., Zhu, L., Luo, K., Guo, Y., Wang, J., & Liu, S. (2021). Semi-supervised pixel-level scene text segmentation by mutually guided network. IEEE Transactions on Image Processing, 30, 8212–8221.CrossRef
Zurück zum Zitat Wang, H., Finn, C., Paull, L., Kaess, M., Rosenholtz, R., Teller, S., & Leonard, J. (2015). Bridging text spotting and SLAM with junction features. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 3701–3708). Wang, H., Finn, C., Paull, L., Kaess, M., Rosenholtz, R., Teller, S., & Leonard, J. (2015). Bridging text spotting and SLAM with junction features. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 3701–3708).
Zurück zum Zitat Wang, L., Lu, H., Wang, Y., Feng, M., Wang, D., Yin, B., & Ruan, X. (2017). Learning to detect salient objects with image-level supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 136–145). Wang, L., Lu, H., Wang, Y., Feng, M., Wang, D., Yin, B., & Ruan, X. (2017). Learning to detect salient objects with image-level supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 136–145).
Zurück zum Zitat Wei, Y., Feng, J., Liang, X., Cheng, M.-M., Zhao, Y., & Yan, S. (2017). Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1568–1576). Wei, Y., Feng, J., Liang, X., Cheng, M.-M., Zhao, Y., & Yan, S. (2017). Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1568–1576).
Zurück zum Zitat Wei, Y., Xiao, H., Shi, H., Jie, Z., Feng, J., & Huang, T. S. (2018). Revisiting dilated convolution: A simple approach for weakly- and semi-supervised semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 7268–7277). Wei, Y., Xiao, H., Shi, H., Jie, Z., Feng, J., & Huang, T. S. (2018). Revisiting dilated convolution: A simple approach for weakly- and semi-supervised semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 7268–7277).
Zurück zum Zitat Wellhausen, L., Dosovitskiy, A., Ranftl, R., Walas, K., Cadena, C., & Hutter, M. (2019). Where should i walk? Predicting terrain properties from images via self-supervised learning. IEEE Robotics and Automation Letters, 4(2), 1509–1516.CrossRef Wellhausen, L., Dosovitskiy, A., Ranftl, R., Walas, K., Cadena, C., & Hutter, M. (2019). Where should i walk? Predicting terrain properties from images via self-supervised learning. IEEE Robotics and Automation Letters, 4(2), 1509–1516.CrossRef
Zurück zum Zitat Wu, W., Xie, E., Zhang, R., Wang, W., Pang, G., Li, Z., et al. (2020). SelfText beyond polygon: Unconstrained text detection with box supervision and dynamic self-training, arXiv. Wu, W., Xie, E., Zhang, R., Wang, W., Pang, G., Li, Z., et al. (2020). SelfText beyond polygon: Unconstrained text detection with box supervision and dynamic self-training, arXiv.
Zurück zum Zitat Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K., test, & tst. (2017). Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1492–1500). Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K., test, & tst. (2017). Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1492–1500).
Zurück zum Zitat Zamir, A. R., Sax, A., Shen, W., Guibas, L. J., Malik, J., & Savarese, S. (2018). Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3712–3722). Zamir, A. R., Sax, A., Shen, W., Guibas, L. J., Malik, J., & Savarese, S. (2018). Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3712–3722).
Zurück zum Zitat Zhang, B., Xiao, J., Wei, Y., Sun, M., & Huang, K. (2019). Reliability does matter: An End-to-end weakly supervised semantic segmentation approach. arxiv. Zhang, B., Xiao, J., Wei, Y., Sun, M., & Huang, K. (2019). Reliability does matter: An End-to-end weakly supervised semantic segmentation approach. arxiv.
Zurück zum Zitat Zhang, J., Bargal, S. A., Lin, Z., Brandt, J., Shen, X., & Sclaroff, S. (2018). Top-down neural attention by excitation backprop. International Journal of Computer Vision, 126(10), 1084–1102.CrossRef Zhang, J., Bargal, S. A., Lin, Z., Brandt, J., Shen, X., & Sclaroff, S. (2018). Top-down neural attention by excitation backprop. International Journal of Computer Vision, 126(10), 1084–1102.CrossRef
Zurück zum Zitat Zhao, X., Liang, S., & Wei, Y. (2018). Pseudo mask augmented object detection. In Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition (pp. 4061–4070). Zhao, X., Liang, S., & Wei, Y. (2018). Pseudo mask augmented object detection. In Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition (pp. 4061–4070).
Zurück zum Zitat Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., & Torralba, A. (2019). Semantic understanding of scenes through the ADE20K dataset. International Journal of Computer Vision, 127(3), 302–321.CrossRef Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., & Torralba, A. (2019). Semantic understanding of scenes through the ADE20K dataset. International Journal of Computer Vision, 127(3), 302–321.CrossRef
Zurück zum Zitat Zhou, Y., Zhu, Y., Ye, Q., Qiu, Q., & Jiao, J. (2018). Weakly supervised instance segmentation using class peak response. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3791–3800). Zhou, Y., Zhu, Y., Ye, Q., Qiu, Q., & Jiao, J. (2018). Weakly supervised instance segmentation using class peak response. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3791–3800).
Zurück zum Zitat Zhou, Z.-H. (2017). A brief introduction to weakly supervised learning. National Science Review, 5(1), 44–53.CrossRef Zhou, Z.-H. (2017). A brief introduction to weakly supervised learning. National Science Review, 5(1), 44–53.CrossRef
Metadaten
Titel
Robots Understanding Contextual Information in Human-Centered Environments Using Weakly Supervised Mask Data Distillation
verfasst von
Daniel Dworakowski
Angus Fung
Goldie Nejat
Publikationsdatum
11.11.2022
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 2/2023
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-022-01706-5

Weitere Artikel der Ausgabe 2/2023

International Journal of Computer Vision 2/2023 Zur Ausgabe

Premium Partner