Skip to main content
Top

2023 | OriginalPaper | Chapter

CAENet: Efficient Multi-task Learning for Joint Semantic Segmentation and Depth Estimation

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we propose an efficient multi-task method, named Context-aware Attentive Enrichment Network (CAENet), to deal with the problem of real-time joint semantic segmentation and depth estimation. Building upon a light-weight encoder backbone, an efficient decoder is devised to fully leverage available information from multi-scale encoder features. In particular, a new Inception Residual Pooling (IRP) module is designed to efficiently extract contextual information from the high-level features with diverse receptive fields to improve semantic understanding ability. Then the context-aware features are enriched adaptively with spatial details from low-level features via a Light-weight Attentive Fusion (LAF) module using pseudo stereoscopic attention mechanism. These two modules are progressively used in a recursive manner to generate high-resolution shared features, which are further processed by task-specific heads to produce final outputs. Such network design effectively captures beneficial information for both semantic segmentation and depth estimation tasks while largely reducing the computational budget. Extensive experiments across multi-task benchmarks validate that CAENet achieves state-of-the-art performance with comparable inference speed against other real-time competing methods. Code is available at https://​github.​com/​wlx-zju/​CAENet.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
6.
go back to reference Chen, L., Yang, Z., Ma, J., Luo, Z.: Driving scene perception network: real-time joint detection, depth estimation and semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1283–1291 (2018). https://doi.org/10.1109/WACV.2018.00145 Chen, L., Yang, Z., Ma, J., Luo, Z.: Driving scene perception network: real-time joint detection, depth estimation and semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1283–1291 (2018). https://​doi.​org/​10.​1109/​WACV.​2018.​00145
8.
go back to reference Couprie, C., Farabet, C., Najman, L., LeCun, Y.: Indoor semantic segmentation using depth information. In: Bengio, Y., LeCun, Y. (eds.) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, 2–4 May 2013, Conference Track Proceedings, pp. 1–8 (2013). https://arxiv.org/abs/1301.3572 Couprie, C., Farabet, C., Najman, L., LeCun, Y.: Indoor semantic segmentation using depth information. In: Bengio, Y., LeCun, Y. (eds.) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, 2–4 May 2013, Conference Track Proceedings, pp. 1–8 (2013). https://​arxiv.​org/​abs/​1301.​3572
11.
13.
go back to reference Gao, Y., Ma, J., Zhao, M., Liu, W., Yuille, A.L.: NDDR-CNN: layerwise feature fusing in multi-task CNNs by neural discriminative dimensionality reduction. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3200–3209 (2019). https://doi.org/10.1109/CVPR.2019.00332 Gao, Y., Ma, J., Zhao, M., Liu, W., Yuille, A.L.: NDDR-CNN: layerwise feature fusing in multi-task CNNs by neural discriminative dimensionality reduction. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3200–3209 (2019). https://​doi.​org/​10.​1109/​CVPR.​2019.​00332
18.
go back to reference Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings, pp. 1–15 (2015). https://arxiv.org/abs/1412.6980 Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings, pp. 1–15 (2015). https://​arxiv.​org/​abs/​1412.​6980
19.
go back to reference Lin, B., YE, F., Zhang, Y., Tsang, I.: Reasonable effectiveness of random weighting: a litmus test for multi-task learning. In: Transactions on Machine Learning Research (2022) Lin, B., YE, F., Zhang, Y., Tsang, I.: Reasonable effectiveness of random weighting: a litmus test for multi-task learning. In: Transactions on Machine Learning Research (2022)
20.
22.
go back to reference Liu, B., Liu, X., Jin, X., Stone, P., Liu, Q.: Conflict-averse gradient descent for multi-task learning. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 18878–18890. Curran Associates, Inc. (2021) Liu, B., Liu, X., Jin, X., Stone, P., Liu, Q.: Conflict-averse gradient descent for multi-task learning. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 18878–18890. Curran Associates, Inc. (2021)
27.
go back to reference Nekrasov, V., Dharmasiri, T., Spek, A., Drummond, T., Shen, C., Reid, I.: Real-time joint semantic segmentation and depth estimation using asymmetric annotations. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 7101–7107 (2019). https://doi.org/10.1109/ICRA.2019.8794220 Nekrasov, V., Dharmasiri, T., Spek, A., Drummond, T., Shen, C., Reid, I.: Real-time joint semantic segmentation and depth estimation using asymmetric annotations. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 7101–7107 (2019). https://​doi.​org/​10.​1109/​ICRA.​2019.​8794220
30.
go back to reference Oršic, M., Krešo, I., Bevandic, P., Šegvic, S.: In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12599–12608 (2019). https://doi.org/10.1109/CVPR.2019.01289 Oršic, M., Krešo, I., Bevandic, P., Šegvic, S.: In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12599–12608 (2019). https://​doi.​org/​10.​1109/​CVPR.​2019.​01289
44.
go back to reference Xu, D., Ouyang, W., Wang, X., Sebe, N.: PAD-Net: multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 675–684 (2018). https://doi.org/10.1109/CVPR.2018.00077 Xu, D., Ouyang, W., Wang, X., Sebe, N.: PAD-Net: multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 675–684 (2018). https://​doi.​org/​10.​1109/​CVPR.​2018.​00077
Metadata
Title
CAENet: Efficient Multi-task Learning for Joint Semantic Segmentation and Depth Estimation
Authors
Luxi Wang
Yingming Li
Copyright Year
2023
DOI
https://doi.org/10.1007/978-3-031-43424-2_25

Premium Partner