Skip to main content
Erschienen in:

07.11.2024 | Vision and Sensors

LDMSNet: Lightweight Dual-Branch Multi-Scale Network for Real-Time Semantic Segmentation of Autonomous Driving

verfasst von: Haoran Yang, Dan Zhang, Jiazai Liu, Zekun Cao, Na Wang

Erschienen in: International Journal of Automotive Technology | Ausgabe 2/2025

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Semantic segmentation plays a crucial role in autonomous driving systems, serving as a key technology for understanding and interpreting the road environment. Most existing semantic segmentation networks strive for high accuracy, but achieving true real-time performance while maintaining high accuracy remains a challenge. However, autonomous driving systems require extremely high reaction speed and real-time processing capabilities, and any processing delay may lead to safety risks. To solve this problem, this paper proposes a lightweight dual-branch multi-scale network (LDMSNet) to achieve real-time semantic segmentation. First, the effective dilated bottleneck (EDB) is proposed to efficiently extract semantic information and spatial information using complementary dual-branch structure and depth-wise dilated convolution. Second, the multi-scale pyramid pooling module (MSPPM) is proposed, which uses a hierarchical residual structure and combines with dilated convolution to extract detailed information from low-resolution branches. Third, the polarized self-attention mechanism (PSA) is introduced to further enhance the interaction and correlation between features and improve the ability to perceive global information. The experimental results show that LDMSNet achieves 74.46% MIoU at 113FPS on the Cityscapes dataset, 71.51% MloU at 153FPS on the CamVid dataset and 77.41% MIoU at 170FPS on the StreetView dataset, effectively balancing speed and accuracy compared to state-of-the-art models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information. 

Order your 30-days-trial for free and without any commitment.

Weitere Produktempfehlungen anzeigen
Literatur
Zurück zum Zitat Brostow, G. J., Shotton, J., Fauqueur, R., & Cipolla, J. (2008). Segmentation and recognition using structure from motion point clouds. In D. Forsyth, P. Torr, & A. Zisserman (Eds.), European conference on computer vision (pp. 44–57). Berlin Heidelberg: Springer. Brostow, G. J., Shotton, J., Fauqueur, R., & Cipolla, J. (2008). Segmentation and recognition using structure from motion point clouds. In D. Forsyth, P. Torr, & A. Zisserman (Eds.), European conference on computer vision (pp. 44–57). Berlin Heidelberg: Springer.
Zurück zum Zitat Cordts, M., Ramos, O.S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The scapes dataset for semantic urban scene understanding. In: IEEE international conference on computer vision and pattern recognition. pp. 3213–3223. Cordts, M., Ramos, O.S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The scapes dataset for semantic urban scene understanding. In: IEEE international conference on computer vision and pattern recognition. pp. 3213–3223.
Zurück zum Zitat Du, J., Huang, X., Xing, M., & Zhang, T. J. (2023). Improved 3D semantic segmentation model based on Rgb image and LiDAR point cloud fusion for automantic driving. International Journal of Automotive Technology, 24(3), 787–797.MATH Du, J., Huang, X., Xing, M., & Zhang, T. J. (2023). Improved 3D semantic segmentation model based on Rgb image and LiDAR point cloud fusion for automantic driving. International Journal of Automotive Technology, 24(3), 787–797.MATH
Zurück zum Zitat FanWang, F., Chu, H., Hu, X., Cheng, Y., & Gao, B. J. (2022). Mlfnet: Multi-level fusion network for real-time semantic segmentation of autonomous driving. IEEE Transactions on Intelligent Vehicles, 8, 756–767.MATH FanWang, F., Chu, H., Hu, X., Cheng, Y., & Gao, B. J. (2022). Mlfnet: Multi-level fusion network for real-time semantic segmentation of autonomous driving. IEEE Transactions on Intelligent Vehicles, 8, 756–767.MATH
Zurück zum Zitat Gao, X. G., Yu, Y., Xie, J., Yang, J., & Yue, D. G. (2021). MSCFNet: A lightweight network with multi-scale context fusion for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 23, 25489–25499. Gao, X. G., Yu, Y., Xie, J., Yang, J., & Yue, D. G. (2021). MSCFNet: A lightweight network with multi-scale context fusion for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 23, 25489–25499.
Zurück zum Zitat Guo, J. W., & Xu, D. O. (2020). Multi-dimensional pruning: A unified framework for model compression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1508–1517. Guo, J. W., & Xu, D. O. (2020). Multi-dimensional pruning: A unified framework for model compression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1508–1517.
Zurück zum Zitat Han, K.Y., Tian, Q., & Wang, Y. (2020). Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1580–1589. Han, K.Y., Tian, Q., & Wang, Y. (2020). Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1580–1589.
Zurück zum Zitat He, K.X., Ren, S., Zhang, X., & Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778. He, K.X., Ren, S., Zhang, X., & Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778.
Zurück zum Zitat He, Y., Liang, S. H., Wu, X., Zhao, B., & Zhang, L. J. (2021). Mgseg: Multiple granularity-based real-time semantic segmentation network. IEEE Transactions on Image Processing, 30, 7200–7214. He, Y., Liang, S. H., Wu, X., Zhao, B., & Zhang, L. J. (2021). Mgseg: Multiple granularity-based real-time semantic segmentation network. IEEE Transactions on Image Processing, 30, 7200–7214.
Zurück zum Zitat Hu, X., & Zhou, B. (2023). LBARNet: Lightweight bilateral asymmetric residual network for real-time semantic segmentation. Computers & Graphics, 116, 1–12.MATH Hu, X., & Zhou, B. (2023). LBARNet: Lightweight bilateral asymmetric residual network for real-time semantic segmentation. Computers & Graphics, 116, 1–12.MATH
Zurück zum Zitat Jiang, Y. Z., Han, L., & Jiang, Z. (2024). MCA: Moment channel attention networks. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3), 2579–2588.MATH Jiang, Y. Z., Han, L., & Jiang, Z. (2024). MCA: Moment channel attention networks. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3), 2579–2588.MATH
Zurück zum Zitat Kim, J. Y., & Heo, S. (2019). Efficient semantic segmentation using spatio-channel dilated convolutions. IEEE Access, 7, 154239–154252. Kim, J. Y., & Heo, S. (2019). Efficient semantic segmentation using spatio-channel dilated convolutions. IEEE Access, 7, 154239–154252.
Zurück zum Zitat Li, H. P., Fan, H., Sun, J., & Xiong, P. (2019). Dfanet: Deep feature aggregation for realtime semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9522–31. Li, H. P., Fan, H., Sun, J., & Xiong, P. (2019). Dfanet: Deep feature aggregation for realtime semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9522–31.
Zurück zum Zitat Liu, J. Q., Qiang, Y., & Zhou Q. (2020). FDDWNet: a lightweight convolutional neural network for real-time semantic segmentation. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE. pp. 2373–2377. Liu, J. Q., Qiang, Y., & Zhou Q. (2020). FDDWNet: a lightweight convolutional neural network for real-time semantic segmentation. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE. pp. 2373–2377.
Zurück zum Zitat Lo, S., Hang, H. M., Chan, S.W., & Lin, J. J. (2019). Efficient dense modules of asymmetric convolution for real-time semantic segmentation. In: Proceedings of the ACM multimedia Asia. pp. 1–6. Lo, S., Hang, H. M., Chan, S.W., & Lin, J. J. (2019). Efficient dense modules of asymmetric convolution for real-time semantic segmentation. In: Proceedings of the ACM multimedia Asia. pp. 1–6.
Zurück zum Zitat Mehta, S. M., Shapiro, L., & Rastegari, M. (2019). Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network. In: Proceedings of the IEEE/CVF confer-ence on computer vision and pattern recognition. pp. 9190–9200. Mehta, S. M., Shapiro, L., & Rastegari, M. (2019). Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network. In: Proceedings of the IEEE/CVF confer-ence on computer vision and pattern recognition. pp. 9190–9200.
Zurück zum Zitat Peng, H. C., Shao, Y., & Xue, C. (2020). Semantic segmentation of litchi branches using DeepLabV3+ model. IEEE Access, 8, 164546–164555. Peng, H. C., Shao, Y., & Xue, C. (2020). Semantic segmentation of litchi branches using DeepLabV3+ model. IEEE Access, 8, 164546–164555.
Zurück zum Zitat Ren, F., Zhou, H., Yang, L., Bai, Y., & Xu, W. F. (2023). STDBNet: Shared trunk and dual-branch network for real-time semantic segmentation. IEEE Signal Processing Letters, 31, 770–774.MATH Ren, F., Zhou, H., Yang, L., Bai, Y., & Xu, W. F. (2023). STDBNet: Shared trunk and dual-branch network for real-time semantic segmentation. IEEE Signal Processing Letters, 31, 770–774.MATH
Zurück zum Zitat Romera, E., Alvarez, L. M., Bergasa, R., & ArroyoJ, M. (2018). ERFNet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 19(1), 263–272. Romera, E., Alvarez, L. M., Bergasa, R., & ArroyoJ, M. (2018). ERFNet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 19(1), 263–272.
Zurück zum Zitat Sun, K. B., Liu, D., & Xiao, B. (2019). Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5693–5703. Sun, K. B., Liu, D., & Xiao, B. (2019). Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5693–5703.
Zurück zum Zitat Vijay, B., Kendall, R. C., & Cipolla, R. (2017). Segnet: A deep convolutional encoderdecoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495.MATH Vijay, B., Kendall, R. C., & Cipolla, R. (2017). Segnet: A deep convolutional encoderdecoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495.MATH
Zurück zum Zitat Wang, Y.Q., Xiong, J., Wu, X., Jin, X., & Zhou, Q. (2019). Esnet: An efficient symmetric network for real-time semantic segmentation. In: Chinese conference on pattern recognition and computer vision. pp. 41–52. Wang, Y.Q., Xiong, J., Wu, X., Jin, X., & Zhou, Q. (2019). Esnet: An efficient symmetric network for real-time semantic segmentation. In: Chinese conference on pattern recognition and computer vision. pp. 41–52.
Zurück zum Zitat Wu, J. R., Liu, J., & Jiang, Y. (2021). Real-time semantic segmentation via sequential knowledge distillation. Neurocomputing, 439, 134–145.MATH Wu, J. R., Liu, J., & Jiang, Y. (2021). Real-time semantic segmentation via sequential knowledge distillation. Neurocomputing, 439, 134–145.MATH
Zurück zum Zitat Xie, W. W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. E. (2021). SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34, 12077–12090.MATH Xie, W. W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. E. (2021). SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34, 12077–12090.MATH
Zurück zum Zitat Xu, J. Z., Bhattacharyya, S., & Xiong P. (2023). PIDNet: A real-time semantic segmentation network inspired by PID controllers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 19529–19539. Xu, J. Z., Bhattacharyya, S., & Xiong P. (2023). PIDNet: A real-time semantic segmentation network inspired by PID controllers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 19529–19539.
Zurück zum Zitat Yu, W. J., Peng, C., Gao, C., Yu, G., Sang, N. C. (2018). Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV). pp. 325–341. Yu, W. J., Peng, C., Gao, C., Yu, G., Sang, N. C. (2018). Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV). pp. 325–341.
Zurück zum Zitat Yu, G. C., Wang, J., Yu, G., Shen, C., & Sang, N. C. (2021). Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. International Journal of Computer Vision, 129, 3051–3068.MATH Yu, G. C., Wang, J., Yu, G., Shen, C., & Sang, N. C. (2021). Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. International Journal of Computer Vision, 129, 3051–3068.MATH
Zurück zum Zitat Zhang, Q., & Yang, B. (2021). Sa-net: Shuffle attention for deep convolutional neural networks. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE. pp. 2235–2239. Zhang, Q., & Yang, B. (2021). Sa-net: Shuffle attention for deep convolutional neural networks. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE. pp. 2235–2239.
Zurück zum Zitat Zhang, H. Z., Luo, G., Chen, T., Wang, X., Liu, W., & Shen, C. W. (2022). Topformer: Token pyramid transformer for mobile semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12083–12093. Zhang, H. Z., Luo, G., Chen, T., Wang, X., Liu, W., & Shen, C. W. (2022). Topformer: Token pyramid transformer for mobile semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12083–12093.
Zurück zum Zitat Zhao, H.J., Qi, X., & Shi, J. (2017). Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2881–2890. Zhao, H.J., Qi, X., & Shi, J. (2017). Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2881–2890.
Zurück zum Zitat Zhou, Q. Y., Fan, Y., & Wang, Y. (2020). AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network. Applied Soft Computing, 96, 106682. Zhou, Q. Y., Fan, Y., & Wang, Y. (2020). AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network. Applied Soft Computing, 96, 106682.
Metadaten
Titel
LDMSNet: Lightweight Dual-Branch Multi-Scale Network for Real-Time Semantic Segmentation of Autonomous Driving
verfasst von
Haoran Yang
Dan Zhang
Jiazai Liu
Zekun Cao
Na Wang
Publikationsdatum
07.11.2024
Verlag
The Korean Society of Automotive Engineers
Erschienen in
International Journal of Automotive Technology / Ausgabe 2/2025
Print ISSN: 1229-9138
Elektronische ISSN: 1976-3832
DOI
https://doi.org/10.1007/s12239-024-00179-4