Skip to main content
Top

2025 | OriginalPaper | Chapter

Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-training

Authors : Qiangqiang Wu, Yan Xia, Jia Wan, Antoni B. Chan

Published in: Computer Vision – ECCV 2024

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

3D single object tracking (SOT) is an essential task in autonomous driving and robotics. However, learning robust 3D SOT trackers remains challenging due to the limited category-specific point cloud data and the inherent sparsity and incompleteness of LiDAR scans. To tackle these issues, we propose a unified 3D SOT framework that leverages 3D generative pre-training and learns robust 3D matching abilities from 2D pre-trained foundation trackers. Our framework features a consistent target-matching architecture with the widely used 2D trackers, facilitating the transfer of 2D matching knowledge. Specifically, we first propose a lightweight Target-Aware Projection (TAP) module, allowing the pre-trained 2D tracker to work well on the projected point clouds without further fine-tuning. We then propose a novel IoU-guided matching-distillation framework that utilizes the powerful 2D pre-trained trackers to guide 3D matching learning in the 3D tracker, i.e., the 3D template-to-search matching should be consistent with its corresponding 2D template-to-search matching obtained from 2D pre-trained trackers. Our designs are applied to two mainstream 3D SOT frameworks: memory-less Siamese and contextual memory-based approaches, which are respectively named SiamDisst and MemDisst. Extensive experiments show that SiamDisst and MemDisst achieve state-of-the-art performance on KITTI, Waymo Open Dataset and nuScenes benchmarks, while running at above real-time speed of 25 and 90 FPS on a RTX3090 GPU.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
We follow the same pre-training hyper-parameters, steps and dataset (ShapeNet [6]) with Point-MAE [42] for pre-training.
 
Literature
1.
go back to reference Anthes, C., García-Hernández, R.J., Wiedemann, M., Kranzlmüller, D.: State of the art of virtual reality technology. In: 2016 IEEE Aerospace Conference, pp. 1–19. IEEE (2016) Anthes, C., García-Hernández, R.J., Wiedemann, M., Kranzlmüller, D.: State of the art of virtual reality technology. In: 2016 IEEE Aerospace Conference, pp. 1–19. IEEE (2016)
2.
go back to reference Ben-Baruch, E., Karklinsky, M., Biton, Y., Ben-Cohen, A., Lawen, H., Zamir, N.: It’s all in the head: representation knowledge distillation through classifier sharing (2022). arXiv:2201.06945 Ben-Baruch, E., Karklinsky, M., Biton, Y., Ben-Cohen, A., Lawen, H., Zamir, N.: It’s all in the head: representation knowledge distillation through classifier sharing (2022). arXiv:​2201.​06945
4.
go back to reference Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: ICCV, pp. 6182–6191 (2019) Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: ICCV, pp. 6182–6191 (2019)
5.
go back to reference Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020) Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)
7.
go back to reference Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: CVPR, pp. 8126–8135 (2021) Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: CVPR, pp. 8126–8135 (2021)
9.
go back to reference Cui, Y., Jiang, C., Wang, L., Wu, G.: MixFormer: end-to-end tracking with iterative mixed attention. In: CVPR (2022) Cui, Y., Jiang, C., Wang, L., Wu, G.: MixFormer: end-to-end tracking with iterative mixed attention. In: CVPR (2022)
10.
go back to reference Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR, pp. 21–26 (2017) Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR, pp. 21–26 (2017)
11.
go back to reference Dosovitskiy, A., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2010) Dosovitskiy, A., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. arXiv preprint arXiv:​2010.​11929 (2010)
12.
go back to reference Duong, C.N., Luu, K., Quach, K.G., Le, N.: ShrinkTeaNet: million-scale lightweight face recognition via shrinking teacher-student networks (2019). arXiv:1905.10620 Duong, C.N., Luu, K., Quach, K.G., Le, N.: ShrinkTeaNet: million-scale lightweight face recognition via shrinking teacher-student networks (2019). arXiv:​1905.​10620
13.
go back to reference Fan, H., Lin, L., Yang, F.: LaSOT: a high-quality benchmark for large-scale single object tracking. In: CVPR, pp. 5374–5383 (2019) Fan, H., Lin, L., Yang, F.: LaSOT: a high-quality benchmark for large-scale single object tracking. In: CVPR, pp. 5374–5383 (2019)
14.
go back to reference Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7952–7961 (2019) Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7952–7961 (2019)
15.
go back to reference Fang, Z., Zhou, S., Cui, Y., Scherer, S.: 3D-SiamRPN: an end-to-end learning method for real-time 3D single object tracking using raw point cloud. IEEE Sens. J. 21(4), 4995–5011 (2020)CrossRef Fang, Z., Zhou, S., Cui, Y., Scherer, S.: 3D-SiamRPN: an end-to-end learning method for real-time 3D single object tracking using raw point cloud. IEEE Sens. J. 21(4), 4995–5011 (2020)CrossRef
16.
go back to reference Feng, S., Liang, P., Gao, J., Cheng, E.: Multi-correlation siamese transformer network with dense connection for 3D single object tracking. IEEE Robot. Autom. Lett. 8(12), 8066–8073 (2023)CrossRef Feng, S., Liang, P., Gao, J., Cheng, E.: Multi-correlation siamese transformer network with dense connection for 3D single object tracking. IEEE Robot. Autom. Lett. 8(12), 8066–8073 (2023)CrossRef
17.
go back to reference Galoogahi, H., Fagg, A., Lucey, S.: Learning background-aware correlation filters for visual tracking. In: ICCV (2017) Galoogahi, H., Fagg, A., Lucey, S.: Learning background-aware correlation filters for visual tracking. In: ICCV (2017)
18.
go back to reference Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)CrossRef Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)CrossRef
19.
go back to reference Giancola, S., Zarzar, J., Ghanem, B.: Leveraging shape completion for 3D siamese tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1359–1368 (2019) Giancola, S., Zarzar, J., Ghanem, B.: Leveraging shape completion for 3D siamese tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1359–1368 (2019)
20.
go back to reference Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
22.
go back to reference He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022) He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
23.
go back to reference Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)CrossRef Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)CrossRef
25.
go back to reference Huang, L., Zhao, X., Huang, K.: GOT-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1562–1577 (2019)CrossRef Huang, L., Zhao, X., Huang, K.: GOT-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1562–1577 (2019)CrossRef
26.
go back to reference Hui, L., Wang, L., Cheng, M., Xie, J., Yang, J.: 3D siamese voxel-to-BEV tracker for sparse point clouds. In: Advances in Neural Information Processing Systems 34, pp. 28714–28727 (2021) Hui, L., Wang, L., Cheng, M., Xie, J., Yang, J.: 3D siamese voxel-to-BEV tracker for sparse point clouds. In: Advances in Neural Information Processing Systems 34, pp. 28714–28727 (2021)
27.
28.
go back to reference Jin, X., Peng, B., Wu, Y., Liu, Y., Liu, J.: Knowledge distillation via route constrained optimization. In: IEEE/CVF International Conference on Computer Vision (2019) Jin, X., Peng, B., Wu, Y., Liu, Y., Liu, J.: Knowledge distillation via route constrained optimization. In: IEEE/CVF International Conference on Computer Vision (2019)
30.
go back to reference Kiran, B.R., et al.: Deep reinforcement learning for autonomous driving: a survey. IEEE Trans. Intell. Transp. Syst. 23(6), 4909–4926 (2021)CrossRef Kiran, B.R., et al.: Deep reinforcement learning for autonomous driving: a survey. IEEE Trans. Intell. Transp. Syst. 23(6), 4909–4926 (2021)CrossRef
31.
go back to reference Kristan, M., et al.: A novel performance evaluation methodology for single-target trackers. IEEE Trans. Pattern Anal. Mach. Intell. 38(11), 2137–2155 (2016)CrossRef Kristan, M., et al.: A novel performance evaluation methodology for single-target trackers. IEEE Trans. Pattern Anal. Mach. Intell. 38(11), 2137–2155 (2016)CrossRef
32.
go back to reference Kristan, M., Matas, J., Danelljan, M.: The first visual object tracking segmentation VOTS2023 challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023) Kristan, M., Matas, J., Danelljan, M.: The first visual object tracking segmentation VOTS2023 challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)
33.
go back to reference Lan, K., Jiang, H., Xie, J.: Temporal-aware siamese tracker: integrate temporal context for 3D object tracking. In: Proceedings of the Asian Conference on Computer Vision, pp. 399–414 (2022) Lan, K., Jiang, H., Xie, J.: Temporal-aware siamese tracker: integrate temporal context for 3D object tracking. In: Proceedings of the Asian Conference on Computer Vision, pp. 399–414 (2022)
35.
go back to reference Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4282–4291 (2019) Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4282–4291 (2019)
36.
go back to reference Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018) Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)
37.
go back to reference Li, J., et al.: Rethinking feature-based knowledge distillation for face recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) Li, J., et al.: Rethinking feature-based knowledge distillation for face recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
38.
go back to reference Lin, L., Fan, H., Zhang, Z., Xu, Y., Ling, H.: SwinTrack: a simple and strong baseline for transformer tracking, pp. 16743–16754 (2022) Lin, L., Fan, H., Zhang, Z., Xu, Y., Ling, H.: SwinTrack: a simple and strong baseline for transformer tracking, pp. 16743–16754 (2022)
39.
go back to reference Liu, Y., Liang, Y., Wu, Q., Zhang, L., Wang, H.: A new framework for multiple deep correlation filters based object tracking. In: ICASSP (2022) Liu, Y., Liang, Y., Wu, Q., Zhang, L., Wang, H.: A new framework for multiple deep correlation filters based object tracking. In: ICASSP (2022)
40.
go back to reference Muller, M., Bibi, A., Giancola, S.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: ECCV, pp. 300–317 (2018) Muller, M., Bibi, A., Giancola, S.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: ECCV, pp. 300–317 (2018)
41.
go back to reference Osep, A., Mehner, W., Mathias, M., Leibe, B.: Combined image-and world-space tracking in traffic scenes. In: IEEE International Conference on Robotics and Automation, pp. 1988–1995. IEEE (2017) Osep, A., Mehner, W., Mathias, M., Leibe, B.: Combined image-and world-space tracking in traffic scenes. In: IEEE International Conference on Robotics and Automation, pp. 1988–1995. IEEE (2017)
43.
go back to reference Pang, Z., Li, Z., Wang, N.: Model-free vehicle tracking and state estimation in point cloud sequences. In: IROS (2021) Pang, Z., Li, Z., Wang, N.: Model-free vehicle tracking and state estimation in point cloud sequences. In: IROS (2021)
44.
go back to reference Peng, B., et al.: Self-supervised knowledge distillation using singular value decomposition. In: IEEE/CVF International Conference on Computer Vision (2019) Peng, B., et al.: Self-supervised knowledge distillation using singular value decomposition. In: IEEE/CVF International Conference on Computer Vision (2019)
45.
go back to reference Peng, B., et al.: ShrinkTeaNet: million-scale lightweight face recognition via shrinking teacher-student networks. In: IEEE/CVF International Conference on Computer Vision (2019) Peng, B., et al.: ShrinkTeaNet: million-scale lightweight face recognition via shrinking teacher-student networks. In: IEEE/CVF International Conference on Computer Vision (2019)
46.
go back to reference Qi, C., Su, H., Mo, K., Guibas, L.: PointNet: deep learning on point sets for 3D classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2017) Qi, C., Su, H., Mo, K., Guibas, L.: PointNet: deep learning on point sets for 3D classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
47.
go back to reference Qi, H., Feng, C., Cao, Z., Zhao, F., Xiao, Y.: P2B: point-to-box network for 3D object tracking in point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6329–6338 (2020) Qi, H., Feng, C., Cao, Z., Zhao, F., Xiao, Y.: P2B: point-to-box network for 3D object tracking in point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6329–6338 (2020)
48.
go back to reference Qi, Z., et al.: Contrast with reconstruct: contrastive 3D representation learning guided by generative pretraining (2023) Qi, Z., et al.: Contrast with reconstruct: contrastive 3D representation learning guided by generative pretraining (2023)
49.
go back to reference Ran, T., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016) Ran, T., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)
51.
go back to reference Shan, J., Zhou, S., Fang, Z., Cui, Y.: PTT: point-track-transformer module for 3D single object tracking in point clouds. arXiv preprint arXiv:2108.06455 (2021) Shan, J., Zhou, S., Fang, Z., Cui, Y.: PTT: point-track-transformer module for 3D single object tracking in point clouds. arXiv preprint arXiv:​2108.​06455 (2021)
52.
go back to reference Sun, P., et al.: Scalability in perception for autonomous driving: waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020) Sun, P., et al.: Scalability in perception for autonomous driving: waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020)
54.
go back to reference Tao, F., Wang, M., Yuan, H.: Overcoming catastrophic forgetting in incremental object detection via elastic response distillation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022) Tao, F., Wang, M., Yuan, H.: Overcoming catastrophic forgetting in incremental object detection via elastic response distillation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
55.
go back to reference Wang, N., Song, Y., Ma, C.: Unsupervised deep tracking. In: CVPR, pp. 3708–1317 (2019) Wang, N., Song, Y., Ma, C.: Unsupervised deep tracking. In: CVPR, pp. 3708–1317 (2019)
56.
go back to reference Wang, Z., Xie, Q., Lai, Y.K., Wu, J., Long, K., Wang, J.: MLVSNet: multi-level voting siamese network for 3D visual tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3101–3110 (2021) Wang, Z., Xie, Q., Lai, Y.K., Wu, J., Long, K., Wang, J.: MLVSNet: multi-level voting siamese network for 3D visual tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3101–3110 (2021)
57.
go back to reference Wang, Z., Yu, X., Rao, Y., Zhou, J., Lu, J.: P2P: tuning pre-trained image models for point cloud analysis with point-to-pixel prompting. In: Advances in Neural Information Processing Systems 35, pp. 14388–14402 (2022) Wang, Z., Yu, X., Rao, Y., Zhou, J., Lu, J.: P2P: tuning pre-trained image models for point cloud analysis with point-to-pixel prompting. In: Advances in Neural Information Processing Systems 35, pp. 14388–14402 (2022)
58.
go back to reference Wu, Q., Chan, A.: Meta-graph adaptation for visual object tracking. In: ICME (2021) Wu, Q., Chan, A.: Meta-graph adaptation for visual object tracking. In: ICME (2021)
60.
go back to reference Wu, Q., Wan, J., Chan, A.B.: Progressive unsupervised learning for visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2993–3002 (2021) Wu, Q., Wan, J., Chan, A.B.: Progressive unsupervised learning for visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2993–3002 (2021)
61.
go back to reference Wu, Q., Yang, T., Liu, Z., Wu, B., Shan, Y., Chan, A.B.: DropMAE: masked autoencoders with spatial-attention dropout for tracking tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14561–14571 (2023) Wu, Q., Yang, T., Liu, Z., Wu, B., Shan, Y., Chan, A.B.: DropMAE: masked autoencoders with spatial-attention dropout for tracking tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14561–14571 (2023)
62.
go back to reference Wu, Q., Yang, T., Wu, W., Chan, A.B.: Scalable video object segmentation with simplified framework. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023) Wu, Q., Yang, T., Wu, W., Chan, A.B.: Scalable video object segmentation with simplified framework. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
63.
go back to reference Wu, Q., Sun, C., Wang, J.: Multi-level structure-enhanced network for 3D single object tracking in sparse point clouds. IEEE Robot. Autom. Lett. 8(1), 9–16 (2022)CrossRef Wu, Q., Sun, C., Wang, J.: Multi-level structure-enhanced network for 3D single object tracking in sparse point clouds. IEEE Robot. Autom. Lett. 8(1), 9–16 (2022)CrossRef
64.
go back to reference Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: CVPR, pp. 2411–2418 (2013) Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: CVPR, pp. 2411–2418 (2013)
65.
go back to reference Xia, Y., et al.: CASSPR: cross attention single scan place recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8461–8472 (2023) Xia, Y., et al.: CASSPR: cross attention single scan place recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8461–8472 (2023)
66.
go back to reference Xia, Y., Shi, L., Ding, Z., Henriques, J.F., Cremers, D.: Text2Loc: 3D point cloud localization from natural language. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14958–14967 (2024) Xia, Y., Shi, L., Ding, Z., Henriques, J.F., Cremers, D.: Text2Loc: 3D point cloud localization from natural language. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14958–14967 (2024)
67.
go back to reference Xia, Y., Wu, Q., Li, W., Chan, A.B., Stilla, U.: A lightweight and detector-free 3D single object tracker on point clouds. IEEE Trans. Intell. Transp. Syst. 24(5), 5543–5554 (2023)CrossRef Xia, Y., Wu, Q., Li, W., Chan, A.B., Stilla, U.: A lightweight and detector-free 3D single object tracker on point clouds. IEEE Trans. Intell. Transp. Syst. 24(5), 5543–5554 (2023)CrossRef
68.
go back to reference Xu, T.X., Guo, Y.C., Lai, Y.K., Zhang, S.H.: CXTrack: improving 3D point cloud tracking with contextual information. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1084–1093 (2023) Xu, T.X., Guo, Y.C., Lai, Y.K., Zhang, S.H.: CXTrack: improving 3D point cloud tracking with contextual information. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1084–1093 (2023)
69.
go back to reference Xu, T.X., Guo, Y.C., Lai, Y.K., Zhang, S.H.: MBPTrack: improving 3D point cloud tracking with memory networks and box priors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9911–9920 (2023) Xu, T.X., Guo, Y.C., Lai, Y.K., Zhang, S.H.: MBPTrack: improving 3D point cloud tracking with memory networks and box priors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9911–9920 (2023)
70.
go back to reference Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: ICCV, pp. 10448–10457 (2021) Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: ICCV, pp. 10448–10457 (2021)
71.
go back to reference Yang, T., Chan, A.B.: Learning dynamic memory networks for object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 152–167 (2018) Yang, T., Chan, A.B.: Learning dynamic memory networks for object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 152–167 (2018)
73.
go back to reference Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2017) Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
74.
go back to reference Zhang, L., Gonzalez-Garcia, A., Van De Weijer, J., Danelljan, M., Khan, F.S.: Learning the model update for siamese trackers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4010–4019 (2019) Zhang, L., Gonzalez-Garcia, A., Van De Weijer, J., Danelljan, M., Khan, F.S.: Learning the model update for siamese trackers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4010–4019 (2019)
75.
go back to reference Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4591–4600 (2019) Zhang, Z., Peng, H.: Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4591–4600 (2019)
76.
go back to reference Zheng, C., et al.: Box-aware feature enhancement for single object tracking on point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13199–13208 (2021) Zheng, C., et al.: Box-aware feature enhancement for single object tracking on point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13199–13208 (2021)
77.
go back to reference Zheng, C., et al.: Beyond 3D siamese tracking: a motion-centric paradigm for 3D single object tracking in point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8111–8120 (2022) Zheng, C., et al.: Beyond 3D siamese tracking: a motion-centric paradigm for 3D single object tracking in point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8111–8120 (2022)
78.
go back to reference Zhou, C., et al.: PTTR: relational 3D point cloud object tracking with transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8531–8540 (2022) Zhou, C., et al.: PTTR: relational 3D point cloud object tracking with transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8531–8540 (2022)
79.
go back to reference Zhou, H., Song, L., Chen, J., Zhou, Y., Wang, G.: Rethinking soft labels for knowledge distillation: a bias-variance tradeoff perspective (2021). arXiv:2102.00650 Zhou, H., Song, L., Chen, J., Zhou, Y., Wang, G.: Rethinking soft labels for knowledge distillation: a bias-variance tradeoff perspective (2021). arXiv:​2102.​00650
Metadata
Title
Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-training
Authors
Qiangqiang Wu
Yan Xia
Jia Wan
Antoni B. Chan
Copyright Year
2025
DOI
https://doi.org/10.1007/978-3-031-73254-6_16

Premium Partner