nach oben

International Journal of Computer Vision

Erschienen in:

25.10.2023

DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes

verfasst von: Shengyu Hao, Peiyuan Liu, Yibing Zhan, Kaixun Jin, Zuozhu Liu, Mingli Song, Jenq-Neng Hwang, Gaoang Wang

Erschienen in: International Journal of Computer Vision | Ausgabe 4/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Cross-view multi-object tracking aims to link objects between frames and camera views with substantial overlaps. Although cross-view multi-object tracking has received increased attention in recent years, existing datasets still have several issues, including (1) missing real-world scenarios, (2) lacking diverse scenes, (3) containing a limited number of tracks, (4) comprising only static cameras, and (5) lacking standard benchmarks, which hinder the investigation and comparison of cross-view tracking methods. To solve the aforementioned issues, we introduce DIVOTrack: a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians in realistic and non-experimental environments. Our DIVOTrack has fifteen distinct scenarios and 953 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available. Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT, which learns object detection, single-view association, and cross-view matching with an all-in-one embedding model. Finally, we present a summary of current methodologies and a set of standard benchmarks with our DIVOTrack to provide a fair comparison and conduct a comprehensive analysis of current approaches and our proposed CrossMOT. The dataset and code are available at https://github.com/shengyuhao/DIVOTrack.

Vorheriger Artikel FlowNAS: Neural Architecture Search for Optical Flow Estimation

Nächster Artikel Symmetry-aware Neural Architecture for Embodied Visual Navigation

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Athar, A., Luiten, J., Voigtlaender, P., Khurana, T., Dave, A., Leibe, B., & Ramanan, D. (2023). Burst: A benchmark for unifying object recognition, segmentation and tracking in video. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 1674–1683.

Ayazoglu, M., Li, B., Dicle, C., Sznaier, M., & Camps, O. I. (2011). Dynamic subspace-based coordinated multicamera tracking. In 2011 International conference on computer vision, pp. 2462–2469. IEEE.

Bergmann, P., Meinhardt, T., & Leal-Taixé, L. (2019). Tracking without bells and whistles. In The IEEE international conference on computer vision (ICCV).

Brasó, G., Cetintas, O., & Leal-Taixé, L. (2022). Multi-object tracking and segmentation via neural message passing. International Journal of Computer Vision, 130(12), 3035–3053.CrossRef

Cai, Y., & Medioni, G. (2014). Exploring context information for inter-camera multiple target tracking. In IEEE winter conference on applications of computer vision, pp. 761–768. IEEE.

Chavdarova, T., Baqué, P., Bouquet, S., Maksai, A., Jose, C., Bagautdinov, T., Lettry, L., Fua, P., Van Gool, L., & Fleuret, F. (2018). Wildtrack: A multi-camera hd dataset for dense unscripted pedestrian detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5030–5039.

Cheng, D., Gong, Y., Wang, J., Hou, Q., & Zheng, N. (2017). Part-aware trajectories association across non-overlapping uncalibrated cameras. Neurocomputing, 230, 30–39.CrossRef

Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019). Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 6569–6578.

Fleuret, F., Berclaz, J., Lengagne, R., & Fua, P. (2007). Multicamera people tracking with a probabilistic occupancy map. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 267–282.CrossRef

Gan, Y., Han, R., Yin, L., Feng, W., & Wang, S. (2021). Self-supervised multi-view multi-human association and tracking. In Proceedings of the 29th ACM international conference on multimedia, pp. 282–290.

Han, R., Feng, W., Zhao, J., Niu, Z., Zhang, Y., Wan, L., & Wang, S. (2020). Complementary-view multiple human tracking. In AAAI conference on artificial intelligence.

Hinton, G., Vinyals, O., Dean, J., et al. (2015) Distilling the knowledge in a neural network. arXiv preprintarXiv:1503.02531, 2(7).

Hofmann, M., Wolf, D., & Rigoll, G. (2013). Hypergraphs for joint multi-view reconstruction and multi-object tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3650–3657.

Hsu, H.-M., Cai, J., Wang, Y., Hwang, J.-N., & Kim, K.-J. (2021). Multi-target multi-camera tracking of vehicles using metadata-aided re-id and trajectory-based camera link model. IEEE Transactions on Image Processing, 30, 5198–5210.CrossRef

Hsu, H.-M., Huang, T.-W., Wang, G., Cai, J., Lei, Z., & Hwang, J.-N. (2019). Multi-camera tracking of vehicles based on deep features re-id and trajectory-based camera link models. In CVPR workshops, pp. 416–424.

Kendall, A., Gal, Y., & Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7482–7491.

Khan, S., Javed, O., Rasheed, Z., & Shah, M. (2001). Human tracking in multiple cameras. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 1, pp. 331–336. IEEE.

Khurana, T., Dave, A., & Ramanan, D. (2021). Detecting invisible people. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 3174–3184.

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprintarXiv:1412.6980.

Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1–2), 83–97.MathSciNetCrossRef

Lee, Y.-G., Tang, Z., & Hwang, J.-N. (2017). Online-learning-based human tracking across non-overlapping cameras. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2870–2883.CrossRef

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision, pp. 740–755. Springer.

Liu, X. (2016). Multi-view 3d human tracking in crowded scenes. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3553–3559.

Luiten, J., Osep, A., Dendorfer, P., Torr, P., Geiger, A., Leal-Taixé, L., & Leibe, B. (2020). Hota: A higher order metric for evaluating multi-object tracking. International Journal of Computer Vision, pp. 1–31.

Luo, H., Jiang, W., Gu, Y., Liu, F., Liao, X., Lai, S., & Gu, J. (2019). A strong baseline and batch normalization neck for deep person re-identification. IEEE Transactions on Multimedia, pp. 1–1.

Ma, C., Yang, F., Li, Y., Jia, H., Xie, X., & Gao, W. (2021). Deep trajectory post-processing and position projection for single & multiple camera multiple object tracking. International Journal of Computer Vision, 129, 3255–3278.CrossRef

Milan, A., Leal-Taixé, L., Reid, I., Roth, S., & Schindler, K. (2016). Mot16: A benchmark for multi-object tracking. arXiv preprintarXiv:1603.00831.

Reddy, N. D., Vo, M., & Narasimhan, S. G. (2019). Occlusion-net: 2d/3d occluded keypoint localization using graph networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7326–7335.

Ristani, E., Solera, F., Zou, R., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In European conference on computer vision, pp. 17–35. Springer.

Ristani, E., & Tomasi, C. (2018). Features for multi-target multi-camera tracking and re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6036–6046.

Tang, Z., Gu, R., & Hwang, J.-N. (2018). Joint multi-view people tracking and pose estimation for 3d scene reconstruction. In 2018 IEEE international conference on multimedia and expo (ICME), pp. 1–6. IEEE.

Tang, Z., Wang, G., Xiao, H., Zheng, A., & Hwang, J.-N. (2018). Single-camera and inter-camera vehicle tracking and 3d speed estimation based on fusion of visual and semantic features. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 108–115.

Tesfaye, Y. T., Zemene, E., Prati, A., Pelillo, M., & Shah, M. (2017). Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprintarXiv:1706.06196.

Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B. B. G., Geiger, A., & Leibe, B. (2019). Mots: Multi-object tracking and segmentation. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp. 7942–7951.

Wang, G., Gu, R., Liu, Z., Hu, W., Song, M., & Hwang, J.-N. (2021). Track without appearance: Learn box and tracklet embedding with local and global motion patterns for vehicle tracking. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 9876–9886.

Wang, G., Song, M., & Hwang, J.-N. (2022). Recent advances in embedding methods for multi-object tracking: A survey. arXiv preprintarXiv:2205.10766.

Wang, G., Wang, Y., Gu, R., Hu, W., & Hwang, J.-N. (2022). Split and connect: A universal tracklet booster for multi-object tracking. IEEE Transactions on Multimedia.

Wang, G., Wang, Y., Zhang, H., Gu, R., & Hwang, J.-N. (2019). Exploit the connectivity: Multi-object tracking with trackletnet. In Proceedings of the 27th ACM international conference on multimedia, pp. 482–490.

Wang, G., Yuan, Y., Chen, X., Li, J., & Zhou, X. (2018). Learning discriminative features with multiple granularities for person re-identification. In Proceedings of the 26th ACM international conference on Multimedia, pp. 274–282.

Wang, Z., Zheng, L., Liu, Y., Li, Y., & Wang, S. (2020). Towards real-time multi-object tracking. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pp. 107–122. Springer.

Wieczorek, M., Rychalska, B., & Dąbrowski, J. (2021). On the unreasonable effectiveness of centroids in image retrieval. In International Conference on Neural Information Processing, pp. 212–223. Springer.

Wojke, N., Bewley, A., & Paulus, D. (2017). Simple online and realtime tracking with a deep association metric. In 2017 IEEE international conference on image processing (ICIP), pp. 3645–3649. IEEE.

Wu, J., Cao, J., Song, L., Wang, Y., Yang, M., & Yuan, J. (2021). Track to detect and segment: An online multi-object tracker. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Xu, Y., Liu, X., Liu, Y., & Zhu, S.-C. (2016). Multi-view people tracking via hierarchical trajectory composition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4256–4265.

Xu, Y., Liu, X., Qin, L., & Zhu, S.-C. (2017). Cross-view people tracking by scene-centered spatio-temporal parsing. In AAAI, pp. 4299–4305.

Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., & Hoi, S. C. H. (2021). Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence.

Zhang, Y., Wang, C., Wang, X., Zeng, W., & Liu, W. (2021). Fairmot: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, 129(11), 3069–3087.CrossRef

Zhang, Z., Wu, J., Zhang, X., & Zhang, C. (2017). Multi-target, multi-camera tracking by hierarchical clustering: Recent progress on dukemtmc project. arXiv preprintarXiv:1712.09531.

Zhou, K., Yang, Y., Cavallaro, A., & Xiang, T. (2019). Omni-scale feature learning for person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 3702–3712.

Zhou, X., Koltun, V., & Krähenbühl, P. (2020). Tracking objects as points. In European Conference on Computer Vision (ECCV), pp. 474–490. Springer.

Zhou, X., Wang, D., & Krähenbühl, P. (2019). Objects as points. arXiv preprintarXiv:1904.07850.

Titel: DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes
verfasst von: Shengyu Hao
Peiyuan Liu
Yibing Zhan
Kaixun Jin
Zuozhu Liu
Mingli Song
Jenq-Neng Hwang
Gaoang Wang
Publikationsdatum: 25.10.2023
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 4/2024
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-023-01922-7

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 4/2024

Language-Aware Soft Prompting: Text-to-Text Optimization for Few- and Zero-Shot Adaptation of V &L Models

PartCom: Part Composition Learning for 3D Open-Set Recognition

Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study

SegViT v2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

Adapting Across Domains via Target-Oriented Transferable Semantic Augmentation Under Prototype Constraint

Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose

Premium Partner