Skip to main content
Top
Published in: International Journal of Computer Vision 5/2020

03-12-2019

The Unmanned Aerial Vehicle Benchmark: Object Detection, Tracking and Baseline

Authors: Hongyang Yu, Guorong Li, Weigang Zhang, Qingming Huang, Dawei Du, Qi Tian, Nicu Sebe

Published in: International Journal of Computer Vision | Issue 5/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the increasing popularity of Unmanned Aerial Vehicles (UAVs) in computer vision-related applications, intelligent UAV video analysis has recently attracted the attention of an increasing number of researchers. To facilitate research in the UAV field, this paper presents a UAV dataset with 100 videos featuring approximately 2700 vehicles recorded under unconstrained conditions and 840k manually annotated bounding boxes. These UAV videos were recorded in complex real-world scenarios and pose significant new challenges, such as complex scenes, high density, small objects, and large camera motion, to the existing object detection and tracking methods. These challenges have encouraged us to define a benchmark for three fundamental computer vision tasks, namely, object detection, single object tracking (SOT) and multiple object tracking (MOT), on our UAV dataset. Specifically, our UAV benchmark facilitates evaluation and detailed analysis of state-of-the-art detection and tracking methods on the proposed UAV dataset. Furthermore, we propose a novel approach based on the so-called Context-aware Multi-task Siamese Network (CMSN) model that explores new cues in UAV videos by judging the consistency degree between objects and contexts and that can be used for SOT and MOT. The experimental results demonstrate that our model could make tracking results more robust in both SOT and MOT, showing that the current tracking and detection methods have limitations in dealing with the proposed UAV benchmark and that further research is indeed needed.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
We use DJI Inspire 2 to collect videos. More information about the UAV platform can be found at http://​www.​dji.​com/​inspire-2.
 
2
Our dataset is available for download at  https://​sites.​google.​com/​site/​daviddo0323/​.
 
Literature
go back to reference Bae, S. H., & Yoon, K. (2014). Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In CVPR (pp. 1218–1225). Bae, S. H., & Yoon, K. (2014). Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In CVPR (pp. 1218–1225).
go back to reference Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP Journal on Image and Video Processing, 2008, 246309.CrossRef Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP Journal on Image and Video Processing, 2008, 246309.CrossRef
go back to reference Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. S. (2016). Fully-convolutional siamese networks for object tracking. In ECCV (pp. 850–865). Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. S. (2016). Fully-convolutional siamese networks for object tracking. In ECCV (pp. 850–865).
go back to reference Bewley, A., Ge, Z., Ott, L., Ramos, F. T., & Upcroft, B. (2016). Simple online and realtime tracking. In ICIP (pp. 3464–3468). Bewley, A., Ge, Z., Ott, L., Ramos, F. T., & Upcroft, B. (2016). Simple online and realtime tracking. In ICIP (pp. 3464–3468).
go back to reference Bochinski, E., Eiselein, V., & Sikora, T. (2017). High-speed tracking-by-detection without using image information. In AVSS (pp. 1–6). Bochinski, E., Eiselein, V., & Sikora, T. (2017). High-speed tracking-by-detection without using image information. In AVSS (pp. 1–6).
go back to reference Choi, W. (2015). Near-online multi-target tracking with aggregated local flow descriptor. In ICCV (pp. 3029–3037). Choi, W. (2015). Near-online multi-target tracking with aggregated local flow descriptor. In ICCV (pp. 3029–3037).
go back to reference Chopra, S., Hadsell, R., & LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In CVPR (pp. 539–546). Chopra, S., Hadsell, R., & LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In CVPR (pp. 539–546).
go back to reference Dai, J., Li, Y., He, K., & Sun, J. (2016). R-FCN: Object detection via region-based fully convolutional networks. In NIPS (pp. 379–387). Dai, J., Li, Y., He, K., & Sun, J. (2016). R-FCN: Object detection via region-based fully convolutional networks. In NIPS (pp. 379–387).
go back to reference Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2015). Learning spatially regularized correlation filters for visual tracking. In ICCV (pp. 4310–4318). Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2015). Learning spatially regularized correlation filters for visual tracking. In ICCV (pp. 4310–4318).
go back to reference Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2016). Adaptive decontamination of the training set: A unified formulation for discriminative visual tracking. In CVPR (pp. 1430–1438). Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2016). Adaptive decontamination of the training set: A unified formulation for discriminative visual tracking. In CVPR (pp. 1430–1438).
go back to reference Danelljan, M., Robinson, A., Khan, F. S., & Felsberg, M. (2016). Beyond correlation filters: Learning continuous convolution operators for visual tracking. In ECCV (pp. 472–488). Danelljan, M., Robinson, A., Khan, F. S., & Felsberg, M. (2016). Beyond correlation filters: Learning continuous convolution operators for visual tracking. In ECCV (pp. 472–488).
go back to reference Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Li, F. (2009). Imagenet: A large-scale hierarchical image database. In CVPR (pp. 248–255). Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Li, F. (2009). Imagenet: A large-scale hierarchical image database. In CVPR (pp. 248–255).
go back to reference Dicle, C., Camps, O. I., & Sznaier, M. (2013). The way they move: Tracking multiple targets with similar appearance. In ICCV (pp. 2304–2311). Dicle, C., Camps, O. I., & Sznaier, M. (2013). The way they move: Tracking multiple targets with similar appearance. In ICCV (pp. 2304–2311).
go back to reference Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 743–761.CrossRef Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 743–761.CrossRef
go back to reference Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., & Tian, Q. (2018). The unmanned aerial vehicle benchmark: Object detection and tracking. In ECCV (pp. 375–391). Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., & Tian, Q. (2018). The unmanned aerial vehicle benchmark: Object detection and tracking. In ECCV (pp. 375–391).
go back to reference Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.CrossRef Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.CrossRef
go back to reference Fan, H., & Ling, H. (2017). Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking. In ICCV. Fan, H., & Ling, H. (2017). Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking. In ICCV.
go back to reference Ferryman, J., & Shahrokni, A. (2009). Pets2009: Dataset and challenge. In AVSS (pp. 1–6). Ferryman, J., & Shahrokni, A. (2009). Pets2009: Dataset and challenge. In AVSS (pp. 1–6).
go back to reference Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. In CVPR (pp. 3354–3361). Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. In CVPR (pp. 3354–3361).
go back to reference Girshick, R. B. (2015). Fast R-CNN. In ICCV (pp. 1440–1448). Girshick, R. B. (2015). Fast R-CNN. In ICCV (pp. 1440–1448).
go back to reference He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778). He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
go back to reference Held, D., Thrun, S., & Savarese, S. (2016). Learning to track at 100 FPS with deep regression networks. In ECCV (pp. 749–765). Held, D., Thrun, S., & Savarese, S. (2016). Learning to track at 100 FPS with deep regression networks. In ECCV (pp. 749–765).
go back to reference Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 583–596.CrossRef Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 583–596.CrossRef
go back to reference Hsieh, M., Lin, Y., & Hsu, W. H. (2017). Drone-based object counting by spatially regularized regional proposal network. In ICCV. Hsieh, M., Lin, Y., & Hsu, W. H. (2017). Drone-based object counting by spatially regularized regional proposal network. In ICCV.
go back to reference Hwang, S., Park, J., Kim, N., Choi, Y., & Kweon, I. S. (2015). Multispectral pedestrian detection: Benchmark dataset and baseline. In CVPR (pp. 1037–1045). Hwang, S., Park, J., Kim, N., Choi, Y., & Kweon, I. S. (2015). Multispectral pedestrian detection: Benchmark dataset and baseline. In CVPR (pp. 1037–1045).
go back to reference Izadinia, H., Saleemi, I., Li, W., & Shah, M. (2012). (MP)2T: Multiple people multiple parts tracker. In ECCV (pp. 100–114). Izadinia, H., Saleemi, I., Li, W., & Shah, M. (2012). (MP)2T: Multiple people multiple parts tracker. In ECCV (pp. 100–114).
go back to reference Kalra, I., Singh, M., Nagpal, S., Singh, R., Vatsa, M., & Sujit, P. (2019). Dronesurf: Benchmark dataset for drone-based face recognition. In IEEE FG 2019 (pp. 1–7). Kalra, I., Singh, M., Nagpal, S., Singh, R., Vatsa, M., & Sujit, P. (2019). Dronesurf: Benchmark dataset for drone-based face recognition. In IEEE FG 2019 (pp. 1–7).
go back to reference Kiani Galoogahi, H., Fagg, A., Huang, C., Ramanan, D., & Lucey, S. (2017). Need for speed: A benchmark for higher frame rate object tracking. In ICCV (pp. 1125–1134). Kiani Galoogahi, H., Fagg, A., Huang, C., Ramanan, D., & Lucey, S. (2017). Need for speed: A benchmark for higher frame rate object tracking. In ICCV (pp. 1125–1134).
go back to reference Kim, C., Li, F., Ciptadi, A., & Rehg, J. M. (2015). Multiple hypothesis tracking revisited. In ICCV (pp. 4696–4704). Kim, C., Li, F., Ciptadi, A., & Rehg, J. M. (2015). Multiple hypothesis tracking revisited. In ICCV (pp. 4696–4704).
go back to reference Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., & Chen, Y. (2017). RON: Reverse connection with objectness prior networks for object detection. In CVPR. Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., & Chen, Y. (2017). RON: Reverse connection with objectness prior networks for object detection. In CVPR.
go back to reference Kristan, M., Leonardis, A., Matas, J., et al. (2016). The visual object tracking VOT2016 challenge results. In ECCV workshop (pp. 777–823). Kristan, M., Leonardis, A., Matas, J., et al. (2016). The visual object tracking VOT2016 challenge results. In ECCV workshop (pp. 777–823).
go back to reference Kristan, M., Leonardis, A., Matas, J., Felsberg, M., & He, Z. (2017). The visual object tracking VOT2017 challenge results. In ICCV workshop. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., & He, Z. (2017). The visual object tracking VOT2017 challenge results. In ICCV workshop.
go back to reference Leal-Taixé, L., Milan, A., Reid, I. D., Roth, S., & Schindler, K. (2015). Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv:1504.01942. Leal-Taixé, L., Milan, A., Reid, I. D., Roth, S., & Schindler, K. (2015). Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv:​1504.​01942.
go back to reference Li, F., Tian, C., Zuo, W., Zhang, L., & Yang, M. (2018). Learning spatial-temporal regularized correlation filters for visual tracking. In CVPR. Li, F., Tian, C., Zuo, W., Zhang, L., & Yang, M. (2018). Learning spatial-temporal regularized correlation filters for visual tracking. In CVPR.
go back to reference Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S. E., Fu, C., & Berg, A. C. (2016). SSD: Single shot multibox detector. In ECCV (pp. 21–37). Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S. E., Fu, C., & Berg, A. C. (2016). SSD: Single shot multibox detector. In ECCV (pp. 21–37).
go back to reference Ma, C., Huang, J., Yang, X., & Yang, M. (2015). Hierarchical convolutional features for visual tracking. In ICCV (pp. 3074–3082). Ma, C., Huang, J., Yang, X., & Yang, M. (2015). Hierarchical convolutional features for visual tracking. In ICCV (pp. 3074–3082).
go back to reference Milan, A., Leal-Taixé, L., Reid, I. D., Roth, S., & Schindler, K. (2016). Mot16: A benchmark for multi-object tracking. arXiv:1603.00831. Milan, A., Leal-Taixé, L., Reid, I. D., Roth, S., & Schindler, K. (2016). Mot16: A benchmark for multi-object tracking. arXiv:​1603.​00831.
go back to reference Milan, A., Rezatofighi, S. H., Dick, A. R., Reid, I. D., & Schindler, K. (2017). Online multi-target tracking using recurrent neural networks. In AAAI (pp. 4225–4232). Milan, A., Rezatofighi, S. H., Dick, A. R., Reid, I. D., & Schindler, K. (2017). Online multi-target tracking using recurrent neural networks. In AAAI (pp. 4225–4232).
go back to reference Milan, A., Roth, S., & Schindler, K. (2014). Continuous energy minimization for multitarget tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1), 58–72.CrossRef Milan, A., Roth, S., & Schindler, K. (2014). Continuous energy minimization for multitarget tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1), 58–72.CrossRef
go back to reference Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., & Ghanem, B. (2018). Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In ECCV. Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., & Ghanem, B. (2018). Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In ECCV.
go back to reference Mueller, M., Smith, N., & Ghanem, B. (2016). A benchmark and simulator for UAV tracking. In ECCV (pp. 445–461). Mueller, M., Smith, N., & Ghanem, B. (2016). A benchmark and simulator for UAV tracking. In ECCV (pp. 445–461).
go back to reference Mueller, M., Smith, N., & Ghanem, B. (2017). Context-aware correlation filter tracking. In CVPR. Mueller, M., Smith, N., & Ghanem, B. (2017). Context-aware correlation filter tracking. In CVPR.
go back to reference Munkres, J. (1957). Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics, 5(1), 32–38.MathSciNetCrossRef Munkres, J. (1957). Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics, 5(1), 32–38.MathSciNetCrossRef
go back to reference Nam, H., & Han, B. (2016). Learning multi-domain convolutional neural networks for visual tracking. In CVPR (pp. 4293–4302). Nam, H., & Han, B. (2016). Learning multi-domain convolutional neural networks for visual tracking. In CVPR (pp. 4293–4302).
go back to reference Ning, W., Wengang, Z., Qi, T., Richang, H., Meng, W., & Houqiang, L. (2018). Multi-cue correlation filters for robust visual tracking. In CVPR (pp. 4844–4853). Ning, W., Wengang, Z., Qi, T., Richang, H., Meng, W., & Houqiang, L. (2018). Multi-cue correlation filters for robust visual tracking. In CVPR (pp. 4844–4853).
go back to reference Papageorgiou, C., & Poggio, T. (2000). A trainable system for object detection. International Journal of Computer Vision, 38(1), 15–33.CrossRef Papageorgiou, C., & Poggio, T. (2000). A trainable system for object detection. International Journal of Computer Vision, 38(1), 15–33.CrossRef
go back to reference Pirsiavash, H., Ramanan, D., & Fowlkes, C. C. (2011). Globally-optimal greedy algorithms for tracking a variable number of objects. In CVPR (pp. 1201–1208). Pirsiavash, H., Ramanan, D., & Fowlkes, C. C. (2011). Globally-optimal greedy algorithms for tracking a variable number of objects. In CVPR (pp. 1201–1208).
go back to reference Qi, Y., Zhang, S., Qin, L., Yao, H., Huang, Q., Lim, J., & Yang, M. (2016). Hedged deep tracking. In CVPR (pp. 4303–4311). Qi, Y., Zhang, S., Qin, L., Yao, H., Huang, Q., Lim, J., & Yang, M. (2016). Hedged deep tracking. In CVPR (pp. 4303–4311).
go back to reference Ren, S., He, K., Girshick, R. B., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS (pp. 91–99). Ren, S., He, K., Girshick, R. B., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS (pp. 91–99).
go back to reference Ristani, E., Solera, F., Zou, R. S., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In ECCVW (pp. 17–35). Ristani, E., Solera, F., Zou, R. S., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In ECCVW (pp. 17–35).
go back to reference Robicquet, A., Sadeghian, A., Alahi, A., & Savarese, S. (2016). Learning social etiquette: Human trajectory understanding in crowded scenes. In ECCV (pp. 549–565). Robicquet, A., Sadeghian, A., Alahi, A., & Savarese, S. (2016). Learning social etiquette: Human trajectory understanding in crowded scenes. In ECCV (pp. 549–565).
go back to reference Shu, G., Dehghan, A., Oreifej, O., Hand, E., & Shah, M. (2012). Part-based multiple-person tracking with partial occlusion handling. In CVPR (pp. 1815–1821). Shu, G., Dehghan, A., Oreifej, O., Hand, E., & Shah, M. (2012). Part-based multiple-person tracking with partial occlusion handling. In CVPR (pp. 1815–1821).
go back to reference Smeulders, A. W. M., Chu, D. M., Cucchiara, R., Calderara, S., Dehghan, A., & Shah, M. (2014). Visual tracking: An experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1442–1468.CrossRef Smeulders, A. W. M., Chu, D. M., Cucchiara, R., Calderara, S., Dehghan, A., & Shah, M. (2014). Visual tracking: An experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1442–1468.CrossRef
go back to reference Solera, F., Calderara, S., & Cucchiara, R. (2015). Towards the evaluation of reproducible robustness in tracking-by-detection. In AVSS (pp. 1–6). Solera, F., Calderara, S., & Cucchiara, R. (2015). Towards the evaluation of reproducible robustness in tracking-by-detection. In AVSS (pp. 1–6).
go back to reference Son, J., Baek, M., Cho, M., & Han, B. (2017). Multi-object tracking with quadruplet convolutional neural networks. In CVPR. Son, J., Baek, M., Cho, M., & Han, B. (2017). Multi-object tracking with quadruplet convolutional neural networks. In CVPR.
go back to reference Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R. W. H., & Yang, M. (2017). CREST: Convolutional residual learning for visual tracking. arXiv:1708.00225. Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R. W. H., & Yang, M. (2017). CREST: Convolutional residual learning for visual tracking. arXiv:​1708.​00225.
go back to reference Song, Y., Ma, C., Wu, X., Gong, L., Bao, L., Zuo, W., Shen, C., Lau, R. W. H., & Yang, M. (2018). VITAL: Visual tracking via adversarial learning. arXiv:1804.04273. Song, Y., Ma, C., Wu, X., Gong, L., Bao, L., Zuo, W., Shen, C., Lau, R. W. H., & Yang, M. (2018). VITAL: Visual tracking via adversarial learning. arXiv:​1804.​04273.
go back to reference Tang, S., Andres, B., Andriluka, M., & Schiele, B. (2016). Multi-person tracking by multicut and deep matching. In ECCV workshops (pp. 100–111). Tang, S., Andres, B., Andriluka, M., & Schiele, B. (2016). Multi-person tracking by multicut and deep matching. In ECCV workshops (pp. 100–111).
go back to reference Tang, S., Andriluka, M., Andres, B., & Schiele, B. (2017). Multiple people tracking by lifted multicut and person re-identification. In CVPR. Tang, S., Andriluka, M., Andres, B., & Schiele, B. (2017). Multiple people tracking by lifted multicut and person re-identification. In CVPR.
go back to reference Tao, R., Gavves, E., & Smeulders, A. W. M. (2016). Siamese instance search for tracking. In CVPR (pp. 1420–1429). Tao, R., Gavves, E., & Smeulders, A. W. M. (2016). Siamese instance search for tracking. In CVPR (pp. 1420–1429).
go back to reference Valmadre, J., Bertinetto, L., Henriques, J. F., Vedaldi, A., & Torr, P. H. S. (2017). End-to-end representation learning for correlation filter based tracking. In CVPR. Valmadre, J., Bertinetto, L., Henriques, J. F., Vedaldi, A., & Torr, P. H. S. (2017). End-to-end representation learning for correlation filter based tracking. In CVPR.
go back to reference Wang, L., Ouyang, W., Wang, X., & Lu, H. (2015). Visual tracking with fully convolutional networks. In ICCV (pp. 3119–3127). Wang, L., Ouyang, W., Wang, X., & Lu, H. (2015). Visual tracking with fully convolutional networks. In ICCV (pp. 3119–3127).
go back to reference Wang, L., Ouyang, W., Wang, X., Lu, H. (2016). STCT: Sequentially training convolutional networks for visual tracking. In CVPR (pp. 1373–1381). Wang, L., Ouyang, W., Wang, X., Lu, H. (2016). STCT: Sequentially training convolutional networks for visual tracking. In CVPR (pp. 1373–1381).
go back to reference Wen, L., Du, D., Cai, Z., Lei, Z., Chang, M., Qi, H., Lim, J., Yang, M., & Lyu, S. (2015). DETRAC: A new benchmark and protocol for multi-object tracking. arXiv:1511.04136. Wen, L., Du, D., Cai, Z., Lei, Z., Chang, M., Qi, H., Lim, J., Yang, M., & Lyu, S. (2015). DETRAC: A new benchmark and protocol for multi-object tracking. arXiv:​1511.​04136.
go back to reference Wu, Y., Lim, J., & Yang, M. (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1834–1848.CrossRef Wu, Y., Lim, J., & Yang, M. (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1834–1848.CrossRef
go back to reference Xia, G. S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., & Zhang, L. (2018). DOTA: A large-scale dataset for object detection in aerial images. In CVPR (pp. 3974–3983). Xia, G. S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., & Zhang, L. (2018). DOTA: A large-scale dataset for object detection in aerial images. In CVPR (pp. 3974–3983).
go back to reference Xiang, Y., Alahi, A., & Savarese, S. (2015). Learning to track: Online multi-object tracking by decision making. In ICCV (pp. 4705–4713). Xiang, Y., Alahi, A., & Savarese, S. (2015). Learning to track: Online multi-object tracking by decision making. In ICCV (pp. 4705–4713).
go back to reference Yoon, J. H., Lee, C., Yang, M., & Yoon, K. (2016). Online multi-object tracking via structural constraint event aggregation. In CVPR (pp. 1392–1400). Yoon, J. H., Lee, C., Yang, M., & Yoon, K. (2016). Online multi-object tracking via structural constraint event aggregation. In CVPR (pp. 1392–1400).
go back to reference Yoon, J. H., Yang, M., Lim, J., & Yoon, K. (2015). Bayesian multi-object tracking using motion context from multiple objects. In WACV (pp. 33–40). Yoon, J. H., Yang, M., Lim, J., & Yoon, K. (2015). Bayesian multi-object tracking using motion context from multiple objects. In WACV (pp. 33–40).
go back to reference Yu, H., Qin, L., Huang, Q., & Yao, H. (2018). Online multiple object tracking via exchanging object context. Neurocomputing, 292, 28–37.CrossRef Yu, H., Qin, L., Huang, Q., & Yao, H. (2018). Online multiple object tracking via exchanging object context. Neurocomputing, 292, 28–37.CrossRef
go back to reference Yun, S., Choi, J., Yoo, Y., Yun, K., & Choi, J. Y. (2017). Action-decision networks for visual tracking with deep reinforcement learning. In CVPR. Yun, S., Choi, J., Yoo, Y., Yun, K., & Choi, J. Y. (2017). Action-decision networks for visual tracking with deep reinforcement learning. In CVPR.
go back to reference Zhang, K., Zhang, L., Liu, Q., Zhang, D., & Yang, M. (2014). Fast visual tracking via dense spatio-temporal context learning. In ECCV (pp. 127–141). Zhang, K., Zhang, L., Liu, Q., Zhang, D., & Yang, M. (2014). Fast visual tracking via dense spatio-temporal context learning. In ECCV (pp. 127–141).
go back to reference Zhang, T., Xu, C., & Yang, M. H. (2017). Multi-task correlation particle filter for robust visual tracking. In CVPR. Zhang, T., Xu, C., & Yang, M. H. (2017). Multi-task correlation particle filter for robust visual tracking. In CVPR.
go back to reference Zhong, B., Bai, B., Li, J., Zhang, Y., & Fu, Y. (2018). Hierarchical tracking by reinforcement learning-based searching and coarse-to-fine verifying. IEEE Transactions on Image Processing, 28(5), 2331–2341.MathSciNetCrossRef Zhong, B., Bai, B., Li, J., Zhang, Y., & Fu, Y. (2018). Hierarchical tracking by reinforcement learning-based searching and coarse-to-fine verifying. IEEE Transactions on Image Processing, 28(5), 2331–2341.MathSciNetCrossRef
go back to reference Zhou, Q., Zhong, B., Zhang, Y., Li, J., & Fu, Y. (2018). Deep alignment network based multi-person tracking with occlusion and motion reasoning. IEEE Transactions on Multimedia, 21(5), 1183–1194.CrossRef Zhou, Q., Zhong, B., Zhang, Y., Li, J., & Fu, Y. (2018). Deep alignment network based multi-person tracking with occlusion and motion reasoning. IEEE Transactions on Multimedia, 21(5), 1183–1194.CrossRef
Metadata
Title
The Unmanned Aerial Vehicle Benchmark: Object Detection, Tracking and Baseline
Authors
Hongyang Yu
Guorong Li
Weigang Zhang
Qingming Huang
Dawei Du
Qi Tian
Nicu Sebe
Publication date
03-12-2019
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 5/2020
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-019-01266-1

Other articles of this Issue 5/2020

International Journal of Computer Vision 5/2020 Go to the issue

Premium Partner