skip to main content
10.1145/3394171.3413896acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization

Authors Info & Claims
Published:12 October 2020Publication History

ABSTRACT

We consider the problem of cross-view geo-localization. The primary challenge is to learn the robust feature against large viewpoint changes. Existing benchmarks can help, but are limited in the number of viewpoints. Image pairs, containing two viewpoints, e.g., satellite and ground, are usually provided, which may compromise the feature learning. Besides phone cameras and satellites, in this paper, we argue that drones could serve as the third platform to deal with the geo-localization problem. In contrast to traditional ground-view images, drone-view images meet fewer obstacles, e.g., trees, and provide a comprehensive view when flying around the target place. To verify the effectiveness of the drone platform, we introduce a new multi-view multi-source benchmark for drone-based geo-localization, named University-1652. University-1652 contains data from three platforms, i.e., synthetic drones, satellites and ground cameras of 1,652 university buildings around the world. To our knowledge, University-1652 is the first drone-based geo-localization dataset and enables two new tasks, i.e., drone-view target localization and drone navigation. As the name implies, drone-view target localization intends to predict the location of the target place via drone-view images. On the other hand, given a satellite-view query image, drone navigation is to drive the drone to the area of interest in the query. We use this dataset to analyze a variety of off-the-shelf CNN features and propose a strong CNN baseline on this challenging dataset. The experiments show that University-1652 helps the model to learn viewpoint-invariant features and also has good generalization ability in real-world scenarios.

Skip Supplemental Material Section

Supplemental Material

3394171.3413896.mp4

mp4

8.2 MB

References

  1. Regal Animus. 2015. Fly High 1 "UIUC" - Free Creative Commons Download. https://www.youtube.com/watch?v=jOC-WJW7GAg.Google ScholarGoogle Scholar
  2. Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5297--5307.Google ScholarGoogle ScholarCross RefCross Ref
  3. Simran Brar, Ralph Rabbat, Vishal Raithatha, George Runcie, and Andrew Yu. 2015. Drones for Deliveries. Sutardja Center for Entrepreneurship & Technology, University of California, Berkeley, Technical Report, Vol. 8 (2015), 2015.Google ScholarGoogle Scholar
  4. Sudong Cai, Yulan Guo, Salman Khan, Jiwei Hu, and Gongjian Wen. 2019. Ground-to-aerial image geo-localization with a hard exemplar reweighting triplet loss. In Proceedings of the IEEE International Conference on Computer Vision. 8391--8400.Google ScholarGoogle ScholarCross RefCross Ref
  5. Gal Chechik, Varun Sharma, Uri Shalit, and Samy Bengio. 2010. Large scale online learning of image similarity through ranking. Journal of Machine Learning Research, Vol. 11, Mar (2010), 1109--1135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Transactions on Image Processing, Vol. 27, 8 (2018), 3893--3903.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  8. FlyLow. 2016. Oxford / Amazing flight. https://www.youtube.com/watch?v=bs-rwVI_big.Google ScholarGoogle Scholar
  9. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  10. Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017).Google ScholarGoogle Scholar
  11. Meng-Ru Hsieh, Yen-Liang Lin, and Winston H Hsu. 2017. Drone-based object counting by spatially regularized regional proposal network. In Proceedings of the IEEE International Conference on Computer Vision. 4145--4153.Google ScholarGoogle ScholarCross RefCross Ref
  12. Sixing Hu, Mengdan Feng, Rang MH Nguyen, and Gim Hee Lee. 2018. CVM-net: Cross-view matching network for image-based ground-to-aerial geo-localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7258--7267.Google ScholarGoogle ScholarCross RefCross Ref
  13. Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, and Li Fei-Fei. 2016. The unreasonable effectiveness of noisy data for fine-grained recognition. In European Conference on Computer Vision. Springer, 301--320.Google ScholarGoogle ScholarCross RefCross Ref
  14. Peike Li, Yunchao Wei, and Yi Yang. 2020. Meta Parsing Networks: Towards Generalized Few-shot Scene Parsing with Adaptive Metric Learning. In Proceedings of the 28th ACM international conference on Multimedia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Siyi Li and Dit-Yan Yeung. 2017. Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. In Thirty-First AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  16. Tsung-Yi Lin, Yin Cui, Serge Belongie, and James Hays. 2015. Learning deep representations for ground-to-aerial geolocalization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5007--5015.Google ScholarGoogle ScholarCross RefCross Ref
  17. Jinxian Liu, Bingbing Ni, Yichao Yan, Peng Zhou, Shuo Cheng, and Jianguo Hu. 2018. Pose transferrable person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4099--4108.Google ScholarGoogle ScholarCross RefCross Ref
  18. Liu Liu and Hongdong Li. 2019. Lending Orientation to Neural Networks for Cross-view Geo-localization. CVPR (2019).Google ScholarGoogle Scholar
  19. Liu Liu, Hongdong Li, and Yuchao Dai. 2019. Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization. In Proceedings of the IEEE International Conference on Computer Vision. 2570--2579.Google ScholarGoogle ScholarCross RefCross Ref
  20. Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. 2016. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4004--4012.Google ScholarGoogle ScholarCross RefCross Ref
  21. James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In CVPR.Google ScholarGoogle Scholar
  22. James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR.Google ScholarGoogle Scholar
  23. F. Radenović, A. Iscen, G. Tolias, Y. Avrithis, and O. Chum. 2018a. Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking. In CVPR.Google ScholarGoogle Scholar
  24. Filip Radenović, Giorgos Tolias, and Ondvr ej Chum. 2018b. Fine-tuning CNN image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 7 (2018), 1655--1668.Google ScholarGoogle Scholar
  25. Krishna Regmi and Mubarak Shah. 2019. Bridging the domain gap for ground-to-aerial image matching. In Proceedings of the IEEE International Conference on Computer Vision. 470--479.Google ScholarGoogle ScholarCross RefCross Ref
  26. Stephan R. Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun. 2016. Playing for Data: Ground Truth from Computer Games. In European Conference on Computer Vision (ECCV) (LNCS), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.), Vol. 9906. Springer International Publishing, 102--118.Google ScholarGoogle ScholarCross RefCross Ref
  27. Troy A Rule. 2015. Airspace in an Age of Drones. BUL Rev., Vol. 95 (2015), 155.Google ScholarGoogle Scholar
  28. Yujiao Shi, Liu Liu, Xin Yu, and Hongdong Li. 2019. Spatial-Aware Feature Aggregation for Image based Cross-View Geo-Localization. In Advances in Neural Information Processing Systems. 10090--10100.Google ScholarGoogle Scholar
  29. Yujiao Shi, Xin Yu, Liu Liu, Tong Zhang, and Hongdong Li. 2020. Optimal Feature Transport for Cross-View Image Geo-Localization. AAAI Conference on Artificial Intelligence (2020).Google ScholarGoogle Scholar
  30. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  31. Yicong Tian, Chen Chen, and Mubarak Shah. 2017. Cross-view image matching for geo-localization in urban environments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3608--3616.Google ScholarGoogle ScholarCross RefCross Ref
  32. Akihiko Torii, Relja Arandjelovic, Josef Sivic, Masatoshi Okutomi, and Tomas Pajdla. 2015. 24/7 place recognition by view synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1808--1817.Google ScholarGoogle ScholarCross RefCross Ref
  33. Nam N Vo and James Hays. 2016. Localizing and orienting street views using overhead imagery. In European conference on computer vision. Springer, 494--509.Google ScholarGoogle ScholarCross RefCross Ref
  34. Scott Workman and Nathan Jacobs. 2015. On the location dependence of convolutional neural network features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 70--78.Google ScholarGoogle ScholarCross RefCross Ref
  35. Scott Workman, Richard Souvenir, and Nathan Jacobs. 2015. Wide-area image geolocalization with aerial reference imagery. In Proceedings of the IEEE International Conference on Computer Vision. 3961--3969.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wei Bian, and Yi Yang. 2019. Progressive learning for person re-identification with one example. IEEE Transactions on Image Processing, Vol. 28, 6 (2019), 2872--2881. https://doi.org/10.1109/TIP.2019.2891895Google ScholarGoogle ScholarCross RefCross Ref
  37. Zuxuan Wu, Xintong Han, Yen-Liang Lin, Mustafa Gokhan Uzunbas, Tom Goldstein, Ser Nam Lim, and Larry S Davis. 2018. Dcan: Dual channel-wise alignment networks for unsupervised scene adaptation. In Proceedings of the European Conference on Computer Vision (ECCV). 518--534.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yi Yang, Dong Xu, Feiping Nie, Jiebo Luo, and Yueting Zhuang. 2009. Ranking with local regression and global alignment for cross media retrieval. In Proceedings of the 17th ACM international conference on Multimedia. 175--184.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Qian Yu, Chaofeng Wang, Barbaros Cetiner, Stella X Yu, Frank Mckenna, Ertugrul Taciroglu, and Kincho H Law. 2019. Building Information Modeling and Classification by Visual Learning At A City Scale. NeurlPS Workshop (2019).Google ScholarGoogle Scholar
  40. Menghua Zhai, Zachary Bessinger, Scott Workman, and Nathan Jacobs. 2017. Predicting ground-level scene layout from aerial imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 867--875.Google ScholarGoogle ScholarCross RefCross Ref
  41. Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE international conference on computer vision. 1116--1124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, Mingliang Xu, and Yi-Dong Shen. 2020. Dual-Path Convolutional Image-Text Embeddings with Instance Loss. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 16, 2 (2020), 1--23. https://doi.org/10.1145/3383184Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. A Discriminatively Learned CNN Embedding for Person Re-identification. ACM Transactions on Multimedia Computing Communications and Applications (2017). https://doi.org/10.1145/3159171Google ScholarGoogle Scholar
  44. Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).Google ScholarGoogle Scholar
  45. Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Ling, and Qinghua Hu. 2018. Vision meets drones: A challenge. arXiv preprint arXiv:1804.07437 (2018).Google ScholarGoogle Scholar

Index Terms

  1. University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '20: Proceedings of the 28th ACM International Conference on Multimedia
        October 2020
        4889 pages
        ISBN:9781450379885
        DOI:10.1145/3394171

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 October 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader