research-article

University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization

Authors:
Zhedong Zheng

University of Technology Sydney & Southern University of Science and Technology, Ultimo, Australia

University of Technology Sydney & Southern University of Science and Technology, Ultimo, Australia
View Profile

,
Yunchao Wei

University of Technology Sydney & Southern University of Science and Technology, Ultimo, Australia

University of Technology Sydney & Southern University of Science and Technology, Ultimo, Australia
View Profile

,
Yi Yang

University of Technology Sydney & Southern University of Science and Technology, Ultimo, Australia

University of Technology Sydney & Southern University of Science and Technology, Ultimo, Australia
View Profile

MM '20: Proceedings of the 28th ACM International Conference on MultimediaOctober 2020Pages 1395–1403https://doi.org/10.1145/3394171.3413896

Published:12 October 2020Publication History

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 1395–1403

ABSTRACT

We consider the problem of cross-view geo-localization. The primary challenge is to learn the robust feature against large viewpoint changes. Existing benchmarks can help, but are limited in the number of viewpoints. Image pairs, containing two viewpoints, e.g., satellite and ground, are usually provided, which may compromise the feature learning. Besides phone cameras and satellites, in this paper, we argue that drones could serve as the third platform to deal with the geo-localization problem. In contrast to traditional ground-view images, drone-view images meet fewer obstacles, e.g., trees, and provide a comprehensive view when flying around the target place. To verify the effectiveness of the drone platform, we introduce a new multi-view multi-source benchmark for drone-based geo-localization, named University-1652. University-1652 contains data from three platforms, i.e., synthetic drones, satellites and ground cameras of 1,652 university buildings around the world. To our knowledge, University-1652 is the first drone-based geo-localization dataset and enables two new tasks, i.e., drone-view target localization and drone navigation. As the name implies, drone-view target localization intends to predict the location of the target place via drone-view images. On the other hand, given a satellite-view query image, drone navigation is to drive the drone to the area of interest in the query. We use this dataset to analyze a variety of off-the-shelf CNN features and propose a strong CNN baseline on this challenging dataset. The experiments show that University-1652 helps the model to learn viewpoint-invariant features and also has good generalization ability in real-world scenarios.

Supplemental Material

3394171.3413896.mp4

mp4

8.2 MB

Download

References

Regal Animus. 2015. Fly High 1 "UIUC" - Free Creative Commons Download. https://www.youtube.com/watch?v=jOC-WJW7GAg.Google Scholar
Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5297--5307.Google ScholarCross Ref
Simran Brar, Ralph Rabbat, Vishal Raithatha, George Runcie, and Andrew Yu. 2015. Drones for Deliveries. Sutardja Center for Entrepreneurship & Technology, University of California, Berkeley, Technical Report, Vol. 8 (2015), 2015.Google Scholar
Sudong Cai, Yulan Guo, Salman Khan, Jiwei Hu, and Gongjian Wen. 2019. Ground-to-aerial image geo-localization with a hard exemplar reweighting triplet loss. In Proceedings of the IEEE International Conference on Computer Vision. 8391--8400.Google ScholarCross Ref
Gal Chechik, Varun Sharma, Uri Shalit, and Samy Bengio. 2010. Large scale online learning of image similarity through ranking. Journal of Machine Learning Research, Vol. 11, Mar (2010), 1109--1135.Google ScholarDigital Library
Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Transactions on Image Processing, Vol. 27, 8 (2018), 3893--3903.Google ScholarCross Ref
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.Google ScholarCross Ref
FlyLow. 2016. Oxford / Amazing flight. https://www.youtube.com/watch?v=bs-rwVI_big.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017).Google Scholar
Meng-Ru Hsieh, Yen-Liang Lin, and Winston H Hsu. 2017. Drone-based object counting by spatially regularized regional proposal network. In Proceedings of the IEEE International Conference on Computer Vision. 4145--4153.Google ScholarCross Ref
Sixing Hu, Mengdan Feng, Rang MH Nguyen, and Gim Hee Lee. 2018. CVM-net: Cross-view matching network for image-based ground-to-aerial geo-localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7258--7267.Google ScholarCross Ref
Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, and Li Fei-Fei. 2016. The unreasonable effectiveness of noisy data for fine-grained recognition. In European Conference on Computer Vision. Springer, 301--320.Google ScholarCross Ref
Peike Li, Yunchao Wei, and Yi Yang. 2020. Meta Parsing Networks: Towards Generalized Few-shot Scene Parsing with Adaptive Metric Learning. In Proceedings of the 28th ACM international conference on Multimedia.Google ScholarDigital Library
Siyi Li and Dit-Yan Yeung. 2017. Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. In Thirty-First AAAI Conference on Artificial Intelligence.Google Scholar
Tsung-Yi Lin, Yin Cui, Serge Belongie, and James Hays. 2015. Learning deep representations for ground-to-aerial geolocalization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5007--5015.Google ScholarCross Ref
Jinxian Liu, Bingbing Ni, Yichao Yan, Peng Zhou, Shuo Cheng, and Jianguo Hu. 2018. Pose transferrable person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4099--4108.Google ScholarCross Ref
Liu Liu and Hongdong Li. 2019. Lending Orientation to Neural Networks for Cross-view Geo-localization. CVPR (2019).Google Scholar
Liu Liu, Hongdong Li, and Yuchao Dai. 2019. Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization. In Proceedings of the IEEE International Conference on Computer Vision. 2570--2579.Google ScholarCross Ref
Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. 2016. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4004--4012.Google ScholarCross Ref
James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In CVPR.Google Scholar
James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR.Google Scholar
F. Radenović, A. Iscen, G. Tolias, Y. Avrithis, and O. Chum. 2018a. Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking. In CVPR.Google Scholar
Filip Radenović, Giorgos Tolias, and Ondvr ej Chum. 2018b. Fine-tuning CNN image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 7 (2018), 1655--1668.Google Scholar
Krishna Regmi and Mubarak Shah. 2019. Bridging the domain gap for ground-to-aerial image matching. In Proceedings of the IEEE International Conference on Computer Vision. 470--479.Google ScholarCross Ref
Stephan R. Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun. 2016. Playing for Data: Ground Truth from Computer Games. In European Conference on Computer Vision (ECCV) (LNCS), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.), Vol. 9906. Springer International Publishing, 102--118.Google ScholarCross Ref
Troy A Rule. 2015. Airspace in an Age of Drones. BUL Rev., Vol. 95 (2015), 155.Google Scholar
Yujiao Shi, Liu Liu, Xin Yu, and Hongdong Li. 2019. Spatial-Aware Feature Aggregation for Image based Cross-View Geo-Localization. In Advances in Neural Information Processing Systems. 10090--10100.Google Scholar
Yujiao Shi, Xin Yu, Liu Liu, Tong Zhang, and Hongdong Li. 2020. Optimal Feature Transport for Cross-View Image Geo-Localization. AAAI Conference on Artificial Intelligence (2020).Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
Yicong Tian, Chen Chen, and Mubarak Shah. 2017. Cross-view image matching for geo-localization in urban environments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3608--3616.Google ScholarCross Ref
Akihiko Torii, Relja Arandjelovic, Josef Sivic, Masatoshi Okutomi, and Tomas Pajdla. 2015. 24/7 place recognition by view synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1808--1817.Google ScholarCross Ref
Nam N Vo and James Hays. 2016. Localizing and orienting street views using overhead imagery. In European conference on computer vision. Springer, 494--509.Google ScholarCross Ref
Scott Workman and Nathan Jacobs. 2015. On the location dependence of convolutional neural network features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 70--78.Google ScholarCross Ref
Scott Workman, Richard Souvenir, and Nathan Jacobs. 2015. Wide-area image geolocalization with aerial reference imagery. In Proceedings of the IEEE International Conference on Computer Vision. 3961--3969.Google ScholarDigital Library
Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wei Bian, and Yi Yang. 2019. Progressive learning for person re-identification with one example. IEEE Transactions on Image Processing, Vol. 28, 6 (2019), 2872--2881. https://doi.org/10.1109/TIP.2019.2891895Google ScholarCross Ref
Zuxuan Wu, Xintong Han, Yen-Liang Lin, Mustafa Gokhan Uzunbas, Tom Goldstein, Ser Nam Lim, and Larry S Davis. 2018. Dcan: Dual channel-wise alignment networks for unsupervised scene adaptation. In Proceedings of the European Conference on Computer Vision (ECCV). 518--534.Google ScholarDigital Library
Yi Yang, Dong Xu, Feiping Nie, Jiebo Luo, and Yueting Zhuang. 2009. Ranking with local regression and global alignment for cross media retrieval. In Proceedings of the 17th ACM international conference on Multimedia. 175--184.Google ScholarDigital Library
Qian Yu, Chaofeng Wang, Barbaros Cetiner, Stella X Yu, Frank Mckenna, Ertugrul Taciroglu, and Kincho H Law. 2019. Building Information Modeling and Classification by Visual Learning At A City Scale. NeurlPS Workshop (2019).Google Scholar
Menghua Zhai, Zachary Bessinger, Scott Workman, and Nathan Jacobs. 2017. Predicting ground-level scene layout from aerial imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 867--875.Google ScholarCross Ref
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE international conference on computer vision. 1116--1124.Google ScholarDigital Library
Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, Mingliang Xu, and Yi-Dong Shen. 2020. Dual-Path Convolutional Image-Text Embeddings with Instance Loss. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 16, 2 (2020), 1--23. https://doi.org/10.1145/3383184Google ScholarDigital Library
Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. A Discriminatively Learned CNN Embedding for Person Re-identification. ACM Transactions on Multimedia Computing Communications and Applications (2017). https://doi.org/10.1145/3159171Google Scholar
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).Google Scholar
Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Ling, and Qinghua Hu. 2018. Vision meets drones: A challenge. arXiv preprint arXiv:1804.07437 (2018).Google Scholar

Index Terms

University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
        Visual content-based indexing and retrieval

Recommendations

AST: An Attention-Guided Segment Transformer for Drone-Based Cross-View Geo-Localization
Computational Visual Media
Abstract
To tackle the problem of drone-based cross-view geo-localization, we address how to match drone-view images and satellite-view images, which is extremely challenging due to the variability of view angles and view distances. Inspired by how humans ...
Read More
An Orthogonal Fusion of Local and Global Features for Drone-based Geo-localization
UAVM '23: Proceedings of the 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective

Drone-based geo-localization is an image retrieval task which is the foundation of many drone-based multimedia applications, such as object detection, drone navigation and mapping. It is challenging due to the large visual appearance changes caused by ...
Read More
Dual-branch Pattern and Multi-scale Context Facilitate Cross-view Geo-localization
UAVM '23: Proceedings of the 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective

Cross-view geo-localization aims to locate the target image of the same geographic location from different viewpoints, which is a challenging task in the field of computer vision. Due to the interference of similar images and the surrounding environment ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
benchmark
drone
geo-localization
image retrieval
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 108
  Total Citations
  View Citations
- 539
  Total Downloads
- Downloads (Last 12 months)154
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

AST: An Attention-Guided Segment Transformer for Drone-Based Cross-View Geo-Localization

An Orthogonal Fusion of Local and Global Features for Drone-based Geo-localization

Dual-branch Pattern and Multi-scale Context Facilitate Cross-view Geo-localization