ABSTRACT
We consider the problem of cross-view geo-localization. The primary challenge is to learn the robust feature against large viewpoint changes. Existing benchmarks can help, but are limited in the number of viewpoints. Image pairs, containing two viewpoints, e.g., satellite and ground, are usually provided, which may compromise the feature learning. Besides phone cameras and satellites, in this paper, we argue that drones could serve as the third platform to deal with the geo-localization problem. In contrast to traditional ground-view images, drone-view images meet fewer obstacles, e.g., trees, and provide a comprehensive view when flying around the target place. To verify the effectiveness of the drone platform, we introduce a new multi-view multi-source benchmark for drone-based geo-localization, named University-1652. University-1652 contains data from three platforms, i.e., synthetic drones, satellites and ground cameras of 1,652 university buildings around the world. To our knowledge, University-1652 is the first drone-based geo-localization dataset and enables two new tasks, i.e., drone-view target localization and drone navigation. As the name implies, drone-view target localization intends to predict the location of the target place via drone-view images. On the other hand, given a satellite-view query image, drone navigation is to drive the drone to the area of interest in the query. We use this dataset to analyze a variety of off-the-shelf CNN features and propose a strong CNN baseline on this challenging dataset. The experiments show that University-1652 helps the model to learn viewpoint-invariant features and also has good generalization ability in real-world scenarios.
Supplemental Material
- Regal Animus. 2015. Fly High 1 "UIUC" - Free Creative Commons Download. https://www.youtube.com/watch?v=jOC-WJW7GAg.Google Scholar
- Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5297--5307.Google ScholarCross Ref
- Simran Brar, Ralph Rabbat, Vishal Raithatha, George Runcie, and Andrew Yu. 2015. Drones for Deliveries. Sutardja Center for Entrepreneurship & Technology, University of California, Berkeley, Technical Report, Vol. 8 (2015), 2015.Google Scholar
- Sudong Cai, Yulan Guo, Salman Khan, Jiwei Hu, and Gongjian Wen. 2019. Ground-to-aerial image geo-localization with a hard exemplar reweighting triplet loss. In Proceedings of the IEEE International Conference on Computer Vision. 8391--8400.Google ScholarCross Ref
- Gal Chechik, Varun Sharma, Uri Shalit, and Samy Bengio. 2010. Large scale online learning of image similarity through ranking. Journal of Machine Learning Research, Vol. 11, Mar (2010), 1109--1135.Google ScholarDigital Library
- Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Transactions on Image Processing, Vol. 27, 8 (2018), 3893--3903.Google ScholarCross Ref
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.Google ScholarCross Ref
- FlyLow. 2016. Oxford / Amazing flight. https://www.youtube.com/watch?v=bs-rwVI_big.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017).Google Scholar
- Meng-Ru Hsieh, Yen-Liang Lin, and Winston H Hsu. 2017. Drone-based object counting by spatially regularized regional proposal network. In Proceedings of the IEEE International Conference on Computer Vision. 4145--4153.Google ScholarCross Ref
- Sixing Hu, Mengdan Feng, Rang MH Nguyen, and Gim Hee Lee. 2018. CVM-net: Cross-view matching network for image-based ground-to-aerial geo-localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7258--7267.Google ScholarCross Ref
- Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, and Li Fei-Fei. 2016. The unreasonable effectiveness of noisy data for fine-grained recognition. In European Conference on Computer Vision. Springer, 301--320.Google ScholarCross Ref
- Peike Li, Yunchao Wei, and Yi Yang. 2020. Meta Parsing Networks: Towards Generalized Few-shot Scene Parsing with Adaptive Metric Learning. In Proceedings of the 28th ACM international conference on Multimedia.Google ScholarDigital Library
- Siyi Li and Dit-Yan Yeung. 2017. Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. In Thirty-First AAAI Conference on Artificial Intelligence.Google Scholar
- Tsung-Yi Lin, Yin Cui, Serge Belongie, and James Hays. 2015. Learning deep representations for ground-to-aerial geolocalization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5007--5015.Google ScholarCross Ref
- Jinxian Liu, Bingbing Ni, Yichao Yan, Peng Zhou, Shuo Cheng, and Jianguo Hu. 2018. Pose transferrable person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4099--4108.Google ScholarCross Ref
- Liu Liu and Hongdong Li. 2019. Lending Orientation to Neural Networks for Cross-view Geo-localization. CVPR (2019).Google Scholar
- Liu Liu, Hongdong Li, and Yuchao Dai. 2019. Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization. In Proceedings of the IEEE International Conference on Computer Vision. 2570--2579.Google ScholarCross Ref
- Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. 2016. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4004--4012.Google ScholarCross Ref
- James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In CVPR.Google Scholar
- James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR.Google Scholar
- F. Radenović, A. Iscen, G. Tolias, Y. Avrithis, and O. Chum. 2018a. Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking. In CVPR.Google Scholar
- Filip Radenović, Giorgos Tolias, and Ondvr ej Chum. 2018b. Fine-tuning CNN image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 7 (2018), 1655--1668.Google Scholar
- Krishna Regmi and Mubarak Shah. 2019. Bridging the domain gap for ground-to-aerial image matching. In Proceedings of the IEEE International Conference on Computer Vision. 470--479.Google ScholarCross Ref
- Stephan R. Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun. 2016. Playing for Data: Ground Truth from Computer Games. In European Conference on Computer Vision (ECCV) (LNCS), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.), Vol. 9906. Springer International Publishing, 102--118.Google ScholarCross Ref
- Troy A Rule. 2015. Airspace in an Age of Drones. BUL Rev., Vol. 95 (2015), 155.Google Scholar
- Yujiao Shi, Liu Liu, Xin Yu, and Hongdong Li. 2019. Spatial-Aware Feature Aggregation for Image based Cross-View Geo-Localization. In Advances in Neural Information Processing Systems. 10090--10100.Google Scholar
- Yujiao Shi, Xin Yu, Liu Liu, Tong Zhang, and Hongdong Li. 2020. Optimal Feature Transport for Cross-View Image Geo-Localization. AAAI Conference on Artificial Intelligence (2020).Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Yicong Tian, Chen Chen, and Mubarak Shah. 2017. Cross-view image matching for geo-localization in urban environments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3608--3616.Google ScholarCross Ref
- Akihiko Torii, Relja Arandjelovic, Josef Sivic, Masatoshi Okutomi, and Tomas Pajdla. 2015. 24/7 place recognition by view synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1808--1817.Google ScholarCross Ref
- Nam N Vo and James Hays. 2016. Localizing and orienting street views using overhead imagery. In European conference on computer vision. Springer, 494--509.Google ScholarCross Ref
- Scott Workman and Nathan Jacobs. 2015. On the location dependence of convolutional neural network features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 70--78.Google ScholarCross Ref
- Scott Workman, Richard Souvenir, and Nathan Jacobs. 2015. Wide-area image geolocalization with aerial reference imagery. In Proceedings of the IEEE International Conference on Computer Vision. 3961--3969.Google ScholarDigital Library
- Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wei Bian, and Yi Yang. 2019. Progressive learning for person re-identification with one example. IEEE Transactions on Image Processing, Vol. 28, 6 (2019), 2872--2881. https://doi.org/10.1109/TIP.2019.2891895Google ScholarCross Ref
- Zuxuan Wu, Xintong Han, Yen-Liang Lin, Mustafa Gokhan Uzunbas, Tom Goldstein, Ser Nam Lim, and Larry S Davis. 2018. Dcan: Dual channel-wise alignment networks for unsupervised scene adaptation. In Proceedings of the European Conference on Computer Vision (ECCV). 518--534.Google ScholarDigital Library
- Yi Yang, Dong Xu, Feiping Nie, Jiebo Luo, and Yueting Zhuang. 2009. Ranking with local regression and global alignment for cross media retrieval. In Proceedings of the 17th ACM international conference on Multimedia. 175--184.Google ScholarDigital Library
- Qian Yu, Chaofeng Wang, Barbaros Cetiner, Stella X Yu, Frank Mckenna, Ertugrul Taciroglu, and Kincho H Law. 2019. Building Information Modeling and Classification by Visual Learning At A City Scale. NeurlPS Workshop (2019).Google Scholar
- Menghua Zhai, Zachary Bessinger, Scott Workman, and Nathan Jacobs. 2017. Predicting ground-level scene layout from aerial imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 867--875.Google ScholarCross Ref
- Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE international conference on computer vision. 1116--1124.Google ScholarDigital Library
- Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, Mingliang Xu, and Yi-Dong Shen. 2020. Dual-Path Convolutional Image-Text Embeddings with Instance Loss. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 16, 2 (2020), 1--23. https://doi.org/10.1145/3383184Google ScholarDigital Library
- Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. A Discriminatively Learned CNN Embedding for Person Re-identification. ACM Transactions on Multimedia Computing Communications and Applications (2017). https://doi.org/10.1145/3159171Google Scholar
- Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).Google Scholar
- Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Ling, and Qinghua Hu. 2018. Vision meets drones: A challenge. arXiv preprint arXiv:1804.07437 (2018).Google Scholar
Index Terms
- University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization
Recommendations
AST: An Attention-Guided Segment Transformer for Drone-Based Cross-View Geo-Localization
Computational Visual MediaAbstractTo tackle the problem of drone-based cross-view geo-localization, we address how to match drone-view images and satellite-view images, which is extremely challenging due to the variability of view angles and view distances. Inspired by how humans ...
An Orthogonal Fusion of Local and Global Features for Drone-based Geo-localization
UAVM '23: Proceedings of the 2023 Workshop on UAVs in Multimedia: Capturing the World from a New PerspectiveDrone-based geo-localization is an image retrieval task which is the foundation of many drone-based multimedia applications, such as object detection, drone navigation and mapping. It is challenging due to the large visual appearance changes caused by ...
Dual-branch Pattern and Multi-scale Context Facilitate Cross-view Geo-localization
UAVM '23: Proceedings of the 2023 Workshop on UAVs in Multimedia: Capturing the World from a New PerspectiveCross-view geo-localization aims to locate the target image of the same geographic location from different viewpoints, which is a challenging task in the field of computer vision. Due to the interference of similar images and the surrounding environment ...
Comments