research-article

Urban Perception: Sensing Cities via a Deep Interactive Multi-task Learning Framework

Authors:
Weili Guan

Monash University, Melbourne, Australia

Monash University, Melbourne, Australia
View Profile

,
Zhaozheng Chen

Singapore Management University, Singapore

Singapore Management University, Singapore
View Profile

,
Fuli Feng

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Weifeng Liu

China University of Petroleum (East China), China

China University of Petroleum (East China), China
View Profile

,
Liqiang Nie

Shandong University, China

Shandong University, China
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17 Issue 1sArticle No.: 13pp 1–20https://doi.org/10.1145/3424115

Published:31 March 2021Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Social scientists have shown evidence that visual perceptions of urban attributes, such as safe, wealthy, and beautiful perspectives of the given cities, are highly correlated to the residents’ behaviors and quality of life. Despite their significance, measuring visual perceptions of urban attributes is challenging due to the following facts: (1) Visual perceptions are subjectively contradistinctive rather than absolute. (2) Perception comparisons between image pairs are usually conducted region by region, and highly related to the specific urban attributes. And (3) the urban attributes have both the shared and specific information. To address these problems, in this article, we present a Deep inteRActive Multi-task leArning scheme, DRAMA for short. DRAMA comparatively quantifies the perceptions of urban attributes by jointly integrating the pairwise comparisons, regional interactions, and urban attribute correlations within a unified deep scheme. In DRAMA, each urban attribute is treated as a task, whereby the task-sharing and the task-specific information is fully explored. By conducting extensive experiments over a public large-scale benchmark dataset, it is demonstrated that our proposed DRAMA scheme outperforms several state-of-the-art baselines. Meanwhile, we applied the pairwise comparisons of our DRAMA model to further quantify the urban attributes and hence rank cities with respect to the given urban attributes. As a byproduct, we have released the codes and parameter settings to facilitate other researches.

References

James Q. Wilson. 1982. Broken windows: The police and neighborhood safety. Atlan. Month. 249, 2 (1982), 29–38.Google Scholar
Kees Keizer, Siegwart Lindenberg, and Linda Steg. 2008. The spreading of disorder. Science 322, 5908 (2008), 1681–1685.Google ScholarCross Ref
A. J. Milam, C. D. M. Furrholden, and P. J. Leaf. 2010. Perceived school and neighborhood safety, neighborhood violence and academic achievement in urban school children. Urban Rev. 42, 5 (2010), 458–467.Google ScholarCross Ref
Deborah A. Cohen, Karen Mason, Ariane Bedimo, Richard Scribner, Victoria Basolo, and Thomas A. Farley. 2003. Neighborhood physical conditions and health. Amer. J. Pub. Health 93, 3 (2003), 467–71.Google ScholarCross Ref
Fredrik N. Piro, Øyvind Nœss, and Bjørgulf Claussen. 2006. Physical activity among elderly people in a city population: The influence of neighbourhood level violence and self perceived safety. J. Epidem. Commun. Health 60, 7 (2006), 626–632.Google ScholarCross Ref
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
Rajeev Ranjan, Vishal M. Patel, and Rama Chellappa. 2016. HyperFace: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2016), 1–1. https://pubmed.ncbi.nlm.nih.gov/29990235/.Google Scholar
Abhimanyu Dubey, Nikhil Naik, Devi Parikh, Ramesh Raskar, and César A. Hidalgo. 2016. Deep learning the city: Quantifying urban perception at a global scale. In Proceedings of the European Conference on Computer Vision. IEEE, 196–212.Google Scholar
Ian P. Howard. 1996. Alhazen’s neglected discoveries of visual phenomena. Perception 25, 10 (1996), 1203–1217.Google ScholarCross Ref
Omar Khaleefa. 1999. Who is the founder of psychophysics and experimental psychology? Amer. J. Islam. Soc. Sci. 16, 2 (1999), 1.Google ScholarCross Ref
Pascal Mamassian, Michael Landy, and Laurence T. Maloney. 2002. Bayesian modelling of visual perception. Probabil. Mod. Brain (2002), 13–36. https://psycnet.apa.org/record/2002-02646-001.Google Scholar
Xuaner Zhang, Ren Ng, and Qifeng Chen. 2018. Single image reflection separation with perceptual losses. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, 4786–4794.Google ScholarCross Ref
Philip Salesses, Katja Schechtner, and César A. Hidalgo. 2013. The collaborative image of the city: Mapping the inequality of urban perception. PloS One 8, 7 (2013), e68400.Google ScholarCross Ref
Marco Maggini, Franco Scarselli, Leonardo Rigutini, and Tiziano Papini. 2008. SortNet: Learning to rank by a neural-based sorting algorithm. In Proceedings of the International ACM SIGIR Conference. ACM, 76–79.Google Scholar
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the International Conference on Machine Learning. ACM, 89–96. Google ScholarDigital Library
Ming-Feng Tsai, Tie-Yan Liu, Tao Qin, Hsin-Hsi Chen, and Wei-Ying Ma. 2007. FRank: A ranking method with fidelity loss. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 383–390. Google ScholarDigital Library
Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. 2003. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, Nov. (2003), 933–969. Google ScholarDigital Library
Ralf Herbrich. 2000. Large margin rank boundaries for ordinal regression. Adv. Large Marg. Classif. (2000), 115–132. https://www.bibsonomy.org/bibtex/c1aab52010073f7f01771dabde1e5b9a.Google Scholar
Zhaohui Zheng, Keke Chen, Gordon Sun, and Hongyuan Zha. 2007. A regression framework for learning ranking functions using relative relevance judgments. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 287–294. Google ScholarDigital Library
Qingbo Wu, Hongliang Li, Zhou Wang, Fanman Meng, Bing Luo, Wei Li, and King N. Ngan. 2017. Blind image quality assessment based on rank-order regularized regression. IEEE Trans. Multim. 19, 11 (2017), 2490–2504.Google ScholarCross Ref
Martin Engilberge, Louis Chevallier, Patrick Pérez, and Matthieu Cord. 2019. SoDeep: A sorting deep net to learn ranking loss surrogates. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10792–10801.Google ScholarCross Ref
Ravi Kiran Sarvadevabhatla, Isht Dwivedi, Abhijat Biswas, Sahil Manocha et al. 2017. SketchParse: Towards rich descriptions for poorly drawn sketches using multi-task hierarchical deep networks. In Proceedings of the ACM on Multimedia Conference. ACM, 10–18. Google ScholarDigital Library
Keke He, Zhanxiong Wang, Yanwei Fu, Rui Feng, Yu-Gang Jiang, and Xiangyang Xue. 2017. Adaptively weighted multi-task deep network for person attribute classification. In Proceedings of the ACM on Multimedia Conference. ACM, 1636–1644. Google ScholarDigital Library
Xuelong Li, Zhigang Wang, and Xiaoqiang Lu. 2017. A multi-task framework for weather recognition. In Proceedings of the ACM on Multimedia Conference. ACM, 1318–1326. Google ScholarDigital Library
Liqiang Nie, Luming Zhang, Yi Yang, Meng Wang, Richang Hong, and Tat-Seng Chua. 2015. Beyond doctors: Future health prediction from multimedia and multimodal observations. In Proceedings of the ACM International Conference on Multimedia. ACM, 591–600. Google ScholarDigital Library
Foteini Markatopoulou, Vasileios Mezaris, and Ioannis Patras. 2016. Deep multi-task learning with label correlation constraint for video concept detection. In Proceedings of the ACM on Multimedia Conference. ACM, 501–505. Google ScholarDigital Library
Yu Zhang and Qiang Yang. 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114 (2017).Google Scholar
Changxing Ding, Chang Xu, and Dacheng Tao. 2015. Multi-task pose-invariant face recognition. IEEE Trans. Image Proc. 24, 3 (2015), 980–993.Google ScholarCross Ref
Yong Luo, Yonggang Wen, Dacheng Tao, Jie Gui, and Chao Xu. 2016. Large margin multi-modal multi-task feature extraction for image classification. IEEE Trans. Image Proc. 25, 1 (2016), 414–427.Google ScholarCross Ref
Lianyang Ma, Xiaokang Yang, and Dacheng Tao. 2014. Person re-identification over camera networks using multi-task distance metric learning. IEEE Trans. Image Proc. 23, 8 (2014), 3656–3670.Google ScholarCross Ref
Wenqing Chu, Yao Liu, Chen Shen, Deng Cai, and Xian-Sheng Hua. 2018. Multi-task vehicle detection with region-of-interest voting. IEEE Trans. Image Proc. 27, 1 (2018), 432–441.Google ScholarCross Ref
Long Xu, Jia Li, Weisi Lin, Yongbing Zhang, Lin Ma, Yuming Fang, and Yihua Yan. 2016. Multi-task rank learning for image quality assessment. IEEE Trans. Circ. Syst. Vid. Technol. 27, 9 (2016), 1833–1843.Google ScholarCross Ref
Qiang Zhang and Martin D. Levine. 2016. Robust multi-focus image fusion using multi-task sparse representation and spatial context. IEEE Trans. Image Proc. 25, 5 (2016), 2045–2058. Google ScholarDigital Library
Weiqing Min, Shuhuan Mei, Linhu Liu, Yi Wang, and Shuqiang Jiang. 2019. Multi-task deep relative attribute learning for visual urban perception. IEEE Trans. Image Proc. 29 (2019), 657–669.Google ScholarDigital Library
Jens Preiss, Felipe Fernandes, and Philipp Urban. 2014. Color-image quality assessment: From prediction to optimization. IEEE Trans. Image Proc. 23, 3 (2014), 1366–1378. Google ScholarDigital Library
Ingmar Lissner and Philipp Urban. 2012. Toward a unified color space for perception-based image processing. IEEE Trans. Image Proc. 21, 3 (2012), 1153–1168. Google ScholarDigital Library
Lark Kwon Choi, Jaehee You, and Alan Conrad Bovik. 2015. Referenceless prediction of perceptual fog density and perceptual image defogging. IEEE Trans. Image Proc. 24, 11 (2015), 3888–3901.Google ScholarCross Ref
Vasileios Argyriou. 2011. Sub-hexagonal phase correlation for motion estimation. IEEE Trans. Image Proc. 20, 1 (2011), 110–120. Google ScholarDigital Library
Yichao Yan, Jingwei Xu, Bingbing Ni, Wendong Zhang, and Xiaokang Yang. 2017. Skeleton-aided articulated motion generation. In Proceedings of the ACM on Multimedia Conference. ACM, 199–207. Google ScholarDigital Library
Bernard E. Harcourt. 1998. Reflecting on the subject: A critique of the social influence conception of deterrence, the broken windows theory, and order-maintenance policing New York style. Soc. Sci. Electron. Publish. 97, 2 (1998), 291–389.Google Scholar
William Bratton and George Kelling. 2006. There are no cracks in the broken windows. Nat. Rev. 28 (2006).Google Scholar
D. Cohen, R. Spear, Scribner, P. Kissinger, K. Mason, and J. Wildgen. 2000. “Broken windows” and the risk of gonorrhea. Amer. J. Pub. Health 90, 2 (2000), 230–230.Google ScholarCross Ref
C. E. Ross and J. Mirowsky. 2001. Neighborhood disadvantage, disorder, and health.J. Health Soc. Behav. 42, 3 (2001), 258–258.Google ScholarCross Ref
Plos One Staff. 2015. Correction: The collaborative image of the city: Mapping the inequality of urban perception. PLoS One 10, 3 (2015), e0119352.Google Scholar
Nikhil Naik, Jade Philipoom, Ramesh Raskar, and César A. Hidalgo. 2014. Streetscore—Predicting the perceived safety of one million streetscapes. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, 793–799. Google ScholarDigital Library
Nikhil Naik, Ramesh Raskar, and Cesar A. Hidalgo. 2016. Cities are physical too: Using computer vision to measure the quality and impact of urban appearance. Amer. Econ. Rev. 106, 5 (2016), 128–132.Google ScholarCross Ref
Vicki Been, Ingrid Gould Ellen, Michael Gedal, Edward Glaeser, and Brian J. Mccabe. 2016. Preserving history or restricting development? The heterogeneous effects of historic districts on local housing markets in New York City. J. Urb. Econ. 92 (2016), 16–30.Google ScholarCross Ref
Nikhil Naik, Scott Duke Kominers, Ramesh Raskar, Edward L. Glaeser, and César Hidalgo. 2015. Preserving history or restricting development? The co-evolution of physical, social, and economic change in five major U.S. cities. Soc. Sci. Electron. Pub. (2015). https://www.hbs.edu/faculty/Pages/item.aspx?num=50631.Google Scholar
Chester Harvey, Lisa Aultman-Hall, Stephanie E. Hurley, and Austin Troy. 2015. Effects of skeletal streetscape design on perceived safety. Landsc. Urb. Plann. 142 (2015), 18–28.Google ScholarCross Ref
Yongchao Xu, Qizheng Yang, Chaoran Cui, Cheng Shi, Guangle Song, Xiaohui Han, and Yilong Yin. 2019. Visual urban perception with deep semantic-aware network. In Proceedings of the International Conference on Multimedia Modeling. Springer, 28–40.Google ScholarCross Ref
Sebastian Ruder. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017).Google Scholar
Ran He, Man Zhang, Liang Wang, Ye Ji, and Qiyue Yin. 2015. Cross-modal subspace learning via pairwise constraints. IEEE Trans. Image Proc. 24, 12 (2015), 5543–5556.Google ScholarCross Ref
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. In Proceedings of the International Conference on Neural Information Processing Systems, Workshop on Machine Learning Systems.Google Scholar
Ralf Herbrich, Tom Minka, and Thore Graepel. 2007. TrueSkillTM: A Bayesian skill rating system. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 569–576. Google ScholarDigital Library
Sahand Negahban, Sewoong Oh, and Devavrat Shah. 2012. Iterative ranking from pair-wise comparisons. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2474–2482. Google ScholarDigital Library
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4 (2002), 422–446. Google ScholarDigital Library

Index Terms

Urban Perception: Sensing Cities via a Deep Interactive Multi-task Learning Framework
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

Urban Perception of Commercial Activeness from Satellite Images and Streetscapes
WWW '18: Companion Proceedings of the The Web Conference 2018

People can percept social attributes from streetscapes such as safety, richness, and happiness by means of visual perception, which inspires the research in terms of urban perception. To the best of our knowledge, this is the first work focused on ...
Read More
Quantifying Urban Safety Perception on Street View Images
WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

In the last 40 years, Urban perception has become an important research area covering several fields, such as criminology, psychology, urban planning, Broken windows theory. It aims to analyze and interpret the behavior of the perception in cities. ...
Read More
Looking South: Learning Urban Perception in Developing Cities
Special Issue on Group ’18 and Regular Papers

Mobile and social technologies are providing new opportunities to document, characterize, and gather impressions of urban environments. In this article, we present a study that examines urban perceptions of three cities in central Mexico; the study ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 1s
January 2021
353 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3453990
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2021 Association for Computing Machinery.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 March 2021
- Revised: 1 September 2020
- Accepted: 1 September 2020
- Received: 1 January 2020
Published in tomm Volume 17, Issue 1s

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Urban perception
urban attributes
regional interactions
deep multi-task learning
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 289
  Total Downloads
- Downloads (Last 12 months)62
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Urban Perception: Sensing Cities via a Deep Interactive Multi-task Learning Framework

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

Urban Perception of Commercial Activeness from Satellite Images and Streetscapes

Quantifying Urban Safety Perception on Street View Images

Looking South: Learning Urban Perception in Developing Cities