skip to main content
research-article

Urban Perception: Sensing Cities via a Deep Interactive Multi-task Learning Framework

Published:31 March 2021Publication History
Skip Abstract Section

Abstract

Social scientists have shown evidence that visual perceptions of urban attributes, such as safe, wealthy, and beautiful perspectives of the given cities, are highly correlated to the residents’ behaviors and quality of life. Despite their significance, measuring visual perceptions of urban attributes is challenging due to the following facts: (1) Visual perceptions are subjectively contradistinctive rather than absolute. (2) Perception comparisons between image pairs are usually conducted region by region, and highly related to the specific urban attributes. And (3) the urban attributes have both the shared and specific information. To address these problems, in this article, we present a Deep inteRActive Multi-task leArning scheme, DRAMA for short. DRAMA comparatively quantifies the perceptions of urban attributes by jointly integrating the pairwise comparisons, regional interactions, and urban attribute correlations within a unified deep scheme. In DRAMA, each urban attribute is treated as a task, whereby the task-sharing and the task-specific information is fully explored. By conducting extensive experiments over a public large-scale benchmark dataset, it is demonstrated that our proposed DRAMA scheme outperforms several state-of-the-art baselines. Meanwhile, we applied the pairwise comparisons of our DRAMA model to further quantify the urban attributes and hence rank cities with respect to the given urban attributes. As a byproduct, we have released the codes and parameter settings to facilitate other researches.

References

  1. James Q. Wilson. 1982. Broken windows: The police and neighborhood safety. Atlan. Month. 249, 2 (1982), 29–38.Google ScholarGoogle Scholar
  2. Kees Keizer, Siegwart Lindenberg, and Linda Steg. 2008. The spreading of disorder. Science 322, 5908 (2008), 1681–1685.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. J. Milam, C. D. M. Furrholden, and P. J. Leaf. 2010. Perceived school and neighborhood safety, neighborhood violence and academic achievement in urban school children. Urban Rev. 42, 5 (2010), 458–467.Google ScholarGoogle ScholarCross RefCross Ref
  4. Deborah A. Cohen, Karen Mason, Ariane Bedimo, Richard Scribner, Victoria Basolo, and Thomas A. Farley. 2003. Neighborhood physical conditions and health. Amer. J. Pub. Health 93, 3 (2003), 467–71.Google ScholarGoogle ScholarCross RefCross Ref
  5. Fredrik N. Piro, Øyvind Nœss, and Bjørgulf Claussen. 2006. Physical activity among elderly people in a city population: The influence of neighbourhood level violence and self perceived safety. J. Epidem. Commun. Health 60, 7 (2006), 626–632.Google ScholarGoogle ScholarCross RefCross Ref
  6. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  7. Rajeev Ranjan, Vishal M. Patel, and Rama Chellappa. 2016. HyperFace: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2016), 1–1. https://pubmed.ncbi.nlm.nih.gov/29990235/.Google ScholarGoogle Scholar
  8. Abhimanyu Dubey, Nikhil Naik, Devi Parikh, Ramesh Raskar, and César A. Hidalgo. 2016. Deep learning the city: Quantifying urban perception at a global scale. In Proceedings of the European Conference on Computer Vision. IEEE, 196–212.Google ScholarGoogle Scholar
  9. Ian P. Howard. 1996. Alhazen’s neglected discoveries of visual phenomena. Perception 25, 10 (1996), 1203–1217.Google ScholarGoogle ScholarCross RefCross Ref
  10. Omar Khaleefa. 1999. Who is the founder of psychophysics and experimental psychology? Amer. J. Islam. Soc. Sci. 16, 2 (1999), 1.Google ScholarGoogle ScholarCross RefCross Ref
  11. Pascal Mamassian, Michael Landy, and Laurence T. Maloney. 2002. Bayesian modelling of visual perception. Probabil. Mod. Brain (2002), 13–36. https://psycnet.apa.org/record/2002-02646-001.Google ScholarGoogle Scholar
  12. Xuaner Zhang, Ren Ng, and Qifeng Chen. 2018. Single image reflection separation with perceptual losses. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, 4786–4794.Google ScholarGoogle ScholarCross RefCross Ref
  13. Philip Salesses, Katja Schechtner, and César A. Hidalgo. 2013. The collaborative image of the city: Mapping the inequality of urban perception. PloS One 8, 7 (2013), e68400.Google ScholarGoogle ScholarCross RefCross Ref
  14. Marco Maggini, Franco Scarselli, Leonardo Rigutini, and Tiziano Papini. 2008. SortNet: Learning to rank by a neural-based sorting algorithm. In Proceedings of the International ACM SIGIR Conference. ACM, 76–79.Google ScholarGoogle Scholar
  15. Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the International Conference on Machine Learning. ACM, 89–96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ming-Feng Tsai, Tie-Yan Liu, Tao Qin, Hsin-Hsi Chen, and Wei-Ying Ma. 2007. FRank: A ranking method with fidelity loss. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 383–390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. 2003. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, Nov. (2003), 933–969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ralf Herbrich. 2000. Large margin rank boundaries for ordinal regression. Adv. Large Marg. Classif. (2000), 115–132. https://www.bibsonomy.org/bibtex/c1aab52010073f7f01771dabde1e5b9a.Google ScholarGoogle Scholar
  19. Zhaohui Zheng, Keke Chen, Gordon Sun, and Hongyuan Zha. 2007. A regression framework for learning ranking functions using relative relevance judgments. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 287–294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Qingbo Wu, Hongliang Li, Zhou Wang, Fanman Meng, Bing Luo, Wei Li, and King N. Ngan. 2017. Blind image quality assessment based on rank-order regularized regression. IEEE Trans. Multim. 19, 11 (2017), 2490–2504.Google ScholarGoogle ScholarCross RefCross Ref
  21. Martin Engilberge, Louis Chevallier, Patrick Pérez, and Matthieu Cord. 2019. SoDeep: A sorting deep net to learn ranking loss surrogates. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10792–10801.Google ScholarGoogle ScholarCross RefCross Ref
  22. Ravi Kiran Sarvadevabhatla, Isht Dwivedi, Abhijat Biswas, Sahil Manocha et al. 2017. SketchParse: Towards rich descriptions for poorly drawn sketches using multi-task hierarchical deep networks. In Proceedings of the ACM on Multimedia Conference. ACM, 10–18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Keke He, Zhanxiong Wang, Yanwei Fu, Rui Feng, Yu-Gang Jiang, and Xiangyang Xue. 2017. Adaptively weighted multi-task deep network for person attribute classification. In Proceedings of the ACM on Multimedia Conference. ACM, 1636–1644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xuelong Li, Zhigang Wang, and Xiaoqiang Lu. 2017. A multi-task framework for weather recognition. In Proceedings of the ACM on Multimedia Conference. ACM, 1318–1326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Liqiang Nie, Luming Zhang, Yi Yang, Meng Wang, Richang Hong, and Tat-Seng Chua. 2015. Beyond doctors: Future health prediction from multimedia and multimodal observations. In Proceedings of the ACM International Conference on Multimedia. ACM, 591–600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Foteini Markatopoulou, Vasileios Mezaris, and Ioannis Patras. 2016. Deep multi-task learning with label correlation constraint for video concept detection. In Proceedings of the ACM on Multimedia Conference. ACM, 501–505. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yu Zhang and Qiang Yang. 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114 (2017).Google ScholarGoogle Scholar
  28. Changxing Ding, Chang Xu, and Dacheng Tao. 2015. Multi-task pose-invariant face recognition. IEEE Trans. Image Proc. 24, 3 (2015), 980–993.Google ScholarGoogle ScholarCross RefCross Ref
  29. Yong Luo, Yonggang Wen, Dacheng Tao, Jie Gui, and Chao Xu. 2016. Large margin multi-modal multi-task feature extraction for image classification. IEEE Trans. Image Proc. 25, 1 (2016), 414–427.Google ScholarGoogle ScholarCross RefCross Ref
  30. Lianyang Ma, Xiaokang Yang, and Dacheng Tao. 2014. Person re-identification over camera networks using multi-task distance metric learning. IEEE Trans. Image Proc. 23, 8 (2014), 3656–3670.Google ScholarGoogle ScholarCross RefCross Ref
  31. Wenqing Chu, Yao Liu, Chen Shen, Deng Cai, and Xian-Sheng Hua. 2018. Multi-task vehicle detection with region-of-interest voting. IEEE Trans. Image Proc. 27, 1 (2018), 432–441.Google ScholarGoogle ScholarCross RefCross Ref
  32. Long Xu, Jia Li, Weisi Lin, Yongbing Zhang, Lin Ma, Yuming Fang, and Yihua Yan. 2016. Multi-task rank learning for image quality assessment. IEEE Trans. Circ. Syst. Vid. Technol. 27, 9 (2016), 1833–1843.Google ScholarGoogle ScholarCross RefCross Ref
  33. Qiang Zhang and Martin D. Levine. 2016. Robust multi-focus image fusion using multi-task sparse representation and spatial context. IEEE Trans. Image Proc. 25, 5 (2016), 2045–2058. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Weiqing Min, Shuhuan Mei, Linhu Liu, Yi Wang, and Shuqiang Jiang. 2019. Multi-task deep relative attribute learning for visual urban perception. IEEE Trans. Image Proc. 29 (2019), 657–669.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jens Preiss, Felipe Fernandes, and Philipp Urban. 2014. Color-image quality assessment: From prediction to optimization. IEEE Trans. Image Proc. 23, 3 (2014), 1366–1378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ingmar Lissner and Philipp Urban. 2012. Toward a unified color space for perception-based image processing. IEEE Trans. Image Proc. 21, 3 (2012), 1153–1168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Lark Kwon Choi, Jaehee You, and Alan Conrad Bovik. 2015. Referenceless prediction of perceptual fog density and perceptual image defogging. IEEE Trans. Image Proc. 24, 11 (2015), 3888–3901.Google ScholarGoogle ScholarCross RefCross Ref
  38. Vasileios Argyriou. 2011. Sub-hexagonal phase correlation for motion estimation. IEEE Trans. Image Proc. 20, 1 (2011), 110–120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Yichao Yan, Jingwei Xu, Bingbing Ni, Wendong Zhang, and Xiaokang Yang. 2017. Skeleton-aided articulated motion generation. In Proceedings of the ACM on Multimedia Conference. ACM, 199–207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Bernard E. Harcourt. 1998. Reflecting on the subject: A critique of the social influence conception of deterrence, the broken windows theory, and order-maintenance policing New York style. Soc. Sci. Electron. Publish. 97, 2 (1998), 291–389.Google ScholarGoogle Scholar
  41. William Bratton and George Kelling. 2006. There are no cracks in the broken windows. Nat. Rev. 28 (2006).Google ScholarGoogle Scholar
  42. D. Cohen, R. Spear, Scribner, P. Kissinger, K. Mason, and J. Wildgen. 2000. “Broken windows” and the risk of gonorrhea. Amer. J. Pub. Health 90, 2 (2000), 230–230.Google ScholarGoogle ScholarCross RefCross Ref
  43. C. E. Ross and J. Mirowsky. 2001. Neighborhood disadvantage, disorder, and health.J. Health Soc. Behav. 42, 3 (2001), 258–258.Google ScholarGoogle ScholarCross RefCross Ref
  44. Plos One Staff. 2015. Correction: The collaborative image of the city: Mapping the inequality of urban perception. PLoS One 10, 3 (2015), e0119352.Google ScholarGoogle Scholar
  45. Nikhil Naik, Jade Philipoom, Ramesh Raskar, and César A. Hidalgo. 2014. Streetscore—Predicting the perceived safety of one million streetscapes. In Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, 793–799. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Nikhil Naik, Ramesh Raskar, and Cesar A. Hidalgo. 2016. Cities are physical too: Using computer vision to measure the quality and impact of urban appearance. Amer. Econ. Rev. 106, 5 (2016), 128–132.Google ScholarGoogle ScholarCross RefCross Ref
  47. Vicki Been, Ingrid Gould Ellen, Michael Gedal, Edward Glaeser, and Brian J. Mccabe. 2016. Preserving history or restricting development? The heterogeneous effects of historic districts on local housing markets in New York City. J. Urb. Econ. 92 (2016), 16–30.Google ScholarGoogle ScholarCross RefCross Ref
  48. Nikhil Naik, Scott Duke Kominers, Ramesh Raskar, Edward L. Glaeser, and César Hidalgo. 2015. Preserving history or restricting development? The co-evolution of physical, social, and economic change in five major U.S. cities. Soc. Sci. Electron. Pub. (2015). https://www.hbs.edu/faculty/Pages/item.aspx?num=50631.Google ScholarGoogle Scholar
  49. Chester Harvey, Lisa Aultman-Hall, Stephanie E. Hurley, and Austin Troy. 2015. Effects of skeletal streetscape design on perceived safety. Landsc. Urb. Plann. 142 (2015), 18–28.Google ScholarGoogle ScholarCross RefCross Ref
  50. Yongchao Xu, Qizheng Yang, Chaoran Cui, Cheng Shi, Guangle Song, Xiaohui Han, and Yilong Yin. 2019. Visual urban perception with deep semantic-aware network. In Proceedings of the International Conference on Multimedia Modeling. Springer, 28–40.Google ScholarGoogle ScholarCross RefCross Ref
  51. Sebastian Ruder. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017).Google ScholarGoogle Scholar
  52. Ran He, Man Zhang, Liang Wang, Ye Ji, and Qiyue Yin. 2015. Cross-modal subspace learning via pairwise constraints. IEEE Trans. Image Proc. 24, 12 (2015), 5543–5556.Google ScholarGoogle ScholarCross RefCross Ref
  53. Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. In Proceedings of the International Conference on Neural Information Processing Systems, Workshop on Machine Learning Systems.Google ScholarGoogle Scholar
  54. Ralf Herbrich, Tom Minka, and Thore Graepel. 2007. TrueSkillTM: A Bayesian skill rating system. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 569–576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Sahand Negahban, Sewoong Oh, and Devavrat Shah. 2012. Iterative ranking from pair-wise comparisons. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2474–2482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4 (2002), 422–446. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Urban Perception: Sensing Cities via a Deep Interactive Multi-task Learning Framework

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 1s
      January 2021
      353 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3453990
      Issue’s Table of Contents

      Copyright © 2021 Association for Computing Machinery.

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 31 March 2021
      • Revised: 1 September 2020
      • Accepted: 1 September 2020
      • Received: 1 January 2020
      Published in tomm Volume 17, Issue 1s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format