Skip to main content
Top
Published in: AI & SOCIETY 3/2023

10-01-2023 | Main Paper

Urban-semantic computer vision: a framework for contextual understanding of people in urban spaces

Authors: Anthony Vanky, Ri Le

Published in: AI & SOCIETY | Issue 3/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Increasing computational power and improving deep learning methods have made computer vision technologies pervasively common in urban environments. Their applications in policing, traffic management, and documenting public spaces are increasingly common (Ridgeway 2018, Coifman et al. 1998, Sun et al. 2020). Despite the often-discussed biases in the algorithms' training and unequally borne benefits (Khosla et al. 2012), almost all applications similarly reduce urban experiences to simplistic, reductive, and mechanistic measures. There is a lack of context, depth, and specificity in these practices that enables semantic knowledge or analysis within urban contexts, especially within the context of using and occupying urban space. This paper will critique existing uses of artificial intelligence and computer vision in urban practices to propose a new framework for understanding people, action, and public space. This paper revisits Geertz's (1973) use of thick descriptions in generating interpretive theories of culture and activity and uses this lens to establish a framework to approach evaluating the varied uses of computer vision technologies that weigh meaning. By discussing cases of implemented examples of urban computer vision—from LinkNYC and Numina's urban measurements to the Detroit Police's use of DataWorks Plus's facial recognition technology—it proposes a framework for evaluating the thickness of the algorithm's conclusions against the computational method's complexity required to produce that outcome. Further, we discuss how the framework's positioning may differ (and conflict) between different users of the technology, from engineer to urban planner and policymaker, to citizen. This paper also discusses how the current use and training of deep learning algorithms and how this process limits semantic learning and proposes three potential methodologies toward gaining a more contextually specific, urban-semantic, description of urban space relevant to urbanists. This paper contributes to the critical conversations regarding the proliferation of artificial intelligence by challenging the current applications of these technologies in the urban environment by highlighting their failures within this context while also proposing an evolution of these algorithms that may ultimately make them sensitive and useful within this spatial and cultural milieu.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
“Imageable” here is taken to mean a cognitive, memory-based image as was used in the previous reference of Lynch’s work.
 
2
These together form the commonly used “four V’s” of big data: velocity, veracity, volume and variety.
 
3
While this paper will not comprehensively review the technology and its application depth, other papers have sought to categorize various approaches. See Ibrahim et al. 2020, for instance.
 
Literature
go back to reference ACLU NY (2016) NYCLU: city’s public wi-fi raises privacy concerns. ACLU NY (2016) NYCLU: city’s public wi-fi raises privacy concerns.
go back to reference Berlyn DE (1971) Aesthetics and psychobiology. Appleton-Century-Crofts Berlyn DE (1971) Aesthetics and psychobiology. Appleton-Century-Crofts
go back to reference Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem, 3213–3223. https://doi.org/10.1109/CVPR.2016.350 Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem, 3213–3223. https://​doi.​org/​10.​1109/​CVPR.​2016.​350
go back to reference Czarniawska B (1992) Exploring complex organizations: a cultural perspective: toward an anthropological perspective. SAGE, Singapore Czarniawska B (1992) Exploring complex organizations: a cultural perspective: toward an anthropological perspective. SAGE, Singapore
go back to reference Dreyfus, H. L. (1992). What Computers Still Can’t Do: A Critique of Artificial Reason. The MIT Press. Dreyfus, H. L. (1992). What Computers Still Can’t Do: A Critique of Artificial Reason. The MIT Press.
go back to reference Geertz C (1973) Thick description: toward an interpretive theory of culture. In: Turning points in qualitative research: Tying knots in a handkerchief, pp 143–168 Geertz C (1973) Thick description: toward an interpretive theory of culture. In: Turning points in qualitative research: Tying knots in a handkerchief, pp 143–168
go back to reference Gehl J (1987) Life between buildings: using public space. Island Press Gehl J (1987) Life between buildings: using public space. Island Press
go back to reference Goldsmith S, Crawford S (2014) The city as digital platform. In: The responsive city. Jossey-Bass Goldsmith S, Crawford S (2014) The city as digital platform. In: The responsive city. Jossey-Bass
go back to reference Greenfield A (2013) Against the smart city. Do Projects Greenfield A (2013) Against the smart city. Do Projects
go back to reference Hand DJ (2020) Dark data: why what you don’t know matters. Princeton University PressCrossRef Hand DJ (2020) Dark data: why what you don’t know matters. Princeton University PressCrossRef
go back to reference Jacobs J (1970) The economy of cities. Random House Jacobs J (1970) The economy of cities. Random House
go back to reference Jiang S, Fiore GA, Yang Y, Ferreira J, Frazzoli E, González MC (2013) A review of urban computing for mobile phone traces: current methods, challenges and opportunities. UrbComp Jiang S, Fiore GA, Yang Y, Ferreira J, Frazzoli E, González MC (2013) A review of urban computing for mobile phone traces: current methods, challenges and opportunities. UrbComp
go back to reference Lynch K (1960) The image of the city. MIT Press Lynch K (1960) The image of the city. MIT Press
go back to reference Mayor’s Office for New Urban Mechanics (2018) Beta blocks. City of Boston Mayor’s Office for New Urban Mechanics (2018) Beta blocks. City of Boston
go back to reference McDuff D, El Kaliouby R, Demirdjian D, Picard R (2013a) Predicting online media effectiveness based on smile responses gathered over the Internet. 2013a 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013a. https://doi.org/10.1109/FG.2013a.6553750 McDuff D, El Kaliouby R, Demirdjian D, Picard R (2013a) Predicting online media effectiveness based on smile responses gathered over the Internet. 2013a 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013a. https://​doi.​org/​10.​1109/​FG.​2013a.​6553750
go back to reference McDuff D, El Kaliouby R, Senechal T, Amr M, Cohn JF, Picard R (2013b) Affectiva-MIT facial expression dataset (AM-FED): naturalistic and spontaneous facial expressions collected “in-the-wild.” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 881–888. https://doi.org/10.1109/CVPRW.2013b.130 McDuff D, El Kaliouby R, Senechal T, Amr M, Cohn JF, Picard R (2013b) Affectiva-MIT facial expression dataset (AM-FED): naturalistic and spontaneous facial expressions collected “in-the-wild.” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 881–888. https://​doi.​org/​10.​1109/​CVPRW.​2013b.​130
go back to reference Norden E (1969) Marshall McLuhan—a candid conversation with the high priest of popcult and metaphysician of media. Essential McLuhan 2:233–270 Norden E (1969) Marshall McLuhan—a candid conversation with the high priest of popcult and metaphysician of media. Essential McLuhan 2:233–270
go back to reference Pasquinelli M (2015) Anomaly detection : the mathematization of the abnormal in the metadata society. Transmed Festiv 2:1–10 Pasquinelli M (2015) Anomaly detection : the mathematization of the abnormal in the metadata society. Transmed Festiv 2:1–10
go back to reference Picard RW (1995) Affective computing. In: Perceptual computing section technical reports (Issue 221) Picard RW (1995) Affective computing. In: Perceptual computing section technical reports (Issue 221)
go back to reference Rice S (1997) Parisian views. The MIT Press Rice S (1997) Parisian views. The MIT Press
go back to reference Sun P, Hou R, Lynch JP (2020) Measuring the utilization of public open spaces by deep learning: a benchmark study at the detroit riverfront. ArXiv 1:2228–2237 Sun P, Hou R, Lynch JP (2020) Measuring the utilization of public open spaces by deep learning: a benchmark study at the detroit riverfront. ArXiv 1:2228–2237
go back to reference Talen E, Ellis C (2015) Beyond relativism reclaiming the search for good city form. 36–49 Talen E, Ellis C (2015) Beyond relativism reclaiming the search for good city form. 36–49
go back to reference Venturi R, Brown DS, Izenour S (1972) Learning from Las Vegas. The MIT Press Venturi R, Brown DS, Izenour S (1972) Learning from Las Vegas. The MIT Press
go back to reference Yatskar M, Zettlemoyer L, Farhadi A (2016) Situation recognition: visual semantic role labeling for image understanding. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem, 5534–5542. https://doi.org/10.1109/CVPR.2016.597 Yatskar M, Zettlemoyer L, Farhadi A (2016) Situation recognition: visual semantic role labeling for image understanding. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem, 5534–5542. https://​doi.​org/​10.​1109/​CVPR.​2016.​597
Metadata
Title
Urban-semantic computer vision: a framework for contextual understanding of people in urban spaces
Authors
Anthony Vanky
Ri Le
Publication date
10-01-2023
Publisher
Springer London
Published in
AI & SOCIETY / Issue 3/2023
Print ISSN: 0951-5666
Electronic ISSN: 1435-5655
DOI
https://doi.org/10.1007/s00146-022-01625-6

Other articles of this Issue 3/2023

AI & SOCIETY 3/2023 Go to the issue

Premium Partner