Assistive Apps for the Blind as Constellations of Distributed Agency
Public transport systems and buildings are still often described as lacking features that allow accessible use by blind persons. Resulting from the setup of urban infrastructures, mobility training is thus essential for blind persons.
10 Hence, practices like the use of the long cane and guide dogs are widespread among people with visual disabilities [
67]. However, as smartphones and other mobile devices are now widely available (and more financially affordable), blind persons also begin to establish relations with digital technologies regarding their mobilities [
68: p. 90]. This is possible primarily due to the introduction of mobile technologies and software such as VoiceOver (iOS) or TalkBack (Android), which allow for maneuvering the device using of oral speech and/or haptic input.
11 How non-visual practices with digital devices are enacted on a daily basis is however an ongoing process from which multifaceted dis-/abilities arise.
In recent years, blindness is being manufactured and addressed socio-technically in novel ways as a variety of apps became available in Google’s Playstore and Apple’s Appstore that specifically target people with visual disabilities as their users. That also means that—concerning wayfinding practices and mobility—forms of direct touch, indirect touch (through the long cane), or cognitive mapping of spaces (i.e., step counting) are complemented by digital practices involving smartphones and apps [
18]. As Wong puts it, “people’s movements and mobility strategies are configured and reconfigured through their everyday engagements with technology and digital information.” [
68: p. 90].
Consequently, we have to go beyond Paul Rodaway’s approach in his work on
Sensuous Geographies [
70]. According to him, sensory experiences are crucial when it comes to daily practices and the respective geographies that are related to them. Rodaway, following the ecological approach of Gibson, reflects on the sensuous geographies of people with visual disabilities, emphasizing the significance of sound and acoustic spaces [
18]:
“[…] though sound does not provide a continuous or reliable source of environmental information, auditory experience can give the blind a wider geography of spatial dimensions and relationships, an acoustic space, and at particular moments offer vivid evocations of place character […]. Free from the continuous flood of visual information, the blind show a clearer understanding of acoustic space. […]” [
70: p. 104].
Framed in this way, “vision loss” [
71: p. 99] turns into something different, becomes a resource and is rather positively connoted, for acoustic geographies might be perceived better when not distracted by visual information. The situated makeup of the acoustic space perceived through the auditory system of the respective blind person then both enables and “disables” a relational set of “acoustic affordances” [
72] that allows for the distribution of “blind agencies.” Yet, it is not only important to bear in mind that a “blind style of perception,” as Saerberg puts it [
73: p. 25], relies on a multiplicity of sensory information. We also have to take into account that these specific “ambient” sounds and blind listeners are reciprocally formed and—additionally—shaped in their relation to each other by digital technologies, as will be argued in the following.
When considering apps for blind persons, several aspects are important to their performance with relational technologies. Geotagging in electronic maps and GPS location services are important elements in enabling applications such as Blindsquare [
74]. The annotation of images and the development of automatic object recognition, face recognition, and complex sceneries are other facets of this economic area where artificial intelligence will play an important role. SeeingAI, Envision AI, or Aipoly and other apps are examples that provide a range of different functions and multiple services.
12 While all of the mentioned apps rely on the translation of visual information into verbal (synthetic) speech, some applications are heading in a different direction by using sonification, that is “the usage of sound to represent scientific data” [
77: p. 249; [
78,
79]. In these cases, there is no verbal information about the object or surrounding area to be identified. Rather, abstract sounds are produced to deliver information about the needed item or surrounding area.
13 Thus, such an app technology acts as a “mediator” [
8: p. 39], a relational link between user and environment that translates visual “input” into “auditory” output and reciprocally mediates and transforms the users’ perception and their interconnected environment. These techno-sensory mediation processes ultimately both enact affordances and their respective agencies. In this respect, though functioning as a way of opening up potentialities by mediating a range of possible actions, a media constellation like this also generates social and material constraints, “that is what it is materially and socially [im]possible to do with it” [
81: p. 187].
In what follows, we present an analysis of Camassia, a recent sonification app that has been available since 2018. This example will be analyzed to demonstrate how users and environment are reciprocally and simultaneously enacted in specific situations. By doing so, we aim to transgress an understanding of such applications as merely functional articulations between inaccessible environments and pregiven blind persons. App technologies are not mere instruments or “channels” that convey information—in the Latourian sense: they cannot be considered “intermediaries” [
8: p. 39]—but instead are interconnected nodes in ecological constellations that generate and transform information and agencies.
“An intermediary, in my vocabulary, is what transports meaning or force without transformation: defining its inputs is enough to define its outputs. […] Mediators transform, translate, distort, and modify the meaning or the elements they are supposed to carry. […] No matter how apparently simple a mediator may look, it may become complex; it may lead in multiple directions which will modify all the contradictory accounts attributed to its role.” [
8: p. 39].
By proposing to develop a concept of a techno-sensorium (according to Elizabeth Stephens) or “more-than-human sensorium” [
82], in which heterogeneous elements collaborate to enact both the environment and respective users, we question how app technologies figure as relational technologies that manufacture blindness and enact visual dis-/abilities. To explore this problematic, the following section traces various processes of translations, processes, and mutual configurations and conceptualizes the app as a mediator that coordinates the production of blind users, a certain acoustic space and the sensory relations between them. The aim is to critically re-situate the borders between a sensing human subject and a perceived environment [
83] and supposed preceding fixations of dis-/abilities. This reconsideration allows taking into account an app-user-configuration where agency is distributed and approached from a non-anthropocentric stance [
84].
Camassia and Resonating Relationships
The app Camassia is one of the several applications designed for the support of wayfinding and mobility practices among blind persons. It hence addresses the targeted group directly and inscribes the user as having a visual disability and, at the same time, being a competent smartphone user. In particular, as the app is currently only available at the Apple App Store, the application presumes the user owns an iPhone. Perhaps, this is not a surprise because Apple embraced accessibility features as an economic factor and people with disabilities as a relevant target group quite early in comparison with other technology and software companies. Consequently, Camassia users need to meet a specific “obligatory passage point” [
85]: “Obligatory passage points serve to establish the identities of the actors in relation to the network, and thus serve to assess their indispensability to the network. Unless actors pass the obligatory passage point, they are not enrolled in this network” [
86: p. 175], Hence, in order to enroll in the actor-network of this app and generate the desired affordances and sensory configurations, the users are subject to the economic, social, and technological demands that Apple prescribes.
Regarding its main function, the app allows the processing of visual information and produces auditory feedback to assist the user in finding a desired pathway. Applying a low-latency approach, the app enables real-time scanning of chosen surroundings and provides “immediate feedback” [
87: p. 12]. It was released in 2018 by iXpoint and developed by this software company together with students of the Department of Informatics at the Karlsruhe Institute of Technology (KIT) [
88]. The idea of the application is to provide an assistive system for the blind and visually impaired that allows independent movement in urban areas while not requiring mobile internet connection, GPS navigation, complex sensor systems, or electronic maps. The main point concerning Camassia is that it relies on the low color saturation that paths have in relation to surrounding areas. In the case of paths in parks and pavements surrounded by grass, the app presented the best results.
14 Thus, instead of identifying obstacles, the app recognizes free space as a possible way.
15 In this case, “locatability” [
12] is highly dependent on and relational to the respective socio-material environments and the geographical actions and agencies the users can undertake: affordances are thus generated in relation to the infrastructural
Umwelt and the bodily, sensory, and technical configurations of their users.
To process visual information, the app uses the smartphone camera that captures images in the direction the device is pointed and the person is walking. Developers programmed the app’s algorithm in a way that it processes 30 pictures per second, which have been stabilized by the device’s motion sensor [
87: p. 14]; more precisely, a shaking smartphone dangling from the user’s neck will probably not interfere with the output. While the company’s demonstration shows the use of the app in a vertical format [cf.
89], user reviews recommend the horizontal format in order to use a wider angle and capture a broader space in front of the walking person [
90]. This shows that smart technologies in no way inherit “universal characteristics” such as “portability” and “availability” that Schrock [
12] proposes. Rather, in order to be portable and available, the smart device needs to be handled according to specific bodily techniques in relation to the user’s perception of holding the device in a particular manner. Portability and availability are hence processually and relationally afforded. Another aspect concerns image cropping as only the lower two-thirds of the captured surroundings on the image are used by the algorithm. That is, only a limited extract of the available information is used by the app to compute the probability of the path. Thus, the user’s perception and related affordances are restrained by the technological condition of this media constellation.
As reviewers suggest [
90,
91], the app is relatively easy to use and access. However, it requires blind persons to adopt a different way of listening (and hearing) as it is necessary for them to learn to understand the signals the app produces as a translation providing information about the area the person is walking through. The auditory soundscape offered by Camassia radically differs from spatial listening, tactile exploration with a long cane, or digital haptic feedback [
92]. Yet, in the twentieth century, one encounters a series of attempts to construct systems that translate written text into sound [
93] and other forms of sonification specifically developed for blind persons [
94]. Moreover, a variety of listening modes can be distinguished regarding sonification [
77], which constitutes an important facet in scientific knowledge production. Specific sonic skills [
78]—both bodily and sensory techniques—are thus required and have to be learned for understanding data that was transformed into sound patterns according to particular software programs and processed across a number of different devices (i.e., smartphone and headphones in this case). That is, affordances, auditory perceptions, modes of listening, and a host of material practices are inextricably entangled. Concerning the translation of a given park area with an earth path into sound through Camassia, one might presume that in order to perceive the respective sound pattern one has to be aware of which sound is an indication of the path and which is not. As a result of the testing phase, training material and sound samples were added to the final version of the app that is now available [
87: p. 16]. Together with the aspects on affording portability and availability, this also includes a particular “learning process,” the sensory information the app provides is interpreted and operated with according to an individual “habitus” [cf. [
81: p. 187]. These affordances are produced in the course of “practicing” and “learning” with the app and consequently, according to Pickering, “such practice consists in the reciprocal tuning of human and material agency, tuning that can itself reconfigure human intentions” [
95: p. 21], actions, and perceptions.
We will consider now in more detail how the app’s algorithm processes visual input. After initializing and calibrating, Camassia transforms the received visual input according to a scale of 24 halftones, stereo sound, variable volume, and tone pitch. The surroundings are thus enacted as an abstract, atonal, and arrhythmic acoustic space: lower sounds signal that there is a path on the right side, higher sounds indicate a way on the left, and a smooth pulsating tone notifies the user of a free way ahead of him/her. As the company’s demonstration video shows [
89], when walking on a certain path, a regular pulsating tone is produced by pointing the device to the right or left, and lower or higher tones are superimposed over this rhythmic sound to indicate when the user approaches another area with a different surface. The video suggests that auditory and visual can be compared on an activated smartphone screen. However, such a comparison is not possible for blind users and thus a different learning process arises when using the accompanying training material and app in situ. An important aspect in this respect is how users listen to the technologically produced soundscape of Camassia. One way is to use the device’s monophonic loudspeaker [
87]. Users can also connect headphones to their smartphone as the app provides a stereo panorama. As headphones might decouple the user from other sounds in the surrounding environment (other people, cars etc., see [
90]), persons testing the app preferred bone conduction models [
87].
The app is advertised as an assistive technology that provides support for the independent mobility of blind people. As is the case with most digital wayfinding practices (use of Blindsquare or other apps, cf. [
18]), such applications can be considered as complementary elements in a diverse range of different mobility practices, where various senses, the long cane, or guide dogs are at play [
67]. What is of interest here is that one might go beyond an approach, which emphasizes the question of accessibility or the aspect of assisted mobility in order to raise the question of a techno-sensorium, which is produced in a socio-technical arrangement of users, devices, senses, and surroundings. Hence, we argue for reflecting upon technologies of sonification [
77] as a way of (medially) producing
both different environments and users. While visually captured surrounding areas are translated into stereophonic acoustic events by Camassia, users are enacted as blind listeners who (learn to) navigate through electronic soundscapes—even if they complement their practices with a long cane or other sensory input [
90]. To put it differently, one may propose that, in collaboration with a smartphone, headphones, and a blind person, Camassia produces a techno-sensorium where agency is distributed to human and non-human elements, that is, to a heterogeneous host of actors. We are confronted with a situation where one is able to observe that agency “is not a basic human capacity, not a precondition of the social; it is a relational, ever-changing outcome of its enactment” [
84: p. 4].
16 Consequently, the app-user-environment-hybrid enacted by Camassia consists of a (temporarily) continuous stream of electronically generated sounds that result from the contact of captured camera images with the app’s algorithm. Here, the sound of this contact provides the conditions of possibilities for a resonating relationship between a blind user and an acoustic environment—both of which emerge simultaneously and are situated in a specific moment.
Yet, this brief discussion of the Camassia app is not able to fully analyze the impact of this sonification application and its economic aspects or ambiguous effects on manufacturing the senses within socio-technical arrangements. Furthermore, there are problematic issues in this techno-sensory framework that are worth mentioning. For instance, the continuous sonic space created by the app is more likely to be enacted in parks, that is, in a specific part of the city’s landscape. Also, it adds “different sensory inputs” [
68: p. 91] to actual traffic noises whereby sonic scenarios rendered sensible, i.e., through distorted echoes elicited by footsteps or clicking of the cane [
73]: p. 22], become more complex. Future research therefore still has to show whether and how such sonically generated affordances and sensory agencies can potentially be “intimately incorporated into routine bodily practices” [
50]: p. 362] of mobile media and digital wayfinding practices.