Assisted, partially automated, highly automated, fully automated, autonomous. At the end of these five stages is the autonomous vehicle. Researchers are working on giving them a human-like perception.
In the not too distant future, autonomous cars will conquer our streets and find their way between pedestrians, cyclists, buses and trains. The navigational capabilities of such autonomous robots in urban environments using 2D or 3D maps, for example, are already impressive. But even "knowledge of all the traffic rules and maps in the world" will not help the autopilot of an autonomous vehicle to drive safely, Volker Lang names the problem for traffic safety in the chapter Artificial Intelligence of the book Digital Competence.
Artificial intelligence (AI) processes and methods are the key to autonomous driving. So far, however, autonomous vehicles are still shying away from encounters with people on urban streets. This is because the autonomy methods used to date still lack robustness. Hopes are pinned on progress in the field of computer vision in particular. It could help autonomous driving achieve a breakthrough. Abhinav Valada, junior professor at the University of Freiburg and head of the Robot Learning Lab there, has been working for some time on the question of how autonomous vehicles can safely navigate between other vehicles and pedestrians in unfamiliar urban environments. Now the German Research Foundation (DFG) is supporting him with an Emmy Noether independent junior research group. The goal of Valada's group is to develop data-efficient and transferable learning techniques for basic autonomous navigation tasks.
For an autonomous vehicle to develop a holistic understanding of a visually presented scene and act "intelligently" accordingly, it must learn to semantically weight the components of a scene. Which pixels in an image belong to people or objects that are in the foreground of a self-driving car's environment? And which pixels represent the urban landscape? The Freiburg researchers searched for answers to such questions and found the solution in "efficient panoptic segmentation." In early 2021, Valada and teammate Rohit Mohan presented the architecture of their method, called EfficientPS, in the 5/2021 issue of the International Journal of Computer Vision.
On the website of the Freiburg-based project, Valada's group shows examples of how the team trained different AI models on different data sets. The results are superimposed on the respective camera image, with colors showing to which object class the model assigns the respective pixel. For example, cars are marked in blue, people in red, trees in green and buildings in gray. In addition, the AI model draws a frame around each object that it considers a separate entity.
With Deep Learning to Scene Understanding
The "scene understanding" task can be solved using deep learning (DL), a sub-discipline of machine learning (ML). "In most machine learning methods, including deep neural networks, the learning procedure follows the scheme of three steps: prediction, loss, and optimization. In this process, all learning methods should be able to reproduce the relationship between the values of an input (x) and the corresponding values of the output (y) in the training data," Heinz-Adalbert Krebs and Patricia Hagenweiler explain the principle procedure in the book chapter Artificial Intelligence. In prediction, the model calculates (predicts) the value of an output (ŷ) from the training input (x), where the model properties are controlled by a parameter (w) whose values are randomly chosen at the beginning. Then, the predicted value (ŷ) is compared with the output value (y) in the training data, from which the loss between the two values is calculated. In the optimization step, the parameter (w) is modified to make the loss smaller, against the background that the prediction of a model depends on the value of the parameter (w). "The goal is to find a model with a small loss," the Springer authors summarize. And, "The whole process is called training. This scheme can be used to determine a model that can predict the output y in the training data from the input x with a small error" (page 10).
Machines predict and make decisions
Deep learning is a special form and a subarea of machine learning based on artificial neural networks. This tool can be used to process complex data such as images or texts, Krebs and Hagenweiler explain. Compared to ML methods, DL also has the advantage that multi-layer networks can be used to learn relationships, "which simple machine learning algorithms cannot do. Based on existing information and the neural network, the Deep Learning method can repeatedly link what has been learned with new content, explain the Springer authors, and the machine thus learns to make forecasts or decisions independently and to question them: "Decisions can be confirmed or changed, whereby humans generally no longer intervene in the actual learning process, but merely ensure that the information for learning is available and the processes are documented. This is achieved by extracting and classifying patterns from the available data and information. Based on the insights gained, data can be linked in a broader context so that the machine is able to make decisions based on the links" (page 14).
Public benchmarks play an important role in measuring the level of development and performance of AI technologies. "For many years, research teams from corporations such as Google or Uber have been competing for the top spots," says Rohit Mohan - and proudly points out that EfficientPS climbed to first place in Cityscapes, an influential benchmark for scene understanding methods in autonomous driving.
On the road to human-like perception
Meanwhile, Abhinav Valada and Rohit Mohan have achieved another milestone on the road to human-like perception for self-driving cars by proposing a so-called amodal panoptic segmentation task and showing its solvability in principle. We humans have the remarkable ability to perceive objects as a whole, even when parts of them are occluded. This ability, known as amodal perception, is the link between our perception of the world and its cognitive understanding, and enables us to cope with everyday life.
Until now, robots or autonomous vehicles have been limited to modal perception, which limits their ability to mimic the visual experience of humans. With advanced AI algorithms, visual recognition capability for self-driving cars could now be revolutionized, Valada believes, by using perception with amodal panoptic segmentation to give machines a holistic understanding of the environment. Machines would learn to abstract from partial occlusion of objects and recognize them in their entirety. In short, the new quality of visual environment sensing will be able to vastly improve the road safety of autonomous driving cars. For their work, Abhinav Valada and Rohit Mohan were awarded the "Most Novel Research" prize at the AutoSens conference in Brussels last September.