Skip to main content

2023 | Book

Computer Vision, Imaging and Computer Graphics Theory and Applications

16th International Joint Conference, VISIGRAPP 2021, Virtual Event, February 8–10, 2021, Revised Selected Papers

Editors: A. Augusto de Sousa, Vlastimil Havran, Alexis Paljic, Tabitha Peck, Christophe Hurter, Helen Purchase, Giovanni Maria Farinella, Petia Radeva, Kadi Bouatouch

Publisher: Springer International Publishing

Book Series : Communications in Computer and Information Science


About this book

This book constitutes the refereed proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2021, held as a virtual event, February 8–10, 2021.
The 16 full papers presented in this volume were carefully reviewed and selected from 371 submissions. The purpose of VISIGRAPP is to bring together researchers and practitioners interested in both theoretical advances and applications of computer vision, computer graphics and information visualization. VISIGRAPP is composed of four co-located conferences, each specialized in at least one of the aforementioned main knowledge areas, namely GRAPP, IVAPP, HUCAPP and VISAPP.
The contributions were organized in topical sections as follows: Computer Graphics Theory and Applications; Human Computer Interaction Theory and Applications; Information Visualization Theory and Applications; Computer Vision Theory and Applications.

Table of Contents


Computer Graphics Theory and Applications

Impact of Avatar Representation in a Virtual Reality-Based Multi-user Tunnel Fire Simulator for Training Purposes
Virtual Reality (VR) technology is playing an increasingly important role in the field of training. The emergency domain, in particular, can benefit from various advantages of VR with respect to traditional training approaches. One of the most promising features of VR-based training is the possibility to share the virtual experience with other users. In multi-user training scenarios, the trainees have to be provided with a proper representation of both the other peers and themselves, with the aim of fostering mutual awareness, communication and cooperation. Various techniques for representing avatars in VR have been proposed in the scientific literature and employed in commercial applications. However, the impact of these techniques when deployed to multi-user scenarios for emergency training has not been extensively explored yet. In this work, two techniques for avatar representation in VR, i.e., no avatar (VR Kit only) and Full-Body reconstruction (blending of inverse kinematics and animations), are compared in the context of emergency training. Experiments were carried out in a training scenario simulating a road tunnel fire. The participants were requested to collaborate with a partner (controlled by an experimenter) to cope with the emergency, and aspects concerning perceived embodiment, immersion, and social presence were investigated.
Davide Calandra, Filippo Gabriele Pratticò, Gianmario Lupini, Fabrizio Lamberti
Facade Layout Completion with Long Short-Term Memory Networks
In a workflow creating 3D city models, facades of buildings can be reconstructed from oblique aerial images for which the extrinsic and intrinsic parameters are known. If the wall planes have already been determined, e.g., based on airborne laser scanning point clouds, facade textures can be computed by applying a perspective transform. These images given, doors and windows can be detected and then added to the 3D model. In this study, the “Scaled YOLOv4” neural network is applied to detect facade objects. However, due to occlusions and artifacts from perspective correction, in general not all windows and doors are detected. This leads to the necessity of automatically continuing the pattern of facade objects into occluded or distorted areas. To this end, we propose a new approach based on recurrent neural networks. In addition to applying the Multi-Dimensional Long Short-term Memory network and the Quasi Recurrent Neural Network, we also use a novel architecture, the Rotated Multi-Dimensional Long Short-term Memory network. This architecture combines four two-dimensional Multi-Dimensional Long Short-term Memory networks on rotated images. Independent of the 3D city model workflow, the three networks were additionally tested on the Graz50 dataset for which the Rotated Multi-Dimensional Long Short-term Memory network delivered better results than the other two networks. The facade texture regions, in which windows and doors are added to the set of initially detected facade objects, are likely to be occluded or distorted. Before equipping 3D models with these textures, inpainting should be applied to these regions which then serve as automatically obtained inpainting masks.
Simon Hensel, Steffen Goebbels, Martin Kada

Human Computer Interaction Theory and Applications

Generating Haptic Sensations over Spherical Surface
Haptic imagery, the imagining of haptic sensations in the mind, makes use of and extends human vision. Thus, enabling a better understanding of multi-dimensional sensorimotor information by strengthening space exploration with “seeing by touch." Testing this concept was performed on a spherical surface to optimize the way of generating localized haptic signals and their propagation across the curved surface to generate dynamic movements of perceivable peak vibrations. Through testing of several spherical structure prototypes, it was found that offset actuations can dynamically amplify vibrations at specific locations. A pilot study was followed to understand the impact of haptic stimulation on viewers of video content in a passive VR environment. Results showed a correlation between heart rate and the presented content; complimenting the technical data recorded.
Patrick Coe, Grigori Evreinov, Mounia Ziat, Roope Raisamo
Effects of Emotion-Induction Words on Memory and Pupillary Reactions While Viewing Visual Stimuli with Audio Guide
This study aimed to examine the possibility of using emotion-induction words in audio guides for education via visual content. This was performed based on the findings of a previous study that focused on the provision timings of visual and auditory information [6]. Thirty emotion-induction words were extracted from the database and categorized into positive, negative, and neutral words, and three experiments were performed. The first experiment was conducted to confirm the reliability of emotional values. The results revealed a strong consistency between the values in the database and the ratings given by the participants. The second experiment assessed whether consistency was maintained if the words appeared in the sentences. The results confirmed that a certain degree of consistency was maintained, as expected, but showed larger individual differences compared with the first experiment. The third experiment was conducted to probe the effect of emotion-induction words used in the audio guide to explain the visual content of memory. Our results revealed that participants who were exposed to positive and negative emotion-induction words remembered the content better than those who were presented with neutral words. Per the three experiments, the emotion value of the neutral words was found to be sensitive to the context in which they were embedded, which was confirmed by observing the changes in pupillary reactions. Suggestions for designing audio and visual content using emotion-induction words for better memory are provided.
Mashiho Murakami, Motoki Shino, Munenori Harada, Katsuko T. Nakahira, Muneo Kitajima
A Bimanual Flick-Based Japanese Software Keyboard Using Direct Kanji Input
Direct kanji input is a Japanese text input method that is totally different from kana-kanji conversion commonly used in Japan. Direct kanji input is said to enable the user to efficiently input kanji characters after mastering it. In this paper, we propose a bimanual flick-based Japanese software keyboard for a tablet that uses direct kanji input. Once the user masters it, the user can efficiently input kanji characters while holding a tablet with both hands. We present three kanji layouts that we designed for this software keyboard. We show the results of the three experiments that we conducted to evaluate the performance of this keyboard. In the first experiment, we compared it with exiting software keyboards. In the second experiment, we evaluated how much the user can learn it by using its learning support functions. In the third experiment, one of the authors continuously used it for 15 months.
Yuya Nakamura, Hiroshi Hosobe
Comparison of Cardiac Activity and Subjective Measures During Virtual Reality and Real Aircraft Flight
Pilot training requires significant resources, both material and human. Immersive virtual reality is a good way to reduce costs and get around the lack of resources availability. However, the effectiveness of virtual flight simulation has not yet been fully assessed, in particular, using physiological measures. In this study, 10 pilots performed standard traffic patterns on both real aircraft (DR400) and its virtual simulation (in head-mounted device and motion platform). We used subjective measures through questionnaires of immersion, presence, and ability to control the aircraft, and objective measures using heart rate, and heart rate variability. The results showed that the pilots were able to fully control the aircraft. Points to improve include updating the hardware (better display resolution and hand tracking) and the simulator dynamics for modelling ground effect. During the real experience, the overall heart rate (HR) was higher (+20 bpm on average), and the heart rate variability (HRV) was lower compared to the virtual experience. The flight phases in both virtual and real flights induced similar cardiac responses with more mental efforts during take-off and landing compared to the downwind phase. Overall, our findings indicate that virtual flight reproduces real flight and can be used for pilot training. However, replacing pilot training with exclusively virtual flight hours seems utopian at this point.
Patrice Labedan, Frédéric Dehais, Vsevolod Peysakhovich

Information Visualization Theory and Applications

Improving Self-supervised Dimensionality Reduction: Exploring Hyperparameters and Pseudo-Labeling Strategies
Dimensionality reduction (DR) is an essential tool for the visualization of high-dimensional data. The recently proposed Self-Supervised Network Projection (SSNP) method addresses DR with a number of attractive features, such as high computational scalability, genericity, stability and out-of-sample support, computation of an inverse mapping, and the ability of data clustering. Yet, SSNP has an involved computational pipeline using self-supervision based on labels produced by clustering methods and two separate deep learning networks with multiple hyperparameters. In this paper we explore the SSNP method in detail by studying its hyperparameter space and pseudo-labeling strategies. We show how these affect SSNP’s quality and how to set them to optimal values based on extensive evaluations involving multiple datasets, DR methods, and clustering algorithms.
Artur André A. M. Oliveira, Mateus Espadoto, Roberto Hirata Jr., Nina S. T. Hirata, Alexandru C. Telea
Visualization of Source Code Similarity Using 2.5D Semantic Software Maps
For various program comprehension tasks, software visualization techniques can be beneficial by displaying aspects related to the behavior, structure, or evolution of software. In many cases, the question is related to the semantics of the source code files, e.g., the localization of files that implement specific features or the detection of files with similar semantics. This work presents a general software visualization technique for source code documents, which uses 3D glyphs placed on a two-dimensional reference plane. The relative positions of the glyphs captures their semantic relatedness. Our layout originates from applying Latent Dirichlet Allocation and Multidimensional Scaling on the comments and identifier names found in the source code files. Though different variants for 3D glyphs can be applied, we focus on cylinders, trees, and avatars. We discuss various mappings of data associated with source code documents to the visual variables of 3D glyphs for selected use cases and provide details on our visualization system.
Daniel Atzberger, Tim Cech, Willy Scheibel, Daniel Limberger, Jürgen Döllner
Revisiting Order-Preserving, Gap-Avoiding Rectangle Packing
We present an improved 2D rectangle packing heuristic that preserves the initial ordering of the rectangles while maintaining a left-to-right reading direction. We also present an algorithm configuration to fall back to a simpler algorithm that works more reliably for simple packing problems and an option to optimize the result in non-interactive scenarios. This is achieved by checking for stackability, approximating the required width, and using a strip packing algorithm to pack the rectangles with the option to improve the approximated width iteratively. We present still existing Obviously Non-Optimal packings and general problems of packings that preserve the reading direction, and discuss the problem of rectangle packing in hierarchical graphs. Moreover, the algorithm without the width approximation step can solve strip packing problems such that a reading direction is maintained.
Sören Domrös, Daniel Lucas, Reinhard von Hanxleden, Klaus Jansen
Exploratory Data Analysis of Population Level Smartphone-Sensed Data
Mobile health involves gathering smartphone-sensor data passively from user’s phones, as they live their lives ’In-the-wild”, periodically annotating data with health labels. Such data is used by machine learning models to predict health. Purely Computational approaches generally do not support interpretability of the results produced from such models. In addition, the interpretability of such results may become difficult with larger study cohorts which make population-level insights desirable. We propose Population Level Exploration and Analysis of smartphone DEtected Symptoms (PLEADES), an interactive visual analytics framework to present smartphone-sensed data. Our approach uses clustering and dimension reduction to discover similar days based on objective smartphone sensor data, across participants for population level analyses. PLEADES enables analysts to apply various clustering and projection algorithms to several smartphone-sensed datasets. PLEADES overlays human-labelled symptom and contextual information from in-the-wild collected smartphone-sensed data, to empower the analyst to interpret findings. Such views enable the contextualization of the symptoms that can manifest in smartphone sensor data. We used PLEADES to visualize two real world in-the-wild collected datasets with objective sensor data and human-provided health labels. We validate our approach through evaluations with data visualization and human context recognition experts.
Hamid Mansoor, Walter Gerych, Abdulaziz Alajaji, Luke Buquicchio, Kavin Chandrasekaran, Emmanuel Agu, Elke Rundensteiner
Towards Interactive Geovisualization Authoring Toolkit for Industry Use Cases
Interactive visualizations of geospatial data are commonplace in various applications and tools. The visual complexity of these visualizations ranges from simple point markers placed on the cartographic maps through visualizing connections, heatmaps, or choropleths to their combination. Designing proper visualizations of geospatial data is often tricky, and the existing approaches either provide only limited support based on pre-defined templates or require extensive programming skills. In our previous work, we introduced the Geovisto toolkit – a novel approach that blends between template editing and programmatic approaches providing tools for authoring reusable multilayered map widgets even for non-programmers. In this paper, we extend our previous work focusing on Geovisto’s application in the industry. Based on the critical assessment of two existing usage scenarios, we summarize the necessary design changes and their impact on the toolkit’s architecture and implementation. We further present a case study where Geovisto was used in the production-ready application for IoT sensor monitoring developed by Logimic, a Czech-US startup company. We conclude by discussing the advantages and limitations of our approach and outlining the future work.
Jiří Hynek, Vít Rusňák

Computer Vision Theory and Applications

Global-first Training Strategy with Convolutional Neural Networks to Improve Scale Invariance
Modelled closely on the feedforward conical structure of the primate vision system - Convolutional Neural Networks (CNNs) learn by adopting a local to global feature extraction strategy. This makes them view-specific models and results in poor invariance encoding within its learnt weights to adequately identify objects whose appearance is altered by various transformations such as rotations, translations, and scale. Recent physiological studies reveal the visual system first views the scene globally for subsequent processing in its ventral stream leading to a global-first response strategy in its recognition function. Conventional CNNs generally use small filters, thus losing the global view of the image. A trainable module proposed by Kumar & Sharma [24] called Stacked Filters Convolution (SFC) models this approach by using a pyramid of large multi-scale filters to extract features from wider areas of the image, which is then trained by a normal CNN. The end-to-end model is referred to as Stacked Filter CNN (SFCNN). In addition to improved test results, SFCNN showed promising results on scale invariance classification. The experiments, however, were performed on small resolution datasets and small CNN as backbone. In this paper, we extend this work and test SFC integrated with the VGG16 network on larger resolution datasets for scale invariance classification. Our results confirm the integration of SFC, and standard CNN also shows promising results on scale invariance on large resolution datasets.
Dinesh Kumar, Dharmendra Sharma
Spline-Based Dense Medial Descriptors for Image Simplification Using Saliency Maps
Medial descriptors have attracted increasing interest in image representation, simplification, and compression. Recently, such descriptors have been separately used to (a) increase the local quality of representing salient features in an image and (b) globally compress an entire image via a B-spline encoding. To date, the two desiderates, (a) high local quality and (b) high overall compression of images, have not been addressed by a single medial method. We achieve this integration by presenting Spatial Saliency Spline Dense Medial Descriptors (3S-DMD) for saliency-aware image simplification-and-compression. Our method significantly improves the trade-off between compression and image quality of earlier medial-based methods while keeping perceptually salient features. We also demonstrate the added-value of user-designed, as compared to automatically-computed, saliency maps. We show that our method achieves both higher compression and better quality than JPEG for a broad range of images and, for specific image types, yields higher compression and similar quality than JPEG 2000.
Jieying Wang, Leonardo de Melo, Alexandre X. Falcão, Jiří Kosinka, Alexandru Telea
BS-GAENets: Brain-Spatial Feature Learning Via a Graph Deep Autoencoder for Multi-modal Neuroimaging Analysis
The obsession with how the brain and behavior are related is a challenge for cognitive neuroscience research, for which functional magnetic resonance imaging (fMRI) has significantly improved our understanding of brain functions and dysfunctions. In this paper, we propose a novel multi-modal spatial cerebral graph based on an attention mechanism called MSCGATE that combines both fMRI modalities: task-, and rest-fMRI based on spatial and cerebral features to preserve the rich complex structure between brain voxels. Moreover, it attempts to project the structural-functional brain connections into a new multi-modal latent representation space, which will subsequently be inputted to our trace regression predictive model to output each subject’s behavioral score. Experiments on the InterTVA dataset reveal that our proposed approach outperforms other graph representation learning-based models, in terms of effectiveness and performance.
Refka Hanachi, Akrem Sellami, Imed Riadh Farah
Enhancing Backlight and Spotlight Images by the Retinex-Inspired Bilateral Filter SuPeR-B
Backlight and spotlight images are pictures where the light sources generate very bright and very dark regions. The enhancement of such images has been poorly investigated and is particularly hard because it has to brighten the dark regions without over-enhance the bright ones. The solutions proposed till now generally perform multiple enhancements or segment the input image in dark and bright regions and enhance these latter with different functions. In both the cases, results are merged in a new image, that often must be smoothed to remove artifacts along the edges. This work describes SuPeR-B, a novel Retinex inspired image enhancer improving the quality of backligt and spotlight images without needing for multi-scale analysis, segmentation and smoothing. According to Retinex theory, SuPeR-B re-works the image channels separately and rescales the intensity of each pixel by a weighted average of intensities sampled from regular sub-windows. Since the rescaling factor depends both on spatial and intensity features, SuPeR-B acts like a bilateral filter. The experiments, carried out on public challenging data, demonstrate that SuPeR-B effectively improves the quality of backlight and spotlight images and also outperforms other state-of-the-art algorithms.
Michela Lecca
Rethinking RNN-Based Video Object Segmentation
Video Object Segmentation is a fundamental task in computer vision that aims at pixel-wise tracking of one or multiple foreground objects within a video sequence. This task is challenging due to real-world requirements such as handling unconstrained object and camera motion, occlusion, fast motion, and motion blur. Recently, methods utilizing RNNs have been successful in accurately and efficiently segmenting the target objects as RNNs can effectively memorize the object of interest and compute the spatiotemporal features which are useful in processing the visual sequential data. However, they have limitations such as lower segmentation accuracy in longer sequences. In this paper, we expand our previous work to develop a hybrid architecture that successfully eliminates some of these challenges by employing additional correspondence matching information, followed by extensively exploring the impact of various architectural designs. Our experiment results on YouTubeVOS dataset confirm the efficacy of our proposed architecture by obtaining an improvement of about 12pp on YoutTubeVOS compared to RNN-based baselines without a considerable increase in the computational costs.
Fatemeh Azimi, Federico Raue, Jörn Hees, Andreas Dengel
Computer Vision, Imaging and Computer Graphics Theory and Applications
A. Augusto de Sousa
Vlastimil Havran
Alexis Paljic
Tabitha Peck
Christophe Hurter
Helen Purchase
Giovanni Maria Farinella
Petia Radeva
Kadi Bouatouch
Copyright Year
Electronic ISBN
Print ISBN

Premium Partner