Skip to main content
Top

2018 | Book

Advances in Visual Computing

13th International Symposium, ISVC 2018, Las Vegas, NV, USA, November 19 – 21, 2018, Proceedings

Editors: Dr. George Bebis, Richard Boyle, Bahram Parvin, Darko Koracin, Matt Turek, Srikumar Ramalingam, Kai Xu, Stephen Lin, Bilal Alsallakh, Jing Yang, Eduardo Cuervo, Ph.D. Jonathan Ventura

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the refereed proceedings of the 13th International Symposium on Visual Computing, ISVC 2018, held in Las Vegas, NV, USA in November 2018.

The total of 66 papers presented in this volume was carefully reviewed and selected from 91 submissions. The papers are organized in topical sections named: ST: computational bioimaging; computer graphics; visual surveillance; pattern recognition; vitrual reality; deep learning; motion and tracking; visualization; object detection and recognition; applications; segmentation; and ST: intelligent transportation systems.

Table of Contents

Frontmatter

ST: Computational Bioimaging

Frontmatter
Automatic Registration of Serial Cerebral Angiography: A Comparative Review

Image registration can play a major role in medical imaging as it can be used to identify changes that have occurred over a period of time, thus mirroring treatment effectiveness, recovery, and detection of diseases onset. While medical image registration algorithms have been largely evaluated on MRI and CT, less attention has been given to Digital Subtraction Angiography (DSA). DSA of the brain is the method of choice for the diagnosis of numerous neurovascular conditions and is used during neurovascular surgeries. Numerous studies have relied on semi-automated registration that involve manual selection of matching features to compute the mapping between images. Nevertheless, there are currently a variety of automatic registration methods which have been developed, although the performance of these methods on DSA have not been fully explored. In this paper, we identify and review a variety of automatic registration methods, and evaluate algorithm performance in the context of serial image registration. We find that intensity-based methods are consistent in performance, while feature-based methods can perform better, but are also more variable in success. Ultimately a combined algorithm may be optimal for automatic registration, which can be applied to analyze vasculature information and improve unbiased treatment evaluation in clinical trials.

Alice Tang, Zhiyuan Zhang, Fabien Scalzo
Skull Stripping Using Confidence Segmentation Convolution Neural Network

Skull stripping is an important preprocessing step on cerebral Magnetic Resonance (MR) images because unnecessary brain structures, like eye balls and muscles, greatly hinder the accuracy of further automatic diagnosis. To extract important brain tissue quickly, we developed a model named Confidence Segmentation Convolutional Neural Network (CSCNet). CSCNet takes the form of a Fully Convolutional Network (FCN) that adopts an encoder-decoder architecture which gives a reconstructed bitmask with pixel-wise confidence level. During our experiments, a crossvalidation was performed on 750 MRI slices of the brain and demonstrated the high accuracy of the model (dice score: $$0.97\pm 0.005$$ ) with a prediction time of less than 0.5 s.

Kaiyuan Chen, Jingyue Shen, Fabien Scalzo
Skin Cancer Segmentation Using a Unified Markov Random Field

Most of the medical institutions still use manual methods to detect the skin cancers tumors. However, melanoma detection using human vision alone can be subjective, inaccurate and poorly reproducible even among experienced dermatologists. This is attributed to the challenges in the automatic segmentation of skin cancer due to many factors, such as different skin colors and the presence of hair, diverse characteristics including lesions of varying sizes and shapes, lesions that may have fuzzy boundaries. To address these factors, a Unified Markov Random Field (UMRF) is used to segment both pixel information and regional information corresponding to skin lesions from the images, where UMRF model lies in two aspects. First, it combines the benefits of the pixel-based and the region-based Markov Random Field (MRF) models by decomposing the likelihood function into the product of the pixel likelihood function and the regional likelihood function. The experimental results show that the employed method has high precision $$83.08\%$$ (Jaccard Index).

Omran Salih, Serestina Viriri
Heart Modeling by Convexity Preserving Segmentation and Convex Shape Decomposition

This paper proposes a convexity-preserving level set (CPLS) and a novel modeling of the heart named Convex Shape Decomposition (CSD) for segmentation of Left Ventricle (LV) and Right Ventricle (RV) from cardiac magnetic resonance images. The main contributions are two-fold. First, we introduce a convexity preserving mechanism in the level set framework, which is helpful for overcoming the difficulties arised from the overlap between intensities of papillary muscles and trabeculae and intensities of myocardium. Furthermore, such a generally contrained convexity-preserving level set method can be useful in many other potential applications. Second, by decomposing the heart into two convex structures, and essentially converting RV segmentation into LV segmentation, we can solve both LV and RV segmentation in a unified framework without training any specific shape models for RV. The proposed method has been quantitatively validated on open datasets, and the experimental results and comparisons with other methods demonstrate the superior performance of our method.

Xue Shi, Lijun Tang, Shaoxiang Zhang, Chunming Li

Computer Graphics I

Frontmatter
PSO-Based Newton-Like Method and Iteration Processes in the Generation of Artistic Patterns

In artistic pattern generation one can find many different approaches to the generation process. One of such approaches is the use of root finding methods. In this paper, we present a new method of generating artistic patterns with the use of root finding. We modify the classical Newton’s method using a Particle Swarm Optimization approach. Moreover, we introduce various iteration processes instead of the standard Picard iteration used in the Newton’s method. Presented examples show that using the proposed method we are able to obtain very interesting and diverse patterns that could have an artistic application, e.g., in texture generation, tapestry or textile design etc.

Ireneusz Gościniak, Krzysztof Gdawiec
An Evaluation of Smoothing and Remeshing Techniques to Represent the Evolution of Real-World Phenomena

In this paper we investigate the use of morphing techniques to represent the continuous evolution of deformable moving objects, representing the evolution of real-world phenomena. Our goal is to devise processes capable of generating an approximation of the actual evolution of these objects with a known error. We study the use of different smoothing and remeshing methods and analyze various statistics to establish mesh quality metrics with respect to the quality of the approximation (interpolation). The results of the tests and the statistics that were collected suggest that the quality of the correspondence between the observations has a major influence on the quality and validity of the interpolation, and it is not trivial to compare the quality of the interpolation with respect to the actual evolution of the phenomenon being represented. The Angle-Improving Delaunay Edge-Flips method, overall, obtained the best results, but the Remeshing method seems to be more robust to abrupt changes in the geometry.

José Duarte, Paulo Dias, José Moreira
Biomimetic Perception Learning for Human Sensorimotor Control

We present a simulation framework for biomimetic human perception and sensorimotor control. It features a biomechanically simulated, musculoskeletal human model actuated by numerous skeletal muscles, with two human-like eyes whose retinas have spatially nonuniform distributions of photoreceptors. Our prototype sensorimotor system for this model incorporates a set of 20 automatically-trained, deep neural networks (DNNs), half of which are neuromuscular DNN controllers comprising its motor subsystem, while the other half are devoted to visual perception. Within the sensory subsystem, which continuously operates on the retinal photoreceptor outputs, 2 DNNs drive eye and head movements, while 8 DNNs extract the sensory information needed to control the arms and legs. Exclusively by means of its egocentric, active visual perception, our biomechanical virtual human learns efficient, online visuomotor control of its eyes, head, and four limbs to perform tasks involving the foveation and visual pursuit of target objects coupled with visually-guided reaching actions to intercept the moving targets.

Masaki Nakada, Honglin Chen, Demetri Terzopoulos
Porous Structure Design in Tissue Engineering Using Anisotropic Radial Basis Functions

The rapid development of additive manufacturing in last decades has greatly improved the quality of medical implants and widened its applications in tissue engineering. For the purpose of creating realistic porous scaffolds, a series of diverse methodologies are attempted to help simplify the manufacturing process and to improve the scaffold quality. Among these approaches, implicit surface methods based on geometric models have gained much attention for its flexibility to generate porous structures. In this paper, an innovative heterogeneous modeling method using anisotropic radial basis functions (ARBFs) is introduced for designing porous structures with controlled porosity and various internal architectures. By re-defining the distance method for the radial basis functions, the interpolated porous shape can be customized according to different requirements. Numerous experiments have been conducted to show the effectiveness of the proposed method.

Ye Guo, Ke Liu, Zeyun Yu

Visual Surveillance

Frontmatter
Accurate and Efficient Non-Parametric Background Detection for Video Surveillance

In this paper, we propose an adaptive, non-parametric method of separating background from foreground in static camera video feed. Our algorithm processes each frame pixel-wise, and calculates a probability density function at each location using previously observed values at that location. This method makes several improvements over the traditional kernel density estimation model, accomplished through applying a dynamic learning weight to observed intensity values in the function, consequentially eradicating the large computational and memory load often associated with non-parametric techniques. In addition, we propose a novel approach to the classic background segmentation issue of “ghosting” by exploiting the spatial relationships among pixels.

William Porr, James Easton, Alireza Tavakkoli, Donald Loffredo, Sean Simmons
A Low-Power Neuromorphic System for Real-Time Visual Activity Recognition

We describe a high-accuracy, real-time, neuromorphic method and system for activity recognition in streaming or recorded videos from static and moving platforms that can detect even small objects and activities with high-accuracy. Our system modifies and integrates multiple independent algorithms into an end-to-end system consisting of five primary modules: object detection, object tracking, convolutional neural network image feature extractor, recurrent neural network sequence feature extractor, and an activity classifier. We also integrate neuromorphic principles of foveated detection similar to how the retina works in the human visual system and the use of contextual knowledge about activities to filter the activity recognition results. We mapped the complete activity recognition pipeline to the COTS NVIDIA Tegra TX2 development kit and demonstrate real-time activity recognition from streaming drone videos at less than 10 W power consumption.

Deepak Khosla, Ryan Uhlenbrock, Yang Chen
Video-Based Human Action Recognition Using Kernel Relevance Analysis

This paper presents a video-based Human Action Recognition using kernel relevance analysis. Our approach, termed HARK, comprises the conventional pipeline employed in action recognition, with a two-fold post-processing stage: (i) A descriptor relevance ranking based on the centered kernel alignment (CKA) algorithm to match trajectory-aligned descriptors with the output labels (action categories), and (ii) a feature embedding based on the same algorithm to project the video samples into the CKA space, where the class separability is preserved, and the number of dimensions is reduced. For concrete testing, the UCF50 human action dataset is employed to assess the HARK under a leave-one-group-out cross-validation scheme. Attained results show that the proposed approach correctly classifies the 90.97% of human actions samples using an average input data dimension of 105 in the classification stage, which outperforms state-of-the-art results concerning the trade-off between accuracy and dimensionality of the final video representation. Also, the relevance analysis allows to increase the video data interpretability, by ranking trajectory-aligned descriptors according to their importance to support action recognition.

Jorge Fernández-Ramírez, Andrés Álvarez-Meza, Álvaro Orozco-Gutiérrez
Robust Incremental Hidden Conditional Random Fields for Human Action Recognition

Hidden conditional random fields (HCRFs) are a powerful supervised classification system, which is able to capture the intrinsic motion patterns of a human action. However, finding the optimal number of hidden states remains a severe limitation for this model. This paper addresses this limitation by proposing a new model, called robust incremental hidden conditional random field (RI-HCRF). A hidden Markov model (HMM) is created for each observation paired with an action label and its parameters are defined by the potentials of the original HCRF graph. Starting from an initial number of hidden states and increasing their number incrementally, the Viterbi path is computed for each HMM. The method seeks for a sequence of hidden states, where each variable participates in a maximum number of optimal paths. Thereby, variables with low participation in optimal paths are rejected. In addition, a robust mixture of Student’s t-distributions is imposed as a regularizer to the parameters of the model. The experimental results on human action recognition show that RI-HCRF successfully estimates the number of hidden states and outperforms all state-of-the-art models.

Michalis Vrigkas, Ermioni Mastora, Christophoros Nikou, Ioannis A. Kakadiaris

Pattern Recognition

Frontmatter
Rotation Symmetry Object Classification Using Structure Constrained Convolutional Neural Network

Rotation symmetry is a salient visual clue in describing and recognizing an object or a structure in an image. Recently, various rotation symmetry detection methods have been proposed based on key point feature matching scheme. However, hand crafted representation of rotation symmetry structure has shown limited performance. On the other hand, deep learning based approach has been rarely applied to symmetry detection due to the huge diversity in the visual appearance of rotation symmetry patterns. In this work, we propose a new framework of convolutional neural network based on two core layers: rotation invariant convolution (RI-CONV) layer and symmetry structure constrained convolution (SSC-CONV) layer. Proposed network learns structural characteristic from image samples regardless of their appearance diversity. Evaluation is conducted on 32,000 images (after augmentation) of our rotation symmetry classification data set.

Seunghwa Yu, Seugnkyu Lee
A Hough Space Feature for Vehicle Detection

This paper addresses the pattern of vehicle images in the Hough space, and presents a feature to detect and classify vehicle images from samples with no vehicle contains. Instead of detecting straight line by seeking peaks in the Hough space, the Hough transform is employed in a novel way to extract features of images. The standard deviation of the columns in the Hough data is proposed as a new kind of feature to represent objects in images. The proposed feature is robust with respect to challenges, such as object dimension, translation, rotation, occlusion, distance to camera, and camera view angle. To evaluate the performance of the proposed feature, a Neural Network pattern recognition classifier is employed to classify vehicle images and non-vehicle samples. The success rate is validated via various imaging environment (lighting, distance to camera, view angle, and incompleteness) for different vehicle models.

Chunling Tu, Shengzhi Du
Gender Classification Based on Facial Shape and Texture Features

This paper seeks to improve gender classification accuracy by fusing shape features, the Active Shape Model with the two appearance based methods, the Local Binary Pattern (LBP) and Local Directional Pattern (LDP). A gender classification model based on the fusion of appearance and shape features is proposed. The experimental results show that the fusion of the LBP and LDP with the Active Shape Model improved the gender classification accuracy rate to 94.5% from 92.8% before fusion.

Mayibongwe H. Bayana, Serestina Viriri, Raphael Angulu
Authentication-Based on Biomechanics of Finger Movements Captured Using Optical Motion-Capture

In this paper, we propose an authentication approach based on the uniqueness of the biomechanics of finger movements. We use an optical-marker-based motion-capture as a preliminary setup to capture goniometric (joint-related) and dermatologic (skin-related) features from the flexion and extension of the index and middle fingers of a subject. We use this information to build a personalized authentication model for a given subject. Analysis of our approach using finger motion-capture from 8 subjects, using reflective tracking markers placed around the joints of index and middle fingers of the subjects shows its viability. In this preliminary study, we achieve an average equal error rate (EER)—when false accept rate and false reject rate are equal—of 6.3% in authenticating a subject immediately after training the authentication model and 16.4% ERR after a week.

Brittany Lewis, Christopher J. Nycz, Gregory S. Fischer, Krishna K. Venkatasubramanian
Specific Document Sign Location Detection Based on Point Matching and Clustering

In this paper we describe a method for specific document sign location detection based on key point grouping correspondence. The proposed method extracts stable points determined only by each contour shape as key points, match point pairs based on contour Fourier shape descriptor; clusters point pairs into different scale level set and finally detects sign location by finding projective matrix in each point pairs set. The contribution of this paper includes 1. a novel concept of key point and its extraction method, 2. a clustering operation for grouping point pairs. 3. a fuzzy DBSCAN processing which response to the constraints of maximum clustering radius. The experimental results show that our method is effective way to process printing/scanning document sign detection, both in recall rate and in speed.

Huaixin Xiong

Virtual Reality I

Frontmatter
Training in Virtual Environments for Hybrid Power Plant

This article describes a Virtual Environments application of a Hybrid Power Plant, for professionals in Electrical Power Systems training. The application is developed in the Game Engine Unity 3D and features three different modes as: immersion, interaction and failure modes, which enhance professional’s skills through visualization of the plant components and different processes operation. Additionally, failure mode is proposed, it simulates wrong maneuvers consequences and effects. The Generation Environment is integrated by wind turbines and photovoltaic panels that interact through a mathematical model and enables manipulation of dependent variables bringing out a more realistic background.

Max G. Chiluisa, Rubén D. Mullo, Víctor H. Andaluz
Visualizing Viewpoint Movement on Driving by Space Information Rendering

Automobiles are necessary for movement. However many people die as a result of automobile accident. Automobile accident does not decrease to at all. On the other hand, we measure a gaze point and a trial to detect person’s interest is performed. The method measuring where person looks on an image is general in traditional eye movement measuring system. But we do not understand person’s interest without image analysis in the traditional system. We propose the method directly calculating three dimensional position of interest object from person’s eye images using virtual reality system without defining two dimensional position of interest object. We register all the candidate object with interest. We propose space information rendering technique to write in direct viewpoint information at virtual space by voting for the object when person watchs an object closely. We develop a driving simulator using virtual reality of a university neighborhood driving course. A beginner driver can know the eye movement of an expert driver by applying this proposed system for an expert driver and a beginner driver.

Satoru Morita
Virtual Reality System for Children Lower Limb Strengthening with the Use of Electromyographic Sensors

This article presents a virtual system for children lower limb strengthening by using electromyographic sensors and the graphics motor Unity 3D. The system allows the acquisition and processing of electromyographic EMG signals through Bluetooth wireless communication which also allows to control virtual environments. Two videogames have been designed with different difficulty levels and easy execution, the interaction with the virtual environments generate muscle strengthening exercises. Moreover, five users have performed experimental tests (3 boys and 2 girls), the children are between 8 and 13 years old, the following inclusion criteria has been taken into account: users must have ages > 7 and < 14 years old and also must have any muscle affectation, additionally, the exclusion criteria is: the users who have any visual deficiency and/or several hearing impairment. Finally, users did perform the usability test SEQ with the following results (59.6 ± 0.33), which allows to know the acceptation level of the virtual system for children lower limb strengthening.

Eddie E. Galarza, Marco Pilatasig, Eddie D. Galarza, Victoria M. López, Pablo A. Zambrano, Jorge Buele, Jhon Espinoza
A Comparative Study of Virtual UI for Risk Assessment and Evaluation

The simulation of a real-life environment in VR greatly reduces the time and cost to perform experiments. A useful application of Virtual Reality (VR) can be training employees and measuring their performances before their assignment in the real work environment. For this study, an experimental environment was created using VR to represent a machine shop in an industrial manufacturing facility. The VR provided with a safe environment for trainees to correctly identify hazards associated with each machine. A comparative study was conducted to evaluate two different ways a trainee can interact with the training system within the VR environment. Participants in the study were asked to perform training tasks with both user interfaces and complete user experience and usability questionnaires. The evaluation of interfaces played an important role in the design and selection of a useful mode of interaction within the VR environment.

Naila Bushra, Daniel Carruth, Shuchisnigdha Deb
Sensory Fusion and Intent Recognition for Accurate Gesture Recognition in Virtual Environments

With the rapid growth of Virtual Reality applications, there is a significant need to bridge the gap between the real world and the virtual environment in which humans are immersed. Activity recognition will be an important factor in delivering models of human actions and operations into the virtual environments. In this paper, we define an activity as being composed of atomic gestures and intents. With this approach, the proposed algorithm detects predefined activities utilizing the fusion of multiple sensors. First, data is collected from both vision and wearable sensors to train Recurrent Neural Networks (RNN) for the detection of atomic gestures. Then, sequences of the gestures, as observable states, are labeled with their associated intents. These intents denote hidden states, and the sequences are used to train and test Hidden Markov Models (HMM). Each HMM is representative of a single activity. Upon testing, the proposed gesture recognition system achieves around 90% average accuracy with 95% mean confidence. The overall activity recognition performs at an average of 89% accuracy for simple and complex activities.

Sean Simmons, Kevin Clark, Alireza Tavakkoli, Donald Loffredo

Deep Learning I

Frontmatter
Accuracy of a Driver-Assistance System in a Collision Scenario

Object tracking for collision avoidance systems benefits from current progress in object detection by deep learning. For the purpose of collision avoidance, a hazard has to be tracked in several frames before the safety system can determine its future trajectory and issue a necessary warning for braking.Because the detected object is defined by a rectangular boundary, it can represent (a non-rectangular) object as well as its background, thus leading to misleading tracking information. Therefore, we rely on feature points identified in the detected regions over time for performing feature point tracking. Feature points in the background are removed by performing clustering in real-world co-ordinates using iterative semi-global matching stereo as well as an approximate size of the detected object type.While matching the feature points between consecutive frames, a best match might not be found. In such circumstances, initially an optimally tracked feature point is used for updating the tracking information of the mismatched feature point. However, with too many mismatches (possibly due to occlusion) its information is overwritten by a more recently matched feature point. We evaluated our system on created test video data involving a controlled collision course.

Waqar Khan, Reinhard Klette
Classify Broiler Viscera Using an Iterative Approach on Noisy Labeled Training Data

Poultry meat is produced and slaughtered at higher and higher rates and the manual food safety inspection is now becoming the bottleneck. An automatic computer vision system could not only increase the slaughter rates but also lead to a more consistent inspection. This paper presents a method for classifying broiler viscera into healthy and unhealthy, in a data set recorded in-line at a poultry processing plant. The results of the on-site manual inspection are used to automatically label the images during the recording. The data set consists of 36,228 images of viscera.The produced labels are noisy, so the labels in the training set are corrected through an iterative approach and ultimately used to train a convolutional neural network. The trained model is tested on a ground truth data set labelled by experts in the field. A classification accuracy of 86% was achieved on a data set with a large in-class variation.

Anders Jørgensen, Jens Fagertun, Thomas B. Moeslund
Instance-level Object Recognition Using Deep Temporal Coherence

In this paper we design and evaluate methods for exploiting temporal coherence present in video data for the task of instance object recognition. First, we evaluate the performance and generalisation capabilities of a Convolutional Neural Network for learning individual objects from multiple viewpoints coming from a video sequence. Then, we exploit the assumption that on video data the same object remains present over a number of consecutive frames. A-priori knowing such number of consecutive frames is a difficult task however, specially for mobile agents interacting with objects in front of them. Thus, we evaluate the use of temporal filters such as Cumulative Moving Average and a machine learning approach using Recurrent Neural Networks for this task. We also show that by exploiting temporal coherence, models trained with a few data points perform comparably to when the whole dataset is available.

Miguel Lagunes-Fortiz, Dima Damen, Walterio Mayol-Cuevas
DUPL-VR: Deep Unsupervised Progressive Learning for Vehicle Re-Identification

Vehicle re-identification (Re-ID) is a search for the similar vehicles in a multi-camera network usually having non-overlapping field-of-views. Supervised approaches have been used mostly for re-ID problem but they have certain limitations when it comes to real life scenarios. To cope with these limitations unsupervised learning techniques can be used. Unsupervised techniques have been successfully applied in the field of person re-identification. Having this in mind, this paper presents an unsupervised approach to solve the vehicle re-ID problem by training a base network architecture with a self-paced progressive unsupervised learning architecture which has not been applied to solve the vehicle re-ID problem. The algorithm has been extensively analyzed over two large available benchmark datasets VeRi and VehicleID for vehicle re-ID with image-to-image and cross-camera search strategies and the approach achieved better performance in most of the standard evaluation metrics when compared with the existing state-of-the-art supervised approaches.

Raja Muhammad Saad Bashir, Muhammad Shahzad, Muhammad Moazam Fraz

Motion and Tracking

Frontmatter
Particle Filter Based Tracking and Mapping

We extended the well known Kinect Fusion approach [11] with a particle filter framework to improve the tracking of abrupt camera movements, while the estimated camera pose is further refined with the ICP algorithm. All performance-critical algorithms were implemented on modern graphics hardware using the CUDA GPGPU language and are largely parallelized.It has been shown that our procedure has only minimal reduced precision compared to known techniques, but provides higher robustness against abrupt camera movements and dynamic occlusions. Furthermore the algorithm runs at a frame-time of approx. 24.6098 ms on modern hardware, hence enabling real time capability.

Nils Höhner, Anna Katharina Hebborn, Stefan Müller
Multi-branch Siamese Networks with Online Selection for Object Tracking

In this paper, we propose a robust object tracking algorithm based on a branch selection mechanism to choose the most efficient object representations from multi-branch siamese networks. While most deep learning trackers use a single CNN for target representation, the proposed Multi-Branch Siamese Tracker (MBST) employs multiple branches of CNNs pre-trained for different tasks, and used for various target representations in our tracking method. With our branch selection mechanism, the appropriate CNN branch is selected depending on the target characteristics in an online manner. By using the most adequate target representation with respect to the tracked object, our method achieves real-time tracking, while obtaining improved performance compared to standard Siamese network trackers on object tracking benchmarks.

Zhenxi Li, Guillaume-Alexandre Bilodeau, Wassim Bouachir
Deep Convolutional Correlation Filters for Forward-Backward Visual Tracking

In this paper, we exploit convolutional features extracted from multiple layers of a pre-trained deep convolutional neural network. The outputs of the multiple convolutional layers encode both low-level and high-level information about the targets. The earlier convolutional layers provide accurate positional information while the late convolutional layers are invariant to appearance changes and provide more semantic information. Specifically, each convolutional layer locates a target through correlation filter-based tracking and then traces the target backward. By analyzing the forward and backward tracking results, we evaluate the robustness of the tracker in each layer. The final position is determined by fusing the locations from each layer. A region proposal network (RPN) is employed whenever a backward tracker failure occurs. The new position will be chosen from the proposal candidates generated by the RPN. Extensive experiments have been implemented on several benchmark datasets. Our proposed tracking method achieves favorable results compared to state-of-the-art methods.

Yong Wang, Robert Laganière, Daniel Laroche, Ali Osman Ors, Xiaoyin Xu, Changyun Zhu
The Bird Gets Caught by the WORM: Tracking Multiple Deformable Objects in Noisy Environments Using Weight ORdered Logic Maps

Object detection and tracking are active and important research areas in computer vision as well as neuroscience. Of particular interest is the detection and tracking of small, poorly lit, deformable objects in the presence of sensor noise, and large changes in background and foreground illumination. Such conditions are frequently encountered when an animal moves in its natural environment, or in an experimental arena. The problems are exacerbated with the use of high-speed video cameras as the exposure time for high-speed cameras is limited by the frame rate, which limits the SNR. In this paper we present a set of simple algorithms for detecting and tracking multiple, small, poorly lit, deformable objects in environments that feature drastic changes in background and foreground illumination, and poor signal-to-noise ratios. These novel algorithms are shown to exhibit better performance than currently available state-of-the art algorithms.

Debajyoti Karmaker, Ingo Schiffner, Michael Wilson, Mandyam V. Srinivasan
A Mumford Shah Style Unified Framework for Layering: Pitfalls and Solutions

Layered models are commonly used in computer vision to estimate the shape, appearance, depth ordering, occlusion structure and motion of objects from a set of images, offering computationally simpler alternatives to full 3D scene models. A unified computational framework for the various modeling elements (shape, appearance, motion and depth ordering), which integrates much of the current and prior work on layered models, would aid our understanding and development of layer extraction algorithms. A notable earlier work by Jackson et al. [2008] sought to provide such a framework in the context of variational methods, neatly cast as a single joint optimization problem. However, it did not perform as anticipated and has not been further developed. As the complexity of their formulation may have hindered its continued exploration, we reformulate their diffeomorphic approach within the much simpler framework of active contours. More importantly, though, we uncover a tricky modeling flaw which poorly extended the classical Mumford-Shah segmentation model to layering, causing unexpected performance degradation of their potentially powerful formulation. We elucidate this flaw and demonstrate its unintended consequences (a shrinking effect on foreground layers). We fix this problem by abandoning their unconstrained joint optimization philosophy and implementing an augmented Lagrangian style optimization process with PDE constraints instead. This new approach, which splits the classical Mumford-Shah appearance and geometric priors into two separate cost functions (one to be minimized with the other as a constraint) fixes the unintended shrinking problem and more properly extends the Mumford-Shah modeling paradigm into the layered framework, yielding far superior results. In doing so, we establish a more solid mathematical foundation for a unified variational approach to layering.

Fareed ud din Mehmood Jafri, Martin Fritz Mueller, Anthony Joseph Yezzi

Visualization

Frontmatter
Visualization of Parameter Sensitivity of 2D Time-Dependent Flow

In this paper, we present an approach to analyze 1D parameter spaces of time-dependent flow simulation ensembles. By extending the concept of the finite-time Lyapunov exponent to the ensemble domain, i.e., to the parameter that gives rise to the ensemble, we obtain a tool for quantitative analysis of parameter sensitivity both in space and time. We exemplify our approach using 2D synthetic examples and computational fluid dynamics ensembles.

Karsten Hanser, Ole Klein, Bastian Rieck, Bettina Wiebe, Tobias Selz, Marian Piatkowski, Antoni Sagristà, Boyan Zheng, Mária Lukácová-Medvidová, George Craig, Heike Leitte, Filip Sadlo
Non-stationary Generalized Wishart Processes for Enhancing Resolution over Diffusion Tensor Fields

Low spatial resolution of diffusion resonance magnetic imaging (dMRI) restricts its clinical applications. Usually, the measures are obtained in a range from 1 to 2 $$\mathrm{mm}^3$$ per voxel, and some structures cannot be studied in detail. Due to clinical acquisition protocols (exposure time, field strength, among others) and technological limitations, it is not possible to acquire images with high resolution. In this work, we present a methodology for enhancing the spatial resolution of diffusion tensor (DT) fields obtained from dMRI. The proposed methodology assumes that a DT field follows a generalized Wishart process (GWP), which is a stochastic process defined over symmetric and positive definite matrices indexed by spatial coordinates. A GWP is modulated by a set of Gaussian processes (GPs). Therefore, the kernel hyperparameters of the GPs control the spatial dynamic of a GWP. Following this notion, we employ a non-stationary kernel for describing DT fields whose statistical properties are not constant over the space. We test our proposed method in synthetic and real dMRI data. Results show that non-stationary GWP can describe complex DT fields (i.e. crossing fibers where the shape, size and orientation properties change abruptly), and it is a competitive methodology for interpolation of DT fields, when we compare with methods established in literature evaluating Frobenius and Riemann distances.

Jhon F. Cuellar-Fierro, Hernán Darío Vargas-Cardona, Andrés M. Álvarez, Álvaro A. Orozco, Mauricio A. Álvarez
Reduced-Reference Image Quality Assessment Based on Improved Local Binary Pattern

The structure of image consists of two aspects: intensity of structure and distribution of structure. Image distortions that degrade image quality potentially affect both the intensity and distribution of image structure. Yet most structure-based image quality assessment methods focus only on the change of the intensity of structure. In this paper, we propose an improved structure-based image quality assessment method that takes both into account. First, we employ image gradients magnitude to describe the intensity of structure and attempt to explore the distribution of structure with local binary pattern (LBP) and newly designed center-surrounding pixels pattern (CSPP, complementary pattern for LBP). LBP and CSPP features are mapped into a combined histogram weighted by the intensity of structure to represent the image structure. Finally, the change of structure which can gauge image quality is measured by calculating the similarity of the histograms of the reference and distorted images. Support vector regression (SVR) is employed to pool structure features to predict an image quality score. Experimental results on three benchmark databases demonstrate that the proposed structure pattern can effectively represent the intensity and distribution of the structure of the image. The proposed method achieves high consistency with subjective perception with 17 reference values, performing better than the existing methods.

Xi-kui Miao, Dah-Jye Lee, Xiang-zheng Cheng, Xiao-yu Yang
Web System for Visualization of Weather Data of the Hydrometeorological Network of Tungurahua, Ecuador

The information provided by hydrometeorological stations can propose predictive solutions to adverse changes caused by weather in different regions. In this aspect, the publication of this type of information can help common users, farmers or populations at high risk of flooding or propose solutions to droughts. This work presents a hydrometeorological visualization system published in http://rrnn.tungurahua.gob.ec/red , including wind parameters, wind direction, precipitation, and temperature, in a graphic and documented way. The structure of the system presents the whole information required prior to the programming of the two-dimensional and three-dimensional interfaces, detailing all the tools used for the generation of graphics layers. Additionally, all the required files are presented to show the necessary graphic details, such as elevations, boundaries between cantons, roads, water bodies, and so on. Finally, a quick navigation of the bi-dimensional and three-dimensional interface is presented, where all options contained in the developed system are quickly displayed.

Jaime Santana, Fernando A. Chicaiza, Víctor H. Andaluz, Patrick Reuter
Analysis and Visualization of Sports Performance Anxiety in Tennis Matches

According to sports psychology, anxiety has a big impact on an athlete’s performance in a sport event. Although much work has been done in sports data analysis and visualization, analysis of anxiety has rarely been included in previous work. In this paper, we propose a method to analyze a tennis player’s anxiety level during a tennis match. This method is based on the psychological theories of anxiety and a database of over 4,000 professional tennis matches. In our model, an athlete’s anxiety level is based on three factors: uncertainty, anticipation, and threat. We have also developed data visualizations to help users study the potential correlation between a tennis player’s anxiety level and his/her skilled performance, such as unforced errors, forced errors, winners, serve directions, first-serve faults, and double faults.

Shiraj Pokharel, Ying Zhu

Object Detection and Recognition

Frontmatter
Detailed Sentence Generation Architecture for Image Semantics Description

Automatic image captioning deals with the objective of describing an image in human understandable natural language. Majority of the existing approaches aiming to solve this problem are based on holistic techniques which translate the whole image into a single sentence description rendering the possibility of losing important aspects of the scene. To enable better and more detailed caption generations, we propose a dense captioning architecture which first extracts and describes the objects of the image which in turn is helpful in generating dense and detailed image captions. The proposed architecture has two modules where the first one generates the region descriptions that describe the objects and their relationships while the other generates object attributes which are helpful to produce object details. Both of these outputs are concatenated and given as input to another sentence generation that is based on an encoder-decoder formulation to generate a single meaningful and grammatically detailed sentence. The results achieved with the proposed architecture shows superior performance when compared with current state-of-the-art image captioning techniques e.g., Neural Talk and Show, Attend and Tell, using standard evaluation metrics.

Imran Khurram, Muhammad Moazam Fraz, Muhammad Shahzad
Pupil Localization Using Geodesic Distance

The main contributions of the presented paper can be summarized as follows. Firstly, we introduce a unique and robust dataset of human eyes that can be used in many detection and recognition scenarios, especially for the recognition of driver drowsiness, gaze direction, or eye-blinking frequency. The dataset consists of approximately 85,000 different eye regions that were captured using various near-infrared cameras, various resolutions, and various lighting conditions. The images are annotated into many categories. Secondly, we present a new method for pupil localization that is based on the geodesic distance. The presented experiments show that the proposed method outperforms the state-of-the-art methods in this area.

Radovan Fusek
Parallel Curves Detection Using Multi-agent System

This paper addresses the possibility of modelling pixel spacial relationship of curves in images using the movement of second order dynamic systems. A multi-agent system is then considered to control the ‘movement’ of pixels in a single image to detect parallel curves. The music scripts are used as example to demonstrate the performance of the proposed method. The experiment results show that it is reliable to model the pixel spatial chain (pixels positioned adjacently or nearly connected in sequence) by the dynamics of a second order system, and the proposed multi-agent method has potential to detect parallel curves in images.

Shengzhi Du, Chunling Tu
Can Deep Learning Learn the Principle of Closed Contour Detection?

Learning the principle of a task should always be the primary goal of a learning system. Otherwise it reduces to a memorizing system and there always exists edge cases. In spite of its recent success in visual recognition tasks, convolutional neural networks’ (CNNs) ability to learn principles is still questionable. While CNNs exhibit a certain degree of generalization, they eventually break when the variability exceeds their capacity, indicating a failure to learn the underlying principles. We use edge cases of a closed contour detection task to support our arguments. We argue that lateral interactions, which are not a part of pure feed-forward CNNs but common in biological vision, are essential to this task.

Xinhua Zhang, Yijing Watkins, Garrett T. Kenyon

Deep Learning II

Frontmatter
DensSiam: End-to-End Densely-Siamese Network with Self-Attention Model for Object Tracking

Convolutional Siamese neural networks have been recently used to track objects using deep features. Siamese architecture can achieve real time speed, however it is still difficult to find a Siamese architecture that maintains the generalization capability, high accuracy and speed while decreasing the number of shared parameters especially when it is very deep. Furthermore, a conventional Siamese architecture usually processes one local neighborhood at a time, which makes the appearance model local and non-robust to appearance changes.To overcome these two problems, this paper proposes DensSiam, a novel convolutional Siamese architecture, which uses the concept of dense layers and connects each dense layer to all layers in a feed-forward fashion with a similarity-learning function. DensSiam also includes a Self-Attention mechanism to force the network to pay more attention to the non-local features during offline training. Extensive experiments are performed on four tracking benchmarks: OTB2013 and OTB2015 for validation set; and VOT2015, VOT2016 and VOT2017 for testing set. The obtained results show that DensSiam achieves superior results on these benchmarks compared to other current state-of-the-art methods.

Mohamed H. Abdelpakey, Mohamed S. Shehata, Mostafa M. Mohamed
Convolutional Adaptive Particle Filter with Multiple Models for Visual Tracking

Although particle filters improve the performance of convolutional-correlation trackers, especially in challenging scenarios such as occlusion and deformation, they considerably increase the computational cost. We present an adaptive particle filter to decrease the number of particles in simple frames in which there is no challenging scenario and the target model closely reflects the current appearance of the target. In this method, we consider the estimated position of each particle in the current frame as a particle in the next frame. These refined particles are more reliable than sampling new particles in every frame. In simple frames, target estimation is easier, therefore many particles may converge together. Consequently, the number of particles decreases in these frames. We implement resampling when the number of particles or the weight of the selected particle is too small. We use the weight computed in the first frame as a threshold for resampling because that weight is calculated by the ground truth model. Another contribution of this article is the generation of several target models by applying different adjusting rates to each of the high-likelihood particles. Thus, we create multiple models; some are useful in challenging frames because they are more influenced by the previous model, while other models are suitable for simple frames because they are less affected by the previous model. Experimental results on the Visual Tracker Benchmark v1.1 beta (OTB100) demonstrate that our proposed framework significantly outperforms state-of-the-art methods.

Reza Jalil Mozhdehi, Yevgeniy Reznichenko, Abubakar Siddique, Henry Medeiros
Scale-Aware RPN for Vehicle Detection

In this paper, we develop a scale-aware Region Proposal Network (RPN) model to address the problem of vehicle detection in challenging situations. Our model introduces two built in sub-networks which detect vehicles with scales from disjoint ranges. Therefore, the model is capable of training the specialized sub-networks for large-scale and small-scale vehicles in order to capture their unique characteristics. Meanwhile, high resolution of feature maps for handling small vehicle instances is obtained. The network model is followed by two XGBoost classifiers with bootstrapping strategy for mining hard negative examples. The method is evaluated on the challenging KITTI dataset and achieves comparable results against the state-of-the-art methods.

Lu Ding, Yong Wang, Robert Laganière, Xinbin Luo, Shan Fu
Object Detection to Assist Visually Impaired People: A Deep Neural Network Adventure

Blindness or vision impairment, one of the top ten disabilities among men and women, targets more than 7 million Americans of all ages. Accessible visual information is of paramount importance to improve independence and safety of blind and visually impaired people, and there is a pressing need to develop smart automated systems to assist their navigation, specifically in unfamiliar healthcare environments, such as clinics, hospitals, and urgent cares. This contribution focused on developing computer vision algorithms composed with a deep neural network to assist visually impaired individual’s mobility in clinical environments by accurately detecting doors, stairs, and signages, the most remarkable landmarks. Quantitative experiments demonstrate that with enough number of training samples, the network recognizes the objects of interest with an accuracy of over 98% within a fraction of a second.

Fereshteh S. Bashiri, Eric LaRose, Jonathan C. Badger, Roshan M. D’Souza, Zeyun Yu, Peggy Peissig
Large Scale Application Response Time Measurement Using Image Recognition and Deep Learning

Application response time is a critical performance metric to assess the quality of software products. It is also an objective metric for user experience evaluation. In this paper, we present a novel method named CVART (Computer Vision–based Application Response Time measurement) for measuring the response time (latency) of an application. In our solution, we use image recognition and deep learning techniques to capture visible changes in the display of the device running the application to compute the application response time of an operation that triggers these visual changes. Appling CVART can bring multiple benefits compared to traditional methods. First, it allows measuring the response time that reflects a real user experience. Second, the solution enables the measurement of operations that are extremely hard or impossible to measure when using traditional methods. Third, it does not require application instrumentation, which is infeasible in many use cases. Finally, the method does not depend on any specific application or software platform, which allows building performance measurement and application monitoring tools that work on multiple platforms and on multiple devices. For demonstration, we present one use case of applying CVART to measure the application response time of virtual desktops hosted in the cloud or datacenter, and we evaluate its efficiency on measurement at large scale.

Lan Vu, Uday Kurkure, Hari Sivaraman, Aravind Bappanadu

Applications I

Frontmatter
Vision-Depth Landmarks and Inertial Fusion for Navigation in Degraded Visual Environments

This paper proposes a method for tight fusion of visual, depth and inertial data in order to extend robotic capabilities for navigation in GPS-denied, poorly illuminated, and textureless environments. Visual and depth information are fused at the feature detection and descriptor extraction levels to augment one sensing modality with the other. These multimodal features are then further integrated with inertial sensor cues using an extended Kalman filter to estimate the robot pose, sensor bias terms, and landmark positions simultaneously as part of the filter state. As demonstrated through a set of hand-held and Micro Aerial Vehicle experiments, the proposed algorithm is shown to perform reliably in challenging visually-degraded environments using RGB-D information from a lightweight and low-cost sensor and data from an IMU.

Shehryar Khattak, Christos Papachristos, Kostas Alexis
Efficient Nearest Neighbors Search for Large-Scale Landmark Recognition

The problem of landmark recognition has achieved excellent results in small-scale datasets. Instead, when dealing with large-scale retrieval, issues that were irrelevant with small amount of data, quickly become fundamental for an efficient retrieval phase. In particular, computational time needs to be kept as low as possible, whilst the retrieval accuracy has to be preserved as much as possible. In this paper we propose a novel multi-index hashing method called Bag of Indexes (BoI) for Approximate Nearest Neighbors (ANN) search. It allows to drastically reduce the query time and outperforms the accuracy results compared to the state-of-the-art methods for large-scale landmark recognition. It has been demonstrated that this family of algorithms can be applied on different embedding techniques like VLAD and R-MAC obtaining excellent results in very short times on different public datasets: Holidays+Flickr1M, Oxford105k and Paris106k.

Federico Magliani, Tomaso Fontanini, Andrea Prati
Patient’s Body Motion Study Using Multimodal RGBDT Videos

Automatic analysis of body movement to identify physical activity of patients who are at bed rest is crucial for treatment or rehabilitation purposes. Existing methods of physical activity analysis mostly focused on the detection of primitive motion/non-motion states in unimodal video data captured by either RGB or depth or thermal sensor. In this paper, we propose a multimodal vision-based approach to classify body motion of a person lying on a bed. We mimicked a real scenario of ‘patient on bed’ by recording multimodal video data from healthy volunteers in a hospital room in a neurorehabilitation center. We first defined a taxonomy of possible physical activities based on observations of patients with acquired brain injuries. We then investigated different motion analysis and machine learning approaches to classify physical activities automatically. A multimodal database including RGB, depth and thermal videos was collected and annotated with eight predefined physical activities. Experimental results show that we can achieve moderately high accuracy (77.68%) to classify physical activities by tracking the body motion using an optical flow-based approach. To the best of our knowledge this is the first multimodal RGBDT video analysis for such application.

Mohammad A. Haque, Simon S. Kjeldsen, Federico G. Arguissain, Iris Brunner, Kamal Nasrollahi, Ole Kæseler Andersen, Jørgen F. Nielsen, Thomas B. Moeslund, Anders Jørgensen
Marker Based Thermal-Inertial Localization for Aerial Robots in Obscurant Filled Environments

For robotic inspection tasks in known environments fiducial markers provide a reliable and low-cost solution for robot localization. However, detection of such markers relies on the quality of RGB camera data, which degrades significantly in the presence of visual obscurants such as fog and smoke. The ability to navigate known environments in the presence of obscurants can be critical for inspection tasks especially, in the aftermath of a disaster. Addressing such a scenario, this work proposes a method for the design of fiducial markers to be used with thermal cameras for the pose estimation of aerial robots. Our low cost markers are designed to work in the long wave infrared spectrum, which is not affected by the presence of obscurants, and can be affixed to any object that has measurable temperature difference with respect to its surroundings. Furthermore, the estimated pose from the fiducial markers is fused with inertial measurements in an extended Kalman filter to remove high frequency noise and error present in the fiducial pose estimates. The proposed markers and the pose estimation method are experimentally evaluated in an obscurant filled environment using an aerial robot carrying a thermal camera.

Shehryar Khattak, Christos Papachristos, Kostas Alexis
Shape-Based Smoothing of Binary Digital Objects Using Signed Distance Transform

Digital staircase effects and noisy protrusions and dents on object boundaries add major challenges for quantitative structural analysis and visual assessment. In this paper, we present a shape-based smoothing algorithm for binary digital objects to eliminate digital staircase artifacts and remove boundary noise. The method uses a signed distance transform image, where the zero level set defines the object boundary. The key idea of our algorithm is to smooth this zero level set by applying a smoothing filter on the signed distance transform image. The method has been applied on slice-by-slice segmentation results of human proximal femur bone volumes from hip MR imaging. The observed results are encouraging, which suggest that the new method is capable of successfully eliminating digital staircase effects, while preserving the basic geometry of the target object. Quantitative analysis of a phantom experiment results reveals that a notion of “optimum scale” of smoothing exists for the new algorithm, and it is related to the scale of noisy protrusions and dents. The quantitative experiments have shown that, at the optimum smoothing scale, the new method can achieve 98.5% to 99.6% Dice similarity coefficient for noisy protrusions and dents of different sizes.

Xiaoliu Zhang, Cheng Chen, Gregory Chang, Punam K. Saha

Segmentation

Frontmatter
Patch-Based Potentials for Interactive Contour Extraction

The problem of interactive contour extraction of targeted objects of interest in images is challenging and finds many applications in image editing tasks. Several methods have been proposed to address this problem with a common objective: performing an accurate contour extraction with minimum user effort. For minimal paths techniques, achieving this goal depends critically on the ability of the so-called potential map to capture edges. In this context we propose new patch-based potentials designed to have small values at the boundary of the targeted object. To evaluate these potentials, we consider the livewire framework and quantify their abilities in terms of number of needed seed points. Both visual and quantitative results demonstrated the strong capability of our proposed potentials in reducing the user’s interaction while preserving a good accuracy of extraction.

Thoraya Ben Chattah, Sébastien Bougleux, Olivier Lézoray, Atef Hamouda
A New Algorithm for Local Blur-Scale Computation and Edge Detection

Precise and efficient object boundary detection is the key for successful accomplishment of many imaging applications involving object segmentation or recognition. Blur-scale at a given image location represents the transition-width of the local object interface. Hence, the knowledge of blur-scale is crucial for accurate edge detection and object segmentation. In this paper, we present new theory and algorithms for computing local blur-scales and apply it for scale-based gradient computation and edge detection. The new blur-scale computation method is based on our observation that gradients inside a blur-scale region follow a Gaussian distribution with non-zero mean. New statistical criteria using maximal likelihood functions are established and applied for local blur-scale computation. Gradient vectors over a blur-scale region are summed to enhance gradients at blurred object interfaces while leaving gradients at sharp transitions unaffected. Finally, a blur-scale based non-maxima suppression method is developed for edge detection. The method has been applied to both natural and phantom images. Experimental results show that computed blur-scales capture true blur extents at individual image locations. Also, the new scale-based gradient computation and edge detection algorithms successfully detect gradients and edges, especially at the blurred object interfaces.

Indranil Guha, Punam K. Saha
Semantic Segmentation by Integrating Classifiers for Different Difficulty Levels

Semantic segmentation assigns class labels to all pixels in an input image. In general, when the number of classes is large or when the appearance of each class frequency changes, the segmentation accuracy decreases drastically. In this paper, we propose to divide a classification task into sub-tasks according to the difficulty of classes. Our proposed method consists of 2 parts; training a network for each sub-task and training an integration network. Difficulty level depends on the number of pixels. By training the network for each difficulty level, we obtain probability maps for each sub-task. Then we train the integration network from those maps. In experiments, we evaluate the segmentation accuracy on the CamVid dataset which contains 11 classes. We divide all classes to 3 classes; easy, normal, and difficult classes. We compared our method with conventional method using all classes. We confirmed that the proposed method outperformed the conventional method.

Daisuke Matsuzuki, Kazuhiro Hotta

Applications II

Frontmatter
Fast Image Dehazing Methods for Real-Time Video Processing

Images of outdoor scenes are usually degraded by atmospheric particles, such as haze, fog and smoke, which fade the color and reduce the contrast of objects in the scene. This reduces image quality for manual or automated analysis in a variety of outdoor video surveillance applications, for example threat or anomaly detection. Current dehazing techniques, based on atmospheric models and frame-by-frame approaches, perform reasonably well, but are slow and unsuitable for real-time processing. This paper addresses the need for an online robust and fast dehazing algorithm that can improve video quality for a variety of surveillance applications. We build upon and expand state of the art dehazing techniques to develop a robust real-time dehazing algorithm with the following key characteristics and advantages: (1) We leverage temporal correlations and exploit special haze models to achieve 4× speed-up over the baseline algorithm [1] with no loss in detection performance, (2) We develop a pixel-by-pixel approach that allows us to retain sharp detail near object boundaries, which is essential for both manual and automated object detection and recognition applications, (3) We introduce a method for estimating global atmospheric lighting which makes it very robust for a variety of outdoor applications, and (4) We introduce a simple and effective sky segmentation method for improving the global atmospheric light estimation which has the effect of mitigating color distortion. We evaluate our approach on video data from multiple test locations, demonstrate both qualitative and quantitative improvements in image quality, and object detection accuracy.

Yang Chen, Deepak Khosla
GPU Accelerated Non-Parametric Background Subtraction

Accurate background subtraction is an essential tool for high level computer vision applications. However, as research continues to increase the accuracy of background subtraction algorithms, computational efficiency has often suffered as a result of increased complexity. Consequentially, many sophisticated algorithms are unable to maintain real-time speeds with increasingly high resolution video inputs. To combat this unfortunate reality, we propose to exploit the inherently parallelizable nature of background subtraction algorithms by making use of NVIDIA’s parallel computing platform known as CUDA. By using the CUDA interface to execute parallel tasks in the Graphics Processing Unit (GPU), we are able to achieve up to a two orders of magnitude speed up over traditional techniques. Moreover, the proposed GPU algorithm achieves over 8x speed over its CPU-based background subtraction implementation proposed in our previous work [1].

William Porr, James Easton, Alireza Tavakkoli, Donald Loffredo, Sean Simmons
Budget-Constrained Online Video Summarisation of Egocentric Video Using Control Charts

Despite the existence of a large number of approaches for generating summaries from egocentric video, online video summarisation has not been fully explored yet. We present an online video summarisation algorithm to generate keyframe summaries during video capture. Event boundaries are identified using control charts and a keyframe is subsequently selected for each event. The number of keyframes is restricted from above which requires a constant review and possible reduction of the cumulatively built summary. The new method was compared against a baseline and a state-of-the-art online video summarisation methods. The evaluation was done on an egocentric video database (Activity of Daily Living (ADL)). Semantic content of the frames in the video was used to evaluate matches with ground truth. The summaries generated by the proposed method outperform those generated by the two competitors.

Paria Yousefi, Clare E. Matthews, Ludmila I. Kuncheva
p-Laplacian Regularization of Signals on Directed Graphs

The graph Laplacian plays an important role in describing the structure of a graph signal from weights that measure the similarity between the vertices of the graph. In the literature, three definitions of the graph Laplacian have been considered for undirected graphs: the combinatorial, the normalized and the random-walk Laplacians. Moreover, a nonlinear extension of the Laplacian, called the p-Laplacian, has also been put forward for undirected graphs. In this paper, we propose several formulations for p-Laplacians on directed graphs directly inspired from the Laplacians on undirected graphs. Then, we consider the problem of p-Laplacian regularization of signals on directed graphs. Finally, we provide experimental results to illustrate the effect of the proposed p-laplacians on different types of graph signals.

Zeina Abu Aisheh, Sébastien Bougleux, Olivier Lézoray
A Dense-Depth Representation for VLAD Descriptors in Content-Based Image Retrieval

The recent advances brought by deep learning allowed to improve the performance in image retrieval tasks. Through the many convolutional layers, available in a Convolutional Neural Network (CNN), it is possible to obtain a hierarchy of features from the evaluated image. At every step, the patches extracted are smaller than the previous levels and more representative. Following this idea, this paper introduces a new detector applied on the feature maps extracted from pre-trained CNN. Specifically, this approach lets to increase the number of features in order to increase the performance of the aggregation algorithms like the most famous and used VLAD embedding. The proposed approach is tested on different public datasets: Holidays, Oxford5k, Paris6k and UKB.

Federico Magliani, Tomaso Fontanini, Andrea Prati

Virtual Reality II

Frontmatter
Augmented Reality System for Training and Assistance in the Management of Industrial Equipment and Instruments

This article proposes the development of a smartphone application on the Android platform as a recognition tool, focused on the digitization of real objects using image processing techniques. The application is oriented to the process of training and assistance in the handling of equipment and industrial instruments within the field of engineering such as electronics, mechanics, electromechanics, mechatronics, being a technological tool that allows users to interact in the reality environment Increased, it presents a friendly and intuitive environment, thus improving the process of handling industrial equipment and changing the paradigms of the use of physical manuals and giving use to new technologies such as smartphones with digital information.

Edison A. Chicaiza, Edgar I. De la Cruz, Víctor H. Andaluz
Alternative Treatment of Psychological Disorders Such as Spider Phobia Through Virtual Reality Environments

This article proposes a tool to support the psychotherapist in the process of treating spider phobia through a system that combines both software and hardware elements to present immersive virtual reality environments to the treated patient. To create the feeling of immersion, environments and models created in Unity are used in conjunction with the patient’s movement tracked through the Kinect motion sensor. For the development of the system, the psychotherapeutic treatment method of systematic desensitization is used, so that the patient can overcome his fear and present non-phobic interactions with spiders. The process of developing the system and redacting this document was supported and supervised by a psychologist specialized in the treatment of phobias. Finally, tests were performed to obtain feedback from specialists and potential patients with a medium degree of phobia, in which the results were very positive and satisfactory.

Joseph Armas, Víctor H. Andaluz
The Skyline as a Marker for Augmented Reality in Urban Context

In recent years, augmented reality (AR) technologies have emerged as powerful tools to help visualize the future impacts of new constructions on cities. Many approaches that use costly sensors and height-end platforms to run AR in real-time have been developed. Little efforts have been made to embed AR on mobile phones. In this paper, we present a novel approach that uses the Skyline as a marker in an AR system. This lightweight feature enables real-time matching of virtual and real skyline on smartphones.We use device’s embedded instruments to estimate the user’s pose. This approximation is used to insert a synthetic object in the live video stream. This first approach gives a very unrealistic impression of the viewed scene: the inserted objects appear to hover and float with the user’s movements. In order to address this problem, we use the live video camera feed as additional source of information which provides a redundancy to the instruments estimation. We extract the Skyline (a set of pixels that defines the boundary between the building and the sky) as main visual feature. Our proposal is to use these automatically extracted points and track them throughout the video sequence, to allow synthetic objects to anchor these visual features, making it possible to simulate a landscape from multiple viewpoints using a smartphone. We use images of the Lyon city (France) to illustrate our proposal.

Mehdi Ayadi, Leo Valque, Mihaela Scuturici, Serge Miguet, Chokri Ben Amar
Oil Processes VR Training

In this paper a virtual reality solution is developed to emulate the environment and the operations of the pitching and receiving traps of pipe scrapers (PIG), with the aim of reinforcing the training of operators in oil camps. To develop this, the information was collected on various pitching and receiving traps of the real pipeline operating companies in the country, thus defining the basic and specific parameters for the virtual recreation of a typical trap model. The 3d models obtains from P&ID’s diagrams to interact with user. The environment, interaction and behavior of pipes was developed in a graphic engine, to make training tasks with real state procedures on the oil industry. The goal is save time, money, resources in the training and learning specific oil industry; and make available a base to simulate another complex process.

Víctor H. Andaluz, José L. Amaquiña, Washington X. Quevedo, Jorge Mora-Aguilar, Daniel Castillo-Carrión, Roberto J. Miranda, María G. Pérez

ST: Intelligent Transportation Systems

Frontmatter
Multiple Object Tracking in Urban Traffic Scenes with a Multiclass Object Detector

Multiple object tracking (MOT) in urban traffic aims to produce the trajectories of the different road users that move across the field of view with different directions and speeds and that can have varying appearances and sizes. Occlusions and interactions among the different objects are expected and common due to the nature of urban road traffic. In this work, a tracking framework employing classification label information from a deep learning detection approach is used for associating the different objects, in addition to object position and appearances. We want to investigate the performance of a modern multiclass object detector for the MOT task in traffic scenes. Results show that the object labels improve tracking performance, but that the output of object detectors are not always reliable.

Hui-Lee Ooi, Guillaume-Alexandre Bilodeau, Nicolas Saunier, David-Alexandre Beaupré
Autonomous Bus Boarding Robotic Wheelchair Using Bidirectional Sensing Systems

Research interest in robotic wheelchairs is driven in part by their potential for improving the independence and quality-of-life of persons with disabilities and the elderly. Moreover, smart wheelchair systems aim to reduce the workload of the caregiver. In this paper, we propose a novel technique for 3D sensing using a conventional Laser Range Finder (LRF). We mounted two sensing systems onto our new six-wheeled robotic bus boarding wheelchair to locate the bus door and its determine its dimensions. Additionally, we have implemented a Single Shot Multibox Detector (SSD) to detect the bus doorsteps to allow for boarding. For precise movements, we successfully measure the height of the bus doorsteps and door width of the bus. Our step measurements and bus doorsteps detection technique for the wheelchair also enables the wheelchair to autonomously board a bus. Our experiments show the effectiveness and applicability of our system to real world robotic wheelchair freedom of movement.

Shamim Al Mamun, Hisato Fukuda, Antony Lam, Yoshinori Kobayashi, Yoshinori Kuno
Road User Abnormal Trajectory Detection Using a Deep Autoencoder

In this paper, we focus on the development of a method that detects abnormal trajectories of road users at traffic intersections. The main difficulty with this is the fact that there are very few abnormal data and the normal ones are insufficient for the training of any kinds of machine learning model. To tackle these problems, we proposed the solution of using a deep autoencoder network trained solely through augmented data considered as normal. By generating artificial abnormal trajectories, our method is tested on four different outdoor urban users scenes and performs better compared to some classical outlier detection methods.

Pankaj Raj Roy, Guillaume-Alexandre Bilodeau
Traffic Flow Classification Using Traffic Cameras

Traffic flow classification is an integrated task of traffic management and network mobility. In this work, a feature collection system is developed to collect the moving and appearance-based features of traffic images, and their performance are evaluated by different machine learning techniques including Deep Neural Networks (DNN), and Convolutional Neural Networks (CNN). The experimental results for a challenging highway video with three traffic flow classes of light, medium and heavy indicates the highest performance of CNN with $$90\%$$ accuracy.

Mohammad Shokrolah Shirazi, Brendan Morris
Backmatter
Metadata
Title
Advances in Visual Computing
Editors
Dr. George Bebis
Richard Boyle
Bahram Parvin
Darko Koracin
Matt Turek
Srikumar Ramalingam
Kai Xu
Stephen Lin
Bilal Alsallakh
Jing Yang
Eduardo Cuervo
Ph.D. Jonathan Ventura
Copyright Year
2018
Electronic ISBN
978-3-030-03801-4
Print ISBN
978-3-030-03800-7
DOI
https://doi.org/10.1007/978-3-030-03801-4

Premium Partner