Skip to main content

Über dieses Buch

The two volume set LNCS 10072 and LNCS 10073 constitutes the refereed proceedings of the 12th International Symposium on Visual Computing, ISVC 2016, held in Las Vegas, NV, USA in December 2016.

The 102 revised full papers and 34 poster papers presented in this book were carefully reviewed and selected from 220 submissions. The papers are organized in topical sections: Part I (LNCS 10072) comprises computational bioimaging; computer graphics; motion and tracking; segmentation; pattern recognition; visualization; 3D mapping; modeling and surface reconstruction; advancing autonomy for aerial robotics; medical imaging; virtual reality; computer vision as a service; visual perception and robotic systems; and biometrics. Part II (LNCS 9475): applications; visual surveillance; computer graphics; and virtual reality.



ST: Computational Bioimaging


Similarity Metric Learning for 2D to 3D Registration of Brain Vasculature

2D to 3D image registration techniques are useful in the treatment of neurological diseases such as stroke. Image registration can aid physicians and neurosurgeons in the visualization of the brain for treatment planning, provide 3D information during treatment, and enable serial comparisons. In the context of stroke, image registration is challenged by the occluded vessels and deformed anatomy due to the ischemic process. In this paper, we present an algorithm to register 2D digital subtraction angiography (DSA) with 3D magnetic resonance angiography (MRA) based upon local point cloud descriptors. The similarity between these local descriptors is learned using a machine learning algorithm, allowing flexibility in the matching process. In our experiments, the error rate of 2D/3D registration using our machine learning similarity metric (52.29) shows significant improvement when compared to a Euclidean metric (152.54). The proposed similarity metric is versatile and could be applied to a wide range of 2D/3D registration.

Alice Tang, Fabien Scalzo

Automatic Optic Disk Segmentation in Presence of Disk Blurring

Fundus image analysis has emerged as a very useful tool to analyze the structure of retina for detection of different eye-related abnormalities. The detection of these abnormalities requires the segmentation of basic retinal structures including blood vessels and optic disk. The optic disk segmentation becomes a challenging task when the optic disk boundary is degraded due to some deviations including optic disk edema and papilledema. This paper focuses on the segmentation of optic disk in presence of disk blurring. The method proposed makes use of gradient extracted from line profiles that pass through optic disk margin. Initially the optic disk is enhanced using morphological operations and location of optic disk region is detected automatically using vessel density property. Finally, line profiles are extracted at different angles and their gradient is evaluated for the estimation of optic disk boundary. The proposed method has been applied on 28 images taken from Armed Forces Institute of Ophthalmology.

Samra Irshad, Xiaoxia Yin, Lucy Qing Li, Umer Salman

An Object Splitting Model Using Higher-Order Active Contours for Single-Cell Segmentation

Determining the number and morphology of individual cells on microscopy images is one of the most fundamental steps in quantitative biological image analysis. Cultured cells used in genetic perturbation and drug discovery experiments can pile up and nuclei can touch or even grow on top of each other. Similarly, in tissue sections cell nuclei can be very close and touch each other as well. This makes single cell nuclei detection extremely challenging using current segmentation methods, such as classical edge- and threshold-based methods that can only detect separate objects, and they fail to separate touching ones. The pipeline we present here can segment individual cell nuclei by splitting touching ones. The two-step approach merely based on energy minimization principles using an active contour framework. In a presegmentation phase we use a local region data term with strong edge tracking capability, while in the splitting phase we introduce a higher-order active contour model. This model prefers high curvature contour locations at the opposite side of joint objects grow “cutting arms” that evolve to one another until they split objects. Synthetic and real experiments show the strong segmentation and splitting ability of the proposed pipeline and that it outperforms currently used segmentation models.

Jozsef Molnar, Csaba Molnar, Peter Horvath

Tensor Voting Extraction of Vessel Centerlines from Cerebral Angiograms

The extraction of vessel centerlines from cerebral angiograms is a prerequisite for 2D-3D reconstruction and computational fluid dynamic (CFD) simulations. Many researchers have studied vessel segmentation and centerline extraction on retinal images while less attention and efforts have been devoted to cerebral angiography images. Since cerebral angiograms consist of vessels that are much noisier because of the possible patient movement, it is often a more challenging task compared to working on retinal images. In this study, we propose a multi-scale tensor voting framework to extract the vessel centerlines from cerebral angiograms. The developed framework is evaluated on a dataset of routinely acquired angiograms and reach an accuracy of 91.75$$\%\pm 5.07\%$$ during our experiments.

Yu Ding, Mircea Nicolescu, Dan Farmer, Yao Wang, George Bebis, Fabien Scalzo

Stacked Autoencoders for Medical Image Search

Medical images can be a valuable resource for reliable information to support medical diagnosis. However, the large volume of medical images makes it challenging to retrieve relevant information given a particular scenario. To solve this challenge, content-based image retrieval (CBIR) attempts to characterize images (or image regions) with invariant content information in order to facilitate image search. This work presents a feature extraction technique for medical images using stacked autoencoders, which encode images to binary vectors. The technique is applied to the IRMA dataset, a collection of 14,410 x-ray images in order to demonstrate the ability of autoencoders to retrieve similar x-rays given test queries. Using IRMA dataset as a benchmark, it was found that stacked autoencoders gave excellent results with a retrieval error of 376 for 1,733 test images with a compression of 74.61%.

S. Sharma, I. Umar, L. Ospina, D. Wong, H. R. Tizhoosh

CutPointVis: An Interactive Exploration Tool for Cancer Biomarker Cutpoint Optimization

In the field of medical and epidemiological research, it is a common practice to do a clinical or statistical dichotomization of a continuous variable. By dichotomizing a continuous variable, a researcher can build a eligibility criteria for potential studies, predict disease likelihood or predict treatment response. The dichotomization methods can be classified into data-depend methods and outcome-based methods. The data-dependent methods are considered to be arbitrary and lack of generics. While the outcome-based methods compute an optimal cut point which maximizes the statistical difference between two dichotomized groups. There is no standard software yet for an expedited cut point determination In this work, we present CutPointVis, a visualization platform for fast and convenient optimal cut point determination. Compared to existing research work, CutPointVis distinguishes itself with its realtime feature and better user interactivity. A case study is presented to demonstrate the usability of CutPointVis.

Lei Zhang, Ying Zhu

Computer Graphics


Adding Turbulence Based on Low-Resolution Cascade Ratios

In this paper we propose a novel method of adding turbulence to low-res. smoke simulation. We consider the physical properties of such low-res. simulation and add turbulence only to the appropriate position where the value of the energy cascade ratio is judged as physically correct. Our method can prevent noise in the whole region of fluid surfaces which appeared with previous methods. We also demonstrate that our method can be combined with a variety of existing methods such as wavelet turbulence and vorticity confinement.

Masato Ishimuroya, Takashi Kanai

Creating Feasible Reflectance Data for Synthetic Optical Flow Datasets

Optical flow ground truth generated by computer graphics has many advantages. For example, we can systematically vary scene parameters to understand algorithm sensitivities. But is synthetic ground truth realistic enough? Appropriate material models have been established as one of the major challenges for the creation of synthetic datasets: previous research has shown that highly sophisticated reflectance field acquisition methods yield results, which various optical flow methods cannot distinguish from real scenes. However, such methods are costly both in acquisition and rendering time and thus infeasible for large datasets. In this paper we find the simplest reflectance models (RM) for different groups of materials which still provide sufficient accuracy for optical flow performance analysis. It turns out that a spatially varying Phong RM is sufficient for simple materials. Normal estimation combined with Anisotropic RM can handle even very complex materials.

Burkhard Güssefeld, Katrin Honauer, Daniel Kondermann

Automatic Web Page Coloring

We present a new tool for automatic recoloring of web pages. Automatic application of different color palettes to web pages is essential for both professional and amateur web designers. However no existing recoloring tools provide full recoloring for web pages. To recolor web page entirely, we replace colors in .css, .html, and .svg files, and recolor images such as background and navigation elements. We create new color theme based on a color guide image provided by user. Evaluation shows a high level of satisfaction with the quality of palettes and results of recoloring. Our tool is available at

Polina Volkova, Soheila Abrishami, Piyush Kumar

Automatic Content-Aware Non-photorealistic Rendering of Images

Non-photorealistic rendering techniques work on image features and often manipulate a set of characteristics such as edges and texture to achieve a desired depiction of the scene. Most computational photography methods decompose an image using edge preserving filters and work on the resulting base and detail layers independently to achieve desired visual effects. We propose a new approach for content-aware non-photorealistic rendering of images where we manipulate the visually salient and non-salient regions separately. We propose a novel content-aware framework in order to render an image for applications such as detail exaggeration, artificial smoothing, and image abstraction. The processed regions of the image are blended seamlessly with the rest of the image for all these applications. We demonstrate that content awareness of the proposed method leads to automatic generation of non-photorealistic rendering of the same image for the different applications mentioned above.

Akshay Gadi Patil, Shanmuganathan Raman

Improved Aircraft Recognition for Aerial Refueling Through Data Augmentation in Convolutional Neural Networks

As machine learning techniques increase in complexity, their hunger for more training data is ever-growing. Deep learning for image recognition is no exception. In some domains, training images are expensive or difficult to collect. When training image availability is limited, researchers naturally turn to synthetic methods of generating new imagery for training. We evaluate several methods of training data augmentation in the context of improving performance of a Convolutional Neural Network (CNN) in the domain of fine-grain aircraft classification. We conclude that randomly scaling training imagery significantly improves performance. Also, we find that drawing random occlusions on top of training images confers a similar improvement in our problem domain. Further, we find that these two effects seem to be approximately additive, with our results demonstrating a 45.7% reduction in test error over basic horizontal flipping and cropping.

Robert Mash, Brett Borghetti, John Pecarina

Motion and Tracking


Detecting Tracking Failures from Correlation Response Maps

Tracking methods based on correlation filters have gained popularity in recent years due to their robustness to rotations, occlusions, and other challenging aspects of visual tracking. Such methods generate a confidence or response map which is used to estimate the new location of the tracked target. By examining the features of this map, important details about the tracker status can be inferred and compensatory measures can be taken in order to minimize failures. We propose an algorithm that uses the mean and entropy of this response map to prevent bad target model updates caused by problems such as occlusions and motion blur as well as to determine the size of the target search area. Quantitative experiments demonstrate that our method improves success plots over a baseline tracker that does not incorporate our failure detection mechanism.

Ryan Walsh, Henry Medeiros

Real-Time Multi-object Tracking with Occlusion and Stationary Objects Handling for Conveying Systems

Multiple object tracking has a broad range of applications ranging from video surveillance to robotics. In this work, we extend the application field to automated conveying systems. Inspired by tracking methods applied to video surveillance, we follow an on-line tracking-by-detection approach based on background subtraction. The logistics applications turn out to be a challenging scenario for existing methods. This challenge is twofold: First, conveyed objects tend to have a similar appearance, which makes the occlusion handling difficult. Second, they are often stationary, which make them hard to detect with background subtraction techniques. This work aims to improve the occlusion handling by using the order of the conveyed objects. Besides, to handle stationary objects, we propose a feedback loop from tracking to detection. Finally, we provide an evaluation of the proposed method on a real-world video.

Adel Benamara, Serge Miguet, Mihaela Scuturici

Fast, Deep Detection and Tracking of Birds and Nests

We present a visual object detector based on a deep convolutional neural network that quickly outputs bounding box hypotheses without a separate proposal generation stage [1]. We modify the network for better performance, specialize it for a robotic application involving “bird” and “nest” categories (including the creation of a new dataset for the latter), and extend it to enforce temporal continuity for tracking. The system exhibits very competitive detection accuracy and speed, as well as robust, high-speed tracking on several difficult sequences.

Qiaosong Wang, Christopher Rasmussen, Chunbo Song

Camera Motion Estimation with Known Vertical Direction in Unstructured Environments

We propose a novel approach to solve the problem of relative camera motion estimation with the information of known vertical direction in unstructured environments using the technique of 2D structure from motion (SFM). The information of vertical direction (gravity direction) can transform cameras into the camera of which vertical axis is parallel with the vertical direction. Moreover, feature point measurements can also be transformed into bearing angles and vertical coordinates with respect to this cameras. Then, 2D pose of the camera and 2D positions of point features can be estimated with 2D trifocal tensor method in closed form. After obtaining those estimates, the remained 1D information about camera and point features are estimated easily. The results of the experiments with simulated and real images are presented to demonstrate the feasibility of the proposed method. We also give the comparison between the proposed method and the state-of-the art method.

Jae-Hean Kim, Jin Sung Choi

A Multiple Object Tracking Evaluation Analysis Framework

Recently, CLEAR and trajectory-based evaluation protocols which generate particular scores such as MOTA and MOTP, etc., are often used in evaluating multiple object tracking (MOT) methods. These scores, indicating how good of tracking methods, seem to be good enough to compare their performances. However, we argue that it is insufficient since failure causes of tracking methods are not discovered. Understanding failure causes will definitely not only help improve their algorithms but also assess merits and demerits of algorithms explicitly. Thus this paper presents Tracking Evaluation Analysis (TEA) by answering the question: “why do tracking failures happen?” TEA comes out as an automatic solution, rather than a conventional way of manually analyzing tracking results, which are notorious for being time-consuming and tedious. In this preliminary version, we demonstrate the validity of TEA by comparing the performances of MOT methods, submitted to MOT 2015 Challenge, tested on TownCentre dataset.

Dao Huu Hung, Do Anh Tuan, Nguyen Ngoc Khanh, Tran Duc Hien, Nguyen Hai Duong



Stereo-Image Normalization of Voluminous Objects Improves Textile Defect Recognition

The visual detection of defects in textiles is an important application in the textile industry. Existing systems require textiles to be spread flat so they appear as 2D surfaces, in order to detect defects. In contrast, we show classification of textiles and textile feature extraction methods, which can be used when textiles are in inhomogeneous, voluminous shape. We present a novel approach on image normalization to be used in stain-defect recognition. The acquired database consist of images of piles of textiles, taken using stereo vision. The results show that a simple classifier using normalized images outperforms other approaches using machine learning in classification accuracy.

Dirk Siegmund, Arjan Kuijper, Andreas Braun

Reliability-Based Local Features Aggregation for Image Segmentation

Local features are used for describing the visual information in a local neighborhood of image pixels. Although using various types of local features can provide complementary information about the pixels, effective integration of these features has remained as a challenging issue. In this paper, we propose a novel segmentation algorithm which aggregates the information obtained from different local features. Starting with an over-segmentation of the input image, local features are fed into a factorization-based framework to construct multiple new representations. We then introduce a novel aggregation model to integrate the new representations. Our proposed model jointly learns the reliability of representations and infers final representation. Final segmentation is obtained by applying post-processing steps on the inferred final representation. Experimental results demonstrate the effectiveness of our algorithm on the Berkeley Segmentation Dataset.

Fariba Zohrizadeh, Mohsen Kheirandishfard, Kamran Ghasedidizaji, Farhad Kamangar

Chan-Vese Revisited: Relation to Otsu’s Method and a Parameter-Free Non-PDE Solution via Morphological Framework

Chan-Vese is an important and well-established segmentation method. However, it tends to be challenging to implement, including issues such as initialization problems and establishing the values of several free parameters. The paper presents a detailed analysis of Chan-Vese framework. It establishes a relation between the Otsu binarization method and the fidelity terms of Chan-Vese energy functional, allowing for intelligent initialization of the scheme. An alternative, fast, and parameter-free morphological segmentation technique is also suggested. Our experiments indicate the soundness of the proposed algorithm.

Arie Shaus, Eli Turkel

Image Enhancement by Volume Limitation in Binary Tomography

We introduce two methods to limit the reconstruction volume in binary tomography. The first one can be used when the CT scanner is equipped with a laser distance measurement device which gives information of the outer boundary of the object to reconstruct. The second one uses dilated versions of a blueprint of the object under investigation. Such a blueprint is often available in non-destructive testing of industrial objects. By experiments we show that the proposed methods can enhance the quality of the reconstructed images.

László Varga, Zoltán Ozsvár, Péter Balázs

Resolution-Independent Superpixels Based on Convex Constrained Meshes Without Small Angles

The over-segmentation problem for images is studied in the new resolution-independent formulation when a large image is approximated by a small number of convex polygons with straight edges at subpixel precision. These polygonal superpixels are obtained by refining and extending subpixel edge segments to a full mesh of convex polygons without small angles and with approximation guarantees. Another novelty is the objective error difference between an original pixel-based image and the reconstructed image with a best constant color over each superpixel, which does not need human segmentations. The experiments on images from the Berkeley Segmentation Database show that new meshes are smaller and provide better approximations than the state-of-the-art.

Jeremy Forsythe, Vitaliy Kurlin, Andrew Fitzgibbon

Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation

We consider the problem of learning deep neural networks (DNNs) for object category segmentation, where the goal is to label each pixel in an image as being part of a given object (foreground) or not (background). Deep neural networks are usually trained with simple loss functions (e.g., softmax loss). These loss functions are appropriate for standard classification problems where the performance is measured by the overall classification accuracy. For object category segmentation, the two classes (foreground and background) are very imbalanced. The intersection-over-union (IoU) is usually used to measure the performance of any object category segmentation method. In this paper, we propose an approach for directly optimizing this IoU measure in deep neural networks. Our experimental results on two object category segmentation datasets demonstrate that our approach outperforms DNNs trained with standard softmax loss.

Md Atiqur Rahman, Yang Wang

Pattern Recognition


A Mobile Recognition System for Analog Energy Meter Scanning

The work presents a mobile platform based system, scanning electricity, gas and water meters. The motivation is the automation of the manual procedure, increasing the reading accuracy and decreasing the human effort. The methodology comprises two stages - digits detection and Optical Character Recognition. The detection of digits is accomplished by a pipeline of operations. Optical Character Recognition is achieved, employing two different approaches - Tesseract OCR and Convolutional Neural Network. The performance evaluation on a vast number of images reports high precision for the algorithms of both stages. Furthermore, Convolutional Neural Network significantly outperforms the Tesseract OCR for all types of meters. The objective of functionality by the limited speed and data storage of mobile devices is also successfully met.

Martin Cerman, Gayane Shalunts, Daniel Albertini

Towards Landmine Detection Using Ubiquitous Satellite Imaging

Despite the tremendous number of landmines worldwide, existing methods for landmine detection still suffer from high scanning costs and times. Utilizing ubiquitous thermal infrared satellite imaging might potentially be an alternative low-cost method, relying on processing big image data collected over decades. In this paper we study this alternative, focusing on assessing the utility of resolution enhancement using state-of-the art super-resolution algorithms in landmine detection. The major challenge is the relatively limited number of thermal satellite images available for a given location, which makes the possible magnification factor extremely low for landmine detection. To facilitate the study, we generate equivalent satellite images for various landmine distributions. We then estimate the detection accuracy from a naive landmine detector on the super-resolution images. While our proposed methodology might not be useful for anti-personal landmines, the experimental results show a promising detection rates for large anti-tank landmines.

Sahar Elkazaz, Mohamed E. Hussein, Ahmed El-Mahdy, Hiroshi Ishikawa

Robustness of Rotation Invariant Descriptors for Texture Classification

In this paper, we present an evaluation of texture descriptors’ robustness when interpolation methods are applied over rotated images. We propose a novel rotation invariant texture descriptor called Sampled Local Mapped Pattern Magnitude (SLMP_M) and we compare it with well-known published texture descriptors. The compared descriptors are the Completed Local Binary Pattern (CLBP), and two Discrete Fourier Transform (DFT)-based methods called the Local Ternary Pattern DFT and the Improved Local Ternary Pattern DFT. Experiments were performed on the Kylberg Sintorn Rotation Dataset, a database of natural textures that were rotated using hardware and computational procedures. Five interpolation methods were investigated: Lanczos, B-spline, Cubic, Linear and Nearest Neighbor with nine directions. Experimental results show that our proposed method makes a robust texture discrimination, overcoming traditional texture descriptors and works better in different interpolations.

Raissa Tavares Vieira, Tamiris Trevisan Negri, Adilson Gonzaga

Feature Evaluation for Handwritten Character Recognition with Regressive and Generative Hidden Markov Models

Hidden Markov Models constitute an established approach often employed for offline handwritten character recognition in digitized documents. The current work aims at evaluating a number of procedures frequently used to define features in the character recognition literature, within a common Hidden Markov Model framework. By separating model and feature structure, this should give a more clear indication of the relative advantage of different families of visual features used for character classification. The effects of model topologies and data normalization are also studied over two different handwritten datasets. The Hidden Markov Model framework is then used to generate images of handwritten characters, to give an accessible visual illustration of the power of different features.

Kalyan Ram Ayyalasomayajula, Carl Nettelblad, Anders Brun

DeTEC: Detection of Touching Elongated Cells in SEM Images

A probabilistic framework using two random fields, DeTEC (Detection of Touching Elongated Cells) is proposed to detect cells in scanning electron microscopy images with inhomogeneous illumination. The first random field provides a binary segmentation of the image to superpixels that are candidates belonging to cells, and to superpixels that are part of the background, by imposing a prior on the smoothness of the texture features. The second random field selects the superpixels whose boundaries are more likely to form elongated cell walls by imposing a smoothness prior onto the orientations of the boundaries. The method is evaluated on a dataset of Clostridium difficile cell images and is compared to CellDetect.

A. Memariani, C. Nikou, B. T. Endres, E. Bassères, K. W. Garey, I. A. Kakadiaris

Object Detection Based on Image Blur Using Spatial-Domain Filtering with Haar-Like Features

In general, out-of-focus blur is considered to be disturbance that reduces the detection accuracy for object detection, and many researchers have tried to remove such noise. The authors proposed an object detection scheme that exploits information included in image blur. This scheme showed good accuracy for object detection, but it has a critical problem: huge computational cost is required owing to the DFT needed to evaluate image blur. This paper proposes a novel object detection scheme using the difference in image blur evaluated with simple spatial-domain filtering. Experimental results using synthetic images show that the scheme achieves perfect classification, though our previous scheme has about a 2.40% miss rate at 0.1 FPPI for circle detection. In addition to the improvement in accuracy, the processing speed becomes about 431 times faster than that of the old scheme.

Ryusuke Miyamoto, Shingo Kobayashi

Rare Class Oriented Scene Labeling Using CNN Incorporated Label Transfer

In natural scene images, rare class objects have low occurrence frequencies and limited spatial coverage, and they may be easily ignored during scene labeling. However, rare class objects are often more important to semantic labeling and image understanding compared to background areas. In this work, we present a rare class-oriented scene labeling framework (RCSL) that involves two new techniques pertaining to rare classes. First, scene assisted rare class retrieval is introduced in label transfer that is intended to enrich the retrieval set with scene-relevant rare classes. Second, a complementary rare class balanced CNN is incorporated to address the unbalanced training data issues, where rare classes are usually dominated by common ones in natural scene images. Furthermore, a superpixels-based re-segmentation was implemented to produce perceptually meaningful object boundaries. Experimental results demonstrate promising scene labeling performance of the proposed framework on the SIFTflow dataset both qualitatively and quantitatively, especially for rare class objects.

Liangjiang Yu, Guoliang Fan

Pollen Grain Recognition Using Deep Learning

Pollen identification helps forensic scientists solve elusive crimes, provides data for climate-change modelers, and even hints at potential sites for petroleum exploration. Despite its wide range of applications, most pollen identification is still done by time-consuming visual inspection by well-trained experts. Although partial automation is currently available, automatic pollen identification remains an open problem. Current pollen-classification methods use pre-designed features of texture and contours, which may not be sufficiently distinctive. Instead of using pre-designed features, our pollen-recognition method learns both features and classifier from training data under the deep-learning framework. To further enhance our network’s classification ability, we use transfer learning to leverage knowledge from networks that have been pre-trained on large datasets of images. Our method achieved $$\approx $$94% classification rate on a dataset of 30 pollen types. These rates are among the highest obtained in this problem.

Amar Daood, Eraldo Ribeiro, Mark Bush

Classifying Pollen Using Robust Sequence Alignment of Sparse Z-Stack Volumes

The identification of pollen grains is a task needed in many scientific and industrial applications, ranging from climate research to petroleum exploration. It is also a time-consuming task. To produce data, pollen experts spend hours, sometimes months, visually counting thousands of pollen grains from hundreds of images acquired by microscopes. Most current automation of pollen identification rely on single-focus images. While this type of image contains characteristic texture and shape, it lacks information about how these visual cues vary across the grain’s surface. In this paper, we propose a method that recognizes pollen species from stacks of multi-focal images. Here, each pollen grain is represented by a multi-focal stack. Our method matches unknown stacks to pre-learned ones using the Longest-Common Sub-Sequence (LCSS) algorithm. The matching process relies on the variations of visual texture and contour that occur along the image stack, which are captured by a low-rank and sparse decomposition technique. We tested our method on 392 image stacks from 10 species of pollen grains. The proposed method achieves a remarkable recognition rate of 99.23%.

Amar Daood, Eraldo Ribeiro, Mark Bush

Complementary Keypoint Descriptors

We examine the use of complementary descriptors for keypoint recognition in digital images. The descriptors combine multiple types of information, including shape, color, and texture. We first review several keypoint descriptors and propose new descriptors that use normalized brightness/color spatial histograms. Individual and combined descriptors are compared on a standard data set that varies blur, viewpoint, zoom, rotation, brightness, and compression. Results indicate that substantially improved results can be achieved without greatly increasing keypoint descriptor length, but that the best results combine information from complementary descriptors.

Clark F. Olson, Sam A. Hoover, Jordan L. Soltman, Siqi Zhang

Two Phase Classification for Early Hand Gesture Recognition in 3D Top View Data

This work classifies top-view hand-gestures observed by a Time of Flight (ToF) camera using Long Short-Term Memory (LSTM) architecture of neural networks. We demonstrate a performance improvement by a two-phase classification. Therefore we reduce the number of classes to be separated in each phase and combine the output probabilities. The modified system architecture achieves an average cross-validation accuracy of 90.75% on a 9-gesture dataset. This is demonstrated to be an improvement over the single all-class LSTM approach. The networks are trained to predict the class-label continuously during the sequence. A frame-based gesture prediction, using accumulated gesture probabilities per frame of the video sequence, is introduced. This eliminates the latency due to prediction of gesture at the end of the sequence as is usually the case with majority voting based methods.

Aditya Tewari, Bertram Taetz, Frederic Grandidier, Didier Stricker



Adaptive Isosurface Reconstruction Using a Volumetric-Divergence-Based Metric

This paper proposes a new adaptive isosurface extraction algorithm for 3D rectilinear volumetric datasets, with the intent of improving accuracy and maintaining topological correctness of the extracted isosurface against the trilinear interpolation isosurface while keeping the mesh triangle count from becoming excessive. The new algorithm first detects cubes where the extracted mesh has large error using a volumetric-divergence-based metric, which estimates the volume between the extracted mesh and the trilinear interpolation isosurface. Then, it adaptively subdivides those cubes to refine the mesh. A new strategy is developed to remove cracks in the mesh caused by neighboring cubes processed with different subdividing levels.

Cuilan Wang, Shuhua Lai

Large Image Collection Visualization Using Perception-Based Similarity with Color Features

This paper introduces the basic steps to build a similarity-based visualization tool for large image collections. We build the similarity metrics based on human perception. Psychophysical experiments have shown that human observers can recognize the gist of scenes within 100 milliseconds (ms) by comprehending the global properties of an image. Color also plays an important role in human rapid scene recognition. However, previous works often neglect color features. We propose new scene descriptors that preserve the information from coherent color regions, as well as the spatial layouts of scenes. Experiments show that our descriptors outperform existing state-of-the-art approaches. Given the similarity metrics, a hierarchical structure of an image collection can be built in a top-down manner. Representative images are chosen for image clusters and visualized using a force-directed graph.

Zeyuan Chen, Christopher G. Healey

Chasing Rainbows: A Color-Theoretic Framework for Improving and Preserving Bad Colormaps

The scientific visualization community increasingly questions the use of rainbow colormaps. This is not unfounded as significant problems are readily seen in a luminance plot of the rainbow colormap. Many good, generally applicable colormaps are proposed as direct replacements for the rainbow. However, there are still many who choose rainbows and like them. Would a colormap with perfect luminance and the chromaticity of a rainbow find a wider audience? This was our motivation in studying the range of chromatic effects arising from luminance corrections. Consequently we developed a framework for adjusting colormaps to various degrees which produces favorable results on a wide range of colormaps. In this work we will detail this framework and demonstrate its effectiveness on several colormaps.

Robert Sisneros, Mohammad Raji, Mark W. Van Moer, David Bock

Interpolation-Based Extraction of Representative Isosurfaces

We propose a novel technique for the automatic, similarity-based selection of representative surfaces. While our technique can be applied to any set of manifolds, we particularly focus on isosurfaces from volume data. We select representatives from sets of surfaces stemming from varying isovalues or time-dependent data. For selection, our approach interpolates between surfaces using a minimum cost flow solver, and determines whether the interpolate adequately represents the actual surface in-between. For this, we employ the Hausdorff distance as an intuitive measure of the similarity of two components. In contrast to popular contour tree-based approaches which are limited to changes in topology, our approach also accounts for geometric deviations. For interactive visualization, we employ a combination of surface renderings and a graph view that depicts the selected surfaces and their relation. We finally demonstrate the applicability and utility of our approach by means of several data sets from different areas.

Oliver Fernandes, Steffen Frey, Thomas Ertl

Image-Based Post-processing for Realistic Real-Time Rendering of Scenes in the Presence of Fluid Simulations and Image-Based Lighting

For real-time fluid simulation currently two methods are available: grid-based simulation and particle-based simulation. They both approximate the simulation of a fluid and have in common that they do not directly generate a visually pleasant surface. Due to time constraints, the subsequent generation of the fluid surface may not consume much time. What is usually generated is an approximate surface, which consists of many individual mesh elements and has no optical properties of a fluid. The visualization of a fluid in image space may contain different detail densities depending on the distance between observer and the fluid. Therefore, filters need to be applied in order to smooth these details to a consistent surface. Many approaches use strong filters in this step, which results in a too smooth surface. To this surface then noise is added in order to give it a rough appearance. To avoid this ad-hoc approach we present a post-processing approach of the direct visualization of the simulation data via image processing applications by both smoothing filters and an image pyramid. Our presented approach based on an image pyramid provides access to various levels of detail. These are used as a controllable low pass filter. Thus, different amounts of smoothing can be selected depending on the distance to the viewer, granting a better surface reconstruction.

Julian Puhl, Martin Knuth, Arjan Kuijper

A Bioplausible Model for Explaining Café Wall Illusion: Foveal vs. Peripheral Resolution

Optical illusions highlight sensitivities and limitations of human visual processing and studying them leads to insights about perception that can potentially help computer vision match or exceed human performance. Geometric illusions are a subclass of illusions in which orientations and angles are distorted and misperceived. In this paper, a quantifiable prediction is presented of the degree of tilt for the Café Wall pattern, a typical geometric illusion, in which the mortar between the tiles seems to converge and diverge. Our study employs a bioplausible model of ON-center retinal processing, using an analytic processing pipeline to measure, quantitatively, the angle of tilt content in the model. The model also predicts different perceived tilts in different areas of the fovea and periphery as the eye saccades to different parts of the image. This variation is verified and quantified in simulations using two different sampling methods. Several sampling sizes and aspect ratios, modeling variant foveal views, are investigated across multiple scales in order to provide confidence intervals around the predicted tilts, and to contrast local tilt detection with a global average across the whole Café Wall image.

Nasim Nematzadeh, David M. W. Powers

Automated Reconstruction of Neurovascular Networks in Knife-Edge Scanning Microscope Rat Brain Nissl Data Set

Analyzing mammalian brain image can help to understand the interaction between cerebral blood flow and its surrounding tissue. However, extracting the geometry of the vasculature and the cells is difficult because of the complexity of the brain. In this paper, we propose an approach for reconstructing the neurovascular networks from Knife-Edge Scanning Microscope (KESM) rat Nissl data set. The proposed method includes the following steps. First, we enhanced the raw image data using homomorphic filtering, fast Fourier transform, and anisotropic diffusion. Next, we initially extracted the vessel cross section from the image using dynamic global thresholding. Subsequently, we computed local properties of the connected components to remove various sources of noise. Finally, the proposed method connected small and large discontinuities in the vascular traces. To validate the performance of the proposed method, we compared reconstruction results of the proposed method with an existing method (Lim’s method [1, 2]). The comparison results show that the proposed method outperforms the previous method: faster and robust to noise.

Wookyung An, Yoonsuck Choe

Spatiotemporal LOD-Blending for Artifact Reduction in Multi-resolution Volume Rendering

High-quality raycasting of multi-resolution volumetric datasets benefits from a well-informed working set selection that accounts for occlusions as well as output sensitivity. In this work, we suggest a feedback mechanism that provides a fine-grained level-of-detail selection for restricted working sets. To mitigate multi-resolution artifacts, our rendering solution combines spatial and temporal level-of-detail blending to provide smooth transitions between adjacent bricks of differing levels of detail and during working set adjustments. We also show how the sampling along rays needs to be adapted to produce a consistent result. Our implementation demonstrates that our spatiotemporal blending in combination with consistent sampling significantly reduces visual artifacts.

Sebastian Thiele, Carl-Feofan Matthes, Bernd Froehlich

Visual Analytics Using Graph Sampling and Summarization on Multitouch Displays

Private industry datasets and public records contain more information than any algorithm can efficiently process or any person can reasonably interpret. This is a basic problem faced by researchers in visual analytics. Graph visualizations (a common large dataset representation) can organize relationships and entities in a visually accessible manner. Our work applies graph sampling and summarization to the interactive visualization of complex networks. We implemented several unbiased sampling techniques to facilitate large scale graph analysis. Moreover, we show biased sampling techniques can improve visualization by emphasizing key graph nodes. We combine algorithmic processing with human interpretations by allowing users to adjust sampling parameters, inspect sample graph visualizations, and compare sample distributions. Summarization also reduces graph complexity. By adjusting rendered graph density, users can navigate and maintain constant on-screen density.

Nicholas G. Lipari, Christoph W. Borst, Mehmet Engin Tozal

Evaluation of Collaborative Actions to Inform Design of a Remote Interactive Collaboration Framework for Immersive Data Visualizations

Data visualization and interaction is an important part of understanding and analyzing complex data. Immersive display systems provide benefits to exploration. In this paper, we present a user study to analyze co-located tasks and behaviors during a collaboration task. Results from this study helped us to identify patterns in co-located collaborative interaction to be able to better design remote collaborative environments. We discuss the challenges while interacting with data in an immersive data visualization environment and our design of a remote interactive collaboration framework for analysis and workflow of data visualizations. The goal of this framework is to preserve the benefits of physically co-located collaboration actions and add the benefits of virtual components that do not conform to real world restrictions.

Rajiv Khadka, Nikhil Shetty, Eric T. Whiting, Amy Banic

ST: 3D Mapping, Modeling and Surface Reconstruction


An Efficient Algorithm for Feature-Based 3D Point Cloud Correspondence Search

Searching correspondences between 3D point Clouds is computationally expensive for two reasons: the complexity of geometric-based feature extraction operations and the large search space. To tackle this challenging problem, we propose a novel and efficient 3D point cloud matching algorithm. Our algorithm is inspired by PatchMatch [1], which is designed for correspondence search between 2D images. However, PatchMatch relies on the natural scanline order of 2D images to propagate good solutions across the images, whereas such an order does not exist for 3D point clouds. Hence, unlike PatchMatch which conducts search at different pixels sequentially under the scanline order, our algorithm searches the best correspondences for different 3D points in parallel using a variant of the Artificial Bee Colony (ABC) [2] algorithm and propagates good solutions found at one point to its k-nearest neighbors. In addition, noticed that correspondences found using geometric-based features extracted at individual points alone can be prone to noise, we add a novel smooth term to the objective function. Experiments on multiple datasets show that the new smooth term can effectively suppress matching noises and the ABC-based parallel search can significantly reduce the computational time compared to brute-force search.

Zili Yi, Yang Li, Minglun Gong

Extraction of Vascular Intensity Directional Derivative on Computed Tomography Angiography

Collateral flow has been shown to have positive effects in ischemic intracranial vessel disease and can compensate for moderate stenosis and even complete occlusion of a major artery. Despite this, the common method of evaluating collaterals - computed tomography angiography (CTA) - is not effective in fully visualizing collaterals, making evaluation difficult. The spatial derivative of signal intensity, in the direction of flow, computed from standard, single-phase CTA may provide hemodynamic information that can be used to grade collaterals without directly visualizing them. We present in this paper software to compute the directional derivative, as well as to map it and the signal intensity onto a color-coded surface mesh for a 3D visualization. Our approach uses precomputed centerlines to simplify the computation and interpretation. To see if the derivative provided information that was not redundant with intensity, the software was run on a set of 43 CTA cases with stenosis, where the VOI of each was segmented by a neurology expert. Whereas KS tests comparing the intensity distributions of the healthy and affected hemispheres indicated that the two were different for 93% of cases, the distributions of directional derivative values were only different for 52.5% of cases. Therefore this derivative may be used as a tool to discriminate the severity of such cases, although its effectiveness as a collateral evaluation tool remains to be seen. While surface segmentation is time-consuming, the software can otherwise process and render color-coded 3D visualizations quickly.

Elijah Agbayani, Baixue Jia, Graham Woolf, David Liebeskind, Fabien Scalzo

Capturing Photorealistic and Printable 3D Models Using Low-Cost Hardware

Recent advances in low cost RGB-D sensors and progress in reconstruction approaches paves way for creating real-time 3D models of people. It is equally important to enhance the visual appeal of such 3D models with textures. Most of the existing approaches use per-vertex colors, such that the color resolution is limited to mesh resolution. In this paper, we propose a feasible solution for texturing 3D models of people (3D busts) using a low-cost RGB-D sensor setup that automatically constructs the 3D geometry and textures the model in just a few minutes. Experimental evaluations evaluate the performance of the approach on synthetic and real world data against the computational time and visual appeal.

Christoph Heindl, Sharath Chandra Akkaladevi, Harald Bauer

Improved Stereo Vision of Indoor Dense Suspended Scatterers Scenes from De-scattering Images

Stereo vision is important in robotics since retrieving depth is very necessary in many robotics applications. Most of state-of-the-art stereo vision algorithms solve the problem with clear images but not the images corrupted by scattering. In this paper, we propose the stereo vision system for robot working in dense suspended scatterers environment. The imaging model of images taken in the environment under active light source based on single scattering phenomenon is analyzed. Based on that, scattering signal can be removed from images. The recovered images are then used as input image for stereo vision. The proposed method is then evaluated based on quality of stereo depth map.

Chanh D. Tr. Nguyen, Kyeong Yong Cho, You Hyun Jang, Kyung-Soo Kim, Soohyun Kim

Fully Automatic and Robust 3D Modeling for Range Scan Data of Complex 3D Objects

3D surface registration of two or more range scans is an important step in building a complete 3D model of an object. When the overlaps between multi-view scans are insufficient, good initial alignment is necessary that usually requires some prior assumption such as pre-defined initial camera configuration or the use of landmarks. Specifically, this paper addresses the problem of registering two or more range scans captured from complex 3D objects which have small overlaps. The proposed technique is based on the integration of a new Partial Artificial Heat Kernel Signature (PA-HKS) and a Modified Multi-view Iterative Contour Coherence (MM-ICC) algorithm. This unique combination allows us to handle multi-view range scan data with large out-of-plane rotation and with limited overlaps between every two adjacent views. The experimental results on several complex 3D objects show the effectiveness and robustness of the proposed approach.

Jungjae Yim, Guoliang Fan

ST: Advancing Autonomy for Aerial Robotics


Real-Time Detection and Tracking of Multiple Humans from High Bird’s-Eye Views in the Visual and Infrared Spectrum

We propose a real-time system to detect and track multiple humans from high bird’s-eye views. First, we present a fast pipeline to detect humans observed from large distances by efficiently fusing information from a visual and infrared spectrum camera. The main contribution of our work is a new tracking approach. Its novelty lies in online learning of an objectness model which is used for updating a Kalman filter. We show that an adaptive objectness model outperforms a fixed model. Our system achieves a mean tracking loop time of 0.8 ms per human on a 2 GHz CPU which makes real time tracking of multiple humans possible.

Julius Kümmerle, Timo Hinzmann, Anurag Sai Vempati, Roland Siegwart

Combining Visual Tracking and Person Detection for Long Term Tracking on a UAV

Visual object tracking performance has improved significantly in recent years. Most trackers are based on either of two paradigms: online learning of an appearance model or the use of a pre-trained object detector. Methods based on online learning provide high accuracy, but are prone to model drift. The model drift occurs when the tracker fails to correctly estimate the tracked object’s position. Methods based on a detector on the other hand typically have good long-term robustness, but reduced accuracy compared to online methods.Despite the complementarity of the aforementioned approaches, the problem of fusing them into a single framework is largely unexplored. In this paper, we propose a novel fusion between an online tracker and a pre-trained detector for tracking humans from a UAV. The system operates at real-time on a UAV platform. In addition we present a novel dataset for long-term tracking in a UAV setting, that includes scenarios that are typically not well represented in standard visual tracking datasets.

Gustav Häger, Goutam Bhat, Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg, Piotr Rudl, Patrick Doherty

Monocular Visual-Inertial SLAM for Fixed-Wing UAVs Using Sliding Window Based Nonlinear Optimization

Precise real-time information about the position and orientation of robotic platforms as well as locally consistent point-clouds are essential for control, navigation, and obstacle avoidance. For years, GPS has been the central source of navigational information in airborne applications, yet as we aim for robotic operations close to the terrain and urban environments, alternatives to GPS need to be found. Fusing data from cameras and inertial measurement units in a nonlinear recursive estimator has shown to allow precise estimation of 6-Degree-of-Freedom (DoF) motion without relying on GPS signals. While related methods have shown to work in lab conditions since several years, only recently real-world robotic applications using visual-inertial state estimation found wider adoption. Due to the computational constraints, and the required robustness and reliability, it remains a challenge to employ a visual-inertial navigation system in the field. This paper presents our tightly integrated system involving hardware and software efforts to provide an accurate visual-inertial navigation system for low-altitude fixed-wing unmanned aerial vehicles (UAVs) without relying on GPS or visual beacons. In particular, we present a sliding window based visual-inertial Simultaneous Localization and Mapping (SLAM) algorithm which provides real-time 6-DoF estimates for control. We demonstrate the performance on a small unmanned aerial vehicle and compare the estimated trajectory to a GPS based reference solution.

Timo Hinzmann, Thomas Schneider, Marcin Dymczyk, Andreas Schaffner, Simon Lynen, Roland Siegwart, Igor Gilitschenski

Change Detection and Object Recognition Using Aerial Robots

This work proposes a strategy for autonomous change detection and classification using aerial robots. For aerial robotic missions that were conducted in different spatio–temporal conditions, the pose–annotated camera data are first compared for similarity in order to identify the correspondence map among the different image sets. Then efficient feature matching techniques relying on binary descriptors are used to estimate the geometric transformations among the corresponding images, and subsequently perform image subtraction and filtering to robustly detect change. To further decrease the computational load, the known poses of the images are used to create local subsets within which similar images are expected to be found. Once change detection is accomplished, a small set of the images that present the maximum levels of change are used to classify the change by searching to recognize a list of known objects through a bag–of–features approach. The proposed algorithm is evaluated using both handheld–smartphone collected data, as well as experiments using an aerial robot.

Shehryar Khattak, Christos Papachristos, Kostas Alexis

Parallelized Iterative Closest Point for Autonomous Aerial Refueling

The Iterative Closest Point algorithm is a widely used approach to aligning the geometry between two 3 dimensional objects. The capability of aligning two geometries in real time on low-cost hardware will enable the creation of new applications in Computer Vision and Graphics. The execution time of many modern approaches are dominated by either the k nearest neighbor search (kNN) or the point alignment phase. This work presents an accelerated alignment variant which utilizes parallelization on a Graphics Processing Unit (GPU) of multiple kNN approaches augmented with a novel Delaunay Traversal to achieve real time estimates.

Jace Robinson, Matt Piekenbrock, Lee Burchett, Scott Nykl, Brian Woolley, Andrew Terzuoli

Distributed Optimal Flocking Design for Multi-agent Two-Player Zero-Sum Games with Unknown System Dynamics and Disturbance

In this paper, distributed flocking strategies have been exploited for multi-agent two-player zero-sum games. Two main challenges are addressed, i.e. (a) handling system uncertainties and disturbances, and (b) achieving optimality. Adopting the emerging Approximate Dynamic Programming (ADP) technology, a novel distributed adaptive flocking design is proposed to optimize the multi-agent two-player zero-sum games even when the system dynamics and disturbances are unknown. First, to evaluate the multi-agent flocking performance and effects from disturbances, a novel flocking cost function is developed. Next, an innovative type of online neural network (NN) based identifier is proposed to approximate the multi-agent zero-sum game system dynamics effectively. Subsequently, another novel neural network (NN) is proposed to approximate the optimal flocking cost function by using the Hamilton-Jacobi-Isaacs (HJI) equation in a forward in time manner. Moreover, a novel additional term is designed and included into the NN update law to relax the stringent requirement of initial admissible control. Eventually, the distributed adaptive optimal flocking design is obtained by using the learnt Multi-agent zero-sum games system dynamics and approximated optimal flocking cost function. Simulation results demonstrate the effectiveness of proposed scheme.

Hao Xu, Luis Rodolfo Garcia Carrillo

Medical Imaging


MinMax Radon Barcodes for Medical Image Retrieval

Content-based medical image retrieval can support diagnostic decisions by clinical experts. Examining similar images may provide clues to the expert to remove uncertainties in his/her final diagnosis. Beyond conventional feature descriptors, binary features in different ways have been recently proposed to encode the image content. A recent proposal is “Radon barcodes” that employ binarized Radon projections to tag/annotate medical images with content-based binary vectors, called barcodes. In this paper, MinMax Radon barcodes are introduced which are superior to “local thresholding” scheme suggested in the literature. Using IRMA dataset with 14,410 x-ray images from 193 different classes, the advantage of using MinMax Radon barcodes over thresholded Radon barcodes are demonstrated. The retrieval error for direct search drops by more than 15%. As well, SURF, as a well-established non-binary approach, and BRISK, as a recent binary method are examined to compare their results with MinMax Radon barcodes when retrieving images from IRMA dataset. The results demonstrate that MinMax Radon barcodes are faster and more accurate when applied on IRMA images.

H. R. Tizhoosh, Shujin Zhu, Hanson Lo, Varun Chaudhari, Tahmid Mehdi

Semantic-Based Brain MRI Image Segmentation Using Convolutional Neural Network

Segmenting Magnetic Resonance images plays a critical role in radiotherapy, surgical planning and image-guided interventions. Traditional differential filter-based segmentation algorithms are predefined independently of image features and require extensive post processing. Convolutional Neural Networks (CNNs) are regarded as a powerful visual model that yields hierarchies of features learned from image data, however, its usage is limited in medical imaging field as it requires large-scale data for training. In this paper, we propose a simple binary detection algorithm to bridge CNNs and medical imaging for accurate medical image segmentation. It applies high-capacity CNNs to extract features from image data. When labeled training medical images are scarce, the proposed algorithm splits data into small regions, and labels them to boost training data size automatically. Rather than replaces classic segmentation methods, this paper presents an alternative that is unique and provides more desirable segmentation results....

Yao Chou, Dah Jye Lee, Dong Zhang

SAHF: Unsupervised Texture-Based Multiscale with Multicolor Method for Retinal Vessel Delineation

Automatic vessel delineation has been challenging due to complexities during the acquisition of retinal images. Although, great progress have been made in this field, it remains the subject of on-going research as there is need to further improve on the delineation of more large and thinner retinal vessels as well as the computational speed. Texture and color are promising, as they are very good features applied for object detection in computer vision. This paper presents an investigatory study on sum average Haralick feature (SAHF) using multi-scale approach over two different color spaces, CIElab and RGB, for the delineation of retinal vessels. Experimental results show that the method presented in this paper is robust for the delineation of retinal vessels having achieved fast computational speed with the maximum average accuracy of 95.67% and maximum average sensitivity of 81.12% on DRIVE database. When compared with the previous methods, the method investigated in this paper achieves higher average accuracy and sensitivity rates on DRIVE.

Temitope Mapayi, Jules-Raymond Tapamo

Unsupervised Caries Detection in Non-standardized Bitewing Dental X-Rays

In recent years dental image processing has become a useful tool in aiding healthcare professionals diagnose patients. Despite advances in the field, accurate diagnoses are still problematic due to the non-uniform nature of dental X-rays. This is attributed to current systems utilizing a supervised learning model for their deterministic algorithm when identifying caries. This paper presents a method for the detection of caries across a variety of non-uniform X-ray images using an unsupervised learning model. This method aims to identify potential caries hallmarks within a tooth without comparing against a set of criteria learned from a database of images. The results show the viability of an unsupervised learning approach and the effectiveness of the method when compared to the supervised approaches.

D. Osterloh, S. Viriri

Vessel Detection on Cerebral Angiograms Using Convolutional Neural Networks

Blood-vessel segmentation in cerebral angiograms is a valuable tool for medical diagnosis. However, manual blood-vessel segmentation is a time consuming process that requires high levels of expertise. The automatic detection of blood vessels can not only improve efficiency but also allow for the development of automatic diagnosis systems. Vessel detection can be approached as a binary classification problem, identifying each pixel as a vessel or non-vessel. In this paper, we use deep convolutional neural networks (CNNs) for vessel segmentation. The network is tested on a cerebral angiogram dataset. The results show the effectiveness of deep learning approach resulting in an accuracy of 95%.

Yang Fu, Jiawen Fang, Benjamin Quachtran, Natia Chachkhiani, Fabien Scalzo

False Positive Reduction in Breast Mass Detection Using the Fusion of Texture and Gradient Orientation Features

The presence of masses in mammograms is among the main indicators of breast cancer and their diagnosis is a challenging task. The one problem of Computer aided diagnosis (CAD) systems developed to assist radiologists in detecting masses is high false positive rate i.e. normal breast tissues are detected as masses. This problem can be reduced if localised texture and gradient orientation patterns in suspicious Regions Of Interest (ROIs) are captured in a robust way. Discriminative Robust Local Binary Pattern (DRLBP) and Discriminative Robust Local Ternary Pattern (DRLTP) are among the state-of-the-art best texture descriptors whereas Histogram of Oriented Gradient (HOG) is one of the best descriptor for gradient orientation patterns. To capture the discriminative micro-patterns existing in ROIs, we propose localised DRLBP-HOG and DRLTP-HOG descriptors by fusing DRLBP, DRLTP and HOG for the description of ROIs; the localisation is archived by dividing each ROI into a number of blocks (sub-images). Support Vector Machine (SVM) is used to classify mass or normal ROIs. The evaluation on DDSM, a benchmark mammograms database, revealed that localised DRLBP-HOG with 9 (3$$\times $$3) blocks forms the best representation and yields an accuracy of 99.80±0.62(ACC±STD) outperforming the state-of-the-art methods.

Mariam Busaleh, Muhammad Hussain, Hatim A. Aboalsamh, Mansour Zuair, George Bebis

Virtual Reality


Enhancing the Communication Spectrum in Collaborative Virtual Environments

The importance of interpersonal and group communication has been studied and recognized for thousands of years. With recent technological advances, humans have enabled remote interaction through shared virtual spaces; however, research is still needed to develop methods for expressing many important non-verbal communication cues. Our work explores the methods for enhancing the communication spectrum in collaborative virtual environments. Our primary contribution is a machine learning framework that maps human facial data to avatars in the virtual world. We developed a synthetic training process to create labeled data to alleviate the burden of manual annotation. Additionally, we describe a collaborative virtual environment that can utilize both verbal and non-verbal cues for improved user communication and interaction. Finally, we present results demonstrating the success of our method in a sample collaborative scenario.

Edward Kim, Christopher Moritz

Narrative Approach to Assess Fear of Heights in Virtual Environments

This paper presents an approach to the detection and identification of measurable human responses to heights inside a virtual environment. Biometric data such as heart rate and movement reactions are acquired during the execution of a task within a specific framework, signal based analysis is done to validate behavioral responses and interpretation of fear. Thus, giving a better understanding of the effect and extend of these stimuli in a person on a controlled virtual scene.

Angelo D. Moro, Christian Quintero, Wilson J. Sarmiento

Immersive Industrial Process Environment from a P&ID Diagram

This work presents the development of an interactive and intuitive three-dimensional Human Machine Interface, based on Virtual Reality, which emulates the operation of an industrial plant and contains a two-dimensional Human Machine Interface for control and monitoring a process of one or more variables, applying the concept of user immersion in the virtual environment. The application is performed by using Computer Aided Design software and a graphics engine. Furthermore, experimental results are presented and discussed to validate the proposed system applied to a real process of a plant.

Víctor H. Andaluz, Washington X. Quevedo, Fernando A. Chicaiza, Catherine Gálvez, Gabriel Corrales, Jorge S. Sánchez, Edwin P. Pruna, Oscar Arteaga, Fabián A. Álvarez, Galo Ávila

Automatic Environment Map Construction for Mixed Reality Robotic Applications

As Virtual Reality technologies proliferate to traditional multimedia application areas, there is a need to create systemic and automated processes to establish the main building blocks of the virtual environments in support of such applications. In this paper, we propose a unified framework for procedurally creating a virtual reality replica of a remotely situated robot’s physical environment. The proposed approach utilizes only robot’s onboard camera and automatically generates the environment map for the final VR environment. The main contributions of this paper are in the hierarchical generation of the epirectangular panorama, the efficient diffuse filling of missing pixel values and the use of the developed virtual environment in improving telepresence for remote social robotics applications.

David McFadden, Brandon Wilson, Alireza Tavakkoli, Donald Loffredo

Foveated Path Tracing

A Literature Review and a Performance Gain Analysis

Virtual Reality (VR) places demanding requirements on the rendering pipeline: the rendering is stereoscopic and the refresh rate should be as high as 95 Hz to make VR immersive. One promising technique for making the final push to meet these requirements is foveated rendering, where the rendering effort is prioritized on the areas where the user’s gaze lies. This requires rapid adjustment of level of detail based on screen space coordinates. Path tracing allows this kind of changes without much extra work. However, real-time path tracing is fairly new concept. This paper is a literature review of techniques related to optimizing path tracing with foveated rendering. In addition, we provide a theoretical estimation of performance gains available and calculate that 94% of the paths could be omitted. For this reason we predict that path tracing can soon meet the demanding rendering requirements of VR.

Matias Koskela, Timo Viitanen, Pekka Jääskeläinen, Jarmo Takala

ST: Computer Vision as a Service


OCR as a Service: An Experimental Evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym

Optical character recognition (OCR) as a classic machine learning challenge has been a longstanding topic in a variety of applications in healthcare, education, insurance, and legal industries to convert different types of electronic documents, such as scanned documents, digital images, and PDF files into fully editable and searchable text data. The rapid generation of digital images on a daily basis prioritizes OCR as an imperative and foundational tool for data analysis. With the help of OCR systems, we have been able to save a reasonable amount of effort in creating, processing, and saving electronic documents, adapting them to different purposes. A set of different OCR platforms are now available which, aside from lending theoretical contributions to other practical fields, have demonstrated successful applications in real-world problems. In this work, several qualitative and quantitative experimental evaluations have been performed using four well-know OCR services, including Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. We analyze the accuracy and reliability of the OCR packages employing a dataset including 1227 images from 15 different categories. Furthermore, we review the state-of-the-art OCR applications in healtcare informatics. The present evaluation is expected to advance OCR research, providing new insights and consideration to the research area, and assist researchers to determine which service is ideal for optical character recognition in an accurate and efficient manner.

Ahmad P. Tafti, Ahmadreza Baghaie, Mehdi Assefi, Hamid R. Arabnia, Zeyun Yu, Peggy Peissig

Animal Identification in Low Quality Camera-Trap Images Using Very Deep Convolutional Neural Networks and Confidence Thresholds

Monitoring animals in the wild without disturbing them is possible using camera trapping framework. Automatic triggered cameras, which take a burst of images of animals in their habitat, produce great volumes of data, but often result in low image quality. This high volume data must be classified by a human expert. In this work a two step classification is proposed to get closer to an automatic and trustfully camera-trap classification system in low quality images. Very deep convolutional neural networks were used to distinguish images, firstly between birds and mammals, secondly between mammals sets. The method reached $$97.5\%$$ and $$90.35\%$$ in each task. An alleviation mode using a confidence threshold of automatic classification is proposed, allowing the system to reach $$100\%$$ of performance traded with human work.

Alexander Gomez, German Diez, Augusto Salazar, Angelica Diaz

A Gaussian Mixture Model Feature for Wildlife Detection

This paper addresses the challenge of the camouflage in the wildlife detection. The protective coloring makes the color space distance between the features of animal pattern and background pattern very small. The texture information should be considered under this situation. A reliable differential estimator for digital image data is employed. The estimation of the first order and the second order differentials of animal and background patterns are modelled using Gaussian mixture model method. It is shown the animal and background have bigger distance in Gaussian mixture models than in the color space. The mathematical expectation and standard deviation of the Gaussian models are therefore used to build the features to represent animal and background patterns. To demonstrate the performance of the proposed features, a neural network classifier is employed. The experiment results on wildlife scene images show that the proposed features have high classifying capacity to detect animals with camouflage from the background environment.

Shengzhi Du, Chunling Du, Rishaad Abdoola, Barend Jacobus van Wyk



Age Classification from Facial Images: Is Frontalization Necessary?

In the majority of the methods proposed for age classification from facial images, the preprocessing steps consist of alignment and illumination correction followed by the extraction of features, which are forwarded to a classifier to estimate the age group of the person in the image. In this work, we argue that face frontalization, which is the correction of the pitch, yaw, and roll angles of the headpose in the 3D space, should be an integral part of any such algorithm as it unveils more discriminative features. Specifically, we propose a method for age classification which integrates a frontalization algorithm before feature extraction. Numerical experiments on the widely used FGnet Aging Database confirmed the importance of face frontalization achieving an average increment in accuracy of 4.43%.

A. Báez-Suárez, C. Nikou, J. A. Nolazco-Flores, I. A. Kakadiaris

PH-BRINT: Pooled Homomorphic Binary Rotation Invariant and Noise Tolerant Representation for Face Recognition Under Illumination Variations

Face recognition under varying illumination conditions is a challenging problem. We propose a simple and effective multiresolution approach Pooled Homomorphic Binary Rotation Invariant and Noise Tolerant (PH-BRINT) for face recognition under varying illumination conditions. First, to reduce the effect of illumination, wavelet transform based homomorphic filter is used. Then Binary Rotation Invariant and Noise Tolerant (BRINT) operators with three different scales are employed to extract multiscale local rotation invariant and illumination insensitive texture features. Finally, the discriminative information from the three scales is pooled using MAX pooling operator and localized gradient information is computed by dividing the pooled image into blocks and calculating the gradient magnitude and direction of each block. The PH-BRINT technique has been tested on a challenging face database Extended Yale B, which was captured under varying illumination conditions. The system using minimum distance classifier with L1-norm achieved an average accuracy of 86.91%, which is comparable with the state-of-the-art best illumination-invariant face recognition techniques.

Raqinah Alrabiah, Muhammad Hussain, Hatim A. Aboalsamh, Mansour Zuair, George Bebis

Multi-Kernel Fuzzy-Based Local Gabor Patterns for Gait Recognition

This paper proposes a novel multi-kernel fuzzy-based local Gabor binary patterns method (MFLGBP) for the purpose of gait representation and recognition. First, we construct the gait energy image (GEI) from mean motion cycle of a gait sequence. Then, we apply Gabor filters and encode the variations in the Gabor magnitude by using a kernel-based fuzzy local binary pattern (KFLBP) operator. Finally, classification is performed using a support vector machine (SVM). Experiments are carried out using the benchmark CASIA B gait database. Our proposed feature extraction method has shown promising performance in terms of correct recognition rate as compared to other methods.

Amer G. Binsaadoon, El-Sayed M. El-Alfy

A Comparative Analysis of Deep and Shallow Features for Multimodal Face Recognition in a Novel RGB-D-IR Dataset

With new trends like 3D and deep learning alternatives for face recognition becoming more popular, it becomes essential to establish a complete benchmark for the evaluation of such algorithms, in a wide variety of data sources and non-ideal scenarios. We propose a new RGB-depth-infrared (RGB-D-IR) dataset, RealFace, acquired with the novel Intel® RealSense™ collection of sensors, and characterized by multiple variations in pose, lighting and disguise. As baseline for future works, we assess the performance of multiple deep and “shallow” feature descriptors. We conclude that our dataset presents some relevant challenges and that deep feature descriptors present both higher robustness in RGB images, as well as an interesting margin for improvement in alternative sources, such as depth and IR.

Tiago Freitas, Pedro G. Alves, Cristiana Carpinteiro, Joana Rodrigues, Margarida Fernandes, Marina Castro, João C. Monteiro, Jaime S. Cardoso

ST: Visual Perception and Robotic Systems


Automated Rebar Detection for Ground-Penetrating Radar

Automated rebar detection in images from ground-penetrating radar (GPR) is a challenging problem and difficult to perform in real-time as a result of relatively low contrast images and the size of the images. This paper presents a rebar localization algorithm, which can accurately locate the pixel locations of rebar within a GPR scan image. The proposed algorithm uses image classification and statistical methods to locate hyperbola signatures within the image. The proposed approach takes advantage of adaptive histogram equalization to increase the visual signature of rebar within the image despite low contrast. A Naive Bayes classifier is used to approximately locate rebar within the image with histogram of oriented gradients feature vectors. In addition, a histogram based method is applied to more precisely locate individual rebar in the image, and then the proposed methods are validated using existing GPR data and data collected during the course of the research for this paper.

Spencer Gibb, Hung Manh La

Improving Visual Feature Representations by Biasing Restricted Boltzmann Machines with Gaussian Filters

Advances in unsupervised learning have allowed the efficient learning of feature representations from large sets of unlabeled data. This paper evaluates visual features learned through unsupervised learning, specifically comparing biasing methods using Gaussian filters on a single-layer network. Using the restricted Boltzmann machine, features emerging through training on image data are compared by classification performance on standard datasets. When Gaussian filters are convolved with adjacent hidden layer activations from a single example during training, topographies emerge where adjacent features become tuned to slightly varying stimuli. When Gaussian filters are applied to the visible nodes, images become blurrier; training on these images leads to less localized features being learned. The networks are trained and tested on the CIFAR-10, STL-10, COIL-100, and MNIST datasets. It is found that the induction of topography or simple image blurring during training produce better features as evidenced by the consistent and notable increase in classification results.

Arjun Yogeswaran, Pierre Payeur

Image Fusion Quality Measure Based on a Multi-scale Approach

In this paper, we present a general purpose and non-reference multi-scale structural similarity measure for objective quality assessment of image fusion. We aim to extend Piella’s measure [1] in several ways, within a multi-scale approach, by making multiple Piella’s measure image evaluations at different image scales, fusing the result into a single evaluation. The main advantage of multi-scale methods lie in its ability to capture the relevant and useful image information at different resolutions. We validated our proposal in different imaging application scenarios, particularly in a 2015 multi-exposure image fusion database that provides human subjective evaluations. Experimental results show that our approach achieves high correlation with the subjective scores provided by the database and makes a significant improvement over the previous Piella’s single-scale fusion quality measure.

Jorge Martinez, Silvina Pistonesi, María Cristina Maciel, Ana Georgina Flesia

Vision-Based Self-contained Target Following Robot Using Bayesian Data Fusion

Several visual following robots have been proposed in recent years. However, many require the use of several, expensive sensors and often the majority of the image processing and other calculations are performed off-board. This paper proposes a simple and cost effective, yet robust visual following robot capable of tracking a general object with limited restrictions on target characteristics. To detect the objects, tracking-learning-detection (TLD) is used within a Bayesian framework to filter and fuse the measurements. A time-of-flight (ToF) depth camera is used to refine the distance estimates at short ranges. The algorithms are executed in real-time (approximately 30 fps) in a Jetson TK1 embedded computer. Experiments were conducted with different target objects to validate the system in scenarios including occlusions and various illumination conditions as well as to show how the data fusion between TLD and the ToF camera improves the distance estimation.

Andrés Echeverri Guevara, Anthony Hoak, Juan Tapiero Bernal, Henry Medeiros

Dual Back-to-Back Kinects for 3-D Reconstruction

In this paper, we investigated the use of two Kinects for capturing the 3-D model of a large scene. Traditionally the method of utilising one Kinect is used to slide across the area, and a full 3-D model is obtained. However, this approach requires the scene with a significant number of prominent features and careful handling of the device. To tackle the problem we mounted two back-to-back Kinects on top of a robot for scanning the environment. This setup requires the knowledge of the relative pose between the two Kinects. As they do not have a shared view, calibration using the traditional method is not possible. To solve this problem, we place a dual-face checkerboard (the front and back patterns are the same) on top of the back-to-back Kinects, and a planar mirror is employed to enable either Kinect to view the same checkerboard. Such an arrangement will create a shared calibration object between the two sensors. In such an approach, a mirror-based pose estimation algorithm is applied to solve the problem of Kinect camera calibration. Finally, we can merge all local object models captured by the Kinects together to form a combined model with a larger viewing area. Experiments using real measurements of capturing an indoor scene were conducted to show the feasibility of our work.

Ho Chuen Kam, Kin Hong Wong, Baiwu Zhang


Weitere Informationen

Premium Partner