Skip to main content

2015 | Buch

Advances in Visual Computing

11th International Symposium, ISVC 2015, Las Vegas, NV, USA, December 14-16, 2015, Proceedings, Part II

herausgegeben von: George Bebis, Richard Boyle, Bahram Parvin, Darko Koracin, Ioannis Pavlidis, Rogerio Feris, Tim McGraw, Mark Elendt, Regis Kopper, Eric Ragan, Zhao Ye, Gunther Weber

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

The two volume set LNCS 9474 and LNCS 9475 constitutes the refereed proceedings of the 11th International Symposium on Visual Computing, ISVC 2015, held in Las Vegas, NV, USA in December 2015.

The 115 revised full papers and 35 poster papers presented in this book were carefully reviewed and selected from 260 submissions. The papers are organized in topical sections: Part I (LNCS 9474) comprises computational bioimaging; computer graphics; motion and tracking; segmentation; recognition; visualization; mapping; modeling and surface reconstruction; advancing autonomy for aerial robotics; medical imaging; virtual reality; observing humans; spectral imaging and processing; intelligent transportation systems; visual perception and robotic systems. Part II (LNCS 9475): applications; 3D computer vision; computer graphics; segmentation; biometrics; pattern recognition; recognition; and virtual reality.

Inhaltsverzeichnis

Frontmatter

Applications

Frontmatter
Hybrid Example-Based Single Image Super-Resolution

Image super-resolution aims to recover a visually pleasing high resolution image from one or multiple low resolution images. It plays an essential role in a variety of real-world applications. In this paper, we propose a novel hybrid example-based single image super-resolution approach which integrates learning from both external and internal exemplars. Given an input image, a proxy image with the same resolution as the target high-resolution image is first generated from a set of externally-learnt regression models. We then perform a coarse-to-fine gradient-level self-refinement on the proxy image guided by the input image. Finally, the refined high-resolution gradients are fed into a uniform energy function to recover the final output. Extensive experiments demonstrate that our framework outperforms the recent state-of-the-art single image super-resolution approaches both quantitatively and qualitatively.

Yang Xian, Xiaodong Yang, Yingli Tian
Automated Habit Detection System: A Feasibility Study

In this paper, we propose an automated habit detection system. We define a “habit” in this study as some motion that is significantly different from our common behaviors. The behaviors of two subjects during conversation are tracked by the Kinect sensor and their skeletal and facial conformations are detected. The proposed system detects the motions considered as habits by analyzing them using a principal component analysis (PCA) and wavelet multi-resolution analysis (MRA). In our experiments, we prepare a total of 108 movies containing 5 min of conversation. Of these, 100 movies are used to build the average motion model (AMM), and the remainder are used for the evaluation. The accuracy of habit detection in the proposed system is shown to have a precision of 84.0 $$\%$$ and a recall of 81.8 $$\%$$.

Hiroki Misawa, Takashi Obara, Hitoshi Iyatomi
Conductor Tutoring Using the Microsoft Kinect

In this paper we present a system that uses the Microsoft Kinect to provide beginner conducting students real time feedback about their performance. Using upper body joint coordinates we detect common mistakes such as swaying, rocking, excessive hinge movement, and mirroring. We compute instant velocities to determine tempo and classify articulation as legato or staccato. Our experiments show that the system performs perfectly when detecting erroneous movements, correctly classifies articulation type most of the time, and can correctly determine tempo by counting the number of beats per minute. The system was well received by conducting students and their instructor, as it allows them to practice by themselves, without an orchestra.

Andrea Salgian, Leighanne Hsu, Nathaniel Milkosky, David Vickerman
Lens Distortion Rectification Using Triangulation Based Interpolation

Nonlinear lens distortion rectification is a common first step in image processing applications where the assumption of a linear camera model is essential. For rectifying the lens distortion, forward distortion model needs to be known. However, many self-calibration methods estimate the inverse distortion model. In the literature, the inverse of the estimated model is approximated for image rectification, which introduces additional error to the system. We propose a novel distortion rectification method that uses the inverse distortion model directly. The method starts by mapping the distorted pixels to the rectified image using the inverse distortion model. The resulting set of points with subpixel locations are triangulated. The pixel values of the rectified image are linearly interpolated based on this triangulation. The method is applicable to all camera calibration methods that estimate the inverse distortion model and performs well across a large range of parameters.

Burak Benligiray, Cihan Topal
A Computer Vision System for Automatic Classification of Most Consumed Brazilian Beans

In this work we propose a computer vision system (CVS) for automatic classification of beans. It is able to classify the beans most consumed in Brazil, according to their skin colors and is composed by three main steps: (i) image acquisition and pre-processing, (ii) segmentation of grains and (iii) classification of grains. In the conducted experiments, we used an apparatus controlled by a PC that includes a conveyor belt, an image acquisition chamber and a camera, to simulate an industrial line of production. The results obtained in the experiments indicate that proposed system could be used to support the visual quality inspection of Brazilian beans.

S. A. Araújo, W. A. L. Alves, P. A. Belan, K. P. Anselmo

3D Computer Vision

Frontmatter
Stereo-Matching in the Context of Vision-Augmented Vehicles

Stereo matching accuracy is determined by comparing results with ground truth. However, the kind of detail remains unspecified in regions where a stereo matcher is more accurate. By identifying feature points we are identifying regions where the data cost used can easily represent features. We suggest to use feature matchers for identifying sparse matches of high confidence, and to use those for guiding a belief-propagation mechanism.Extensive experiments, also including a semi-global stereo matcher, illustrate achieved performance. We also test on data just recently made available for a developing country, which comes with particular challenges not seen before. Since KITTI ground truth is sparse, for most of identified feature points ground truth is actually missing. By using our novel stereo matching method (called FlinBPM) we derive our own ground truth and compare it with results obtained by other matching approaches including our novel stereo matching method (called WlinBPM).Based on this we were able to identify circumstances in which a census transform fails to define an appropriate data cost measure. There is not a single all-time winner in the set of considered stereo matchers, but there are specific benefits when applying one of the discussed stereo matching strategies. This might point towards a need of adaptive solutions for vision-augmented vehicles.

Waqar Khan, Reinhard Klette
A Real-Time Depth Estimation Approach for a Focused Plenoptic Camera

This paper presents an algorithm for real-time depth estimation with a focused plenoptic camera. The described algorithm is based on pixel-wise stereo-observations in the raw image recorded by the plenoptic camera which are combined in a probabilistic depth map. Additionally, we provide efficient methods for outlier removal based on a Naive Bayes classifier as well as depth refinement using a bilateral filter. We achieve a real-time performance for our algorithm by an optimized parallel implementation.

Ross Vasko, Niclas Zeller, Franz Quint, Uwe Stilla
Range Image Processing for Real Time Hospital-Room Monitoring

In this paper we describe a robust and movable real-time system, based on range data and 2D image processing, to monitor hospital-rooms and to provide useful information that can be used to give early warnings in case of dangerous situations. The system auto-configures itself in real-time, no initial supervised setup is necessary, so is easy to displace it from room to room, according to the effective hospital needs. Night-and-day operations are granted even in presence of severe occlusions, by exploiting the 3D data given by a Kinect$$^\copyright $$ sensor. High performance is obtained by a hierarchical approach that first detects the rough geometry of the scene. Thereafter, the system detects the other entities, like beds and people. The current implementation has been preliminarily tested at “Le Scotte” polyclinic hospital in Siena, and allows a 24 h coverage of up to three beds by a single Kinect$$^\copyright $$ in a typical room.

Alessandro Mecocci, Francesco Micheli, Claudia Zoppetti
Real–Time 3-D Surface Reconstruction from Multiple Cameras

Recently, by means of the cheap GPUs and appropriate parallel algorithms, it is possible to perform real-time 3-D reconstruction. In this paper, a real-time 3-D surface reconstruction system has been set up to achieve dense geometry reconstruction from multiple cameras. Pose of the cameras are accurately estimated with the help of a self-calibration system. The depth map of the recorded scene is computed by means of a dense multi-view stereo algorithm. Matching cost aggregation and global optimization method are used to obtain the accurate depth values. We merge our works into the Meshlab, where the depth information is used for generating the surface model. High-quality results are finally presented to prove the feasibility of our system and reconstruction algorithms.

Yongchun Liu, Huajun Gong, Zhaoxing Zhang
Stereo Correspondence Evaluation Methods: A Systematic Review

The stereo correspondence problem has received significant attention in literature during approximately three decades. During that period of time, the development on stereo matching algorithms has been quite considerable. In contrast, the proposals on evaluation methods for stereo matching algorithms are not so many. This is not trivial issue, since an objective assessment of algorithms is required not only to measure improvements on the area, but also to properly identify where the gaps really are, and consequently, guiding the research. In this paper, a systematic review on evaluation methods for stereo matching algorithms is presented. The contributions are not only on the found results, but also on how it is explained and presented: aiming to be useful for the researching community on visual computing, in which such systematic review process is not yet broadly adopted.

Camilo Vargas, Ivan Cabezas, John W. Branch

Computer Graphics

Frontmatter
Guided High-Quality Rendering

We present a system which allows for guiding the image quality in global illumination (GI) methods by user-specified regions of interest (ROIs). This is done with either a tracked interaction device or a mouse-based method, making it possible to create a visualization with varying convergence rates throughout one image towards a GI solution. To achieve this, we introduce a scheduling approach based on Sparse Matrix Compression (SMC) for efficient generation and distribution of rendering tasks on the GPU that allows for altering the sampling density over the image plane. Moreover, we present a prototypical approach for filtering the newly, possibly sparse samples to a final image. Finally, we show how large-scale display systems can benefit from rendering with ROIs.

Thorsten Roth, Martin Weier, Jens Maiero, André Hinkenjann, Yongmin Li
User-Assisted Inverse Procedural Facade Modeling and Compressed Image Rendering

We take advantage of human intuition by encoding facades into a procedural representation. Our user-assisted inverse procedural modeling approach allows users to exploit repetitions and symmetries of facades to create a split grammar representation of the input. Terminal symbols correspond to repeating elements such as windows, window panes, and doors and their distributions are encoded as the production rules. Our participants achieved a compression factor that averaged 57 % (min = 12 %, max = 99 %) while taking on average 7 min (min = 1, max = 25) to compress an image. The compressed facades do not suffer from occlusion problems present in the input, such as trees or cars. Our second contribution is a novel rendering algorithm that directly displays the compressed facades in their procedural form by interpreting the procedural rules during texture lookup. This algorithm provides considerable memory savings while achieving comparable rendering performance.

Huilong Zhuo, Shengchuan Zhou, Bedrich Benes, David Whittinghill
Facial Fattening and Slimming Simulation Based on Skull Structure

In this paper, we propose a novel facial fattening and slimming deformation method in 2D images that preserves the individuality of the input face by estimating the skull structure from a frontal face image and prevents unnatural deformation (e.g. penetration into the skull). Our method is composed of skull estimation, optimizing fattening and slimming rules appropriate to the estimated skull, mesh deformation to generate fattening and slimming face, and generation background image adapted to the generated face contour. Finally, we verify our method by comparison with other rules, precision of skull estimation, subjective experiment, and execution time.

Masahiro Fujisaki, Shigeo Morishima
Many-Lights Real Time Global Illumination Using Sparse Voxel Octree

The many-lights real time Global Illumination (GI) algorithm is promising but requires many shadow maps to be generated for Virtual Point Light (VPL) visibility tests, which reduces its efficiency. Prior solutions restrict either the number or accuracy of shadow map updates, which may lower the accuracy of indirect illumination or prevent the rendering of fully dynamic scenes. In this paper, we propose a hybrid real-time GI algorithm that utilizes an efficient Sparse Voxel Octree (SVO) ray marching algorithm for visibility tests instead of the shadow map generation step of the many-lights algorithm. Our technique achieves high rendering fidelity at about 50 FPS, is highly scalable and can support thousands of VPLs generated on the fly.

Che Sun, Emmanuel Agu
WebPhysics: A Parallel Rigid Body Simulation Framework for Web Applications

Due to the ubiquity of web browser engines and the advent of modern web standards (like HTML5), software industry tends to use web application as an alternative to traditional native application. Web app development commonly uses script language (like JavaScript, CSS), the low performance language which significantly hinders real-time execution of physics simulation. We design a new framework to achieve real-time physics simulation engine. The key novelty lies at: we choose native implementation for computing intensive functions in physics simulation, and bind native implementation with JavaScript APIs, then we only expose JavaScript APIs through web browser engine to developers but still calling native implementation. Based on this model, we build WebPhysics: the first 2D simulation engine targeting on real-time web applications, which is seamlessly compatible to both de-facto standard simulation engine (Box2D) and browser engine (Webkit). We also explore and implement a parallel rigid body simulation (Box2DOCL) in the context of web app framework to obtain further performance improvement. Our experiments show significant performance improvement in simulation time.

Robert (Bo) Li, Tasneem Brutch, Guodong Rong, Yi Shen, Chang Shu

Segmentation

Frontmatter
A Markov Random Field and Active Contour Image Segmentation Model for Animal Spots Patterns

Non-intrusive biometrics of animals using images allows to analyze phenotypic populations and individuals with patterns like stripes and spots without affecting the studied subjects. However, non-intrusive biometrics demand a well trained subject or the development of computer vision algorithms that ease the identification task. In this work, an analysis of classic segmentation approaches that require a supervised tuning of their parameters such as threshold, adaptive threshold, histogram equalization, and saturation correction is presented. In contrast, a general unsupervised algorithm using Markov Random Fields (MRF) for segmentation of spots patterns is proposed. Active contours are used to boost results using MRF output as seeds. As study subject the Diploglossus millepunctatus lizard is used. The proposed method achieved a maximum efficiency of $$91.11\,\%$$.

Alexander Gómez, German Díez, Jhony Giraldo, Augusto Salazar, Juan M. Daza
Segmentation of Building Facade Towers

Architectural styles are phases of development that classify architecture in the sense of historic periods, regions and cultural influences. The article presents the first approach, performing automatic segmentation of building facade towers in the framework of an image-based architectural style classification system. The observed buildings, featuring towers, belong to Romanesque, Gothic and Baroque architectural styles. The method is a pipeline unifying bilateral symmetry detection, graph-based segmentation approaches and image analysis and processing technique. It employs the specific visual features of the outstanding architectural element tower - vertical bilateral symmetry, raising out of the main building and solidity. The approach is robust to high perspective distortions. It comprises two branches, targeting facades with single and double towers correspondingly. The performance evaluation on a vast number of images reports extremely high segmentation precision.

Gayane Shalunts
Effective Information and Contrast Based Saliency Detection

Human attention tends to get focused on the most prominent objects in a scene which are different from the background. These are termed as salient objects. The human brain perceives an object of salient type based on its difference with the surroundings in terms of color and texture. There have been many color based approaches in the past for salient object detection. In this paper, we augment information set features with color features and detect the final single salient object using a set of color, size and location based features. The information set features result from representing the uncertainty in the color and illumination components. To locate the salient parts of the image, we make use of the entropy to find the uncertainties in the color and luminance components of the image. Extensive comparisons with the state-of-the-art methods in terms of precision, recall and F-Measure are made on two different publicly available datasets to prove the effectiveness of this approach.

Aditi Kapoor, K. K. Biswas, M. Hanmandlu
Edge Based Segmentation of Left and Right Ventricles Using Two Distance Regularized Level Sets

In this paper, we present a new approach for segmentation of left and right ventricles from cardiac MR images. A two-level-set formulation is proposed which is the extension of distance regularized level set evolution (DRLSE) model in [1], with the 0-level set and k-level set representing the endocardium and epicardium, respectively. The extraction of endocardium and epicardium is obtained as a result of the interactive curve evolution of the 0 and k level sets derived from the proposed variational level set formulation. The initialization of the proposed two-level-set DRLSE model is generated by performing the original DRLSE from roughly located endocardium. Experimental results have demonstrated the effectiveness of the proposed two-level-set DRLSE model.

Yu Liu, Yue Zhao, Shuxu Guo, Shaoxiang Zhang, Chunming Li
Automatic Crater Detection Using Convex Grouping and Convolutional Neural Networks

Craters are some the most important landmarks on the surface of many planets which can be used for autonomous safe landing and spacecraft and rover navigation. Manual detection of craters is laborious and impractical, and many approaches have been proposed in the field to automate this task. However, none of these methods have yet become a standard tool for crater detection due to the challenging nature of this problem. In this paper, we propose a new crater detection algorithm (CDA) which employs a multi-scale candidate region detection step based on convexity cues and candidate region verification based on machine learning. Using an extensive dataset, our method has achieved a 92 % detection rate with an 85 % precision rate.

Ebrahim Emami, George Bebis, Ara Nefian, Terry Fong

ST: Biometrics

Frontmatter
Segmentation of Saimaa Ringed Seals for Identification Purposes

Wildlife photo-identification is a commonly used technique to identify and track individuals of wild animal populations over time. It has various applications in behavior and population demography studies. Nowadays, mostly due to large and labor-intensive image data sets, automated photo-identification is an emerging research topic. In this paper, the first steps towards automatic individual identification of the critically endangered Saimaa ringed seal (Phoca hispida saimensis) are taken. Ringed seals have a distinctive permanent pelage pattern that is unique to each individual making the image-based identification possible. We propose a superpixel classification based method for the segmentation of ringed seal in images to eliminate the background and to simplify the identification. The proposed segmentation method is shown to achieve a high segmentation accuracy with challenging image data. Furthermore, we show that using the obtained segmented images promising identification results can be obtained even with a simple texture feature based approach. The proposed method uses general texture classification techniques and can be applied also to other animal species with a unique fur or skin pattern.

Artem Zhelezniakov, Tuomas Eerola, Meeri Koivuniemi, Miina Auttila, Riikka Levänen, Marja Niemi, Mervi Kunnasranta, Heikki Kälviäinen
Fingerprint Matching with Optical Coherence Tomography

Fingerprint recognition is an important security technique with a steadily growing usage for the identification and verification of individuals. However, current fingerprint acquisition systems have certain disadvantages, which include the requirements of physical contact with the acquisition device, and the presence of undesirable artefacts, such as scars, on the surface of the fingerprint. This paper evaluates the accuracy of a complete framework for the capturing of undamaged, undistorted fingerprints from below the skins surface using optical coherence tomography hardware, the extraction and conversion of the subsurface data into a usable fingerprint and the matching of such fingerprints. The ability of the framework to integrate with existing fingerprint recognition systems and its ability to operate as an independent stand-alone system are both evaluated.

Yaseen Moolla, Ann Singh, Ebrahim Saith, Sharat Akhoury
Improve Non-graph Matching Feature-Based Face Recognition Performance by Using a Multi-stage Matching Strategy

In this paper, a multi-stage matching strategy that determines the recognition result step by step is employed to improve the recognition performance of a non-graph matching feature-based face recognition. As the gallery size increases, correct correspondence of feature points between the probe image and training images becomes more and more difficult so that the recognition accuracy degrades gradually. To deal with the recognition degradation problem, we propose a multi-stage matching strategy for the non-graph matching feature-based method. Instead of finding the best match, each step picks out one half of the best matching candidates and removes the other half. The behavior of picking and removing repeats until the number of the remaining candidates is small enough to decide the final result. The experimental result shows that with the multi-stage matching strategy, the recognition performance is remarkably improved. Moreover, the improvement level also increases with the gallery size.

Xianming Chen, Wenyin Zhang, Chaoyang Zhang, Zhaoxian Zhou
Neighbors Based Discriminative Feature Difference Learning for Kinship Verification

In this paper, we present a discriminative feature difference learning method for facial image based kinship verification. To transform feature difference of an image pair to be discriminative for kinship verification, a linear transformation matrix for feature difference between an image pair is inferred from training data. This transformation matrix is obtained through minimizing the difference of L2 norm between the feature difference of each kinship pair and its neighbors from non-kinship pairs. To find the neighbors, a cosine similarity is applied. Our method works on feature difference rather than the commonly used feature concatenation, leading to a low complexity. Furthermore, there is no positive semi-definitive constrain on the transformation matrix while there is in metric learning methods, leading to an easy solution for the transformation matrix. Experimental results on two public databases show that the proposed method combined with a SVM classification method outperforms or is comparable to state-of-the-art kinship verification methods.

Xiaodong Duan, Zheng-Hua Tan
A Comparative Analysis of Two Approaches to Periocular Recognition in Mobile Scenarios

In recent years, periocular recognition has become a popular alternative to face and iris recognition in less ideal acquisition scenarios. An interesting example of such scenarios is the usage of mobile devices for recognition purposes. With the growing popularity and easy access to such devices, the development of robust biometric recognition algorithms to work under such conditions finds strong motivation. In the present work we assess the performance of extended versions of two state-of-the-art periocular recognition algorithms on the publicly available CSIP database, a recent dataset composed of images acquired under highly unconstrained and multi-sensor mobile scenarios. The achieved results show each algorithm is better fit to tackle different scenarios and applications of the biometric recognition problem.

João C. Monteiro, Rui Esteves, Gil Santos, Paulo Torrão Fiadeiro, Joana Lobo, Jaime S. Cardoso

Applications

Frontmatter
Visual Perception and Analysis as First Steps Toward Human–Robot Chess Playing

We propose in this paper a novel visual computing approach for the automatic perception of chess gaming states, where a standard chessboard with original chess pieces is used. Our image analysis algorithm uses only grayscale images captured from a single mobile camera for interactive gaming under natural environmental conditions. On the one hand, we apply computer vision techniques to detect and localize the grid corners, obtaining a 2D representation of the $$8 \times 8$$ chess grid based on the grayscale information of the input image. On the other hand, we exploit computer graphics techniques for the 3D modeling and rendering of the pieces together with the chessboard. Using 2D–3D correspondences, we are able to recover the 3D camera poses and determine game state transitions during the gaming process. Experimental study based on both simulated and real–world scenarios demonstrates the feasibility and effectiveness of our approach.

Andreas Schwenk, Chunrong Yuan
A Gaussian Mixture Representation of Gesture Kinematics for On-Line Sign Language Video Annotation

Sign languages (SLs) are visuo-gestural representations used by deaf communities. Recognition of SLs usually requires manual annotations, which are expert dependent, prone to errors and time consuming. This work introduces a method to support SL annotations based on a motion descriptor that characterizes dynamic gestures in videos. The proposed approach starts by computing local kinematic cues, represented as mixtures of Gaussians which together correspond to gestures with a semantic equivalence in the sign language corpora. At each frame, a spatial pyramid partition allows a fine-to-coarse sub-regional description of motion-cues distribution. Then for each sub-region, a histogram of motion-cues occurrence is built, forming a frame-gesture descriptor which can be used for on-line annotation. The proposed approach is evaluated using a bag-of-features framework, in which every frame-level histogram is mapped to an SVM. Experimental results show competitive results in terms of accuracy and time computation for a signing dataset.

Fabio Martínez, Antoine Manzanera, Michèle Gouiffès, Annelies Braffort
Automatic Affect Analysis: From Children to Adults

This article presents novel and robust framework for automatic recognition of facial expressions for children. The proposed framework also achieved results better than state of the art methods for stimuli containing adult faces. The proposed framework extract features only from perceptual salient facial regions as it gets its inspiration from human visual system. In this study we are proposing novel shape descriptor, facial landmark points triangles ratio (LPTR). The framework was first tested on the “Dartmouth database of children’s faces” which contains photographs of children between 6 and 16 years of age and achieved promising results. Later we tested proposed framework on Cohn-Kanade (CK+) posed facial expression database (adult faces) and obtained results that exceeds state of the art.

Rizwan Ahmed Khan, Alexandre Meyer, Saida Bouakaz
A Study of Hand Motion/Posture Recognition in Two-Camera Views

This paper presents a vision-based approach for hand gesture recognition which combines both trajectory recognition and hand posture recognition. With two calibrated cameras, the 3D hand motion trajectory can be reconstructed. The reconstructed trajectory is then modeled by dynamic movement primitives (DMP) and a support vector machine (SVM) is trained to recognize five classes of gestures trajectories. Scale-invariant feature transform (SIFT) is used to extract features on segmented hand postures taken from both camera views. Based on various hand appearances captured by the two cameras, the proposed hand posture recognition method has shown a very good success rate. A gesture vector is proposed to combine the recognition result from both trajectory and hand postures. For our experimental set-up, it was shown that it is possible to accomplish a good overall accuracy for gesture recognition.

Jingya Wang, Shahram Payandeh

Pattern Recognition

Frontmatter
Automatic Verification of Properly Signed Multi-page Document Images

In this paper we present an industrial application for the automatic screening of incoming multi-page documents in a banking workflow aimed at determining whether these documents are properly signed or not. The proposed method is divided in three main steps. First individual pages are classified in order to identify the pages that should contain a signature. In a second step, we segment within those key pages the location where the signatures should appear. The last step checks whether the signatures are present or not. Our method is tested in a real large-scale environment and we report the results when checking two different types of real multi-page contracts, having in total more than 14,500 pages.

Marçal Rusiñol, Dimosthenis Karatzas, Josep Lladós
CRFs and HCRFs Based Recognition for Off-Line Arabic Handwriting

This paper investigates the application of the probabilistic discriminative based Conditional Random Fields (CRFs) and its extension the hidden-states CRFs (HCRFs) to the problem of off-line Arabic handwriting recognition. A CRFs- and A HCRFs- based classifiers are built on top of an explicit word segmentation module using two different set of shape description features. A simple yet effective taxonomization technique is used to reduce the number of the class labels, and 3000 letter samples from IESK-arDB database are used for the training and 300 words are used for the evaluation. Experiments compare the performance of the CRFs to the HCRFs as well as to that of a generative based HMMs. Results indicate superiority of discriminative based approaches, where HCRFs achieved the best performance followed by CRFs.

Moftah Elzobi, Ayoub Al-Hamadi, Laslo Dings, Sherif El-etriby
Classifying Frog Calls Using Gaussian Mixture Models

We focus on the automatic classification of frog calls using shape features of spectrogram images. Monitoring frog populations is a means for tracking the health of natural habitats. This monitoring task is usually done by well-trained experts who listen and classify frog calls, which are tasks that are both time consuming and error prone. To automate this classification process, our method treats the sound signal of a frog call as a texture image, which is modeled as Gaussian mixture model. The method is simple but it has shown promising results. Tests performed on a dataset of frog calls of 15 different species produced an average classification rate of 80 %, which approximates human performance.

Dalwinderjeet Kular, Kathryn Hollowood, Olatide Ommojaro, Katrina Smart, Mark Bush, Eraldo Ribeiro
Ice Detection on Electrical Power Cables

In northern countries, ice storms can cause major power disruptions such as the one that occurred on December 2013 that left more than 300,000 customers in Toronto with no electricity immediately after such an ice storm. Detection of ice formation on power cables can help on taking actions for removing the ice before a major problem occurs. A computer vision solution was developed to detect ice on difficult imaging scenarios such as images taken under fog conditions that reduces the image contrast, passing cars that are within the field of view of the camera as well as different illumination problems that can occur when taking images during different times of the day. Based on a neural network for classification and six image features that can deal with these difficult images, we reduced the errors on a set of images that was previously yielding 20 errors out of 50 images to only one error.

Binglin Li, Gabriel Thomas, Dexter Williams
Facial Landmark Localization Using Robust Relationship Priors and Approximative Gibbs Sampling

We tackle the facial landmark localization problem as an inference problem over a Markov Random Field. Efficient inference is implemented using Gibbs sampling with approximated full conditional distributions in a latent variable model. This approximation allows us to improve the runtime performance 1000-fold over classical formulations with no perceptible loss in accuracy. The exceptional robustness of our method is realized by utilizing a $$L_{1}$$-loss function and via our new robust shape model based on pairwise topological constraints. Compared with competing methods, our algorithm does not require any prior knowledge or initial guess about the location, scale or pose of the face.

Karsten Vogt, Oliver Müller, Jörn Ostermann

Recognition

Frontmatter
Off-the-Shelf CNN Features for Fine-Grained Classification of Vessels in a Maritime Environment

Convolutional Neural Networks (CNNs) have recently achie- ved spectacular performance on standard image classification benchmarks. Moreover, CNNs trained using large datasets such as ImageNet have performed effectively even on other recognition tasks and have been used as generic feature extraction tool for off-the-shelf classifiers. This paper, presents an experimental study to investigate the ability of off-the-shelf CNN features catch discriminative details of maritime vessels for fine-grained classification. An off-the-shelf classification scheme utilizing a linear support vector machine is applied to the high-level convolution features that come before fully connected layers in popular deep learning architectures. Extensive experimental evaluation compared OverFeat, GoogLeNet, VGG, and AlexNet architectures for feature extraction. Results showed that OverFeat features outperform the other architectures with a mAP = 0.7021 on the nine class fine-grained problem which was almost 0.02 better than its closest competitor, GoogLeNet, which performed best on smaller vessel types.

Fouad Bousetouane, Brendan Morris
Joint Visual Phrase Detection to Boost Scene Parsing

Scene parsing is a very challenging problem which attracts increasing interests in many fields such as computer vision and robotics. However, occluded or small objects which are difficult to parse are always ignored. To deal with these two problems, we integrate visual phrase into our joint system, which has been proved to have good performance on describing relationships between objects. In this paper, we propose a joint model which integrates scene classification, object and visual phrase detection, as well as scene parsing together. By encoding them into a Conditional Random Field model, all tasks mentioned above could be solved jointly. We evaluate our method on the MSRC-21 dataset. The experimental results demonstrate that our method achieves comparable and on some occasions even superior performance with respect to state-of-the-art joint methods especially when there exist partially occluded or small objects.

Keke Tang, Zhe Zhao, Xiaoping Chen
If We Did Not Have ImageNet: Comparison of Fisher Encodings and Convolutional Neural Networks on Limited Training Data

This work aims to compare two competing approaches for image classification, namely Bag-of-Visual-Words (BoVW) and Convolutional Neural Networks (CNNs). Recent works have shown that CNNs (Convolutional Neural Networks) have surpassed hand-crafted feature extraction techniques in image classification problems. Their success is partly attributed to the fact that benchmarking initiatives such as ImageNet in a massive crowd sourcing effort gathered sufficient data necessary to train deep neural networks with a very large number of model parameters. Obviously, manually annotated training datasets on a similar scale cannot be provided in every classification scenario due to the massive amount of required resources and time. In this paper, we therefore analyze and compare the performance of BoVW- and CNN-based approaches for image classification as a function of the available training data. We show that CNNs benefit from growing datasets while BoVW-based classifiers outperform CNNs when only limited data is available. Evidence is given by experiments with gradually increasing training data and visualizations of the classification models.

Christian Hentschel, Timur Pratama Wiradarma, Harald Sack
Investigating Pill Recognition Methods for a New National Library of Medicine Image Dataset

With the increasing access to pharmaceuticals, chances are that medication administration errors will occur more frequently. On average, individuals above age 65 take at least 14 prescriptions per year. Unfortunately, adverse drug reactions and noncompliance are responsible for 28 % of hospitalizations of the elderly. Correctly identifying pills has become a critical task in patient care and safety. Using the recently released National Library of Medicine (NLM) pill image database, this paper investigates descriptors for pill detection and characterization. We describe efforts in investigating algorithms to segment NLM pills images automatically, and extract several features to assembly pill groups with priors based on FDA recommendations for pill physical attributes. Our contributions toward pill recognition automation are three-fold: we evaluate the 1,000 most common medications in the United States, provide masks and feature matrices for the NLM reference pill images to guarantee reproducibility of results, and discuss strategies to organize data for efficient content-based image retrieval.

Daniela Ushizima, Allan Carneiro, Marcelo Souza, Fatima Medeiros
Realtime Face Verification with Lightweight Convolutional Neural Networks

Face verification is a promising method for user authentication. Besides existing methods with deep convolutional neural networks to handle millions of people using powerful computing systems, the authors aim to propose an alternative approach of a lightweight scheme of convolutional neural networks (CNN) for face verification in realtime. Our goal is to propose a simple yet efficient method for face verification that can be deployed on regular commodity computers for individuals or small-to-medium organizations without super-computing strength. The proposed scheme targets unconstrained face verification, a typical scenario in reality. Experimental results on original data of Labeled Faces in the Wild dataset show that our best CNN found through experiments with 10 hidden layers achieves the accuracy of $$(82.58 \pm 1.30)\,\%$$ while many other instances in the same scheme can also approximate this result. The current implementation of our method can run at 60 fps and 235 fps on a regular computer with CPU-only and GPU configurations respectively. This is suitable for deployment in various applications without special requirements of hardware devices.

Nhan Dam, Vinh-Tiep Nguyen, Minh N. Do, Anh-Duc Duong, Minh-Triet Tran

Virtual Reality

Frontmatter
Relighting for an Arbitrary Shape Object Under Unknown Illumination Environment

Relighting techniques can achieve the photometric consistency in synthesizing a composite images. Relighting generally need the object’s shape and illumination environment. Recent research demonstrates possibilities of relighting for an unknown shape object or the relighting the object under unknown illumination environment. However, achieving both tasks are still challenging issue. In this paper, we propose a relighting method for an unknown shape object captured under unknown illumination environment by using an RGB-D camera. The relighted object can be rendered from pixel intensity, surface albedo and shape of the object and illumination environment. The pixel intensity and the shape of the object can simultaneously be obtained from an RGB-D camera. Then surface albedo and illumination environment are iteratively estimated from the pixel intensity and shape of the object. We demonstrate that our method can perform relighting for a dynamic shape object captured under unknown illumination using an RGB-D camera.

Yohei Ogura, Hideo Saito
Evaluation of Fatigue Measurement Using Human Motor Coordination for Gesture-Based Interaction in 3D Environments

Benefits of immersive three-dimensional (3D) applications are enhanced by effective 3D interaction techniques. While gesture-based interaction provides benefits in these types of environments users commonly report higher fatigue than with other interaction solutions. Typically, fatigue is measured subjectively but may lack precision, consistency and depth. Our research proposes a novel, more consistent and predictable measure of fatigue. This paper presents the details of our technique based on human motor coordination, the results of an experimental study on gesture-based interaction, identifies attributing causes for fatigue and outlines design guidelines to reduce fatigue for gesture-based interaction techniques. These results have implications for gesture-based or mid-air interaction techniques for 3D environments, such as virtual environments and immersive visualizations.

Neera Pradhan, Angela Benavides, Qin Zhu, Amy Ulinski Banic
JackVR: A Virtual Reality Training System for Landing Oil Rigs

We propose JackVR, an interactive immersive simulation prototype aiming to train domain experts to land jackup oil rigs. Jackup rigs are among the most common offshore drilling units for extracting oil, but the process of landing the rigs is mostly challenging because of the unpredictable sea and weather conditions, lack of clear vision, and the possible risk of damaging the ocean floor. We designed JackVR to support oil engineers and technicians by allowing them to practice landing the oil rig within a safe and semi-realistic training environment. Furthermore, the design explores various superimposed spatial indicators that provide visual warnings on unexpected task conditions. The implemented prototype supports two modes for training, and utilizes the ray-casting interaction technique to enable seamless and direct control of the rig. ...

Ahmed E. Mostafa, Kazuki Takashima, Mario Costa Sousa, Ehud Sharlin
DAcImPro: A Novel Database of Acquired Image Projections and Its Application to Object Recognition

Projector-camera systems are designed to improve the projection quality by comparing original images with their captured projections, which is usually complicated due to high photometric and geometric variations. Many research works address this problem using their own test data which makes it extremely difficult to compare different proposals. This paper has two main contributions. Firstly, we introduce a new database of acquired image projections (DAcImPro) that, covering photometric and geometric conditions and providing data for ground-truth computation, can serve to evaluate different algorithms in projector-camera systems. Secondly, a new object recognition scenario from acquired projections is presented, which could be of a great interest in such domains, as home video projections and public presentations. We show that the task is more challenging than the classical recognition problem and thus requires additional pre-processing, such as color compensation or projection area selection.

Aleksandr Setkov, Fabio Martinez Carillo, Michèle Gouiffès, Christian Jacquemin, Maria Vanrell, Ramon Baldrich
Deformable Object Behavior Reconstruction Derived Through Simultaneous Geometric and Material Property Estimation

We present a methodology of accurately reconstructing the deformation and surface characteristics of a scanned 3D model recorded in real-time within a Finite Element Model (FEM) simulation. Based on a sequence of generated surface deformations defining a reference animation, we illustrate the ability to accurately replicate the deformation behavior of an object composed of an unknown homogeneous elastic material. We then formulate the procedural generation of the internal geometric structure and material parameterization required to achieve the recorded deformation behavior as a non-linear optimization problem. In this formulation the geometric distribution (quality) and density of tetrahedral components are simultaneously optimized with the elastic material parameters (Young’s Modulus and Possion’s ratio) of a procedurally generated FEM model to provide the optimal deformation behavior with respect to the recorded surface.

Shane Transue, Min-Hyung Choi

Poster

Frontmatter
Accidental Fall Detection Based on Skeleton Joint Correlation and Activity Boundary

We propose a system to detect accidental fall from walking or sitting activity in a nursing home. Differing from the trajectory tracing techniques, which detects periodic movements, our algorithm explores secondary features (angle and distance), focusing on the correlation between joints and the boundary of this correlation. We generated skeleton joint data using the Kinect sensor because it is affordable and supports sufficiently large capture space. However, other similar smart sensors can also be used. The angle feature denotes the correlation between the normal vector of the floor and the vector formed by linking the knee and ankle (on the left and right leg separately). The distance feature denotes the correlation between the floor and each of several important joints. A fall is reported when the angle is greater than and the distance is less than the respective threshold value. We created an activity database to evaluate our technique. The activities simulate elderly people walking, sitting and falling. Experimental results show that our algorithm is simple to implement, has low computational cost and is able to detect 36/37 falling events, and 57/57 walking and sitting activities accurately.

Martha Magali Flores-Barranco, Mario-Alberto Ibarra-Mazano, Irene Cheng
Generalized Wishart Processes for Interpolation Over Diffusion Tensor Fields

Diffusion Magnetic Resonance Imaging (dMRI) is a non-invasive tool for watching the microstructure of fibrous nerve and muscle tissue. From dMRI, it is possible to estimate 2-rank diffusion tensors imaging (DTI) fields, that are widely used in clinical applications: tissue segmentation, fiber tractography, brain atlas construction, brain conductivity models, among others. Due to hardware limitations of MRI scanners, DTI has the difficult compromise between spatial resolution and signal noise ratio (SNR) during acquisition. For this reason, the data are often acquired with very low resolution. To enhance DTI data resolution, interpolation provides an interesting software solution. The aim of this work is to develop a methodology for DTI interpolation that enhance the spatial resolution of DTI fields. We assume that a DTI field follows a recently introduced stochastic process known as a generalized Wishart process (GWP), which we use as a prior over the diffusion tensor field. For posterior inference, we use Markov Chain Monte Carlo methods. We perform experiments in toy and real data. Results of GWP outperform other methods in the literature, when compared in different validation protocols.

Hernán Darío Vargas Cardona, Mauricio A. Álvarez , Álvaro A. Orozco
Spatio-Temporal Fusion for Learning of Regions of Interests Over Multiple Video Streams

Video surveillance systems must process and manage a growing amount of data captured over a network of cameras for various recognition tasks. In order to limit human labour and error, this paper presents a spatial-temporal fusion approach to accurately combine information from Region of Interest (RoI) batches captured in a multi-camera surveillance scenario. In this paper, feature-level and score-level approaches are proposed for spatial-temporal fusion of information to combine information over frames, in a framework based on ensembles of GMM-UBM (Universal Background Models). At the feature-level, features in a batch of multiple frames are combined and fed to the ensemble, whereas at the score-level the outcome of ensemble for individual frames are combined. Results indicate that feature-level fusion provides higher level of accuracy in a very efficient way.

Samaneh Khoshrou, Jaime S. Cardoso, Eric Granger, Luís F. Teixeira
Patch Selection for Single Image Deblurring Based on a Coalitional Game

Most single-image deblurring methods estimate the blur kernel using whole image, however, that may lead to incorrect estimation and more computations. In this paper, we focus on accelerating the blind deconvolution algorithm and increasing the accuracy of kernel estimation by using only a small region in image to perform the process of kernel estimation. Then, the problem now is to find the most proper region. At first, we found informative pixels to locate useful patches. Inspiring by game theory, we propose a coalitional game based patch selection method to choose a group of patches for kernel estimation. In this game, each patch represents a player, and our purpose is to find a coalition that has the maximal payoff. Shapley Value is applied to fairly distribute the utility to each player. We show the speed-up and the quality improvement of our method both on real-world and synthetic images.

Jung-Hsuan Lin, Rong-Sheng Wang, Jing-wei Wang
A Robust Real-Time Road Detection Algorithm Using Color and Edge Information

A vision-based road detection technique is important for implementation of a safe driving assistance system. A major problem of vision-based road detection is sensitivity to environmental change, especially illumination change. A novel framework is proposed for robust road detection using a color model with a separable brightness component. Road candidate areas are selected using an adaptive thresholding method, then fast region merging is performed based on a threshold value. Extracted road contours are filtered using edge information. Experimental results show the proposed algorithm is robust in an illumination change environment.

Jae-Hyun Nam, Seung-Hoon Yang, Woong Hu, Byung-Gyu Kim
SeLibCV: A Service Library for Computer Vision Researchers

Image feature detectors and descriptors have made a big advance in several computer vision applications including object recognition, image registration, remote sensing, panorama stitching, and 3D surface reconstruction. Most of these fundamental algorithms are complicated in code, and their implementations are available for only a few platforms. This operational restriction causes various difficulties to utilize them, and even more, it makes different challenges to establish novel experiments and develop new research ideas. SeLibCV is a Software as a Service (SaaS) library for computer vision researchers worldwide that facilitates Rapid Application Development (RAD), and provides application-to-application interaction by tiny services accessible through the Internet. Its functionality covers a wide range of computer vision algorithms including image processing, features extraction, motion detection, visualization, and 3D surface reconstruction. The present paper focuses on the SeLibCV’s routines specializing in local features detection, extraction, and matching algorithms which offer reusable and platform independent components, leading to reproducible research for computer vision scientists. SeLibCV is freely available at http://selibcv.org for any academic, educational, and research purposes.

Ahmad P. Tafti, Hamid Hassannia, Dee Piziak, Zeyun Yu
Bicycle Detection Using HOG, HSC and MLBP

Due to the growing number of bicycles on roads, safety of bicyclists is drawing the increasing attention of transportation departments. Intelligent Transportation Systems (ITS) use automated tools for processing and analysis of traffic video data to plan and implement safety measures. One of important factors that influence the planning and safety countermeasures for bicyclists is the bicycle count. In this paper, we develop a bicycle detection method that can be used in a bicycle counting system. We strive to improve the efficiency of detection by looking for classification features that deliver more versatile information to automatic classifiers. We explore a combination of Histograms of Oriented Gradients (HOG), Histogram of Shearlet Coefficients (HSC) and Multi-scale Local Binary Pattern (MLBP) to improve detection and count of bicycles in video data. It is shown that the combination of the above features secures a higher detection accuracy.

Farideh Foroozandeh Shahraki, Ali Pour Yazdanpanah, Emma E. Regentova, Venkatesan Muthukumar
On Calibration and Alignment of Point Clouds in a Network of RGB-D Sensors for Tracking

This paper investigates the integration of multiple time-of-flight (ToF) depth sensors for the purposes of general 3D tracking and specifically of the hands. The advantage of using a network with multiple sensors is in the increased viewing coverage as well as being able to capture a more complete 3D point cloud representation of the object. Given an ideal point cloud representation, tracking can be accomplished without having to first reconstruct a mesh representation of the object. In utilizing a network of depth sensors, calibration between the sensors and the subsequent data alignment of the point clouds poses key challenges. While there has been research on the merging and alignment of scenes with larger objects such as the human body, there is little research available focusing on a smaller and more complicated object such as the human hand. This paper presents a study on ways to merge and align the point clouds from a network of sensors for object and feature tracking from the combined point clouds.

George Xu, Shahram Payandeh
Semantic Web Technologies for Object Tracking and Video Analytics

As demonstrated in several research contexts, some of the best performing state of the art algorithms for object tracking integrate a traditional bottom-up approach with some knowledge of the scene and aims of the algorithm. In this paper, we propose the use of the Semantic Web technology for representing high-level knowledge describing the elements of the scene to be analysed. In particular, we demonstrate how to use the OWL ontology language to describe scene elements and their relationships together with a SPARQL based rule language to infer on the knowledge. The proof of the implemented concept prototype is able to track people even when occlusions between persons and/or objects occur, only using the bounding box dimensions, positions and directions. We also demonstrate how the Semantic Web Technology enables powerful video analytics functions for video surveillance applications.

Benoit Gaüzère, Claudia Greco, Pierluigi Ritrovato, Alessia Saggese, Mario Vento
Home Oriented Virtual e-Rehabilitation

We present a collaborative framework to provide a simple solution for home-oriented rehabilitation of post-stroke patients. Our final aim is to build a system that will act like a therapist, giving sound advice to the patient and also improve patient’s confidence to perform their daily routine activities independently. In this study, we discuss our rehab system, with strong emphasis on techniques implemented for the integration of rehab robot with the virtual reality games. Experimental observations proves the feasibility of our system.

Yogendra Patil, Iara Brandão, Guilherme Siqueira, Fei Hu
WHAT2PRINT: Learning Image Evaluation

The popularity of digital photography has changed the way images that are taken, processed, and stored. This has created a demand for systems that can evaluate the aesthetic quality of images. Applications that auto-assess image aesthetic quality and modify images to raise their aesthetic quality are widely available, but applications that automatically select aesthetic images from a given image collection are limited. The goal of this project is to create a portable application that can recommend user-given images from a given image collection, using criteria learned from user preferences. We train a Support Vector Machine on seven extracted image features. This system achieves a correct prediction rate of 70 % on a public image dataset. The use of additional or improved features should yield increased prediction rates.

Bohao She, Clark F. Olson
Use of a Large Image Repository to Enhance Domain Dataset for Flyer Classification

This paper describes our exploratory work on supplementing our dataset of images extracted from real estate flyers with images from a large general image repository to enhance the breadth of the samples and create a classification model which would perform well for totally unseen, new instances. We selected some images from the Scene UNderstanding (SUN) database which are annotated with the scene categories that seem to match with our flyer images, and added them to our flyer dataset. We ran a series of experiments with various configurations of flyer vs. SUN data mix. The results showed that the classification models trained with a mixture of SUN and flyer images produced comparable accuracies as the models trained solely with flyer images. This suggests that we were able to create a model which is scalable to unseen, new data without sacrificing the accuracy of the data at hand.

Payam Pourashraf, Noriko Tomuro
Illumination Invariant Robust Likelihood Estimator for Particle Filtering Based Target Tracking

Tracking visual targets under illumination changes is a challenging problem, especially when the illumination varies across different regions of the target. In this paper, we solve the problem of illumination invariant tracking during likelihood estimation within the particle filter. Existing particle filter based tracking frameworks mainly deal with changes in illumination by the choice of color-space or features. This paper presents an alternate likelihood estimation algorithm that helps dealing with illumination changes using a homomogrphic filtering based weighted illumination model. That is, a homomorphic filter is first used to separate the illumination and reflectance components from the image, and further by associating an appropriate weight to the illumination, the target image is reconstructed for the accurately measuring the likelihood. The proposed algorithm is implemented using a simple particle filter tracking framework and compared against other tracking algorithms on scenarios with large illumination variations.

Buti Al Delail, Harish Bhaskar, M. Jamal Zemerly, Mohammed Al-Mualla
Adaptive Flocking Control of Multiple Unmanned Ground Vehicles by Using a UAV

In this paper we aim to discuss adaptive flocking control of multiple Unmanned Ground Vehicles (UGVs) by using an Unmanned Aerial Vehicle (UAV). We utilize a Quadrotor to provide the positions of all agents and also to manage the shrinking or expanding of the agents with respect to the environmental changes. The proposed method adaptively causes changing in the sensing range of the ground robots as the quadrotor attitude changes. The simulation results show the effectiveness of proposed method.

Mohammad Jafari, Shamik Sengupta, Hung Manh La
Basic Study of Automated Diagnosis of Viral Plant Diseases Using Convolutional Neural Networks

Detecting plant diseases is usually difficult without an experts’ knowledge. Therefore, fast and accurate automated diagnostic methods are highly desired in agricultural fields. Several studies on automated plant disease diagnosis have been conducted using machine learning methods. However, with these methods, it can be difficult to detect regions of interest, (ROIs) and to design and implement efficient parameters. In this study, we present a novel plant disease detection system based on convolutional neural networks (CNN). Using only training images, CNN can automatically acquire the requisite features for classification, and achieve high classification performance. We used a total of 800 cucumber leaf images to train CNN using our innovative techniques. Under the 4-fold cross-validation strategy, the proposed CNN-based system (which also extends the training dataset by generating additional images) achieves an average accuracy of 94.9 % in classifying cucumbers into two typical disease classes and a non-diseased class.

Yusuke Kawasaki, Hiroyuki Uga, Satoshi Kagiwada, Hitoshi Iyatomi
Efficient Training of Evolution-Constructed Features

Evolution-Constructed (ECO) features have been shown to be effective for general object recognition. ECO features use evolution strategies to build series of transforms and thus can be generated automatically without human expert involvement. We improved on our successful ECO features algorithm by reducing their dimensions before putting them into the classifier in order to create more effective ECO features. Efficient training of ECO features allows features to be more robust in representing the images.

Meng Zhang, Dah-Jye Lee
Ground Extraction from Terrestrial LiDAR Scans Using 2D-3D Neighborhood Graphs

We introduce a new method for filtering terrestrial LiDAR data into two categories: Ground points and object points. Our method consists of four steps. First, we propose a graph-based feature, which is obtained by combining 2D and 3D neighborhood graphs. For each point, we assign a number, that is the count of common neighbors in 2D and 3D graphs. This feature allows the discrimination between terrain points and object points as terrain points tend to have the same neighbors in both 2D and 3D graphs, while off-terrain points tend to have less common neighbors between 2D and 3D graphs. In second step, we used c-mean algorithm to quantize the feature space into two clusters, terrain points and object points. The third step consists of repeating the first and the second step using different neighborhood sizes to construct the KNN(k-nearest neighbor) graph. In the final step, we propose a decision-level fusion scheme that combines the results obtained in the third step to achieve higher accuracy. Experiments show the effectiveness of our method.

Yassine Belkhouche, Prakash Duraisamy, Bill Buckles
Mass Segmentation in Mammograms Based on the Combination of the Spiking Cortical Model (SCM) and the Improved CV Model

In this paper, a novel method based on CV model for the mass segmentation is proposed. Firstly, selecting the largest connected region, seeded region growing, and singular value decomposition (SVD) are used to pre-processing. After that apply the Spiking Cortical Model (SCM) on the pre-processed image to locate the lesion. Finally, the mass boundary is accurately segmented by the improved CV model. The validity of the proposed method is evaluated through two well-known digitized datasets (DDSM and MIAS). The performance of the method is evaluated with detection rate and area overlap. The results indicate the proposed scheme could obtain better performance when compared with several existing schemes.

Xiaoli Gao, Keju Wang, Yanan Guo, Zhen Yang, Yide Ma
High Performance and Efficient Facial Recognition Using Norm of ICA/Multiwavelet Features

In this paper, a supervised facial recognition system is proposed. For feature extraction, a Two-Dimensional Discrete Multiwavelet Transform (2D DMWT) is applied to the training databases to compress the data and extract useful information from the face images. Then, a Two-Dimensional Fast Independent Component Analysis (2D FastICA) is applied to different combinations of poses corresponding to the subimages of the low-low frequency subband of the MWT, and the $$\ell _2$$-norm of the resulting features are computed to obtain discriminating and independent features, while achieving significant dimensionality reduction. The compact features are fed to a Neural Network (NNT) based classifier to identify the unknown images. The proposed techniques are evaluated using three different databases, namely, ORL, YALE, and FERET. The recognition rates are measured using K-fold Cross Validation. The proposed approach is shown to yield significant improvement in storage requirements, computational complexity, as well as recognition rates over existing approaches.

Ahmed Aldhahab, George Atia, Wasfy B. Mikhael
Dynamic Hand Gesture Recognition Using Generalized Time Warping and Deep Belief Networks

Body gestures play an important role in human communications, specially hand gestures are the most distinctive features in sign languages. Several works have been proposed in order to recognize hand gestures using static and dynamic approaches. Nevertheless, due to the high variety of signs and the dynamic changes exhibited in different hand motions, a strategy for modeling these dynamic changes in hand signs must be fulfilled. In this work we propose a framework for dynamic hand gesture recognition using a well known method for alignment of time series as the Generalized Time Warping (GTW). Several features are extracted from the aligned sequences of hand gestures based on texture descriptors. Then a methodology for hand motion recognition is carried out based on Convolutional Neural Networks. The obtained results show that the methodology proposed allows an accurate recognition of several hand gestures obtained from the RVL-SLLL American Sign Language Database.

Cristian A. Torres-Valencia, Hernán F. García, Germán A. Holguín, Mauricio A. Álvarez, Álvaro Orozco
Gaussian Processes for Slice-Based Super-Resolution MR Images

Magnetic resonance imaging (MRI) is a medical technique used in radiology to obtain anatomical images of healthy and pathological tissues. Due to hardware limitations and clinical protocols, MRI data are often acquired with low-resolution. For this reason, the scientific community has been developing super-resolution (SR) methodologies in order to enhance spatial resolution through post-processing of 2D multi-slice images. The enhancement of spatial resolution in magnetic resonance (MR) images improves clinical procedures such as tissue segmentation, registration and disease diagnosis. Several methods to perform SR-MR images have been proposed. However, they present different drawbacks: sensitivity to noise, high computational cost, and complex optimization algorithms. In this paper, we develop a supervised learning methodology to perform SR-MR images using a patch-based Gaussian process regression (GPR) method. We compare our approach with nearest-neighbor interpolation, B-splines and a SR-GPR scheme based on nearest-neighbors. We test our SR-GPR algorithm in MRI-T1 and MRI-T2 studies, evaluating the performance through error metrics and morphological validation (tissue segmentation). Results obtained with our methodology outperform the other alternatives for all validation protocols.

Hernán Darío Vargas Cardona, Andrés F. López-Lopera, Álvaro A. Orozco, Mauricio A. Álvarez, Juan Antonio Hernández Tamames, Norberto Malpica
Congestion-Aware Warehouse Flow Analysis and Optimization

Generating realistic configurations of urban models is a vital part of the modeling process, especially if these models are used for evaluation and analysis. In this work, we address the problem of assigning objects to their storage locations inside a warehouse which has a great impact on the quality of operations within a warehouse. Existing storage policies aim to improve the efficiency by minimizing travel time or by classifying the items based on some features. We go beyond existing methods as we analyze warehouse layout network in an attempt to understand the factors that affect traffic within the warehouse. We use simulated annealing based sampling to assign items to their storage locations while reducing traffic congestion and enhancing the speed of order picking processes. The proposed method enables a range of applications including efficient storage assignment, warehouse reliability evaluation and traffic congestion estimation.

Sawsan AlHalawani, Niloy J. Mitra
Building of Readable Decision Trees for Automated Melanoma Discrimination

Even expert dermatologists cannot easily diagnose a melanoma, because its appearance is often similar to that of a nevus, in particular in its early stage. For this reason, studies of automated melanoma discrimination using image analysis have been conducted. However, no systematic studies exist that offer grounds for the discrimination result in a readable form. In this paper, we propose an automated melanoma discrimination system that it is capable of providing not only the discrimination results but also their grounds by means of utilizing a Random Forest (RF) technique. Our system was constructed based on a total of 1,148 dermoscopy images (168 melanomas and 980 nevi) and uses only their color features in order to ensure the readability of the grounds for the discrimination results. By virtue of our efficient feature selection procedure, our system provides accurate discrimination results (a sensitivity of 79.8 % and a specificity of 80.7 % with 10-fold cross-validation) under human-oriented limitations and presents the grounds for the results in an intelligible format.

Keiichi Ohki, M. Emre Celebi, Gerald Schaefer, Hitoshi Iyatomi
A Novel Infrastructure for Supporting Display Ecologies

We introduce a novel approach for display ecologies that aims to support users in presentation and discussion scenarios by applying assistance from Smart Meeting Rooms (SMR). We present an infrastructure that allows multiple users to easily integrate their mobile devices into the device ensemble of a SMR and to utilize its large displays to show contents like slides, pictures and other data visualizations. With a tailored editor, multiple users can easily share contents and interactively coordinate the display of information. The content is automatically distributed to the displays of the SMR based on user-defined spatial and temporal links between contents as well as on semantic networks. Further, intention recognition is used to automatically adapt the representation of contents with regard to the current situation. In this way we provide a user-driven smart steering that supports users by automatically reducing their effort to configure and to work with display ecologies.

Christian Eichner, Martin Nyolt, Heidrun Schumann
Visualizing Software Metrics in a Software System Hierarchy

Various software metrics can be derived from a software system to measure inherent quantitative properties such a system can have or not. The general problem with these metrics is the fact that many of them may exist with varying values making an exploration of the raw metric data a challenging task. As another data dimension we have to deal with the hierarchical organization of the software system since we are also interested in software metric correlations or anomalies on different hierarchy levels. In this paper we introduce a visualization concept which shows the hierarchical organization of the software system on the one hand, but also the list of software metrics attached to each hierarchy level on the other hand. This interactive technique exploits the strengths of the human visual system that allow fast pattern recognition to derive similar or different metric patterns in the software hierarchy. The provided visualization technique targets the rapid finding of insights and knowledge in the typically vast amounts of multivariate and hierarchical software metric data. We illustrate the usefulness of our approach in a case study investigating more than 70 software metrics in the Eclipse open source software project.

Michael Burch
Region Growing Selection Technique for Dense Volume Visualization

Selection is a fundamental task in volume visualization as it is often the first step for manipulation and analysis tasks. The presented work describes and investigates a novel 3-Dimensional (3D) selection technique for dense clouds of points. This technique solves issues with current selection techniques employed in such applications by allowing users to select similar regions of datasets without requiring prior knowledge about the structures within the data, thus bypassing occlusion and high density. We designed a prototype and experimented on large dense volumetric datasets. The preliminary results of our performance evaluation and the user-simulated test show encouraging results and indicate in which environments this technique could have high potential.

Lionel B. Sakou, Daniel Wilches, Amy Banic
Computing Voronoi Diagrams of Line Segments in ℝ K in O(n log n) Time

The theoretical bounds on the time required to compute a Voronoi diagram of line segments in 3D are the lower bound of Ω(n2) and the upper bound of O(n3+ε). We present a method here for computing Voronoi diagrams of line segments in O(2a k n log 2n + 2b k n log 2n + 14n + 12c1n) for k-dimensional space. We also present a modification to the Bowyer-Watson method to bring its runtime down to a tight O(n log n).

Jeffrey W. Holcomb, Jorge A. Cobb
Visualizing Aldo Giorgini’s Ideal Flow

This paper offers a more detailed analysis of the mathematical methods that Aldo Giorgini used and aims to serve as a supplement to the biographical sketch of Giorgini’s life, Cybernethisms. Giorgini’s career as a civil engineer led him to become a pioneer of computer art, as he repurposed the frameworks for turbulence visualization into works of art. Specifically, we studied Giorgini’s original materials to gain insight into the mathematical methods that Giorgini used in his visualizations of fluid dynamics. Given today’s computer memory and specialized frameworks, the limitations that Giorgini worked with in his time now seem foreign. We wanted to further understand Giorgini’s use of technology and make it more accessible by creating an interactive web tool named Ideal Flow, that allows users to see fluid dynamics variations in real time. Using WebGL, we were able to implement Giorgini’s frameworks using a modern tool and thus fully enable new viewers to celebrate Giorgini’s contributions. Future generations now have the opportunity to familiarize themselves with Giorgini’s unique approach to art and science, as well as computer art history in general.

Esteban Garcia Bravo, Tim McGraw
Restoration of Blurred-Noisy Images Through the Concept of Bilevel Programming

Finding a compromise between regularity to remove noise and preserving image fidelity for natural images is unarguably a non-trivial problem. This paper proposes a new image restoration algorithm that executes an optimal tradeoff between sharpness and noise to warrant an acceptable result of image restoration based on bilevel programming. The algorithm demands an objective functions to perform denoising on the degraded image for the lower-level problem using the curvelet-based denoising method, while the upper-level problem with ultimate objective function that is to obtain restored image by performing deblurring to the denoised image using an improved Wiener filter. Experiments were conducted for synthetically blurred and noisy images. The experimental result shows that the algorithm successfully restores image detail. Numerical measurements of the image quality reveal that the algorithm is comparable with other state-of-the-art methods and has the advantage for image contrast and preserving edge details.

Jessica Soo Mee Wong, Chee Seng Chan
Free-Form Tetrahedron Deformation

Deformation mechanics in combination with artistic control allows the creation of remarkably fluid and life-like 3-dimensional models. Slightly deforming and distorting a graphical mesh injects vibrant harmonious characteristics that would otherwise be lacking. Having said that, the deformation of high poly complex shapes is a challenging and important problem (e.g., a solution that is computationally fast, exploits parallel architecture, such as, the graphical processing unit, is controllable, and produces aesthetically pleasing results). We present a solution that addresses these problems by combining a tetrahedron interpolation method with an automated tetrahedronization partitioning algorithm. For this paper, we focus on 3-dimensional tetrahedron meshes, while our technique is applicable to both 3-dimensional (tetrahedron) and 2-dimensional (triangulated planar) meshes. With this in mind, we compare and review free-form deformation techniques over the past few years. We also show experimental results to demonstrate our algorithm’s advantages and simplicity compared to other more esoteric approaches.

Ben Kenwright
Innovative Virtual Reality Application for Road Safety Education of Children in Urban Areas

In order to make children develop good safety habits on the streets, it is very important to educate them on this subject at an early age. The technological advancements allow the creation of applications for training assistance and support. Virtual Reality and Augmented Reality are some of the best suitable scientific domains for successful training applications. In this paper, we present an innovative application for child risk prevention and education in urban are as; this one is based on a collaborative Research & Technologies platform which refers to the dynamic simulation of a city containing artificial intelligence and behavior modeling for pedestrians, crowds, vehicles and traffic in 3D visual and audio environment. We propose an interactive scenario for child risk education and prevention. We experiment it in an autonomous city (Paris) represented in a virtual environment and including artificial intelligence. This scenario takes into account the social implication and the relation between real and virtual actors.

Taha Ridene, Laure Leroy, Safwan Chendeb
Vision-Based Vehicle Counting with High Accuracy for Highways with Perspective View

Vehicle detection by motion is still a common method used in vision-based tracking systems due to vehicles’ continuous motion on highways. However, counting accuracy is affected for highways with perspective view due to long-time merging (i.e. blob merging or occlusion) events. In this work, a new way of vehicle counting with high accuracy using two appearance-based classifiers is proposed to detect merging situations and handle vehicle counts. Experimental results on three Las Vegas highways with differing perspective views and congestion difficulties show improvement in counting and general applicability of the proposed method. Moreover, tracking and counting results of a highly cluttered highway indicates greater counting improvement (89 % to 94 %) for highly congested situations.

Mohammad Shokrolah Shirazi, Brendan Morris
Automatic Motion Classification for Advanced Driver Assistance Systems

Many computer vision applications need motion detection and analysis. In this research, a newly developed feature descriptor is used to find sparse motion vectors. Based on the resulting sparse motion field the camera motion is detected and analyzed. Statistical analysis is performed, based on polar representation of motion vectors. Direction of motion is classified, based on the statistical analysis results. The motion field further is used for depth analysis. This proposed method is evaluated with two video sequences under image deformation: illumination change, blurring and camera movement (i.e. viewpoint change). These video sequences are captured from a moving camera (moving/driving car) with moving objects.

Alok Desai, Dah-Jye Lee, Shreeya Mody
Shared Autonomy Perception and Manipulation of Physical Device Controls

As robots begin to enter our homes and workplaces, they will have to deal with the devices and appliances that are already there. These devices are invariably designed with human perception and manipulation abilities in mind. Unfortunately, this often makes them hard for robots to interact with autonomously. Control elements, such as buttons, switches, and knobs, are often hard to identify with current sensors. Even when they are found, it is often not clear how they should be manipulated to achieve a specific goal without extensive background knowledge of the specific device and task.In this paper, we describe a shared-autonomy approach to the identification and operation of these device controls. A human operator provides assistance with perception and high-level planning, while the robot takes care of the low-level actions that depend on closed-loop sensor feedback. We demonstrate our approach by controlling a consumer electronics device, despite not being able to autonomously sense some of its control elements, and give the results of some initial evaluations suggesting that the shared autonomy interface is both easier to use and more efficient than a direct teleoperation interface.

Matthew Rueben, William D. Smart
Condition Monitoring for Image-Based Visual Servoing Using Kalman Filter

In image-based visual servoing (IBVS), the control law is based on the error between the current and desired features on the image plane. The visual servoing system is working well only when all the designed features are correctly extracted. To monitor the quality of feature extraction, in this paper, a condition monitoring scheme is developed. First, the failure scenarios of the visual servoing system caused by incorrect feature extraction are reviewed. Second, we propose a residual generator, which can be used to detect if a failure occurs, based on the Kalman filter. Finally, simulation results are given to verify the effectiveness of the proposed method.

Mien Van, Denglu Wu, Shuzi Sam Ge, Hongliang Ren
Backmatter
Metadaten
Titel
Advances in Visual Computing
herausgegeben von
George Bebis
Richard Boyle
Bahram Parvin
Darko Koracin
Ioannis Pavlidis
Rogerio Feris
Tim McGraw
Mark Elendt
Regis Kopper
Eric Ragan
Zhao Ye
Gunther Weber
Copyright-Jahr
2015
Electronic ISBN
978-3-319-27863-6
Print ISBN
978-3-319-27862-9
DOI
https://doi.org/10.1007/978-3-319-27863-6

Premium Partner