Skip to main content

2016 | Buch

Advances in Visual Computing

12th International Symposium, ISVC 2016, Las Vegas, NV, USA, December 12-14, 2016, Proceedings, Part II

herausgegeben von: George Bebis, Richard Boyle, Bahram Parvin, Darko Koracin, Fatih Porikli, Sandra Skaff, Alireza Entezari, Jianyuan Min, Daisuke Iwai, Amela Sadagic, Carlos Scheidegger, Tobias Isenberg

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

The two volume set LNCS 10072 and LNCS 10073 constitutes the refereed proceedings of the 12th International Symposium on Visual Computing, ISVC 2016, held in Las Vegas, NV, USA in December 2016.

The 102 revised full papers and 34 poster papers presented in this book were carefully reviewed and selected from 220 submissions. The papers are organized in topical sections: Part I (LNCS 10072) comprises computational bioimaging; computer graphics; motion and tracking; segmentation; pattern recognition; visualization; 3D mapping; modeling and surface reconstruction; advancing autonomy for aerial robotics; medical imaging; virtual reality; computer vision as a service; visual perception and robotic systems; and biometrics. Part II (LNCS 9475): applications; visual surveillance; computer graphics; and virtual reality.

Inhaltsverzeichnis

Frontmatter

Applications

Frontmatter
A Sparse Representation Based Classification Algorithm for Chinese Food Recognition

Obesity is becoming a widely concerned health problem of most part of the world. Computer vision based recognition system has great potential to be an efficient tool to monitor food intake and cope with the growing problem of obesity. This paper proposes a food recognition algorithm based on sparse representation. The proposed algorithm learns overcomplete dictionaries from local descriptors including texture and color features that are extracted from food image patches. With the learned two overcomplete dictionaries, a feature vector of the food image can be generated with the sparsely encoded local descriptors. SVM is used for the classification. This research creates a Chinese food image dataset for experiments. Classifying Chinese food is more challenging because they are not as distinguishable visually as western food. The proposed algorithm achieves an average classification accuracy of 97.91% in a dataset of 5309 images that comprises 18 classes. The proposed method can be easily employed to dataset with more classes. Our results demonstrate the feasibility of using the proposed algorithm for food recognition.

Haixiang Yang, Dong Zhang, Dah-Jye Lee, Minjie Huang
Guided Text Spotting for Assistive Blind Navigation in Unfamiliar Indoor Environments

Scene text in indoor environments usually preserves and communicates important contextual information which can significantly enhance the independent travel of blind and visually impaired people. In this paper, we present an assistive text spotting navigation system based on an RGB-D mobile device for blind or severely visually impaired people. Specifically, a novel spatial-temporal text localization algorithm is proposed to localize and prune text regions, by integrating stroke-specific features with a subsequent text tracking process. The density of extracted text-specific feature points serves as an efficient text indicator to guide the user closer to text-likely regions for better recognition performance. Next, detected text regions are binarized and recognized by off-the-shelf optical character recognition methods. Significant non-text indicator signage can also be matched to provide additional environment information. Both recognized results are then transferred to speech feedback for user interaction. Our proposed video text localization approach is quantitatively evaluated on the ICDAR 2013 dataset, and the experimental results demonstrate the effectiveness of our proposed method.

Xuejian Rong, Bing Li, J. Pablo Muñoz, Jizhong Xiao, Aries Arditi, Yingli Tian
Automatic Oil Reserve Analysis Through the Shadows of Exterior Floating Crest Oil Tanks in Highlight Optical Satellite Images

Oil reserve strategy has been implemented in many countries. Although automatic oil reserve analysis could help to estimate the relationship between supply and demand, it is a challenging task and few studies has been done. As the crests of exterior floating crest oil tanks will float up and down according to internal storage, its shadow information can be utilized. Here we proposed a two-step framework to automatically analyze the reserve status of exterior floating crest oil tanks: firstly, detect out the oil tanks with ELSD (for candidate extraction) and CNN (for classification); secondly, analyze the reserve status through the shadows formed under the condition of good illumination. The framework is validated with a artificially calculating method utilizing the view angle. The experimental results show that this method can analyze the reserve status with outstanding performance.

Qingquan Wang, Jinfang Zhang, Xiaohui Hu
Performance Evaluation of Video Summaries Using Efficient Image Euclidean Distance

Video summarization aims to manage video data by providing succinct representation of videos, however its evaluation is somewhat challenging. IMage Euclidean Distance (IMED) has been proposed for the measurement of the similarity of two images. Though it is effective and can tolerate the distortion and/or small movement of the objects, its computational complexity is high in the order of $$O(n^2)$$O(n2). This paper proposes an efficient method for evaluating the video summaries. It retrieves a set of matched frames between automatic summary and the ground truth summary through two way search, in which the similarity between two frames are measured using the Efficient IMED (EIMED), which considers neighboring pixels, rather than all the pixels in the frames. Experimental results based on a publicly accessible dataset has shown that the proposed method is effective in finding precise matches and usually discards the false ones, leading to a more objective measurement of the performance for various techniques.

Sivapriyaa Kannappan, Yonghuai Liu, Bernard Paul Tiddeman
RDEPS: A Combined Reaction-Diffusion Equation and Photometric Similarity Filter for Optical Image Restoration

Restoration of optical images degraded by atmospheric turbulence and various types of noise is still an open problem. In this paper, we propose an optical image restoration method based on a Reaction-Diffusion Equation and Photometric Similarity (RDEPS). We exploit photometric similarity and geometric closeness of the optical image by combining a photometric similarity function and a appropriately defined reaction-diffusion equation. Our resulting RDEPS filter is used to restore images degraded by atmospheric turbulence and noise, including Gaussian noise and impulse noise. Extensive experimental results show that our method outperforms other recently developed methods in terms of PSNR and SSIM. Moreover, the computational efficiency analysis shows that our RDEPS provides efficient restoration of optical images.

Xueqing Zhao, Pavlos Mavridis, Tobias Schreck, Arjan Kuijper
Leveraging Multi-modal Analyses and Online Knowledge Base for Video Aboutness Generation

The Internet has a huge volume of unlabeled videos from diverse sources, making it difficult for video providers to organize and for viewers to consume the content. This paper defines the problem of video aboutness generation (i.e., the automatic generation of a concise natural-language description about a video) and characterizes its differences from closely related problems such as video summarization and video caption. We then made an attempt to provide a solution to this problem. Our proposed system exploits multi-modal analyses of audio, text and visual content of the video and leverages the Internet to identify a top-matched aboutness description. Through an exploratory study involving human judges evaluating a variety of test videos, we found support of the proposed approach.

Raj Kumar Gupta, Yang Yinping
A Flood Detection and Warning System Based on Video Content Analysis

Floods are becoming more frequent and extreme due to climate change. Early detection is critical in providing a timely response to prevent damage to property and life. Previous methods for flood detection make use of specialized sensors or satellite imagery. In this paper, we propose a method for event detection based on video content analysis of feeds from surveillance cameras, which have become more common and readily available. Since these cameras are static, we can use image masks to identify regions of interest in the video where the flood would likely occur. We then perform background subtraction and then use image segmentation on the foreground region. The main features of the segment that we use to identify if it is a flooded region are: color, size and edge density. We use a probabilistic model of the color of the flood based on our set of collected flood images. We determine the size of the segment relative to the frame size as another indicator that it is flood since flooded regions tend to occupy a huge region of the frame. Finally, we perform a form of ripple detection by performing edge detection and using the edge density as a possible indicator for ripples and consequently flood. We then broadcast an SMS message after detecting a flood event consistently across multiple frames for a specified time period. Our results show that this simple technique can adequately detect floods in real-time.

Martin Joshua P. San Miguel, Conrado R. Ruiz Jr.
Efficient CU Splitting Method for HEVC Intra Coding Based on Visual Saliency

Intra coding with quadtree partition structure is a critical feature in the new High Efficiency Video Coding standard and it also causes a dramatic increase in computational complexity. In this paper, based on the visual saliency detection, an efficient CU splitting method is proposed for HEVC intra coding to reduce the complexity of computation. Experimental results show that the proposed method can reduces the coding complexity of the current HM to about 46.15% with only 0.1791% increases in BD rate and 0.0542 dB PSNR losses.

Xin Zhou, Guangming Shi, Wei Zhou
Video Anomaly Detection Based on Adaptive Multiple Auto-Encoders

Anomaly detection in surveillance videos is a challenging problem in computer vision community. In this paper, a novel unsupervised learning framework is proposed to detect and localize abnormal events in real-time manner. Typical methods mainly rely on extracting complex handcraft features and learning only a fitting model for prediction. In contrast, normal events are represented using simple spatio-temporal volume (STV) in our method, then adaptive multiple auto-encoders (AMAE) are constructed to handle the inter-class variation in normal events. When testing on an unknown frame, reconstruction errors of multiple auto-encoders are utilized for prediction. Experiments are performed on UCSD Ped2 and UMN datasets. Experimental results show that our method is effective to detect and localize abnormal events at a speed of 70 fps.

Tianlong Bao, Chunhui Ding, Saleem Karmoshi, Ming Zhu
Comprehensive Parameter Sweep for Learning-Based Detector on Traffic Lights

Determining the optimal parameters for a given detection algorithm is not straightforward and what ends up as the final values is mostly based on experience and heuristics. In this paper we investigate the influence of three basic parameters in the widely used Aggregate Channel Features (ACF) object detector applied for traffic light detection. Additionally, we perform an exhaustive search for the optimal parameters for the night time data from the LISA Traffic Light Dataset. The optimized detector reaches an Area-Under-Curve of 66.63% on calculated precision-recall curve.

Morten B. Jensen, Mark P. Philipsen, Thomas B. Moeslund, Mohan Trivedi
An Efficient Pedestrian Detector Based on Saliency and HOG Features Modeling

Most of pedestrian detection existing approaches rely on applying descriptors to the entire image or use a sliding window which resize the matching window at different scales and scan the image. However, these methods suffer from low computational efficiency and time consuming. We propose in this paper the use of saliency detection based on contourlet transform to generate a region of interest (ROI). The resulting saliency map is then used for features extraction using the HOG descriptor. Finally, the distribution of the generated features is estimated by a two-parameter Weibull model. The built feature vector is after trained using a support vector regression (SVR) classifier. Thus, the proposed approach provides two contributions. (1) By designing a saliency detection, we aim to remove noisy and busy background and focus on the area where the object exists which enhance the accuracy of the classification process. (2) By modeling the generated features, we intend to reduce the training dimension and make the system computationally efficient in real-time, or soft real-time. The results of the experimental study made on the challenging INRIA data set prove the effectiveness of the proposed approach.

Mounir Errami, Mohammed Rziza

Visual Surveillance

Frontmatter
Preventing Drowning Accidents Using Thermal Cameras

Every year approximately 372 000 people die from unintentional drowning, causing it to be a top-3 cause to unintentional injury [1]. In Denmark 25% of drownings happen at harbor areas [2]. To address this problem thermal cameras have been placed strategically at a harbor. Using computer vision techniques an automatic surveillance system for predicting and detecting drowning accidents has been implemented. First a person detector has been implemented using simple human characteristics. The person is tracked using a Kalman Filter. Using the tracker as a prior, a fall prediction is determined. A fall detector is implemented using a virtual trip-wire in combination with an optical flow algorithm making the system able to detect 100% of all falls and only yielding a 0.08 false positive rate hourly. The entire system has been developed using 155 h of real life thermal video, hereof 56 h are manually annotated.

Soren Bonderup, Jonas Olsson, Morten Bonderup, Thomas B. Moeslund
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity Recognition Using Wearable Sensors

Due to its symbolic role in ubiquitous health monitoring, physical activity recognition with wearable body sensors has been in the limelight in both research and industrial communities. Physical activity recognition is difficult due to the inherent complexity involved with different walking styles and human body movements. Thus we present a correntropy induced dictionary pair learning framework to achieve this recognition. Our algorithm for this framework jointly learns a synthesis dictionary and an analysis dictionary in order to simultaneously perform signal representation and classification once the time-domain features have been extracted. In particular, the dictionary pair learning algorithm is developed based on the maximum correntropy criterion, which is much more insensitive to outliers. In order to develop a more tractable and practical approach, we employ a combination of alternating direction method of multipliers and an iteratively reweighted method to approximately minimize the objective function. We validate the effectiveness of our proposed model by employing it on an activity recognition problem and an intensity estimation problem, both of which include a large number of physical activities from the recently released PAMAP2 dataset. Experimental results indicate that classifiers built using this correntropy induced dictionary learning based framework achieve high accuracy by using simple features, and that this approach gives results competitive with classical systems built upon features with prior knowledge.

Sherin M. Mathews, Chandra Kambhamettu, Kenneth E. Barner
3D Human Activity Recognition Using Skeletal Data from RGBD Sensors

In this paper, a new effective method was proposed to recognize human actions based on RGBD data sensed by a depth camera, namely Microsoft Kinect. Skeleton data extracted from depth images was utilized to generate 10 direction features which represent specific body parts and 11 position features which represent specific human joints. The fusion features composed of both was used to represent a human posture. An algorithm based on the difference level of adjacent postures was presented to select the key postures from an action. Finally, the action features, composed of the key postures’ features, were classified and recognized by a multiclass Support Vector Machine. Our major contributions are proposing a new framework to recognize the users’ actions and a simple and effective method to select the key postures. The recognition results in the KARD dataset and the Florence 3D Action dataset show that our approach significantly outperforms the compared methods.

Jiaxu Ling, Lihua Tian, Chen Li
Unsupervised Deep Networks for Temporal Localization of Human Actions in Streaming Videos

We propose a deep neural network which captures latent temporal features suitable for localizing actions temporally in streaming videos. This network uses unsupervised generative models containing autoencoders and conditional restricted Boltzmann machines to model temporal structure present in an action. Human motions are non-linear in nature, and thus require continuous temporal model representation of motion which are crucial for streaming videos. The generative ability would help predict features at future time steps which can give an indication of completion of action at any instant. To accumulate M classes of action, we train an autencoder to seperate out actions spaces, and learn generative models per action space. The final layer accumulates statistics from each model, and estimates action class and percentage of completion in a segment of frames. Experimental results prove that this network provides a good predictive and recognition capability required for action localization in streaming videos.

Binu M. Nair
A New Method for Fall Detection of Elderly Based on Human Shape and Motion Variation

Fall detection for elderly and patient has been an active research topic due to the great demand for products and technology of fall detection in the healthcare industry. Computer vision provides a promising solution to analyze personal behavior and detect certain unusual events such as falls. In this paper, we present a new method for fall detection based on the variation of shape and motion. First, we use the CodeBook method to extract the person silhouette from the video. Then, information of rectangle, ellipse and histogram projection are used to provide features to analyze the person shape. In addition, we represent the person shape by three blocks extracted from rectangle. Then, we use optical flow to analyze the person motion within each blocks. Finally, falls are detected from normal activities using thresholding-based method. All experiments show that our fall detection system achieves very good performances in accuracy and error rate.

Abderrazak Iazzi, Mohammed Rziza, Rachid Oulad Haj Thami, Driss Aboutajdine
Motion of Oriented Magnitudes Patterns for Human Action Recognition

In this paper, we present a novel descriptor for human action recognition, called Motion of Oriented Magnitudes Patterns (MOMP), which considers the relationships between the local gradient distributions of neighboring patches coming from successive frames in video. The proposed descriptor also characterizes the information changing across different orientations, is therefore very discriminative and robust. The major advantages of MOMP are its very fast computation time and simple implementation. Subsequently, our features are combined with an effective coding scheme VLAD (Vector of locally aggregated descriptors) in the feature representation step, and a SVM (Support Vector Machine) classifier in order to better represent and classify the actions. By experimenting on several common benchmarks, we obtain the state-of-the-art results on the KTH dataset as well as the performance comparable to the literature on the UCF Sport dataset.

Hai-Hong Phan, Ngoc-Son Vu, Vu-Lam Nguyen, Mathias Quoy

Computer Graphics

Frontmatter
Adaptive Video Transition Detection Based on Multiscale Structural Dissimilarity

The fast growth in the acquisition and dissemination of videos has driven the development of diverse multimedia applications, such as interactive broadcasting, entertainment, surveillance, telemedicine, among others. Due to the massive amount of generated data, a challenging task is to store, browse and retrieve video content efficiently. This work describes and analyzes a novel automatic video transition method based on multiscale inter-frame dissimilarity vectors. The shot frames are identified by means of an adaptive local threshold mechanism. Experimental results demonstrate that the proposed approach is capable of achieving high accuracy rates when applied to several video sequences.

Anderson Carlos Sousa e Santos, Helio Pedrini
Fast and Accurate 3D Reconstruction of Dental Models

There are three main processes in 3D reconstruction: point cloud generation, point cloud registration, and point cloud merging. A merging algorithm is necessary to fuse range images obtained from multiple directions in order to achieve a complete model of a single surface. In merging phase, most low cost RGB-d sensors use volumetric range image processing (VRIP) to fuse 3D data in realtime. However, VRIP isn’t suitable for 3D measurement of dental models because the quality of 3D data from its low resolution depth image cannot satisfy high precision of dental CAD systems. To achieve greater details, We suggest to introduce a new idea, so-called angle truncation, into VRIP to fuse 3D data quickly and retain fine details simultaneously. We also discuss various distance metric and blending functions in scanning dental impressions. Finally, dental impression model is scanned to compare the accuracy and the speed of our method, the original VRIP, and Poisson surface reconstruction which is often preferred in 3d reconstruction. The results show that the method’s accuracy is improved to the original VRIP and the time efficiency is enhanced compared to the Poisson surface reconstruction.

Seongje Jang, Yonghee Hahm, Kunwoo Lee
A Portable and Unified CPU/GPU Parallel Implementation of Surface Normal Generation Algorithm from 3D Terrain Data

The Multi-mission Instrument Processing Lab (MIPL) is responsible for developing much of the ground support software for the Mars missions. The MIPL pipeline is used to generate several products from a one mega-pixel image within a 30 min time constraint. In future missions, this time constraint will be decreased to five minutes for 20 mega-pixels images, requiring a minimum 120 times speed-up from current operational hardware and software. Moreover, any changes to the current software must preserve the source code’s maintainability and portability for future missions. Therefore, the surface normal generation software has been implemented on a Graphical Processing Unit (GPU) through the use of the NVidia CUDA Toolkit and Hemi Library to allow for minimum code complexity. Several changes have been made to Hemi Library to allow for additional optimizations of the GPU code. In addition, several challenges to developing a parallelized GPU implementation of the surface normal generation algorithm are explored, while both tested and prospective solutions to these problems are described.

Brandon Wilson, Robert Deen, Alireza Tavakkoli
Character Animation: An Automated Gait Cycle for 3D Characters Using Mathematical Equations

With the increasing importance of 3D graphics, many types of animation have evolved to perfectly simulate motion. Referring to many movies, games, etc., almost all characters undergo gait cycles. The aim of this paper is to auto-generate realistic gait cycles; thus time and effort could be saved. This paper derives mathematical equations used in describing the gait cycle and tests these equations on several 3D characters to prove their validity using Maya program.

Mary Guindy, Rimon Elias
Realistic 3D Modeling of the Liver from MRI Images

It is increasingly difficult to take care of our health with our fast paced lifestyles. If people were more aware of their health conditions, we believe change would come more easily. The Haptic Elasticity Simulator (HES) was designed so hospitals and clinics could show a patient his or her organ and poke it with a haptic device to feel its elasticity, in hopes the patient will change their lifestyle choices. This paper builds upon HES and improves the visual aspect. We discuss an end-to-end pipeline with minimal human interaction to create and render a realistic model of a patients liver. The pipeline uses a patients MRI images to create an initial mesh which is then processed to make it look realistic using ITK, VTK, and our own implementations.

Andrew Conegliano, Jürgen P. Schulze

Virtual Reality

Frontmatter
An Integrated Cyber-Physical Immersive Virtual Reality Framework with Applications to Telerobotics

This paper presents an architecture to integrate a number of robotic platforms in interactive immersive virtual environments. The architecture, termed ArVETO (Aria Virtual Environment for Tele- Operation), is a client-server framework that communicates directly with a state-of-the-art game engine to utilize a virtual environment in support of tele-robotics and tele-presence. The strength of the proposed architecture is that it allows for the integration of heterogeneous robotic systems in an intelligent immersive environment for intuitive interaction between the robot and its operators. By utilizing an immersive virtual reality medium, an operator can more naturally interact with the robot; as buttons and joysticks can be replaced with hand gestures and interactions with the virtual environment. This provides a higher degree of immersion and interactivity for the operator when compared to more traditional control schemes.

Matthew Bounds, Brandon Wilson, Alireza Tavakkoli, Donald Loffredo
Teacher-Student VR Telepresence with Networked Depth Camera Mesh and Heterogeneous Displays

We present a novel interface for a teacher guiding students immersed in virtual environments. Our approach uses heterogeneous displays, with a teacher using a large 2D monitor while multiple students use immersive head-mounted displays. The teacher is sensed by a depth camera (Kinect) to capture depth and color imagery, which are streamed to student stations to inject a realistic 3D mesh of the teacher into the environment. To support communication needed for an educational application, we introduce visual aids to help teachers point and to help them establish correct eye gaze for guiding students. The result allowed an expert guide in one city to guide users located in another city through a shared educational environment. We include substantial technical details on mesh streaming, rendering, and the interface, to help other researchers.

Sam Ekong, Christoph W. Borst, Jason Woodworth, Terrence L. Chambers
Virtual Reality Integration with Force Feedback in Upper Limb Rehabilitation

In this article, it presents an alternative rehabilitation system for upper extremity fine motor skills by using haptic devices implemented in a virtual reality interface. The proposed rehabilitation system develops 3D shapes and textures observed in virtual reality environments, interaction with environments generate specific rehabilitation exercises for conditions in patients to treat their conditions in the upper extremities; the system presents different rehabilitation environments focused on the use of virtual reality. The system is implemented through bilateral Unity3D software interaction with the Novint Falcon device further Oculus Rift and Leap motion is used for total immersion of the patient with the development of virtual reality. The patient performs a path, based in a rehabilitation entertainment brought about by the displacement and force feedback paths, which are based on physiotherapist’s exercises. Developed experimental results show the efficiency of the system, which generates the human interaction-machine, oriented to develop the human ability.

Víctor H. Andaluz, Pablo J. Salazar, Miguel Escudero V., Carlos Bustamante D., Marcelo Silva S., Washington Quevedo, Jorge S. Sánchez, Edison G. Espinosa, David Rivas
Joint Keystone Correction and Shake Removal for a Hand Held Projector

Images projected on to a planar projection surface undergoes keystone distortion when projector is not perpendicular to projection surface. Further in the case of handheld projector, the projected image does not remain steady on the surface due to shaky movements of the hand. This paper introduces a simple approach to stabilise such shaking images using an additional inertial measurement unit (IMU) consisting of gyroscope and accelerometer sensors attached into handheld projector. The approach explicitly estimates the transformation between projector plane and projection surface through out the perturbation of projector, which is used for calculating the prewarped image. Attached IMU gives the rotation angles along all 3 axes. These angles are used in estimating the prewarping transformation. A novel approach has been presented for solving the stabilization problem for a shaking projector in both calibrated and uncalibrated setting. We demonstrate the effectiveness of this approach in getting a stabilized and keystone corrected image on a planar surface continuously with good accuracy in real time compared to existing methods.

Manevarthe Bhargava, Kalpati Ramakrishnan

Poster Session

Frontmatter
Global Evolution-Constructed Feature for Date Maturity Evaluation

Evolution-Constructed (ECO) Feature as a method to learn image features has achieved very good results on a variety of object recognition and classification applications. When compared with hand-crafted features, ECO-Feature is capable of constructing non-intuitive features that could be overlooked by human experts. Although the ECO features are easy to compute, they are sensitive to small variation of object location and orientation in the images. This paper presents an improved ECO-Feature that addresses these limitations of the original ECO-Feature. The proposed method constructs a global representation of the object and also achieves invariance to small deformations. Two major changes are made in the proposed method to achieve good performance. A non-linear down-sampling technique is employed to reduce the dimensionality of the generated global features and hence improve the training efficiency of ECO-Feature. We apply the global ECO-Feature on a dataset of fruit date to demonstrate the improvement on the original ECO-Feature and the experimental results show the global ECO-Feature’s ability to generate better features for date maturity evaluation.

Meng Zhang, Dah-Jye Lee
An Image Dataset of Text Patches in Everyday Scenes

This paper describes a dataset containing small images of text from everyday scenes. The purpose of the dataset is to support the development of new automated systems that can detect and analyze text. Although much research has been devoted to text detection and recognition in scanned documents, relatively little attention has been given to text detection in other types of images, such as photographs that are posted on social-media sites. This new dataset, known as COCO-Text-Patch, contains approximately 354,000 small images that are each labeled as “text” or “non-text”. This dataset particularly addresses the problem of text verification, which is an essential stage in the end-to-end text detection and recognition pipeline. In order to evaluate the utility of this dataset, it has been used to train two deep convolution neural networks to distinguish text from non-text. One network is inspired by the GoogLeNet architecture, and the second one is based on CaffeNet. Accuracy levels of 90.2% and 90.9% were obtained using the two networks, respectively. All of the images, source code, and deep-learning trained models described in this paper will be publicly available (https://aicentral.github.io/coco-text-patch/).

Ahmed Ibrahim, A. Lynn Abbott, Mohamed E. Hussein
Pre-processing of Video Streams for Extracting Queryable Representation of Its Contents

Automating video stream processing for inferring situations of interest has been an ongoing challenge. This problem is currently exacerbated by the volume of surveillance/monitoring videos generated. Currently, manual or context-based customized techniques are used for this purpose. On the other hand, non-procedural query specification and processing (e.g., the Structured Query Language or SQL) has been well-established, effective, scalable, and used widely. Furthermore, stream processing has extended this approach to sensor data.The focus of this work is to extend and apply well-established non-procedural query processing techniques for inferring situations from video streams. This entails extracting appropriate information from video frames and choosing a suitable representation for expressing situations using queries. In this paper, we elaborate on what to extract, how to extract, and the data model proposed for representing the extracted data for situation analysis using queries. We focus on moving object extraction, its location in the frame, relevant features of an object, and identification of objects across frames along with algorithms and experimental results. Our long-term goal is to establish a framework for adapting stream and event processing techniques for real-time, analysis of video streams.

Manish Annappa, Sharma Chakravarthy, Vassilis Athitsos
Physiological Features of the Internal Jugular Vein from B-Mode Ultrasound Imagery

Traditional methods of capturing vital signs by monitoring electrical impulses are quite effective however this data has the potential to be extracted from alternative technology. Non-invasive monitoring using low-cost ultrasound imaging of arterial and venous vasculature has the potential to detect standard vital signs such as heart and respiratory rate as well as additional parameters such as relative changes in circulating blood volume. This paper explores the feasibility of using ultrasound to monitor these signals by detecting spatial and temporal changes in the internal jugular vein (IJV). Ultrasound videos of the jugular in the transverse plane were collected from a subset of healthy subjects. Frame-by-frame segmentation of the IJV demonstrates frequency characteristics similar to certain physiological systems. Heart and respiratory rate appear to be present in IJV cross-sectional area variations in select ultrasound clips and may provide information regarding the severity of a patient’s illness.

Jordan P. Smith, Mohamed Shehata, Ramsey G. Powell, Peter F. McGuire, Andrew J. Smith
Manifold Interpolation for an Efficient Hand Shape Recognition in the Irish Sign Language

This paper presents interpolation using two-stage PCA for hand shape recognition. In the first stage PCA is performed on the entire training dataset of real human hand images. In the second stage, on separate sub-sets of the projected points in the first-stage eigenspace. The training set contains only a few pose angles. The output is a set of new interpolated manifolds, representing the missing data. The goal of this approach is to create a more robust dataset, able to recognise a hand image from an unknown rotation. We show some accuracy values in recognising unknown hand shapes.

Marlon Oliveira, Alistair Sutherland, Mohamed Farouk
Leaf Classification Using Convexity Moments of Polygons

Research has shown that shape features can be used in the process of object recognition with promising results. However, due to a wide variety of shape descriptors, selecting the right one remains a difficult task. This paper presents a new shape recognition feature: Convexity Moment of Polygons. The Convexity Moments of Polygons is derived from the Convexity measure of polygons. A series of experimentations based on FLAVIA images dataset was performed to demonstrate the accuracy of the proposed feature compared to the Convexity measure of polygons in the field of leaf classification. A classification rate of 92% was obtained with the Convexity Moment of Polygons, 80% with the convexity Measure of Polygons using the Radial Basis function neural networks classifier (RBF).

J. R. Kala, S. Viriri, D. Moodley
Semi-automated Extraction of Retinal Blood Vessel Network with Bifurcation and Crossover Points

Among different retinal analysis tasks, blood vessel extraction plays an important role as it is often the first essential step before any measurement can be made for various applications such as biometric authentication or diagnosis of retinal vascular diseases. In this paper, we present a new method for extraction of blood vessel network with its nodes (bifurcation and crossover points) from retinal images. The first step is to identify pixels with homogeneous vessel elements with a set of four directional filters. Then another step is applied to extract local linear components assuming that a vessel is a set of short linear segments. Through an optimization process this information is combined to extract the vessel network and its nodes. The proposed algorithm was tested on the publicly available DRIVE retinal fundus image database. The experimental results show good precision, recall and F-measure compared to ground truth and a state-of-the-art algorithm for the same dataset.

Z. Nougrara, N. Kihal, J. Meunier
SINN: Shepard Interpolation Neural Networks

A novel feed forward Neural Network architecture is proposed based on Shepard Interpolation. Shepard Interpolation is a method for approximating multi-dimensional functions with known coordinate-value pairs [4]. In a Shepard Interpolation Neural Network (SINN), weights and biases are deterministically initiated to non-zero values. Furthermore, Shepard networks maintain a similar accuracy as traditional Neural Networks with a reduction in memory footprint and number of hyper parameters such as number of layers, layer sizes and activation functions. Shepard Interpolation Networks greatly reduce the complexity of Neural Networks, improving performance while maintaining accuracy. The accuracy of Shepard Networks is evaluated on the MNIST digit recognition task. The proposed architecture is compared to LeCun et al. original work on Neural networks [9].

Phillip Williams
View-Based 3D Objects Recognition with Expectation Propagation Learning

In this paper, we develop an expectation propagation learning framework for the inverted Dirichlet (ID) and Dirichlet mixture models. The main goal is to implement an algorithm to recognize 3D objects. Those objects are in our case from a view-based 3D models database that we have assembled. Following specific rules determined by analyzing the results of our tests, we have been able to get promising recognition rates. Experimental results are presented with different object classes by comparing recognition rates and confidence levels according to different tuning parameters.

Adrien Bertrand, Faisal R. Al-Osaimi, Nizar Bouguila
Age Estimation by LS-SVM Regression on Facial Images

Determining the age of a person just by using an image of his/her face is a research topic in Computer Vision that is being extensively worked on. In contrast to say expression analysis, age determination is dependent on a number of factors. To construe the real age of a person is an esoteric task. The changes that appear on a face are not only due to aging, but also a number of factors like stress, appropriate rest etc. In this paper an approach has been developed to determine true age of a person by making use of some existing algorithms and combining them for maximum efficiency. The image is represented using an Active Appearance Model (AAM). The AAM uses geometrical ratio of the local face features along with wrinkle analysis. Next, to enhance the feature selection, Principle Component Analysis (PCA) is done. For the learning process a Support Vector Machine is used. Relationships in the image are obtained by making use of binarized statistical image features (BSIF) and the patterns are stored in Local Binary Pattern Histograms (LBPH). This histogram acts as input for the learning unit. The SVM is made to learn the patterns by studying the LBPH. Finally after the learning phase, when a new image is taken, a Least Square-Support Vector Machine Regression model (LS-SVM) is used to predict the final age of the person in the image.

Shreyank N. Gowda
Video Cut Detector via Adaptive Features using the Frobenius Norm

One of the first and most important steps in content-based video retrieval is the cut detection. Its effectiveness has a major impact towards subsequent high-level applications such as video summarization. In this paper, a robust video cut detector (VCD) based on different theorems related to the singular value decomposition (SVD) is proposed. In our contribution, the Frobenius norm is performed to estimate the appropriate reduced features from the SVD of concatenated block based histograms (CBBH). After that, according to each segment, each frame will be mapped into $$\tilde{k}$$k~-dimensional vector in the singular space. The classification of continuity values is achieved using an adjusted thresholding technique. Experimental results show the efficiency of our detector, which outperforms recent related methods in detecting the hard cut transitions.

Youssef Bendraou, Fedwa Essannouni, Ahmed Salam, Driss Aboutajdine
Practical Hand Skeleton Estimation Method Based on Monocular Camera

In this paper, we propose a practical hand skeleton reconstruction method using a monocular camera. The proposed method is a fundamental technology that can be applicable to future products such as wearable or mobile devices and smart TVs requiring natural hand interactions. To heighten its practicability, we designed our own hand parameters composed of global hand and local finger configurations. Based on the parameter states, a kinematic hand and its contour can be reconstructed. By adopting palm detection and tracking, global parameters can be easily estimated, which can reduce the search space required for whole parameter estimations. We can then fine-tune the coarse estimated parameters through the use of a Gauss-Newton optimization stage. Experimental results indicate that our method provides a sufficient level of accuracy to be utilized in gesture-interactive applications. The proposed method is light in terms of algorithm complexity and can be applied in real time.

Sujung Bae, Jaehyeon Yoo, Moonsik Jeong, Vladimir Savin
A Nonparametric Hierarchical Bayesian Model and Its Application on Multimodal Person Identity Verification

In this paper, we propose a hierarchical Dirichlet process (HDP) mixture model of inverted Dirichlet (ID) distributions. The proposed model is learned within a principled variational Bayesian framework that we have developed by selecting appropriate priors for the parameters and calculating good approximations to the exact posteriors. The proposed statistical framework is validated via a challenging application namely multimodal person identity verification.

Wentao Fan, Nizar Bouguila
Performance Evaluation of 3D Keypoints and Descriptors

This paper presents a comprehensive evaluation of the performance of common 3D keypoint detectors and descriptors currently available in the Point Cloud Library (PCL) to recover the transformation of 300 real objects. Current research on keypoints detectors and descriptors considers their performance individually in terms of their repeatability or descriptiveness, rather than on their overall performance at multi-sensor alignment or recovery. We present the data on the performance of each pair under all transformations independently: translations and rotations in and around each of the x-, y- and z-axis respectively. We provide insight into the implementation of the detectors and descriptors in PCL leading to abnormal or unexpected performance. The obtained results show that the ISS/SHOT and ISS/SHOTColor detector/descriptor pair works best at 3D recovery under various transformations.

Zizui Chen, Stephen Czarnuch, Andrew Smith, Mohamed Shehata
Features of Internal Jugular Vein Contours for Classification

Portable ultrasound is commonly used to image blood vessels such as the Inferior Vena Cava or Internal Jugular Vein (IJV) in the attempt to estimate patient intravascular volume status. A large number of features can be extracted from a vessel’s cross section. This paper examines the role of shape factors and statistical moment descriptors to classify healthy subjects enrolled in a simulation modeling relative changes in volume status. Features were evaluated using a range of selection methods and tested with a variety of classifiers. It was determined that a subset of features derived from moments are the most appropriate for this task.

Jordan P. Smith, Mohamed Shehata, Peter F. McGuire, Andrew J. Smith
Gathering Event Detection by Stereo Vision

This paper proposes a method for pedestrian gathering detection in real cluttered scenario by stereo vision. Firstly, foreground is converted into 3D cloud points and extracted by spatial confinement with more insensitivity to illumination change. Instead of detecting stationary people in camera view, they are localized in plan view maps which is more resistant to inter person occlusion and people number is directly estimated in multiple plan view statistical maps based on more physically inspired features by regression. In addition, it exhibits superior extensibility to multiple binocular camera system for wider surveillance coverage and higher detection accuracy through fusion. Finally, we contributed the first abnormal dataset with depth information and experimental results on it validate its effectiveness.

Qian Wang, Wei Jin, Gang Wang
Abnormal Detection by Iterative Reconstruction

We propose an automatic abnormal detection method using subspace and iterative reconstruction for visual inspection. In visual inspection, we obtain many normal images and little abnormal images. Thus, we use a subspace method which is trained from only normal images. We reconstruct a test image by the subspace and detect abnormal regions by robust statistics of the difference between the test and reconstructed images. However, the method sometimes gave many false positives when black artificial abnormal regions are added to white regions. This is because neighboring white regions of the black abnormity become dark to represent the black abnormity. To overcome it, we use iterative reconstruction by replacing the abnormal region detected by robust statistics into an intensity value made from normal images. In experiments, we evaluate our method using 4 machine parts and confirmed that the proposed method detect abnormal regions with high accuracy.

Kenta Toyoda, Kazuhiro Hotta
An Integrated Octree-RANSAC Technique for Automated LiDAR Building Data Segmentation for Decorative Buildings

This paper introduces a new method for the automated segmentation of laser scanning data for decorative urban buildings. The method combines octree indexing and RANSAC - two previously established but heretofore not integrated techniques. The approach was successfully applied to terrestrial point clouds of the facades of five highly decorative urban structures for which existing approaches could not provide an automated pipeline. The segmentation technique was relatively efficient and wholly scalable requiring only 1 s per 1,000 points, regardless of the façade’s level of ornamentation or non-recti-linearity. While the technique struggled with shallow protrusions, its ability to process a wide range of building types and opening shapes with data densities as low as 400 pts/m2 demonstrate its inherent potential as part of a large and more sophisticated processing approach.

Fatemeh Hamid-Lakzaeian, Debra F. Laefer
Optimization-Based Multi-view Head Pose Estimation for Driver Behavior Analysis

An optimization–based multi-view head pose estimation method is presented, it takes advantage of the constraint relationship formed by the relative positions of the cameras and the driver’s head to fuse multiple estimation results and generate an optimized solution. The proposed method is novel in the following ways: 1. it introduces the ideal pose constraint conditions for each view pose self-adjustment, 2. it sets up the optimization goal of minimizing the average of the 3D projection error in the 2D plane to guide pose estimated value adjustment, and 3. it determines the adjustment through the iteration process for each view pose. The proposed method can improve the accuracy and confidence of the system estimation, which has been verified by simulation and real measurement.

Huaixin Xiong
Reduction of Missing Wedge Artifact in Oblique-View Computed Tomography

The manufacturer need for high-speed interconnection in three-dimensional integrated circuits (3D ICs) warrants inspection with through-silicon via (TSV) technology. Because the use of a flat component in tomographic reconstruction restricts the range of viewing angles, the computed tomography (CT) system produces limited-view projection images, which causes missing angle artifacts in the reconstructed 3D data. In this paper, we propose a total variation (TV) approach for tomographic image reconstruction. The proposed approach improves the image quality when the sinogram images have equal quality at all viewing angles and the accessible tilt range is restricted only by the physical limits of the oblique-view CT system. This method employs a bowtie TV (b-TV) penalty, which establishes a desirable balance between smooth and piecewise-constant solutions in the missing wedge region. Finally, the images resulting from the proposed method are shown to be smooth with sharp edges and fewer visible artifacts. Furthermore, the overall image quality is higher than those of images obtained by existing TV methods.

Kyung-Chan Jin, Jung-Seok Yoon, Yoon-Ho Song
Using Dense 3D Reconstruction for Visual Odometry Based on Structure from Motion Techniques

Aim of intense research in the field computational vision, dense 3D reconstruction achieves an important landmark with first methods running in real time with millimetric precision, using RGBD cameras and GPUs. However, these methods are not suitable for low computational resources. The goal of this work is to show a method of visual odometry using regular cameras, without using a GPU. The proposed method is based on techniques of sparse Structure from Motion (SFM), using data provided by dense 3D reconstruction. Visual odometry is the process of estimating the position and orientation of an agent (a robot, for instance), based on images. This paper compares the proposed method with the odometry calculated by Kinect Fusion. Odometry provided by this work can be used to model a camera position and orientation from dense 3D reconstruction.

Marcelo de Mattos Nascimento, Manuel Eduardo Loaiza Fernandez, Alberto Barbosa Raposo
Towards Estimating Heart Rates from Video Under Low Light

The ability to read the physiological state of a person using conventional cameras opens the doors to many potential applications such as medical monitoring, human emotion recognition, and even human robot interaction. The estimation of heart rates from video is particularly useful and well suited to reading from conventional cameras as evidenced by a body of recent literature. However, existing work has only been demonstrated to work under relatively good lighting, which limits the range of applications. In this paper, we propose a new approach towards estimating heart rate from video that is robust to low light conditions in addition to motion and changing illuminants. The approach is simple, fast, and we show that it captures the HR effectively.

Antony Lam, Yoshinori Kuno
Video Tracking with Probabilistic Cooccurrence Feature Extraction

Video analysis is a rich research topic for a wide spectrum of applications such as surveillance, activity recognition, security, and event detection. Many challenges affect the efficiency of a tracking algorithm such as scene illumination change, occlusions, scaling and search window for the tracked objects. We present an integrated probabilistic model for object tracking, that combines implicit dynamic shape representations and probabilistic object modeling. Furthermore, this paper describes a novel implementation of the algorithm that runs on a general purpose graphics processing unit (GPGPU), and is suitable for video analysis in a real-time vision system. We demonstrate the utility of the proposed tracking algorithm on a benchmark video tracking data set while achieving state-of-the art results in both overlap-accuracy and speed.

Kaleb Smith, Anthony O. Smith
3-D Shape Recovery from Image Focus Using Rank Transform

Obtaining an accurate and precise depth map is an ultimate goal of 3-D shape recovery. This article proposes a new robust algorithm Rank Transform (RT) for recovering 3-D shape of an object. The rank transform (RT) encodes for each pixel the position of its grey value in the ranking of all the grey values in its neighborhood. Due to its low computational complexity and robustness against noise, it is superior alternative to most of other SFF approaches. The proposed method is experimented using real and synthetic image sequences. The evaluation is gauged on the basis of unimodality and monotonicity of the focus curve. Finally by means of two global statistical metrics Root mean square error (RMSE) and correlation, we show that our method produces – in spite of simplicity- results of competitive quality.

Fahad Mahmood, Jawad Mahmood, Waqar Shahid Qureshi, Umar Shahbaz Khan
Combinatorial Optimization for Human Body Tracking

We present a method of improving the accuracy of a 3D human motion tracker. Beginning with confidence-weighted estimates for the positions of body parts, we solve the shortest path problem to identify combinations of positions that fit the rigid lengths of the body. We choose from multiple sets of these combinations by predicting current positions with kinematics. We also refine this choice by using the geometry of the optional positions. Our method was tested on a data set from an existing motion tracking system, resulting in an overall increase in the sensitivity and precision of tracking. Notably, the average sensitivity of the feet rose from 52.6% to 84.8%. When implemented on a 2.9 GHz processor, the system required an average of 3.5 milliseconds per video frame.

Andrew Hynes, Stephen Czarnuch
Automatic Detection of Deviations in Human Movements Using HMM: Discrete vs Continuous

Automatic detection of correct performance of movements in humans is the core of coaching and rehabilitation applications. Human movement can be studied in terms of sequential data by using different sensor technologies. This representation makes it possible to use models that use sequential data to determine if executions of a certain activity are close enough to the specification or if they must be considered to be erroneous. One of the most widely used approaches for characterization of sequential data are Hidden Markov Models (HMM). They have the advantage of being able to model processes based on data from noisy sources. In this work we explore the use of both discrete and continuous HMMs to label movement sequences as either according to a specification or deviated from it. The results show that the majority of sequences are correctly labeled by the technique, with an advantage for continuous HMM.

Carlos Palma, Augusto Salazar, Francisco Vargas
Quantitative Performance Optimisation for Corner and Edge Based Robotic Vision Systems: A Monte-Carlo Simulation

Corner and edge based robotic vision systems have achieved enormous success in various applications. To quantify and thereby improve the system performance, the standard method is to conduct cross comparisons using benchmark datasets. Such datasets, however, are usually generated for validating specific vision algorithms (e.g. monocular SLAM [1] and stereo odometry [2]). In addition, they are not capable of evaluating robotic systems which require visual feedback signals for motion control (e.g. visual servoing [3]). To develop a more generalised framework to evaluate ordinary corner and edge based robotic vision systems, we propose a novel Monte-Carlo simulation which contains various real-world geometric uncertainty sources. An edge-based global localisation algorithm is evaluated and optimised using the proposed simulation via a large scale Monte-Carlo analysis. During a long-term optimisation, the system performance is improved by around 230 times, while preserving high robustness towards all the simulated uncertainty sources.

Jingduo Tian, Neil Thacker, Alexandru Stancu
Evaluating the Change of Directional Patterns for Fingerprints with Missing Singular Points Under Rotation

Overcoming small inter-class variation when fingerprints have missing singular points (SPs) is one of the current challenges faced in fingerprint classification, since class information is scarce. Grouping the orientation fields to form Directional Patterns (DPs) shows potential in classifying these fingerprints. However, DPs change under rotation. This paper evaluates the change of DPs for fingerprints with missing SPs to determine a method of rotation that produces unique DPs for a Whorl ($$ W $$W) with a single loop and a single delta; a Right Loop ($$ RL $$RL), Left Loop ($$ LL $$LL), Tented Arch (TA) and a $$ W $$W with a single loop; a $$ RL $$RL and $$ LL $$LL with a single delta; and lastly a Plain Arch ($$ PA $$PA) and a Partial Fingerprint ($$ PF $$PF) with no SPs. The proposed method of rotation is based on the remaining SPs and achieves a manual classification accuracy of 91.72% on the FVC 2002 and 2004 DB1, and FVC 2004 DB2.

Kribashnee Dorasamy, Leandra Webb-Ray, Jules-Raymond Tapamo
Particle Detection in Crowd Regions Using Cumulative Score of CNN

In recent years, convolutional neural network gave the state-of-the-art performance on various image recognition benchmarks. Although CNN requires a large number of training images including various locations and sizes of a target, we cannot prepare a lot of supervised intracellular images. In addition, the properties of intracellular images are different from standard images used in computer vision researches. Overlap between particles often occurred in dense regions. In overlapping area, there are ambiguous edges at the peripheral region of particles. This induces the detection error by the conventional method. However, all edges of overlapping particles are not ambiguous. We should use the obvious peripheral edges. Thus, we try to predict the center of a particle from the peripheral regions by CNN, and the prediction results are voted. Since the particle center is predicted from peripheral views, we can prepare many training samples from one particle. High accuracy is obtained in comparison with the conventional binary detector using CNN as a binary classifier.

Kenshiro Nishida, Kazuhiro Hotta
Preliminary Studies on Personalized Preference Prediction from Gaze in Comparing Visualizations

This paper presents a pilot study on the recognition of user preference, manifested as the choice between items, using eye movements. Recently, there have been empirical studies demonstrating user task decoding from eye movements. Such studies promote eye movement signal as a courier of user cognitive state rather than a simple interaction utility, supporting the use of eye movements in demanding cognitive tasks as an implicit cue, obtained unobtrusively. Even though eye movements have been already employed in human-computer interaction (HCI) for a variety of tasks, to the best of our knowledge, they have not been evaluated for personalized preference recognition during visualization comparison. To summarize the contribution, we investigate: “How well do eye movements disclose the user’s preference?” To this end, we build a pilot experiment enforcing high-level cognitive load for the users and record their eye movements and preference choices, asserted explicitly. We then employ Gaussian processes along with other classifiers in order to predict the users’ choices from the eye movements. Our study supports further investigation of the observer preference prediction from eye movements.

Hamed R.-Tavakoli, Hanieh Poostchi, Jaakko Peltonen, Jorma Laaksonen, Samuel Kaski
Simulating a Predator Fish Attacking a School of Prey Fish in 3D Graphics

Schooling behavior is one of the most salient social and group activities among fishes. Previous work in 3D computer graphics focuses primarily on simulating interactions between fishes within the group in normal circumstances, such as maintaining distance between neighbors. Little work has been done on simulating the interactions between the schools of fish and attacking predators. How does a predator pick its target? How do a school of fish react to such attacks? In this paper, we introduce a method to model and simulate interactions between prey fishes and predator fishes in 3D graphics. We model a school of fish as a complex network with information flow, information breakage, and different structural properties. Using this model, we can simulate a predator fish targeting isolated peripheral fish, the primitive escape behavior of prey fishes, and some of the defensive maneuvers exhibited by fish schools.

Sahithi Podila, Ying Zhu
Direct Visual-Inertial Odometry and Mapping for Unmanned Vehicle

We present a direct visual-inertial system that can track camera motions and map the environment. This method aligns input images directly based on the intensity of pixels and minimizes the photometric error, instead of using key features detected in the images. IMU measurements provide additional constraints to suppress the scale drift induced by the visual odometry. The depth information for each pixel can be computed either from the inverse depth estimation or from stereo images. Experiments using an existing dataset shows that the performance of our method is comparable to that of a latest reported method.

Wenju Xu, Dongkyu Choi
Real-Time Automated Aerial Refueling Using Stereo Vision

Aerial Refueling (AAR) of Unammed Aerial Vehicles (UAVs) is vital to the United States Air Force’s (USAF) continued air superiority. Inspired by the stereo vision system organic to the new KC-46A Pegasus tanker, we present a novel solution for computing a real time relative navigation (rel-nav) vector between a refueling UAV and a KC-46. Our approach relies on a real time 3D virtual simulation environment that models a realistic refueling scenario. Within this virtual scenario, a stereo camera system mounted beneath the KC-46 consumes synthetic imagery of a receiver conducting an aerial refueling approach. This synthetic imagery is processed by computer vision algorithms that calculate the sensed rel-nav position and orientation. The sensed solution is compared against the virtual environment’s truth data to quantify error and evaluate the stereo vision performance in a deterministic, real time manner. Our approach yields sub-meter precision at approximately 30 Hz.

Christopher Parsons, Scott Nykl
Signature Embedding: Writer Independent Offline Signature Verification with Deep Metric Learning

The handwritten signature is widely employed and accepted as a proof of a person’s identity. In our everyday life, it is often verified manually, yet only casually. As a result, the need for automatic signature verification arises. In this paper, we propose a new approach to the writer independent verification of offline signatures. Our approach, named Signature Embedding, is based on deep metric learning. Comparing triplets of two genuine and one forged signature, our system learns to embed signatures into a high-dimensional space, in which the Euclidean distance functions as a metric of their similarity. Our system ranks best in nearly all evaluation metrics from the ICDAR SigWiComp 2013 challenge. The evaluation shows a high generality of our system: being trained exclusively on Latin script signatures, it outperforms the other systems even for signatures in Japanese script.

Hannes Rantzsch, Haojin Yang, Christoph Meinel
Backmatter
Metadaten
Titel
Advances in Visual Computing
herausgegeben von
George Bebis
Richard Boyle
Bahram Parvin
Darko Koracin
Fatih Porikli
Sandra Skaff
Alireza Entezari
Jianyuan Min
Daisuke Iwai
Amela Sadagic
Carlos Scheidegger
Tobias Isenberg
Copyright-Jahr
2016
Electronic ISBN
978-3-319-50832-0
Print ISBN
978-3-319-50831-3
DOI
https://doi.org/10.1007/978-3-319-50832-0