nach oben

2007 | Buch

Kapitel lesen Erstes Kapitel lesen

Advanced Concepts for Intelligent Vision Systems

9th International Conference, ACIVS 2007, Delft, The Netherlands, August 28-31, 2007. Proceedings

herausgegeben von: Jacques Blanc-Talon, Wilfried Philips, Dan Popescu, Paul Scheunders

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Inhaltsverzeichnis

Frontmatter

Computer Vision

A Framework for Scalable Vision-Only Navigation

This paper presents a monocular vision framework enabling feature-oriented appearance-based navigation in large outdoor environments containing other moving objects. The framework is based on a hybrid topological-geometrical environment representation, constructed from a learning sequence acquired during a robot motion under human control. The framework achieves the desired navigation functionality without requiring a global geometrical consistency of the underlying environment representation. The main advantages with respect to conventional alternatives are unlimited scalability, real-time mapping and effortless dealing with interconnected environments once the loops have been properly detected. The framework has been validated in demanding, cluttered and interconnected environments, under different imaging conditions. The experiments have been performed on many long sequences acquired from moving cars, as well as in real-time large-scale navigation trials relying exclusively on a single perspective camera. The obtained results imply that a globally consistent geometric environment model is not mandatory for successful vision-based outdoor navigation.

Siniša Šegvić, Anthony Remazeilles, Albert Diosi, François Chaumette

Visual Tracking by Hypothesis Testing

A new approach for tracking a non-rigid target is presented. Tracking is formulated as a Maximum A Posteriori (MAP) segmentation problem where each pixel is assigned a binary label indicating whether it belongs to the target or not. The label field is modeled as a Markov Random Field whose Gibbs energy comprises three terms. The first term quantifies the error in matching the object model with the object’s appearance as given by the current segmentation. Coping with the deformations of the target while avoiding optical flow computation is achieved by marginalizing this likelihood over all possible motions per pixel. The second term penalizes the lack of continuity in the labels of the neighbor pixels, thereby encouraging the formation of a smoothly shaped object mask, without holes. Finally, for the sake of increasing robustness, the third term constrains the object mask to assume an elliptic shape model with unknown parameters. MAP optimization is performed iteratively, alternating between estimating the shape parameters and recomputing the segmentation using updated parameters. The latter is accomplished by discriminating each pixel via a simple hypothesis test. We demonstrate the efficiency of our approach on synthetic and real video sequences.

Valentin Enescu, Ilse Ravyse, Hichem Sahli

A New Approach to the Automatic Planning of Inspection of 3D Industrial Parts

The present article describes a novel algorithm of planning in order to carry out in an automatic way the dimensional inspection of elements with three-dimensional characteristic and which ones belong to the manufactured pieces, the measurements are obtained with a high precision. The method is considered as generalized from the piece complexity, the points since the measurements must be done and the range of the application of the system is not limited. According to the previously mentioned the analysis discretizes the space configurations of the positioning system of the piece and the surface of the own piece. All the techniques here presented have been proved and validated in a real system of inspection; the system is based on stereoscopic cameras which are endowed with a laser light.

J. M. Sebastián, D. García, A. Traslosheros, F. M. Sánchez, S. Domínguez, L. Pari

Low Latency 2D Position Estimation with a Line Scan Camera for Visual Servoing

This paper describes the implementation of a visual position estimation algorithm, using a line-scan sensor positioned at an angle over a 2D repetitive pattern. An FFT is used with direct interpretation of the phase information at the fundamental frequencies of the pattern. The algorithm is implemented in a FPGA. The goal is to provide fast position estimation on visual data, to be used as feedback information in a dynamic control system. Traditional implementations of these systems are often hampered by low update rates (<100 Hz) and/or large latencies (>10 msec). These limit the obtainable bandwidths of the control system. Presented here is an implementation of an algorithm with a high update rate (30kHz) and low latency (100

sec). This system can be used for a range of repetitive structures and has a high robustness. Resolutions of less than 0.1

m have been demonstrated on real products with 210x70

m feature size.

Peter Briër, Maarten Steinbuch, Pieter Jonker

Optimization of Quadtree Triangulation for Terrain Models

The aim of the study is to increase the accuracy of a terrain triangulation while maintaining or reducing the number of triangles. To this end, a non-trivial algorithm for quadtree triangulation is proposed. The proposed algorithm includes: i) a resolution parameters calculation technique and ii) three error metric calculation techniques. Simulation software is also devised to apply the proposed algorithm. Initially, a data file is read to obtain the elevation data of a terrain. After that, a 3D mesh is generated by using the original quadtree triangulation algorithm and the proposed algorithm. For each of the algorithms, two situations are analyzed: i) the situation with fixed resolution parameters and ii) the situation with dynamically changing resolution parameters. For all of the cases, terrain accuracy value and number of triangles of 3D meshes are calculated and evaluated. Finally, it is shown that dynamically changing resolution parameters improve the algorithms’ performance.

Refik Samet, Emrah Ozsavas

Analyzing DGI-BS: Properties and Performance Under Occlusion and Noise

This paper analyzes a new 3D recognition method for occluded objects in complex scenes. The technique uses the Depth Gradient Image Based on Silhouette representation (DGI-BS) and settles the problem of identification-pose under occlusion and noise requirements. DGI-BS synthesizes both surface and contour information avoiding restrictions concerning the layout and visibility of the objects in the scene. Firstly, the paper is devoted to show the main properties of this method compared with a set of known techniques as well as to explain briefly the key concepts of the DGI-BS representation. Secondly, the performance of this strategy in real scenes under occlusion and noise circumstances is presented in detail.

Pilar Merchán, Antonio Adán

Real-Time Free Viewpoint from Multiple Moving Cameras

In recent years, some Video-Based Rendering methods have advanced from off-line rendering to on-line rendering. However very few of them can handle moving cameras while recording. Moving cameras enable to follow an actor in a scene, come closer to get more details or just adjust the framing of the cameras. In this paper, we propose a new Video-Based Rendering method that creates new views of the scene in live from four moving webcams. These cameras are calibrated in real-time using multiple markers. Our method fully uses both CPU and GPU and hence requires only one consumer grade computer.

Vincent Nozick, Hideo Saito

A Cognitive Modeling Approach for the Semantic Aggregation of Object Prototypes from Geometric Primitives: Toward Understanding Implicit Object Topology

Object recognition has developed to the most common approach for detecting arbitrary objects based on their appearance, where viewpoint dependency, occlusions, algorithmic constraints and noise are often hindrances for successful detection. Statistical pattern analysis methods, which are able to extract features from appearing images and enable the classification of the image content have reached a certain maturity and achieve excellent recognition on rather complex problems.

However, these systems do not seem directly scalable to human performance in a cognitive sense and appearance does not contribute to understanding the structure of objects. Syntactical pattern recognition methods are able to deal with structured objects, which may be constructed from primitives that were generated from extracted image features. Here, an eminent problem is how to aggregate image primitives in order to (re-) construct objects from such primitives.

In this paper, we propose a new approach to the aggregation of object prototypes by using geometric primitives derived from features out of image sequences and acquired from changing viewpoints. We apply syntactical rules for forming representations of the implicit object topology of object prototypes by a set of fuzzy graphs. Finally, we find a super-position of a prototype graph set, which can be used for updating and learning new object recipes in hippocampal like episodic memory that paves the way to cognitive understanding of natural scenes. The proposed implementation is exemplified with an object similar to the Necker cube.

Peter Michael Goebel, Markus Vincze

A Multi-touch Surface Using Multiple Cameras

In this paper we present a framework for a multi-touch surface using multiple cameras. With an overhead camera and a side-mounted camera we determine the fingertip positions. After determining the fundamental matrix that relates the two cameras, we calculate the three dimensional coordinates of the fingertips. The intersection of the epipolar lines from the overhead camera with the fingertips detected in the side camera image provides the fingertip height. Touches are detected when the measured fingertip height from the touch surface is zero. We interpret touch events as hand gestures which can be generalized into commands for manipulating applications. We offer an example application of a multi-touch finger painting program.

Itai Katz, Kevin Gabayan, Hamid Aghajan

Fusion, Detection and Classification

Fusion of Bayesian Maximum Entropy Spectral Estimation and Variational Analysis Methods for Enhanced Radar Imaging

A new fused Bayesian maximum entropy–variational analysis (BMEVA) method for enhanced radar/synthetic aperture radar (SAR) imaging is addressed as required for high-resolution remote sensing (RS) imagery. The variational analysis (VA) paradigm is adapted via incorporating the image gradient flow norm preservation into the overall reconstruction problem to control the geometrical properties of the desired solution. The metrics structure in the corresponding image representation and solution spaces is adjusted to incorporate the VA image formalism and RS model-level considerations; in particular, system calibration data and total image gradient flow power constraints. The BMEVA method aggregates the image model and system-level considerations into the fused SSP reconstruction strategy providing a regularized balance between the noise suppression and gained spatial resolution with the VA-controlled geometrical properties of the resulting solution. The efficiency of the developed enhanced radar imaging approach is illustrated through the numerical simulations with the real-world SAR imagery.

Yuriy Shkvarko, Rene Vazquez-Bautista, Ivan Villalon-Turrubiates

A PDE-Based Approach for Image Fusion

In this paper, we present a new general method for image fusion based on Partial Differential Equation (PDE). We propose to combine pixel-level fusion and diffusion processes through one single powerful equation. The insertion of the relevant information contained in sources is achieved in the fused image by reversing the diffusion process. To solve the well-known instability problem of an inverse diffusion process, a regularization term is added. One of the advantages of such an original approach is to improve the quality of the results in case of noisy input images. Finally, few examples and comparisons with classical fusion models will demonstrate the efficiency of our method both on blurred and noisy images.

Sorin Pop, Olivier Lavialle, Romulus Terebes, Monica Borda

Improvement of Classification Using a Joint Spectral Dimensionality Reduction and Lower Rank Spatial Approximation for Hyperspectral Images

Hyperspectral images (HSI) are multidimensional and multicomponent data with a huge number of spectral bands providing spectral redundancy. To improve the efficiency of the classifiers the principal component analysis (PCA), referred to as

PCA

, the maximum noise fraction (MNF) and more recently the independent component analysis (ICA), referred to as

ICA

are the most commonly used techniques for dimensionality reduction (DR). But, in HSI and in general when dealing with multi-way data, these techniques are applied on the vectorized images, providing a two-way data. The spatial representation is lost and the spectral components are selected using only spectral information. As an alternative, in this paper, we propose to consider HSI as array data or tensor -instead of matrix- which offers multiple ways to decompose data orthogonally.We develop two news DR methods based on multilinear algebra tools which perform the DR using the

PCA

for the first one and using the

ICA

for the second one. We show that the result of spectral angle mapper (SAM) classification is improved by taking advantage of jointly spatial and spectral information and by performing simultaneously a dimensionality reduction on the spectral way and a projection onto a lower dimensional subspace of the two spatial ways.

N. Renard, S. Bourennane, J. Blanc-Talon

Learning-Based Object Tracking Using Boosted Features and Appearance-Adaptive Models

This paper presents a learning-based algorithm for object tracking. During on-line learning we employ most informative and hard to classify examples, features maximizing individually the mutual information, stable object features within all past observations and features from the initial object template. The object undergoing tracking is discriminated by a boosted classifier built on regression stumps. We seek mode in the confidence map calculated by the strong classifier to sample new features. In a supplementing tracker based upon a particle filter we use a recursively updated mixture appearance model, which depicts stable structures in images seen so far, initial object appearance as well as two-frame variations. The update of slowly varying component is done using only pixels that are classified by the strong classifier as belonging to foreground. The estimates calculated by particle filter allow us to sample supplementary features for learning of the classifier. The performance of the algorithm is demonstrated on freely available test sequences. The resulting algorithm runs in real-time.

Bogdan Kwolek

Spatiotemporal Fusion Framework for Multi-camera Face Orientation Analysis

In this paper, we propose a collaborative technique for face orientation estimation in smart camera networks. The proposed spatiotemporal feature fusion analysis is based on active collaboration between the cameras in data fusion and decision making using features extracted by each camera. First, a head strip mapping method is proposed based on a Markov model and a Viterbi-like algorithm to estimate the relative angular differences to the face between the cameras. Then, given synchronized face sequences from several camera nodes, the proposed technique determines the orientation and the angular motion of the face using two features, namely the hair-face ratio and the head optical flow. These features yield an estimate of the face orientation and the angular velocity through simple analysis such as Discrete Fourier Transform (DFT) and Least Squares (LS), respectively. Spatiotemporal feature fusion is implemented via key frame detection in each camera, a forward-backward probabilistic model, and a spatiotemporal validation scheme. The key frames are obtained when a camera node detects a frontal face view and are exchanged between the cameras so that local face orientation estimates can be adjusted to maintain a high confidence level. The forward-backward probabilistic model aims to mitigate error propagation in time. Finally, a spatiotemporal validation scheme is applied for spatial outlier removal and temporal smoothing. A face view is interpolated from the mapped head strips, from which snapshots at the desired view angles can be generated. The proposed technique does not require camera locations to be known in prior, and hence is applicable to vision networks deployed casually without localization.

Chung-Ching Chang, Hamid Aghajan

Independent Component Analysis-Based Estimation of Anomaly Abundances in Hyperspectral Images

Independent Component Analysis (ICA) is a blind source separation method which is exploited for various applications in signal processing. In hyperspectral imagery, ICA is commonly employed for detection and segmentation purposes. But it is often thought to be unable to quantify abundances. In this paper, we propose an ICA-based method to estimate the anomaly abundances from the independent components. The first experiments on synthetic and real world hyperspectral images are very promising referring to the estimation accuracy and robustness.

Alexis Huck, Mireille Guillaume

Unsupervised Multiple Object Segmentation of Multiview Images

In this paper we propose an unsupervised multiview image segmentation algorithm, combining multiple image cues including color, depth, and motion. First, the interested objects are extracted by computing a saliency map based on the visual attention model. By analyzing the saliency map, we automatically obtain the number of foreground objects and their bounding boxes, which are used to initialize the segmentation algorithm. Then the optimal segmentation is calculated by energy minimization under the min-cut/max-flow theory. There are two major contributions in this paper. First, we show that the performance of graph cut segmentation depends on the user interactive initialization, while our proposed method provides robust initialization instead of the random user input. In addition, we propose a novel energy function with a locally adaptive smoothness term when constructing the graphs. Experimental results demonstrate that subjectively good segmentation results are obtained.

Wenxian Yang, King Ngi Ngan

Image Processing and Filtering

Noise Removal from Images by Projecting onto Bases of Principal Components

In this paper, we develop a new wavelet domain statistical model for the removal of stationary noise in images. The new model is a combination of local linear projections onto bases of Principal Components, that perform a dimension reduction of the spatial neighbourhood, while avoiding the ”curse of dimensionality”. The models obtained after projection consist of a low dimensional Gaussian Scale Mixtures with a reduced number of parameters. The results show that this technique yields a significant improvement in denoising performance when using larger spatial windows, especially on images with highly structured patterns, like textures.

Bart Goossens, Aleksandra Pižurica, Wilfried Philips

A Multispectral Data Model for Higher-Order Active Contours and Its Application to Tree Crown Extraction

Forestry management makes great use of statistics concerning the individual trees making up a forest, but the acquisition of this information is expensive. Image processing can potentially both reduce this cost and improve the statistics. The key problem is the delineation of tree crowns in aerial images. The automatic solution of this problem requires considerable prior information to be built into the image and region models. Our previous work has focused on including shape information in the region model; in this paper we examine the image model. The aerial images involved have three bands. We study the statistics of these bands, and construct both multispectral and single band image models. We combine these with a higher-order active contour model of a ‘gas of circles’ in order to include prior shape information about the region occupied by the tree crowns in the image domain. We compare the results produced by these models on real aerial images and conclude that multiple bands improves the quality of the segmentation. The model has many other potential applications,

e.g.

to nano-technology, microbiology, physics, and medical imaging.

Péter Horváth

A Crossing Detector Based on the Structure Tensor

A new crossing detector is presented which also permits orientation estimation of the underlying structures. The method relies on well established tools such as the structure tensor, the double angle mapping and descriptors for second order variations. The performance of our joint crossing detector and multi-orientation estimator is relatively independent of the angular separation of the underlying unimodal structures.

Frank G. A. Faas, Lucas J. van Vliet

Polyphase Filter and Polynomial Reproduction Conditions for the Construction of Smooth Bidimensional Multiwavelets

To construct a very smooth nonseparable multiscaling function, we impose polynomial approximation order 2 and add new conditions on the polyphase highpass filters. We work with a dilation matrix generating quincunx lattices, and fix the index set. Other imposed conditions are orthogonal filter bank and balancing. We construct a smooth, compactly supported multiscaling function and multiwavelet, and test the system on a noisy image with good results.

Ana Ruedin

Multidimensional Noise Removal Method Based on Best Flattening Directions

This paper presents a new multi-way filtering method for multi-way images impaired by additive white noise. Instead of matrices or vectors, multidimensional images are considered as multi-way arrays also called tensors. Some noise removal techniques consist in vectorizing or matricizing multi-way data. That could lead to the loss of inter-bands relations. The presented filtering method consider multidimensional data as whole entities. Such a method is based on multilinear algebra. We adapt multi-way Wiener filtering to multidimensional images. Therefore, we introduce specific directions for tensor flattening. To this end, we extend the SLIDE algorithm to retrieve main directions of tensors, which are modeled as straight lines. To keep the local characteristics of images, we propose to adapt quadtree decomposition to tensors. Experiments on color images and on HYDICE hyperspectral images are presented to show the importance of flattening directions for noise removal in color images and hyperspectral images.

Damien Letexier, Salah Bourennane, Jacques Blanc-Talon

Low-Rank Approximation for Fast Image Acquisition

We propose a scanning procedure for fast image acquisition, based on low-rank image representations. An initial image is predicted from a low resolution scan and a smooth interpolation of the singular triplets. This is followed by an adaptive cross correlation scan, following the maximum error in the difference image. Our approach aims at reducing the scanning time for image acquisition devices that are in the single-pixel camera category. We exemplify with results from our experimental microwave, mm-wave and terahertz imaging systems.

Dan C. Popescu, Greg Hislop, Andrew Hellicar

A Soft-Switching Approach to Improve Visual Quality of Colour Image Smoothing Filters

Many filtering methods for Gaussian noise smoothing in colour images have been proposed. The common objective of these methods is to smooth out the noise while preserving the edges and details of the image. However, it can be observed that these methods, in their effort to preserve the image structures, also generate artefacts in homogeneous regions that are actually due to noise. So, these methods can perform well in image edges and details but sometimes they do not achieve the desired smoothing in homogeneous regions. In this paper we propose a method to overcome this problem. We use fuzzy concepts to build a soft-switching technique between two Gaussian noise filters: (i) a filter able to smooth out the noise near edges and fine features while properly preserving those details and (ii) a filter able to achieve the desired smoothing in homogeneous regions. Experimental results are provided to show the performance achieved by the proposed solution.

Samuel Morillas, Stefan Schulte, Tom Mélange, Etienne E. Kerre, Valentín Gregori

Comparison of Image Conversions Between Square Structure and Hexagonal Structure

Hexagonal image structure is a relatively new and powerful approach to intelligent vision system. The geometrical arrangement of pixels in this structure can be described as a collection of hexagonal pixels. However, all the existing hardware for capturing image and for displaying image are produced based on rectangular architecture. Therefore, it becomes important to find a proper software approach to mimic hexagonal structure so that images represented on the traditional square structure can be smoothly converted from or to the images on hexagonal structure. For accurate image processing, it is critical to best maintain the image resolution after image conversion. In this paper, we present various algorithms for image conversion between the two image structures. The performance of these algorithms will be compared though experimental results.

Xiangjian He, Jianmin Li, Tom Hintz

Biometrics and Security

Action Recognition with Semi-global Characteristics and Hidden Markov Models

In this article, a new approach is presented for action recognition with only one non-calibrated camera. Invariance to view point is obtained with several acquisitions of the same action. The originality of the presented approach consists of characterizing sequences by a temporal succession of semi-global features, which are extracted from “space-time micro-volumes”. The advantages of the proposed approach is the use of robust features (estimated on several frames) associated to the ability to manage actions with variable duration and to easily segment the sequences with algorithms that are specific to time varying data. For the recognition, each view of each action is modeled by an Hidden Markov Model system. Results presented on 1614 sequences of everyday life actions like “walking”, “sitting down”, “bending down”, performed by several persons validate the proposed approach.

Catherine Achard, Xingtai Qu, Arash Mokhber, Maurice Milgram

Patch-Based Experiments with Object Classification in Video Surveillance

We present a patch-based algorithm for the purpose of object classification in video surveillance. Within detected regions-of-interest (ROIs) of moving objects in the scene, a feature vector is calculated based on template matching of a large set of image patches. Instead of matching direct image pixels, we use Gabor-filtered versions of the input image at several scales. This approach has been adopted from recent experiments in generic object-recognition tasks. We present results for a new typical video surveillance dataset containing over 9,000 object images. Furthermore, we compare our system performance with another existing smaller surveillance dataset. We have found that with 50 training samples or higher, our detection rate is on the average above 95%. Because of the inherent scalability of the algorithm, an embedded system implementation is well within reach.

Rob Wijnhoven, Peter H. N. de With

Neural Network Based Face Detection from Pre-scanned and Row-Column Decomposed Average Face Image

This paper introduces a methodology for detecting human faces with minimum constraints on the properties of the photograph and appearance of faces. The proposed method uses average face model to save the computation time required for training process. The average face is decomposed into row and column sub-matrices and then presented to the neural network. To reduce the time required for scanning the images at places where the probability of face is very low, a pre-scan algorithm is applied. The algorithm searches the faces in the image at different scales for detecting faces in different sizes. Arbitration between multiple scales and heuristics improves the accuracy of the algorithm. Experimental results are presented in this paper to illustrate the performance of the algorithm including accuracy and speed in detecting faces.

Ziya Telatar, Murat H. Sazlı, Irfan Muhammad

Model-Based Image Segmentation for Multi-view Human Gesture Analysis

Multi-camera networks bring in potentials for a variety of vision-based applications through provisioning of rich visual information. In this paper a method of image segmentation for human gesture analysis in multi-camera networks is presented. Aiming to employ manifold sources of visual information provided by the network, an opportunistic fusion framework is described and incorporated in the proposed method for gesture analysis. A 3D human body model is employed as the converging point of spatiotemporal and feature fusion. It maintains both geometric parameters of the human posture and the adaptively learned appearance attributes, all of which are updated from the three dimensions of space, time and features of the opportunistic fusion. In sufficient confidence levels parameters of the 3D human body model are again used as feedback to aid subsequent vision analysis. The 3D human body model also serves as an intermediate level for gesture interpretation in different applications.

The image segmentation method described in this paper is part of the gesture analysis problem. It aims to reduce raw visual data in a single camera to concise descriptions for more efficient communication between cameras. Color distribution registered in the model is used to initialize segmentation. Perceptually Organized Expectation Maximization (POEM) is then applied to refine color segments with observations from a single camera. Finally ellipse fitting is used to parameterize segments. Experimental results for segmentation are illustrated. Some examples for skeleton fitting based on the elliptical segments will also be shown to demonstrate motivation and capability of the model-based segmentation approach for multi-view human gesture analysis.

Chen Wu, Hamid Aghajan

A New Partially Occluded Face Pose Recognition

A video-based face pose recognition framework for partially occluded faces is presented. Each pose of a person’s face is approximated using a connected low-dimensional appearance manifolds and face pose is estimated by computing the minimal probabilistic distance from the partially occluded face to sub-pose manifold using a weighted mask. To deal with partially occluded faces, we detect the occluded pixels in the current frame and then put lower weights on these occluded pixels by computing minimal probabilistic distance between given occluded face pose and face appearance manifold. The proposed method was evaluated under several situations and promising results are obtained.

Myung-Ho Ju, Hang-Bong Kang

Large Head Movement Tracking Using Scale Invariant View-Based Appearance Model

In this paper we propose a novel method for head tracking in large range using a scale invariant view-based appearance model. The proposed model is populated online, and it can select key frames while the head undergoes different motions in camera-near field. We propose a robust head detection algorithm to obtain accurate head region, which is used as the view of head, in each intensity image. When the head moves far from camera, the view of head is obtained through the proposed algorithm first, and then a key frame whose view of head is most similar to that of current frame is selected to recover the head pose of current frame by coordinate adjustment. In order to improve the efficiency of the tracking method, a searching algorithm is also proposed to select key frame. The proposed method was evaluated with a stereo camera and observed a robust pose recovery when the head has large motion, even when the movement along the Z axis was about 150 cm.

Gangqiang Zhao, Ling Chen, Gencai Chen

Robust Shape-Based Head Tracking

This work presents a new method to automatically locate frontal facial feature points under large scene variations (illumination, pose and facial expressions). First, we use a kernel-based tracker to detect and track the facial region in an image sequence. Then the results of the face tracking, i.e. face region and face pose, are used to constrain prominent facial feature detection and tracking. In our case, eyes and mouth corners are considered as prominent facial features. In a final step, we propose an improvement to the Bayesian Tangent Shape Model for the detection and tracking of the full shape model. A constrained regularization algorithm is proposed using the head pose and the accurately aligned prominent features to constrain the deformation parameters of the shape model. Extensive experiments demonstrate the accuracy and effectiveness of our proposed method.

Yunshu Hou, Hichem Sahli, Ravyse Ilse, Yanning Zhang, Rongchun Zhao

Evaluating Descriptors Performances for Object Tracking on Natural Video Data

In this paper, a new framework is presented for the quantitative evaluation of the performance of appearance models composed of an object descriptor and a similarity measure in the context of object tracking. The evaluation is based on natural videos, and takes advantage of existing ground-truths from object tracking benchmarks. The proposed metrics evaluate the ability of an appearance model to discriminate an object from the clutter. This allows comparing models which may use diverse kinds of descriptors or similarity measures in a principled manner. The performances measures can be global, but time-oriented performance evaluation is also presented. The insights that the proposed framework can bring on appearance models properties with respect to tracking are illustrated on natural video data.

Mounia Mikram, Rémi Mégret, Yannick Berthoumieu

A Simple and Efficient Eigenfaces Method

This paper first presents a review of eigenface methods for face recognition and then introduces a new algorithm in this class. The main difference with previous approaches is the definition of the database. Classically, an image is exploited as a single vector, by concatenating its rows, while here we simply use all the rows as vectors during the training and the recognition stages. The new algorithm reduces the computational complexity of the classical eigenface method and also reaches a higher percentage of recognition. It is compared with other algorithms based on wavelets, aiming at reducing the computational burden. The most efficient wavelet families and other relevant parameters are discussed.

Carlos Gómez, Béatrice Pesquet-Popescu

A New Approach to Face Localization in the HSV Space Using the Gaussian Model

We propose a model based approach for the problem of face localization. Traditionally, images are represented in the RGB color space, which is a 3-dimensional space that includes the illumination factor. However, the human skin color of different ethnic groups has been shown to change because of brightness. We therefore propose to transform the RGB images into the HSV color-space. We then exclude the V component, and use the HS-domain to represent skin pixels using a Gaussian probability model. The model is used to obtain a skin likelihood image which is further transformed into a binary image using the fuzzy C-mean clustering (FCM) technique. The candidate skin regions are checked for some facial properties and finally a template face matching approach is used to localize the face.. The developed algorithm is found robust and reliable under various imaging conditions and even in the presence of structural objects like hairs, spectacles, etc.

Mohamed Deriche, Imran Naseem

Gait Recognition Using Active Shape Models

The gait recognition is presented for human identification from a sequence of noisy silhouettes segmented from video. The proposed gait recognition algorithm gives better performance than the baseline algorithm because of segmentation of the object by using active shape model (ASM) algorithm. For the experiment, we used the HumanID Gait Challenge data set, which is the largest gait benchmarking data set with 122 objects, For realistic simulation we use various values for the following parameters; i) viewpoint, ii) shoe, iii) surface, iv) carrying condition, and v) time.

Woon Cho, Taekyung Kim, Joonki Paik

Statistical Classification of Skin Color Pixels from MPEG Videos

Detection and classification of skin regions plays important roles in many image processing and vision applications. In this paper, we present a statistical approach for fast skin detection in MPEG-compressed videos. Firstly, conditional probabilities of skin and non-skin pixels are extracted from manual marked training images. Then, candidate skin pixels are identified using the Bayesian maximum a posteriori decision rule. An optimal threshold is then obtained by analyzing of probability error on the basis of the likelihood ratio histogram of skin and non-skin pixels. Experiments from sequences with varying illuminations have demonstrated the effectiveness of our approach.

Jinchang Ren, Jianmin Jiang

A Double Layer Background Model to Detect Unusual Events

A double layer background representation to detect novelty in image sequences is shown. The model is capable of handling non-stationary scenarios, such as vehicle intersections. In the first layer, an adaptive pixel appearance background model is computed. Its subtraction with respect to the current image results in a blob description of moving objects. In the second layer, motion direction analysis is performed by a Mixture of Gaussians on the blobs. We have used both layers for representing the usual space of activities and for detecting unusual activity. Our experiments clearly showed that the proposed scheme is able to detect activities such as vehicles running on red light or making forbidden turns.

Joaquin Salas, Hugo Jimenez-Hernandez, Jose-Joel Gonzalez-Barbosa, Juan B. Hurtado-Ramos, Sandra Canchola

Realistic Facial Modeling and Animation Based on High Resolution Capture

Real-time facial expression capture is an essential part for on-line performance animation. For efficiency and robustness, special devices such as head-mounted cameras and face-attached markers have been used. However, these devices can possibly cause some discomfort that may hinder a face puppeteer from performing natural facial expressions. In this paper, we propose a comprehensive solution for real-time facial expression capture without any of such devices. Our basic idea is first to capture the 2D facial features and 3D head motion exploiting anthropometric knowledge and then to capture their time-varying 3D positions only due to facial expression. We adopt a Kalman filter to track the 3D features guided by their captured 2D positions while correcting their drift due to 3D head motion as well as removing noises.

Hae Won Byun

Image Processing and Restoration

Descriptor-Free Smooth Feature-Point Matching for Images Separated by Small/Mid Baselines

Most existing feature-point matching algorithms rely on photometric region descriptors to distinct and match feature points in two images. In this paper, we propose an efficient feature-point matching algorithm for finding point correspondences between two uncalibrated images separated by small or mid camera baselines. The proposed algorithm does not rely on photometric descriptors for matching. Instead, only the motion smoothness constraint is used, which states that the correspondence vectors within a small neighborhood usually have similar directions and magnitudes. The correspondences of feature points in a neighborhood are collectively determined in such a way that the smoothness of the local correspondence field is maximized. The smoothness constraint is self-contained in the correspondence field and is robust to the camera motion, scene structure, illumination, etc. This makes the entire point-matching process texture-independent, descriptor-free and robust. The experimental results show that the proposed method performs much better than the intensity-based block-matching technique, even when the image contrast varies clearly across images.

Ping Li, Dirk Farin, Rene Klein Gunnewiek, Peter H. N. de With

A New Supervised Evaluation Criterion for Region Based Segmentation Methods

We present in this article a new supervised evaluation criterion that enables the quantification of the quality of region segmentation algorithms. This criterion is compared with seven well-known criteria available in this context. To that end, we test the different methods on natural images by using a subjective evaluation involving different experts from the French community in image processing. Experimental results show the benefit of this new criterion.

Adel Hafiane, Sébastien Chabrier, Christophe Rosenberger, Hélène Laurent

A Multi-agent Approach for Range Image Segmentation with Bayesian Edge Regularization

We present in this paper a multi-agent approach for range image segmentation. The approach consists in using autonomous agents for the segmentation of a range image in its different planar regions. Agents move on the image and perform local actions on the pixels, allowing robust region extraction and accurate edge detection. In order to improve the segmentation quality, a Bayesian edge regularization is applied to the resulting edges. A new Markov Random Field (MRF) model is introduced to model the edge smoothness, used as a prior in the edge regularization. The experimental results obtained with real images from the ABW database show a good potential of the proposed approach for range image analysis, regarding both segmentation efficiency, and detection accuracy.

Smaine Mazouzi, Zahia Guessoum, Fabien Michel, Mohamed Batouche

Adaptive Image Restoration Based on Local Robust Blur Estimation

This paper presents a novel non-iterative method to restore the out-of-focus part of an image. The proposed method first applies a robust local blur estimation to obtain a blur map of the image. The estimation uses the maximum of difference ratio between the original image and its two digitally re-blurred versions to estimate the local blur radius. Then adaptive least mean square filters based on the local blur radius and the image structure are applied to restore the image and to eliminate the sensor noise. Experimental results have shown that despite its low complexity the proposed method has a good performance at reducing spatially varying blur.

Hao Hu, Gerard de Haan

Image Upscaling Using Global Multimodal Priors

This paper introduces a Bayesian restoration method for low-resolution images combined with a geometry-driven smoothness prior and a new global multimodal prior. The multimodal prior is proposed for images that normally just have a few dominant colours. In spite of this, most images contain much more colours due to noise and edge pixels that are part of two or more connected smooth regions. The Maximum A Posteriori estimator is worked out to solve the problem. Experimental results confirm the effectiveness of the proposed global multimodal prior for images with a strong multimodal colour distribution such as cartoons. We also show the visual superiority of our reconstruction scheme to other traditional interpolation and reconstruction methods: noise and compression artifacts are removed very well and our method produces less blur and other annoying artifacts.

Hiêp Luong, Bart Goossens, Wilfried Philips

A Type-2 Fuzzy Logic Filter for Detail-Preserving Restoration of Digital Images Corrupted by Impulse Noise

A novel filtering operator based on type-2 fuzzy logic is proposed for detail preserving restoration of images corrupted by impulse noise. The performance of the proposed operator is evaluated for different test images corrupted at various noise densities and also compared with representative impulse noise removal operators from the literature. Results of the filtering experiments show that the presented operator offers superior performance over the competing operators by efficiently suppressing the noise in the image while at the same time effectively preserving the useful information in the image.

M. Tülin Yildirim, M. Emin Yüksel

Contrast Enhancement of Images Using Partitioned Iterated Function Systems

A new algorithm for the contrast enhancement of images, based on the theory of Partitioned Iterated Function System (PIFS), is presented. A PIFS consists of contractive transformations, such that the original image is the fixed point of the union of these transformations. Each transformation involves the contractive affine spatial transform of a square block, as well as the linear transform of the gray levels of its pixels. The PIFS is used in order to create a lowpass version of the original image. The contrast-enhanced image is obtained by adding the difference of the original image with its lowpass version, to the original image itself. Quantitative and qualitative results stress the superior performance of the proposed contrast enhancement algorithm against two other widely used contrast enhancement methods.

Theodore Economopoulos, Pantelis Asvestas, George Matsopoulos

A Spatiotemporal Algorithm for Detection and Restoration of Defects in Old Color Films

A spatiotemporal method is presented for detection and concealment of local defects such as blotches in old color films. Initially, non-local means (NL-means) method which does not require motion estimation is used for noise removal in image sequences. Later, the motion vectors that are incorrectly estimated within defect regions are repaired by taking account of the temporal continuity of motion trajectory. The defects in films are detected by spike detection index (SDI) method, which are easily adapted to color image sequences. Finally, the proposed inpainting algorithm fills in detected defect regions, which is not required to estimate true motion like other approaches. The method is presented on synthetic and real image sequences, and efficient concealment results are obtained.

Bekir Dizdaroglu, Ali Gangal

Medical Image Processing

Categorizing Laryngeal Images for Decision Support

This paper is concerned with an approach to automated analysis of vocal fold images aiming to categorize laryngeal diseases. Colour, texture, and geometrical features are used to extract relevant information. A committee of support vector machines is then employed for performing the categorization of vocal fold images into

healthy

diffuse

, and

nodular

classes. The discrimination power of both, the original and the space obtained based on the kernel principal component analysis is investigated. A correct classification rate of over 92% was obtained when testing the system on 785 vocal fold images. Bearing in mind the high similarity of the decision classes, the correct classification rate obtained is rather encouraging.

Adas Gelzinis, Antanas Verikas, Marija Bacauskiene

Segmentation of the Human Trachea Using Deformable Statistical Models of Tubular Shapes

In this work, we present two active shape models for the segmentation of tubular objects. The first model is built using cylindrical parameterization and minimum description length to achieve correct correspondences. The other model is a multidimensional point distribution model built from the centre line and related information of the training shapes. The models are used to segment the human trachea in low-dose CT scans of the thorax and are compared in terms of compactness of representation and segmentation effectiveness and efficiency. Leave-one-out tests were carried out on real CT data.

Romulo Pinho, Jan Sijbers, Toon Huysmans

Adaptive Image Content-Based Exposure Control for Scanning Applications in Radiography

I-ImaS (Intelligent Imaging Sensors) is a European project which has designed and developed a new adaptive X-ray imaging system using on-line exposure control, to create locally optimized images. The I-ImaS system allows for real-time image analysis during acquisition, thus enabling real-time exposure adjustment. This adaptive imaging system has the potential of creating images with optimal information within a given dose constraint and to acquire optimally exposed images of objects with variable density during one scan. In this paper we present the control system and results from initial tests on mammographic and encephalographic images. Furthermore, algorithms for visualization of the resulting images, consisting of unevenly exposed image regions, are developed and tested. The preliminary results show that the same image quality can be achieved at 30-70% lower dose using the I-ImaS system compared to conventional mammography systems.

Helene Schulerud, Jens Thielemann, Trine Kirkhus, Kristin Kaspersen, Joar M. Østby, Marinos G. Metaxas, Gary J. Royle, Jennifer Griffiths, Emily Cook, Colin Esbrand, Silvia Pani, Cristian Venanzi, Paul F. van der Stelt, Gang Li, Renato Turchetta, Andrea Fant, Sergios Theodoridis, Harris Georgiou, Geoff Hall, Matthew Noy, John Jones, James Leaver, Frixos Triantis, Asimakis Asimidis, Nikos Manthos, Renata Longo, Anna Bergamaschi, Robert D. Speller

Shape Extraction Via Heat Flow Analogy

In this paper, we introduce a novel evolution-based segmentation algorithm by using the heat flow analogy, to gain practical advantage. The proposed algorithm consists of two parts. In the first part, we represent a particular heat conduction problem in the image domain to roughly segment the region of interest. Then we use geometric heat flow to complete the segmentation, by smoothing extracted boundaries and removing possible noise inside the prior segmented region. The proposed algorithm is compared with active contour models and is tested on synthetic and medical images. Experimental results indicate that our approach works well in noisy conditions without pre-processing. It can detect multiple objects simultaneously. It is also computationally more efficient and easier to control and implement in comparison to active contour models.

Cem Direkoğlu, Mark S. Nixon

Adaptive Vision System for Segmentation of Echographic Medical Images Based on a Modified Mumford-Shah Functional

This paper presents a novel adaptive vision system for accurate segmentation of tissue structures in echographic medical images. The proposed vision system incorporates a level-set deformable model based on a modified Mumford-Shah functional, which is estimated over sparse foreground and background regions in the image. This functional is designed so that it copes with the intensity inhomogeneity that characterizes echographic medical images. Moreover, a parameter tuning mechanism has been considered for the adaptation of the deformable model parameters. Experiments were conducted over a range of echographic images displaying abnormal structures of the breast and of the thyroid gland. The results show that the proposed adaptive vision system stands as an efficient, effective and nearly objective tool for segmentation of echographic images.

Dimitris K. Iakovidis, Michalis A. Savelonas, Dimitris Maroulis

Detection of Individual Specimens in Populations Using Contour Energies

In this paper we study how shape information encoded in contour energy components values can be used for detection of microscopic organisms in population images. We proposed features based on shape and geometrical statistical data obtained from samples of optimized contour lines integrated in the framework of Bayesian inference for recognition of individual specimens. Compared with common geometric features the results show that patterns present in the image allow better detection of a considerable amount of individuals even in cluttered regions when sufficient shape information is retained. Therefore providing an alternative to building a specific shape model or imposing specific constrains on the interaction of overlapping objects.

Daniel Ochoa, Sidharta Gautama, Boris Vintimilla

Logarithmic Model-Based Dynamic Range Enhancement of Hip X-Ray Images

Digital capture with consumer digital still camera of the radiographic film significantly decreases the dynamic range and, hence, the details visibility. We propose a method that boosts the dynamic range of the processed X-ray image based on the fusion of a set of digital images acquired under different exposure values. The fusion is controlled by a fuzzy-like confidence information and the luminance range is over-sampled by using logarithmic image processing operators.

Corneliu Florea, Constantin Vertan, Laura Florea

A New Color Representation for Intensity Independent Pixel Classification in Confocal Microscopy Images

We address the problem of pixel classification in fluorescence microscopy images by only using wavelength information. To achieve this, we use Support Vector Machines as supervised classifiers and pixels components as feature vectors. We propose a representation derived from the HSV color space that allows separation between color and intensity information. An extension of this transformation is also presented that allows to performs an a priori object/background segmentation. We show that these transformations not only allows intensity independent classification but also makes the classification problem more simple. As an illustration, we perform intensity independent pixel classification first on a synthetic then on real biological images.

Boris Lenseigne, Thierry Dorval, Arnaud Ogier, Auguste Genovesio

Colon Visualization Using Cylindrical Parameterization

Using cylindrical parameterization, the 3D mesh surface extracted from colon CT scan images is parameterized onto a cylinder, and afterwards visualized with a modified Chamfer distance transformation of the original CT images with regards to the colon centerline/boundary distance. The cylinder with information from distance transformation is then unfolded with numerical integration along its circumferential direction and mapped to a plane, which approximates the view of a colon cut open along its length.

Zhenhua Mai, Toon Huysmans, Jan Sijbers

Particle Filter Based Automatic Reconstruction of a Patient-Specific Surface Model of a Proximal Femur from Calibrated X-Ray Images for Surgical Navigation

In this paper, we present a particle filter based 2D/3D reconstruction scheme combining a parameterized multiple-component geometrical model and a point distribution model, and show its application to automatically reconstruct a surface model of a proximal femur from a limited number of calibrated X-ray images with no user intervention at all. The parameterized multiple-component geometrical model is regarded as a simplified description capturing the geometrical features of a proximal femur. Its parameters are optimally and automatically estimated from the input images using a particle filter based algorithm. The estimated geometrical parameters are then used to initialize a point distribution model based 2D/3D reconstruction scheme for an accurate reconstruction of a surface model of the proximal femur. We designed and conducted

$ \textsl{in vitro} $

and

$ \textsl{in vivo} $

to compare the present automatic reconstruction scheme to a manually initialized one. An average mean reconstruction error of 1.2 mm was found when the manually initialized reconstruction scheme was used. It increased to 1.3 mm when the automatic one was used. However, the automatic reconstruction scheme has the advantage of elimination of user intervention, which holds the potential to facilitate the application of the 2D/3D reconstruction in surgical navigation.

Guoyan Zheng, Xiao Dong

Video Coding and Processing

Joint Tracking and Segmentation of Objects Using Graph Cuts

This paper presents a new method to both track and segment objects in videos. It includes predictions and observations inside an energy function that is minimized with graph cuts. The min-cut/max-flow algorithm provides a segmentation as the global minimum of the energy function, at a modest computational cost. Simultaneously, our algorithm associates the tracked objects to the observations during the tracking. It thus combines “detect-before-track” tracking algorithms and segmentation methods based on color/motion distributions and/or temporal consistency. Results on real sequences are presented in which the robustness to partial occlusions and to missing observations is shown.

Aurélie Bugeau, Patrick Pérez

A New Fuzzy Motion and Detail Adaptive Video Filter

In this paper a new low-complexity algorithm for the denoising of video sequences is presented. The proposed fuzzy-rule based algorithm is first explained in the pixel domain and later extended to the wavelet domain. The method can be seen as a fuzzy variant of a recent multiple class video denoising method that automatically adapts to detail and motion. Experimental results show that the proposed algorithm efficiently removes Gaussian noise from digital greyscale image sequences. These results also show that our method outperforms other state-of-the-art filters of comparable complexity for different video sequences.

Tom Mélange, Vladimir Zlokolica, Stefan Schulte, Valérie De Witte, Mike Nachtegael, Aleksandra Pižurica, Etienne E. Kerre, Wilfried Philips

Bridging the Gap: Transcoding from Single-Layer H.264/AVC to Scalable SVC Video Streams

Video scalability plays an increasingly important role in the disclosure of digital video content. Currently, the scalable extension of the H.264/AVC video coding standard (SVC) is being finalized, which provides scalability layers for state-of-the-art H.264/AVC video streams. Existing video content that is coded using single-layer H.264/AVC, however, cannot benefit from the newly developed scalability features. Here, we discuss our architecture for H.264/AVC-to-SVC transcoding, which is able to derive SNR scalability layers from existing H.264/AVC bitstreams. Results show that the rate-distortion performance of our architecture approaches the optimal decoder-encoder cascade within 1 to 2 dB. Timing results indicate that intelligent conversion techniques are required, and that transcoding can significantly reduce the required computation time.

Jan De Cock, Stijn Notebaert, Peter Lambert, Rik Van de Walle

Improved Pixel-Based Rate Allocation for Pixel-Domain Distributed Video Coders Without Feedback Channel

In some video coding applications, it is desirable to reduce the complexity of the video encoder at the expense of a more complex decoder. Distributed Video (DV) Coding is a new paradigm that aims at achieving this. To allocate a proper number of bits to each frame, most DV coding algorithms use a feedback channel (FBC). However, in some cases, a FBC does not exist. In this paper, we therefore propose a rate allocation (RA) algorithm for pixel-domain distributed video (PDDV) coders without FBC. Our algorithm estimates at the encoder the number of bits for every frame without significantly increasing the encoder complexity. For this calculation we consider each pixel of the frame individually, in contrast to our earlier work where the whole frame is treated jointly. Experimental results show that this pixel-based approach delivers better estimates of the adequate encoding rate than the frame-based approach. Compared to the PDDV coder with FBC, the PDDV coder without FBC has only a small loss in RD performance, especially at low rates.

Marleen Morbée, Josep Prades-Nebot, Antoni Roca, Aleksandra Pižurica, Wilfried Philips

Multiview Depth-Image Compression Using an Extended H.264 Encoder

This paper presents a predictive-coding algorithm for the compression of multiple depth-sequences obtained from a multi-camera acquisition setup. The proposed depth-prediction algorithm works by synthesizing a virtual depth-image that matches the depth-image (of the predicted camera). To generate this virtual depth-image, we use an image-rendering algorithm known as 3D image-warping. This newly proposed prediction technique is employed in a 3D coding system in order to compress multiview depth-sequences. For this purpose, we introduce an extended H.264 encoder that employs two prediction techniques: a block-based motion prediction and the previously mentioned 3D image-warping prediction. This extended H.264 encoder adaptively selects the most efficient prediction scheme for each image-block using a rate-distortion criterion. We present experimental results for several multiview depth-sequences, which show a quality improvement of about 2.5 dB as compared to H.264 inter-coded depth-images.

Yannick Morvan, Dirk Farin, Peter H. N. de With

Grass Detection for Picture Quality Enhancement of TV Video

Current image enhancement in televisions can be improved if the image is analyzed, objects of interest are segmented, and each segment is processed with specifically optimized algorithms. In this paper we present an algorithm and feature model for segmenting grass areas in video sequences. The system employs adaptive color and position models for creating a coherent grass segmentation map. Compared with previously reported algorithms, our system shows significant improvements in spatial and temporal consistency of the results. This property makes the proposed system suitable for TV video applications.

Bahman Zafarifar, Peter H. N. de With

Exploitation of Combined Scalability in Scalable H.264/AVC Bitstreams by Using an MPEG-21 XML-Driven Framework

The heterogeneity in the contemporary multimedia environments requires a format-agnostic adaptation framework for the consumption of digital video content. Preferably, scalable bitstreams are used in order to satisfy as many circumstances as possible. In this paper, the scalable extension on the H.264/AVC specification is used to obtain the parent bitstreams. The adaptation along the combined scalability axis of the bitstreams must occur in a format-independent manner. Therefore, an abstraction layer of the bitstream is needed. In this paper, XML descriptions are used representing the high-level structure of the bitstreams by relying on the MPEG-21 Bitstream Syntax Description Language standard. The adaptation process is executed in the XML domain by transforming the XML descriptions considering the usage environment. Such an adaptation engine is discussed in this paper in which all communication is based on XML descriptions without knowledge of underlying coding format. From the performance measurements, one can conclude that the transformations in the XML domain and the generation of the corresponding adapted bitstream can be realized in real time.

Davy De Schrijver, Wesley De Neve, Koen De Wolf, Davy Van Deursen, Rik Van de Walle

Moving Object Extraction by Watershed Algorithm Considering Energy Minimization

MPEG-4, which is a video coding standard, supports object-based functionalities for high efficiency coding. MPEG-7, a multimedia content description interface, handles the object data in, for example, retrieval and/or editing systems. Therefore, extraction of semantic video objects is an indispensable tool that benefits these newly developed schemes. In the present paper, we propose a technique that extracts the shape of moving objects by combining snakes and watershed algorithm. The proposed method comprises two steps. In the first step, snakes extract contours of moving objects as a result of the minimization of an energy function. In the second step, the conditional watershed algorithm extracts contours from a topographical surface including a new function term. This function term is introduced to improve the estimated contours considering boundaries of moving objects obtained by snakes. The efficiency of the proposed approach in moving object extraction is demonstrated through computer simulations.

Kousuke Imamura, Masaki Hiraoka, Hideo Hashimoto

Constrained Inter Prediction: Removing Dependencies Between Different Data Partitions

With the growing demand for low delay video streaming in error-prone environments, error resilience tools, such as the data partitioning tool in the H.264/AVC specification, are becoming more and more important. In this paper, the introduction of constrained inter prediction into the H.264/AVC specification is proposed. Constrained inter prediction can help the data partitioning tool by removing the dependencies between partitions B and C, thereby making it possible to process partition C if partition B is lost. From the experimental results it is observed that the cost for introducing this technique can be neglected. Furthermore, when constrained inter prediction is used in combination with constrained intra prediction, resulting bitstreams have an increased peak signal-to-noise ratio of up to 1.8 dB in error-prone environments compared to when only constrained intra prediction is used.

Yves Dhondt, Stefaan Mys, Kenneth Vermeirsch, Rik Van de Walle

Performance Improvement of H.264/AVC Deblocking Filter by Using Variable Block Sizes

Currently H.264/AVC supports variable block motion compensation, multiple reference images, 1/4-pixel motion vector accuracy, and in-loop deblocking filter, compared with the existing compression technologies. While these coding technologies are major functions of compression rate improvement, they lead to high complexity at the same time. For the H.264 video coding technology to be actually applied on low-end / low-bit rates terminals more extensively, it is essential to improve the coding speed. Currently the deblocking filter that can improve the moving picture’s subjective image quality to a certain degree is used on low-end terminals to a limited extent due to computational complexity. In this paper, a performance improvement method of the deblocking filter that efficiently reduces the blocking artifacts occurred during the compression of low-bit rates digital motion pictures is suggested. Blocking artifacts are plaid images appear on the block boundaries due to DCT and quantization. In the method proposed in this paper, the image’s spatial correlational characteristics are extracted by using the variable block information of motion compensation; the filtering is divided into 4 modes according to the characteristics, and adaptive filtering is executed in the divided regions. The proposed deblocking method reduces the blocking artifacts, prevents excessive blurring effects, and improves the performance about 40% compared with the existing method.

Seung-Ho Shin, Duk-Won Oh, Young-Joon Chai, Tae-Yong Kim

Real-Time Detection of the Triangular and Rectangular Shape Road Signs

Road signs recognition systems are developed to assist drivers and to help increase traffic safety. Shape detectors constitute a front-end in majority of such systems. In this paper we propose a method for robust detection of triangular, rectangular and rhombus shaped road signs in real traffic scenes. It starts with segmentation of colour images. For this purpose the histograms were created from hundreds of real warning and information signs. Then the characteristic points are detected by means of the developed symmetrical detector of local binary features. The points are further clusterized and used to select shapes from the input images. Finally, the shapes are verified to fulfil geometrical properties defined for the road signs. The proposed detector shows high accuracy and very fast operation time what was verified experimentally.

Bogusław Cyganek

High-Resolution Multi-sprite Generation for Background Sprite Coding

In this paper, we consider high-resolution multi-sprite generation and its application to background sprite coding. Firstly, we propose an approach to partitioning a video sequence into multiple background sprites and selecting an optimal reference frame for each sprite range. This approach groups images that cover a similar scene into the same sprite range. We then propose an iterative regularized technique for constructing a high-resolution sprite in each sprite range. This technique determines the regularization parameter automatically and produces sprite images with high visual quality. Due to the advantages of high-resolution multi-sprites, a high-resolution sprite coding method is also presented and it achieves high coding efficiency.

Getian Ye

Motion Information Exploitation in H.264 Frame Skipping Transcoding

This paper proposes an adaptive motion mode selection method in H.264 frame skipping transcoding. In order to reduce the high complexity arising from variable block sizes in H.264, the proposed method exploits original motion information from incoming bitstreams. In addition, the paper also adopts Forward Dominant Vector Selection approach in MV composition of H.264 transcoding, in comparison with Bilinear Interpolation method. The simulation results show that the proposed method achieves good trade-off between computational complexity and video quality.

Qiang Li, Xiaodong Liu, Qionghai Dai

Joint Domain-Range Modeling of Dynamic Scenes with Adaptive Kernel Bandwidth

The first step in various computer vision applications is a detection of moving objects. The prevalent pixel-wise models regard image pixels as independent random processes. They don’t take into account the existing correlation between the neighboring pixels. By using a nonparametric density estimation method over a joint domain-range representation of image pixels, this correlation can be exploited to achieve high levels of detection accuracy in the presence of dynamic backgrounds. This work improves recently proposed joint domain-range model for the background subtraction, which assumes the constant kernel bandwidth. The improvement is obtained by adapting the kernel bandwidth according to the local image structure. This approach provides the suppression of structural artifacts present in detection results when the kernel density estimation with constant bandwidth is used. Consequently, a more accurate detection of moving objects can be achieved.

Borislav Antić, Vladimir Crnojević

Competition Based Prediction for Skip Mode Motion Vector Using Macroblock Classification for the H.264 JM KTA Software

H.264/MPEG4-AVC achieves higher compression gain in comparison to its predecessors H.263 and MPEG4 part 2. This gain partly results from the improvement of motion compensation tools especially the variable block size, the 1/4-pel motion accuracy and the access to multiple reference frames. A particular mode among all Inter modes is the Skip mode. For this mode, no information is transmitted except the signaling of the mode itself. In our previous work we have proposed a competing framework for better motion vector prediction and coding, also including the Skip mode. This proposal has recently been adopted by the Video Coding Expert Group (VCEG) in the Key Technical Area-software (KTA) of H.264, which is the starting point for future ITU standardization activities. In this paper we propose an extension of this method based on the adaptation of two families of predictors for the Skip mode according to the video content and to statistical criteria. A systematic gain upon the previous method, with an average of 8.2% of bits saved compared to H.264 standard, is reported.

Guillaume Laroche, Joel Jung, Béatrice Pesquet-Popescu

Efficiency of Closed and Open-Loop Scalable Wavelet Based Video Coding

Video compression techniques can be classified into scalable and non-scalable. Scalable coding is more suitable in variable band-width scenarios because it improves the quality of the reconstructed video. On the other hand, the scalability has a cost in terms of coding efficiency and complexity. This paper describes a JPEG2000-and-MCTF-based fully scalable video codec (FSVC) and analyzes a set of experiments to measure the cost of the scalability, comparing two different FSVC encoders: open-loop FSVC and closed-loop FSVC. In the open-loop version of FSVC, the encoder uses the original images to make the predictions. The closed-loop scheme generates the predictions with reference images identical to those obtained by the decoder at a given bitrate. Numerical and visual results demonstrate a small loss of the coding efficiency for the open-loop scheme. Moreover, the inclusion of the closed-loop increases the complexity of the encoder and produces poor performance at high bitrates.

Manuel F. López, Vicente Gonzalez Ruiz, Inmaculada García

Spatio-temporal Information-Based Simple Deinterlacing Algorithm

In this paper, we propose a new computationally efficient fuzzy rule-based line doubling algorithm which provides effective visual performance. In the proposed scheme, spatio-temporal mode selector and fuzzy rule-based correlation dependent interpolation techniques are utilized for the 2-D input signal. The basic idea is to classify the field dynamically into background or foreground area. The proposed method interpolates missing pixels using temporal information in the background area, and then interpolates remaining pixels using spatial information in the foreground area using fuzzy rule.

Gwanggil Jeon, Fang Yong, Joohyun Lee, Rokkyu Lee, Jechang Jeong

Image Interpretation

Fast Adaptive Graph-Cuts Based Stereo Matching

Stereo vision is one of the central research problems in computer vision. The most difficult and important issue in this area is the stereo matching process. One technique that performs this process is the Graph-Cuts based algorithm and which provides accurate results . Nevertheless, this approach is too slow to use due to the redundant computations that it invokes. In this work, an Adaptive Graph-Cuts based algorithm is implemented. The key issue is to subdivide the image into several regions using quadtrees and then define a global energy function that adapts itself for each of these subregions. Results show that the proposed algorithm is 3 times faster than the other Graph-Cuts algorithm while keeping the same quality of the results.

Michel Sarkis, Nikolas Dörfler, Klaus Diepold

A Fast Level-Set Method for Accurate Tracking of Articulated Objects with an Edge-Based Binary Speed Term

This paper presents a novel binary speed term for tracking objects with the help of active contours. The speed, which can be 0 or 1, is determined by local nonlinear filters, and not by the strength of the gradient as is common for active contours. The speed has been designed to match the nature of a recent fast level-set evolution algorithm. The resulting active contour method is used to track objects for which probability distributions of pixel intensities for the background and for the object cannot be reliably estimated.

Cristina Darolti, Alfred Mertins, Ulrich G. Hofmann

Real-Time Vanishing Point Estimation in Road Sequences Using Adaptive Steerable Filter Banks

This paper presents an innovative road modeling strategy for video-based driver assistance systems. It is based on the real-time estimation of the vanishing point of sequences captured with forward looking cameras located near the rear view mirror of a vehicle. The vanishing point is used for many purposes in video-based driver assistance systems, such as computing linear models of the road, extraction of calibration parameters of the camera, stabilization of sequences, etc. In this work, a novel strategy for vanishing point estimation is presented. It is based on the use of an adaptive steerable filter bank which enhances lane markings according to their expected orientations. Very accurate results are obtained in the computation of the vanishing point for several type of sequences, including overtaking traffic, changing illumination conditions, paintings in the road, etc.

Marcos Nieto, Luis Salgado

Self-Eigenroughness Selection for Texture Recognition Using Genetic Algorithms

To test the effectiveness of Self-Eigenroughness, which is derived from performing principal component analysis (PCA) on each texture roughness individually, in texture recognition with respect to Eigenroughness, which is derived from performing PCA on all texture roughness; we present a novel fitness function with adaptive threshold to evaluate the performance of each subset of genetically selected eigenvectors. Comparatively studies suggest that the former is superior to the latter in terms of recognition accuracy and computation efficiency.

Jing-Wein Wang

Analysis of Image Sequences for Defect Detection in Composite Materials

The problem of inspecting composite materials to detect internal defects is felt in many industrial contexts both for quality controls through production lines and for maintenance operations during in-service inspections. The analysis of the internal defects (not detectable by a visual inspection) is a difficult task unless invasive techniques are applied. For this reason in the last years there has been an increasing interest for the development of low cost non-destructive inspection techniques that can be applied during normal routine tests without damaging materials but also with automatic analysis tools. In this paper we have addressed the problem of developing an automatic signal processing system that analyzes the time/space variations in a sequence of thermographic images and allows the identification of internal defects in composite materials that otherwise could not be detected. First of all a preprocessing technique was applied to the time /space signals to extract significant information, then an unsupervised classifier was used to extract uniform classes that characterize a range of internal defects. The experimental results demonstrate the ability of the method to recognize different regions containing several types defects.

T. D’Orazio, M. Leo, C. Guaragnella, A. Distante

Remote Sensing Imagery and Signature Fields Reconstruction Via Aggregation of Robust Regularization with Neural Computing

The robust numerical technique for high-resolution reconstructive imaging and scene analysis is developed as required for enhanced remote sensing with large scale sensor array radar/synthetic aperture radar. First, the problem-oriented modification of the previously proposed fused Bayesian-regularization (FBR) enhanced radar imaging method is performed to enable it to reconstruct remote sensing signatures (RSS) of interest alleviating problem ill-poseness due to system-level and model-level uncertainties. Second, the modification of the Hopfield-type maximum entropy neural network (NN) is proposed that enables such NN to perform numerically the robust adaptive FBR technique via efficient NN computing. Finally, we report some simulation results of hydrological RSS reconstruction from enhanced real-world environmental images indicative of the efficiency of the developed method.

Yuriy Shkvarko, Ivan Villalon-Turrubiates

A New Technique for Global and Local Skew Correction in Binary Documents

A new technique for global and local skew correction in binary documents is proposed. The proposed technique performs a connected component analysis and for each connected component, document’s local skew angle is estimated, based on detecting a sequence of other consecutive connected components, at certain directions, within a specified neighborhood. A histogram of all local skew angles is constructed. If the histogram has one peak then global skew correction is performed, otherwise the document has more than one skews. For local skew correction, a page layout analysis is performed based on a boundary growth algorithm at different directions. The exact global or local skew is approached with a least squares line fitting procedure. The accuracy of the technique has been tested using many documents of different skew and it is compared with two other similar techniques.

Michael Makridis, Nikos Nikolaou, Nikos Papamarkos

System for Estimation of Pin Bone Positions in Pre-rigor Salmon

Current systems for automatic processing of salmon are not able to remove all bones from freshly slaughtered salmon. This is because some of the bones are attached to the flesh by tendons, and the fillet is damaged or the bones broken if the bones are pulled out. This paper describes a camera based system for determining the tendon positions in the tissue, so that the tendon can be cut with a knife and the bones removed. The location of the tendons deep in the tissue is estimated based on the position of a texture pattern on the fillet surface. Algorithms for locating this line-looking pattern, in the presence of several other similar-looking lines and significant other texture are described. The algorithm uses a model of the pattern’s location to achieve precision and speed, followed by a RANSAC/MLESAC inspired line fitting procedure. Close to the neck the pattern is barely visible; this is handled through a greedy search algorithm. We achieve a precision better than 3 mm for 78% of the fish using maximum 2 seconds processing time.

Jens T Thielemann, Trine Kirkhus, Tom Kavli, Henrik Schumann-Olsen, Oddmund Haugland, Harry Westavik

Vertebral Mobility Analysis Using Anterior Faces Detection

In this article, we are interested in the X-rays images of the spinal column in various positions. The purpose of this work is to extract some parameters determining the vertebral mobility and its variation during flexion-extension movements. A modified Discrete Dynamic Contour Model (DDCM) using the Canny edge detector was the starting point for our segmentation algorithm. To address the lack of convergence due to open contour, we have elaborated a heuristic method appropriate to the area of our application. The results in real images cooresponding to the cervical spinal column and their comparison with manual measures are presented to demonstrate and to validate the proposed technique.

M. Benjelloun, G. Rico, S. Mahmoudi, R. Prévot

Image Processing Algorithms for an Auto Focus System for Slit Lamp Microscopy

The slit lamp microscope is the most popular opthalmologic instrument comprising a microscope with an light source attached to it. The coupling of microscope and light source distinguishes it from other optical devices. In this paper an Auto Focus system is proposed that considers this mechanical coupling and compensates for movements of the patient. It tracks the patient’s eye during the focusing process and applies a robust contrast-measurement algorithm to an area relative to it. The proposed method proved to be very accurate, reliable and stable, even starting from very defocused positions.

Christian Gierl, T. Kondo, H. Voos, W. Kongprawechon, S. Phoojaruenchanachai

Applying Image Analysis and Probabilistic Techniques for Counting Olive Trees in High-Resolution Satellite Images

This paper proposes a method, that integrates image analysis and probabilistic techniques, for counting olive trees in high-resolution satellite images. Counting trees becomes significant for surveying and inventorying forests, and in certain cases relevant for assessing estimates of the production of plantations, as it is the case of the olive trees fields. The method presented in this paper exploits the particular characteristics of parcels, i.e. a certain reticular layout and a similar appearance of trees, to yield a probabilistic measure that captures the confident of each spot in the image to be an olive tree. Some promising experimental results have been obtained in satellite images taken from QuickBird.

J. González, C. Galindo, V. Arevalo, G. Ambrosio

An Efficient Closed-Form Solution to Probabilistic 6D Visual Odometry for a Stereo Camera

Estimating the ego-motion of a mobile robot has been traditionally achieved by means of encoder-based odometry. However, this method presents several drawbacks, such as the existence of accumulative drifts, its sensibility to slippage, and its limitation to planar environments. In this work we present an alternative method for estimating the incremental change in the robot pose from images taken by a stereo camera. In contrast to most previous approaches for 6D visual odometry, based on iterative, approximate methods, we propose here to employ an optimal closed-form formulation which is more accurate, efficient, and does not exhibit convergence problems. We also derive the expression for the covariance associated to this estimation, which enables the integration of our approach into vision-based SLAM frameworks. Additionally, our proposal combines highly-distinctive SIFT descriptors with the fast KLT feature tracker, thus achieving robust and efficient execution in real-time. To validate our research we provide experimental results for a real robot.

F. A. Moreno, J. L. Blanco, J. González

Color Image Segmentation Based on Type-2 Fuzzy Sets and Region Merging

This paper focuses on application of fuzzy sets of type 2 (FS2) in color images segmentation. The proposed approach is based on FS2 entropy application and region merging. Both local and global information of the image are employed and FS2 makes it possible to take into account the total uncertainty inherent to the segmentation operation. Fuzzy entropy is utilized as a tool to perform histogram analysis to find all major homogeneous regions at the first stage. Then a basic and fast region merging process, based on color similarity and reduction of small clusters, is carried out to avoid oversegmentation. The experimental results demonstrate that this method is suitable to find homogeneous regions for natural images, even for noisy images.

Samy Tehami, André Bigand, Olivier Colot

Image Interpretation

ENMIM: Energetic Normalized Mutual Information Model for Online Multiple Object Tracking with Unlearned Motions

In multiple-object tracking, the lack in prior information limits the association performance. Furthermore, to improve tracking, dynamic models are needed in order to determine the settings of the estimation algorithm. In case of complex motions, the dynamic cannot be learned and the task of tracking becomes difficult. That is why online spatio-temporal motion estimation is of crucial importance. In this paper, we propose a new model for multiple target online tracking: the Energetic Normalized Mutual Information Model (ENMIM). ENMIM combines two algorithms: (i) Quadtree Normalized Mutual Information, QNMI, a recursive partitioning methodology involving a region motion extraction; (ii) an energy minimization approach for data association adapted to the constraint of lack in prior information about motion and based on geometric properties. ENMIM is able to handle typical problems such as large inter-frame displacements, unlearned motions and noisy images with low contrast. The main advantage of ENMIM is its parameterless and its capacity to handle noisy multi-modal images without exploiting any pre-processing step.

Abir El Abed, Séverine Dubuisson, Dominique Béréziat

Geometrical Scene Analysis Using Co-motion Statistics

Deriving the geometrical features of an observed scene is pivotal for better understanding and detection of events in recorded videos. In the paper methods are presented for the estimation of various geometrical scene characteristics. The estimated characteristics are: point correspondences in stereo views, mirror pole, light source and horizon line. The estimation is based on the analysis of dynamical scene properties by using co-motion statistics. Various experiments prove the feasibility of our approach.

Zoltán Szlávik, László Havasi, Tamás Szirányi

Cascade of Classifiers for Vehicle Detection

Being aware of other vehicles on the road ahead is a key information to help driver assistance systems to increase driver’s safety. This paper addresses this problem, proposing a system to detect vehicles from the images provided by a single camera mounted in a mobile platform. A classifier–based approach is presented, based on the evaluation of a cascade of classifiers (COC) at different scanned image regions. The Adaboost algorithm is used to determine the COC from training sets. Two proposals are done to reduce the computation needed for the detection scheme used: a lazy evaluation of the COC, and the customization of the COC by a wrapping process. The benefits of these two proposals are quantified in terms of the average number of image features required to classify an image region, achieving a reduction of the 58% on this concept, while scarcely penalizing the detection accuracy of the system.

Daniel Ponsa, Antonio López

Aerial Moving Target Detection Based on Motion Vector Field Analysis

An efficient automatic detection strategy for aerial moving targets in airborne forward-looking infrared (FLIR) imagery is presented in this paper. Airborne cameras induce a global motion over all objects in the image, that invalidates motion-based segmentation techniques for static cameras. To overcome this drawback, previous works compensate the camera ego-motion. However, this approach is too much dependent on the quality of the ego-motion compensation, tending towards an over-detection. In this work, the proposed strategy estimates a robust motion vector field, free of erroneous vectors. Motion vectors are classified into different independent moving objects, corresponding to background objects and aerial targets. The aerial targets are directly segmented using their associated motion vectors. This detection strategy has a low computational cost, since no compensation process or motion-based technique needs to be applied. Excellent results have been obtained over real FLIR sequences.

Carlos R. del-Blanco, Fernando Jaureguizar, Luis Salgado, Narciso García

Image Coding

Embedding Linear Transformations in Fractal Image Coding

Many desirable properties make fractals a powerful mathematic model applied in several image processing and pattern recognition tasks: image coding, segmentation, feature extraction and indexing, just to cite some of them. Unfortunately, they are based on a strong asymmetric scheme, so suffering from very high coding times. On the other side, linear transforms are quite time balanced, allowing to be usefully integrated in real-time applications, but they do not provide comparable performances with respect to the image quality for high bit rates. Owning to their potential for preserving the original image energy in a few coefficients in the frequency domain, linear transforms also known a widespread diffusion in some side applications such as to select representative features or to define new image quality measures. In this paper, we investigate different levels of embedding linear transforms in a fractal based coding scheme. Experimental results have been organized as to point out what is the contribution of each embedding step to the objective quality of the decoded image.

Michele Nappi, Daniel Riccio

Digital Watermarking with PCA Based Reference Images

Principal Components Analysis (PCA) is a valuable technique for dimensionality reduction purposes for huge datasets. Principal components are linear combination of the original variables. The projection of data on this linear subspace keeps the most of the original characteristics. This helps to find robust characteristics for watermarking applications. Most of the PCA based watermarking methods were done in projection space i.e. in eigen image. In this study, different from the other methods, PCA is used to obtain a reference of the cover image by using compression property of PCA. PCA and block-PCA based methods are proposed by using some of the principal vectors in reconstruction. The watermarking is done according to difference of the original and its reference image. The method is compared with Discrete Wavelet Transform (DWT) based approach and its performance against some attacks is discussed.

Erkan Yavuz, Ziya Telatar

JPEG2000 Coding Techniques Addressed to Images Containing No-Data Regions

This work introduces techniques addressed to enhance the coding performance obtained when compressing images that contain areas with irrelevant information, here called

no-data regions

. No-data regions can be produced due to several factors, such as geometric corrections, overlapping of successive layers of information, a malfunction of the sensor used to capture the image, etc. Most coding systems are not devised to consider such regions separately from the rest of the image, sometimes causing an important loss in the coding efficiency. Within the framework JPEG2000, we propose five techniques that address this issue. Experimental results suggest that the application of the proposed techniques can achieve, in some cases, a compression gain of 130 over a compression without applying the proposed techniques.

Jorge González-Conejero, Francesc Aulí-Llinàs, Joan Bartrina-Rapesta, Joan Serra-Sagristà

A New Optimum-Word-Length-Assignment (OWLA) Multiplierless Integer DCT for Lossless/Lossy Image Coding and Its Performance Evaluation

Recently, we proposed a multiplierless 1D Int-DCT improved from our previous proposed Int-DCT by approximating floating multiplications to bit-shift and addition operations. The multiplierless 1D Int-DCT can be well operated both lossless coding and lossy coding. However, our multiplierless 1D Int-DCT is not focused on how to assign word-length for floating-multiplier approximation as short as possible for reduction of hardware complexity. In this paper, we propose a new Optimum-Word-Length-Assignment (OWLA) multiplierless Int-DCT. Apart from inexpensive hardware complexity, the new OWLA multiplierless 1D Int-DCT achieves high coding performance in both lossless and lossy coding. The lossless/lossy coding criterion is applied to evaluate coding performance of the proposed Int-DCT comparing to those of the others Int-DCT.

Somchart Chokchaitam, Masahiro Iwahashi

On Hybrid Directional Transform-Based Intra-band Image Coding

In this paper, we propose a generic hybrid oriented-transform and wavelet-based image representation for intra-band image coding. We instantiate for three popular directional transforms having similar powers of approximation but different redundancy factors. For each transform type, we design a compression scheme wherein we exploit intra-band coefficient dependencies. We show that our schemes outperform alternative approaches reported in literature. Moreover, on some images, we report that two of the proposed codec schemes outperform JPEG2000 by over 1dB. Finally, we investigate the trade-off between oversampling and sparsity and show that, at low rates, hybrid coding schemes with transform redundancy factors as high as 1.25 to 5.8 are capable in fact of outperforming JPEG2000 and its critically-sampled wavelets.

Alin Alecu, Adrian Munteanu, Aleksandra Pižurica, Jan Cornelis, Peter Schelkens

Analysis of the Statistical Dependencies in the Curvelet Domain and Applications in Image Compression

This paper reports an information-theoretic analysis of the dependencies that exist between curvelet coefficients. We show that strong dependencies exist in local intra-band micro-neighborhoods, and that the shape of these neighborhoods is highly anisotropic. With this respect, it is found that the two immediately adjacent neighbors that lie in a direction orthogonal to the orientation of the subband convey the most information about the coefficient. Moreover, taking into account a larger local neighborhood set than this brings only mild gains with respect to intra-band mutual information estimations. Furthermore, we point out that linear predictors do not represent sufficient statistics, if applied to the entire intra-band neighborhood of a coefficient. We conclude that intra-band dependencies are clearly the strongest, followed by their inter-orientation and inter-scale counterparts; in this respect, the more complex intra-band/inter-scale or intra-band/inter-orientation models bring only mild improvements over intra-band models. Finally, we exploit the coefficient dependencies in a curvelet-based image coding application and show that the scheme is comparable and in some cases even outperforms JPEG2000.

Alin Alecu, Adrian Munteanu, Aleksandra Pižurica, Jan Cornelis, Peter Schelkens

A Novel Image Compression Method Using Watermarking Technique in JPEG Coding Process

Watermarking is a technique used to embed copyright information in an image. In this paper, we propose a novel image compression method which embeds a part of the coding parameters, instead of the copyright information, into an own image. The proposed method is adapted for the JPEG coding process. In the proposed method for JPEG, the DC coefficients of the DCT transform are embedded into low-to-middle frequency terms of the AC coefficients. Therefore, the DC coefficients need not be transmitted separately, which results in less data being needed for encoding. On the decoder side, first, the data for the DC coefficients embedded in AC coefficients is extracted. After this, the data of the DC and AC coefficients allows for the reconstruction of the image. Experiments on the relation between data compression ratio and PSNR using a quantization scale factor as parameter are carried out. The experimental results show that the proposed method has achieved a 3.65% reduction of the quantity of image data, compared with the standard JPEG method, while maintaining nearly the same image quality.

Hideo Kuroda, Shinichi Miyata, Makoto Fujimura, Hiroki Imamura

Improved Algorithm of Error-Resilient Entropy Coding Using State Information

This paper proposes an improved algorithm of the error-resilient entropy coding (EREC) to limit error propagation (EP) in variable-length-coding (VLC) bit stream. The main novelties include two folds. First, after each stage of EREC encoding process, the resulting states of all slots and blocks are conveyed as side information and used at decoders to remove the EP caused by those erroneous blocks/slots that have been placed-up/filled-up. Second, the alternate placement (AP) technique is proposed to alleviate the EP caused by those erroneous blocks/slots that are still partially-placed/partially-filled. An in-depth analysis shows that less than three bits per block are required for conveying state information. Experiments are conducted and the results show that our proposed method improves recovery quality significantly.

Yong Fang, Gwanggil Jeon, Jechang Jeong, Chengke Wu, Yangli Wang

Backmatter

Titel: Advanced Concepts for Intelligent Vision Systems
herausgegeben von: Jacques Blanc-Talon
Wilfried Philips
Dan Popescu
Paul Scheunders
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-74607-2
Print ISBN: 978-3-540-74606-5
DOI: https://doi.org/10.1007/978-3-540-74607-2