Skip to main content

2006 | Buch

Computer Vision, Graphics and Image Processing

5th Indian Conference, ICVGIP 2006, Madurai, India, December 13-16, 2006. Proceedings

herausgegeben von: Prem K. Kalra, Shmuel Peleg

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

The Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP) is a forum bringing together researchers and practitioners in these related areas, coming from national and international academic institutes, from government research and development laboratories, and from industry. ICVGIP has been held biannually since its inception in 1998, attracting more participants every year, including international participants. The proceedings of ICVGIP 2006, published in Springer's series Lecture Notes in Computer Science, comprise 85 papers that were selected for presentation from 284 papers, which were submitted from all over the world. Twenty-nine papers were oral presentations, and 56 papers were presented as posters. For the first time in ICVGIP, the review process was double-blind as common in the major international conferences. Each submitted paper was assigned at least three reviewers who are experts in the relevant area. It was difficult to select such a few papers, as there were many other deserving, but those could not be accommodated.

Inhaltsverzeichnis

Frontmatter

Image Restoration and Super-Resolution

Edge Model Based High Resolution Image Generation

The present paper proposes a new method for high resolution image generation from a single image. Generation of high resolution (HR) images from lower resolution image(s) is achieved by either reconstruction-based methods or by learning-based methods. Reconstruction based methods use multiple images of the same scene to gather the extra information needed for the HR. The learning-based methods rely on the learning of characteristics of a specific image set to inject the extra information for HR generation. The proposed method is a variation of this strategy. It uses a generative model for sharp edges in images as well as descriptive models for edge representation. This prior information is injected using the Symmetric Residue Pyramid scheme. The advantages of this scheme are that it generates sharp edges with no ringing artefacts in the HR and that the models are universal enough to allow usage on wide variety of images without requirement of training and/or adaptation. Results have been generated and compared to actual high resolution images.

Index terms:

Super-Resolution, edge modelling, Laplacian pyramids.

Malay Kumar Nema, Subrata Rakshit, Subhasis Chaudhuri
Greyscale Photograph Geometry Informed by Dodging and Burning

Photographs are often used as input to image processing and computer vision tasks. Prints from the same negative may vary in intensity values due, in part, to the liberal use of dodging and burning in photography. Measurements which are invariant to these transformations can be used to extract information from photographs which is not sensitive to certain alterations in the development process. These measurements are explored through the construction of a differential geometry which is itself invariant to linear dodging and burning.

Carlos Phillips, Kaleem Siddiqi
A Discontinuity Adaptive Method for Super-Resolution of License Plates

In this paper, a super-resolution algorithm tailored to enhance license plate numbers of moving vehicles in real traffic videos is proposed. The algorithm uses the information available from multiple, sub-pixel shifted, and noisy low-resolution observations to reconstruct a high-resolution image of the number plate. The image to be super-resolved is modeled as a Markov random field and is estimated from the low-resolution observations by a graduated non-convexity optimization procedure. To preserve edges in the reconstructed number plate for better readability, a discontinuity adaptive regularizer is proposed. Experimental results are given on several real traffic sequences to demonstrate the edge preserving capability of the proposed method and its robustness to potential errors in motion and blur estimates. The method is computationally efficient as all operations are implemented locally in the image domain.

K. V. Suresh, A. N. Rajagopalan
Explicit Nonflat Time Evolution for PDE-Based Image Restoration

This article is concerned with new strategies with which explicit time-stepping procedures of PDE-based restoration models converge with a similar efficiency to implicit algorithms. Conventional explicit algorithms often require hundreds of iterations to converge. In order to overcome the difficulty and to further improve image quality, the article introduces new spatially variable constraint term and timestep size, as a method of nonflat time evolution (MONTE). It has been verified that the explicit time-stepping scheme incorporating MONTE converges in only 4-15 iterations for all restoration examples we have tested. It has proved more effective than the additive operator splitting (AOS) method in both computation time and image quality (measured in PSNR), for most cases. Since the explicit MONTE procedure is efficient in computer memory, requiring only twice the image size, it can be applied particularly for huge data sets with a great efficiency in computer memory as well.

Seongjai Kim, Song-Hwa Kwon
Decimation Estimation and Super-Resolution Using Zoomed Observations

We propose a technique for super-resolving an image from several observations taken at different camera zooms. From the set of these images, a super-resolved image of the entire scene (least zoomed) is obtained at the resolution of the most zoomed one. We model the super-resolution image as a Markov Random Field (MRF). The cost function is derived using a Maximum a posteriori (MAP) estimation method and is optimized by using gradient descent technique. The novelty of our approach is that the decimation (aliasing) matrix is obtained from the given observations themselves. Results are illustrated with real data captured using a zoom camera. Application of our technique to multiresolution fusion in remotely sensed images is shown.

Prakash P. Gajjar, Manjunath V. Joshi, Asim Banerjee, Suman Mitra

Segmentation and Classification

Description of Interest Regions with Center-Symmetric Local Binary Patterns

Local feature detection and description have gained a lot of interest in recent years since photometric descriptors computed for interest regions have proven to be very successful in many applications. In this paper, we propose a novel interest region descriptor which combines the strengths of the well-known SIFT descriptor and the LBP texture operator. It is called the

center-symmetric local binary pattern (CS-LBP) descriptor

. This new descriptor has several advantages such as tolerance to illumination changes, robustness on flat image areas, and computational efficiency. We evaluate our descriptor using a recently presented test protocol. Experimental results show that the CS-LBP descriptor outperforms the SIFT descriptor for most of the test cases, especially for images with severe illumination variations.

Marko Heikkilä, Matti Pietikäinen, Cordelia Schmid
An Automatic Image Segmentation Technique Based on Pseudo-convex Hull

This paper describes a novel method for image segmentation where image contains a dominant object. The method is applicable to a large class of images including noisy and poor quality images. It is fully automatic and has low computational cost. It may be noted that the proposed segmentation technique may not produce optimal result in some cases but it gives reasonably good result for almost all images of a large class. Hence, the method is found very useful for the applications where accuracy of the segmentation is not very critical, e.g., for global shape feature extraction, second generation coding etc.

Sanjoy Kumar Saha, Amit Kumar Das, Bhabatosh Chanda
Single-Histogram Class Models for Image Segmentation

Histograms of visual words (or textons) have proved effective in tasks such as image classification and object class recognition. A common approach is to represent an object class by a set of histograms, each one corresponding to a training exemplar. Classification is then achieved by k-nearest neighbour search over the exemplars.

In this paper we introduce two novelties on this approach: (i) we show that new compact

single

histogram models estimated optimally from the entire training set achieve an equal or superior classification accuracy. The benefit of the single histograms is that they are much more efficient both in terms of memory and computational resources; and (ii) we show that bag of visual words histograms can provide an accurate pixel-wise segmentation of an image into object class regions. In this manner the compact models of visual object classes give simultaneous segmentation and recognition of image regions.

The approach is evaluated on the MSRC database [5] and it is shown that performance equals or is superior to previous publications on this database.

F. Schroff, A. Criminisi, A. Zisserman
Learning Class-Specific Edges for Object Detection and Segmentation

Recent research into recognizing object classes (such as humans, cows and hands) has made use of edge features to hypothesize and localize class instances. However, for the most part, these edge-based methods operate solely on the geometric shape of edges, treating them equally and ignoring the fact that for certain object classes, the appearance of the object on the “inside” of the edge may provide valuable recognition cues.

We show how, for such object classes, small regions around edges can be used to classify the edge into object or non-object. This classifier may then be used to prune edges which are not relevant to the object class, and thereby improve the performance of subsequent processing. We demonstrate learning class specific edges for a number of object classes — oranges, bananas and bottles — under challenging scale and illumination variation.

Because class-specific edge classification provides a low-level analysis of the image it may be integrated into any edge-based recognition strategy without significant change in the high-level algorithms. We illustrate its application to two algorithms: (i) chamfer matching for object detection, and (ii) modulating contrast terms in MRF based object-specific segmentation. We show that performance of both algorithms (matching and segmentation) is considerably improved by the class-specific edge labelling.

Mukta Prasad, Andrew Zisserman, Andrew Fitzgibbon, M. Pawan Kumar, P. H. S. Torr
Nonparametric Neural Network Model Based on Rough-Fuzzy Membership Function for Classification of Remotely Sensed Images

A nonparametric neural network model based on Rough-Fuzzy Membership function, multilayer perceptron, and back-propagation algorithm is described. The described model is capable to deal with rough uncertainty as well as fuzzy uncertainty associated with classification of remotely sensed multi-spectral images. The input vector consists of membership values to linguistic properties while the output vector is defined in terms of rough fuzzy class membership values. This allows efficient modeling of indiscernibility and fuzziness between patterns by appropriate weights being assigned to the back-propagated errors depending upon the Rough-Fuzzy Membership values at the corresponding outputs. The effectiveness of the model is demonstrated on classification problem of IRS-P6 LISS IV images of Allahabad area. The results are compared with statistical (Minimum Distance), conventional MLP, and FMLP models.

Niraj Kumar, Anupam Agrawal
Aggregation Pheromone Density Based Image Segmentation

Ants, bees and other social insects deposit pheromone (a type of chemical) in order to communicate between the members of their community. Pheromone that causes clumping or clustering behavior in a species and brings individuals into a closer proximity is called aggregation pheromone. This paper presents a novel method for image segmentation considering the aggregation behavior of ants. Image segmentation is viewed as a clustering problem which aims to partition a given set of pixels into a number of homogenous clusters/segments. At each location of data point representing a pixel an ant is placed; and the ants are allowed to move in the search space to find out the points with higher pheromone density. The movement of an ant is governed by the amount of pheromone deposited at different points of the search space. More the deposited pheromone, more is the aggregation of ants. This leads to the formation of homogenous groups of data. The proposed algorithm is evaluated on a number of images using different cluster validity measures. Results are compared with those obtained using

average linkage

and

k-means

clustering algorithms and are found to be better.

Susmita Ghosh, Megha Kothari, Ashish Ghosh
Remote Sensing Image Classification: A Neuro-fuzzy MCS Approach

The present article proposes a new neuro-fuzzy-fusion (NFF) method for combining the output of a set of fuzzy classifiers in a multiple classifier system (MCS) framework. In the proposed method the output of a set of classifiers (i.e., fuzzy class labels) are fed as input to a neural network, which performs the fusion task. The proposed fusion technique is tested on a set of remote sensing images and compared with existing techniques. Experimental study revealed the improved classification capability of the NFF based MCS as it yielded consistently better results.

B. Uma Shankar, Saroj K. Meher, Ashish Ghosh, Lorenzo Bruzzone
A Hierarchical Approach to Landform Classification of Satellite Images Using a Fusion Strategy

There is increasing need for effective delineation of meaningfully different landforms due to the decreasing availability of experienced landform interpreters. Any procedure for automating the process of landform segmentation from satellite images offer the promise of improved consistency and reliality. We propose a hierarchical method for landform classification for classifying a wide variety of landforms. At stage 1 an image is classified as one of the three broad categories of terrain types in terms of its geomorphology, and these are: desertic/rann of kutch, coastal or fluvial. At stage 2, all different landforms within either desertic/rann of kutch , coastal or fluvial areas are identified using suitable processing. At the final stage, all outputs are fused together to obtain a final segmented output. The proposed technique is evaluated on large number of optical band satellite images that belong to aforementioned terrain types.

Aakanksha Gagrani, Lalit Gupta, B. Ravindran, Sukhendu Das, Pinaki Roychowdhury, V. K. Panchal

Image Filtering/Processing

An Improved ‘Gas of Circles’ Higher-Order Active Contour Model and Its Application to Tree Crown Extraction

A central task in image processing is to find the region in the image corresponding to an entity. In a number of problems, the region takes the form of a collection of circles,

e.g.

tree crowns in remote sensing imagery; cells in biological and medical imagery. In [1], a model of such regions, the ‘gas of circles’ model, was developed based on higher-order active contours, a recently developed framework for the inclusion of prior knowledge in active contour energies. However, the model suffers from a defect. In [1], the model parameters were adjusted so that the circles were local energy minima. Gradient descent can become stuck in these minima, producing phantom circles even with no supporting data. We solve this problem by calculating, via a Taylor expansion of the energy, parameter values that make circles into energy inflection points rather than minima. As a bonus, the constraint halves the number of model parameters, and severely constrains one of the two that remain, a major advantage for an energy-based model. We use the model for tree crown extraction from aerial images. Experiments show that despite the lack of parametric freedom, the new model performs better than the old, and much better than a classical active contour.

Péter Horváth, Ian H. Jermyn, Zoltan Kato, Josiane Zerubia
A New Extension of Kalman Filter to Non-Gaussian Priors

In the Kalman filter, the state dynamics is specified by the state equation while the measurement equation characterizes the likelihood. In this paper, we propose a generalized methodology of specifying state dynamics using the conditional density of the states given its neighbors without explicitly defining the state equation. In other words, the typically strict linear constraint on the state dynamics imposed by the state equation is relaxed by specifying the conditional density function and using it as the prior in predicting the state. Based on the above idea, we propose a sampling-based Kalman Filter (KF) for the image estimation problem. The novelty in our approach lies in the fact that we compute the mean and covariance of the prior (possibly non-Gaussian) by importance sampling. These apriori mean and covariance are fed to the update equations of the KF to estimate the aposteriori estimates of the state. We show that the estimates obtained by the proposed strategy are superior to those obtained by the traditional Kalman filter that uses the auto-regressive state model.

G. R. K. S. Subrahmanyam, A. N. Rajagopalan, R. Aravind
A Computational Model for Boundary Detection

Boundary detection in natural images is a fundamental problem in many computer vision tasks. In this paper, we argue that early stages in primary visual cortex provide ample information to address the boundary detection problem. In other words,

global visual primitives

such as object and region boundaries can be extracted using

local features

captured by the receptive fields. The anatomy of visual cortex and psychological evidences are studied to identify some of the important underlying computational principles for the boundary detection task. A scheme for boundary detection based on these principles is developed and presented. Results of testing the scheme on a benchmark set of natural images, with associated human marked boundaries, show the performance to be quantitatively competitive with existing computer vision approaches.

Gopal Datt Joshi, Jayanthi Sivaswamy
Speckle Reduction in Images with WEAD and WECD

In this paper we discuss the speckle reduction in images with the recently proposed Wavelet Embedded Anisotropic Diffusion (WEAD) and Wavelet Embedded Complex Diffusion (WECD). Both these methods are improvements over anisotropic and complex diffusion by adding wavelet based bayes shrink in its second stage. Both WEAD and WECD produces excellent results when compared with the existing speckle reduction filters. The comparative analysis with other methods were mainly done on the basis of Structural Similarity Index Matrix (SSIM) and Peak Signal to Noise Ratio (PSNR). The visual appearance of the image is also considered.

Jeny Rajan, M. R. Kaimal
Image Filtering in the Compressed Domain

Linear filtering of images is usually performed in the spatial domain using the linear convolution operation. In the case of images stored in the block DCT space, the linear filtering is usually performed on the sub-image obtained by applying an inverse DCT to the block DCT data. However, this results in severe blocking artifacts caused by the boundary conditions of individual blocks as pixel values outside the boundaries of the blocks are assumed to be zeros. To get around this problem, we propose to use the symmetric convolution operation in such a way that the operation becomes equivalent to the linear convolution operation in the spatial domain. This is achieved by operating on larger block sizes in the transform domain. We demonstrate its applications in image sharpening and removal of blocking artifacts directly in the compressed domain.

Jayanta Mukherjee, Sanjit K. Mitra
Significant Pixel Watermarking Using Human Visual System Model in Wavelet Domain

In this paper, we propose a novel algorithm for robust image watermarking by inserting a single copy of the watermark. Usually, robustness is achieved by embedding multiple copies of the watermark.The proposed method locates and watermarks ‘significant pixels’ of the image in the wavelet domain. Here, the amount of distortion at every pixel is kept within the threshold of perception by adopting ideas from Human Visual System (HVS) model. The robustness of the proposed method was verified under six different attacks. To verify the advantage of selecting the significant pixels over the highest absolute coefficients, simulations were performed under both cases with quantization of pixels as per HVS model. Simulation results show the advantage of selecting the ‘significant pixels’ for watermarking gray images as well as color images.

M. Jayalakshmi, S. N. Merchant, U. B. Desai
Early Vision and Image Processing: Evidences Favouring a Dynamic Receptive Field Model

Evidences favouring a dynamic receptive field model of retinal ganglion cells and the cells of Lateral Geniculate Nucleus (LGN) have been presented based on the perception of some brightness-contrast illusions. Of the different kinds of such stimuli, four, namely the Simultaneous Brightness-contrast, the White effect, the DeValois and DeValois checkerboard illusion and the Howe stimulus have been chosen to establish this model. The present approach attempts to carry forward the works that look upon visual perception as a step-by-step information processing task rather than a rule-based Gestalt approach and provides a new biologically inspired tool for simultaneous smoothing and edge enhancement in image processing.

Kuntal Ghosh, Sandip Sarkar, Kamales Bhaumik
An Alternative Curvature Measure for Topographic Feature Detection

The notion of topographic features like ridges, trenches, hills, etc. is formed by visualising the 2D image function as a surface in 3D space. Hence, properties of such a surface can be used to detect features from images. One such property, the curvature of the image surface, can be used to detect features characterised by a sharp bend in the surface. Curvature based feature detection requires an efficient technique to estimate/calculate the surface curvature. In this paper, we present an alternative measure for curvature and provide an analysis of the same to determine its scope. Feature detection algorithms using this measure are formulated and two applications are chosen to demonstrate their performance. The results show good potential of the proposed measure in terms of efficiency and scope.

Jayanthi Sivaswamy, Gopal Datt Joshi, Siva Chandra
Nonlinear Enhancement of Extremely High Contrast Images for Visibility Improvement

This paper presents a novel image enhancement algorithm using a multilevel windowed inverse sigmoid (MWIS) function for rendering images captured under extremely non uniform lighting conditions. MWIS based image enhancement is a combination of three processes viz. adaptive intensity enhancement, contrast enhancement and color restoration. Adaptive intensity enhancement uses the non linear transfer function to pull up the intensity of underexposed pixels and to pull down the intensity of overexposed pixels of the input image. Contrast enhancement tunes the intensity of each pixel’s magnitude with respect to its surrounding pixels. A color restoration process based on relationship between spectral bands and the luminance of the original image is applied to convert the enhanced intensity image back to a color image.

K. Vijayan Asari, Ender Oguslu, Saibabu Arigela

Graphics and Visualization

Culling an Object Hierarchy to a Frustum Hierarchy

Visibility culling of a scene is a crucial stage for interactive graphics applications, particularly for scenes with thousands of objects. The culling time must be small for it to be effective. A hierarchical representation of the scene is used for efficient culling tests. However, when there are multiple view frustums (as in a tiled display wall), visibility culling time becomes substantial and cannot be hidden by pipelining it with other stages of rendering. In this paper, we address the problem of culling an object to a hierarchically organized set of frustums, such as those found in tiled displays and shadow volume computation. We present an adaptive algorithm to unfold the twin hierarchies at every stage in the culling procedure. Our algorithm computes from-point visibility and is conservative. The precomputation required is minimal, allowing our approach to be applied for dynamic scenes as well. We show performance of our technique over different variants of culling a scene to multiple frustums. We also show results for dynamic scenes.

Nirnimesh, Pawan Harish, P. J. Narayanan
Secondary and Tertiary Structural Fold Elucidation from 3D EM Maps of Macromolecules

Recent advances in three dimensional Electron Microscopy (3D EM) have given an opportunity to look at the structural building blocks of proteins (and nucleic acids) at varying resolutions. In this paper, we provide algorithms to detect the secondary structural motifs (

α

-helices and

β

-sheets) from proteins for which the volumetric maps are reconstructed at 5−10 Å resolution. Additionally, we show that when the resolution is coarser than 10 Å, some of the tertiary structural motifs can be detected from 3D EM. For both these algorithms, we employ the tools from computational geometry and differential topology, specifically the computation of stable/unstable manifolds of certain critical points of the distance function induced by the molecular surface. With the results in this paper, we thus draw a connection between the mathematically well-defined concepts with the bio-chemical structural folds of proteins.

Chandrajit Bajaj, Samrat Goswami
Real-Time Streaming and Rendering of Terrains

Terrains and other geometric models have been traditionally stored locally. Their remote access presents the characteristics that are a combination of file serving and realtime streaming like audio-visual media. This paper presents a terrain streaming system based upon a client server architecture to handle heterogeneous clients over low-bandwidth networks. We present an efficient representation for handling terrains streaming. We design a client-server system that utilizes this representation to stream virtual environments containing terrains and overlayed geometry efficiently. We handle dynamic entities in environment and the synchronization of the same between multiple clients. We also present a method of sharing and storing terrain annotations for collaboration between multiple users. We conclude by presenting preliminary performance data for the streaming system.

Soumyajit Deb, Shiben Bhattacharjee, Suryakant Patidar, P. J. Narayanan
Ad-Hoc Multi-planar Projector Displays

High-resolution portable projectors have become commodity items now to own – but not to use. It is not always possible to find a display area where the camera can be properly aligned so that an undistorted image be seen. We present a method to project an undistorted image using a digital projector on a piecewise-planar display area.

We use uncalibrated structured light ranging to segment the unknown projection area and further compute the homographies that map the projector space to the camera space through each of the planes. The edge detection and point-correspondences are subpixel precise. Finally, we use these computed homographies to pre-warp the display image so that a distortion-free image is visible. Our results show a seamless and correct rectification with accurate segmentation of the planes.

Kashyap Paidimarri, Sharat Chandran
PACE: Polygonal Approximation of Thick Digital Curves Using Cellular Envelope

A novel algorithm to derive an approximate cellular envelope of an arbitrarily thick digital curve on a 2D grid is proposed in this paper. The concept of “cellular envelope” is newly introduced in this paper, which is defined as the smallest set of cells containing the given curve, and hence bounded by two tightest (inner and outer) isothetic polygons on the grid. Contrary to the existing algorithms that use thinning as preprocessing for a digital curve with changing thickness, in our work, an optimal cellular envelope (smallest in the number of constituent cells) that entirely contains the given curve is constructed based on a combinatorial technique. The envelope, in turn, is further analyzed to determine polygonal approximation of the curve as a sequence of cells using certain attributes of digital straightness. Since a real-world curve/curve-shaped object with varying thickness and unexpected disconnectedness is unsuitable for the existing algorithms on polygonal approximation, the curve is encapsulated by the cellular envelope to enable the polygonal approximation. Owing to the implicit Euclidean-free metrics and combinatorial properties prevailing in the cellular plane, implementation of the proposed algorithm involves primitive integer operations only, leading to fast execution of the algorithm. Experimental results including CPU time reinforce the elegance and efficacy of the proposed algorithm.

Partha Bhowmick, Arindam Biswas, Bhargab B. Bhattacharya
Texture Guided Realtime Painterly Rendering of Geometric Models

We present a real-time painterly rendering technique for geometric models. The painterly appearance and the impression of geometric detail is created by effectively rendering several brush strokes. Unlike existing techniques, we use the textures of the models to come up with the features and the positions of strokes in 3D object space. The strokes have fixed locations on the surfaces of the models during animation, this enables frame to frame coherence. We use vertex and fragment shaders to render strokes for real-time performance. The strokes are rendered as sprites in two-dimensions, analogous to the way artists paint on canvas. While animating, strokes may get cluttered since they are closely located on screen. Existing techniques ignore this issue; we address it by developing a level of detail scheme that maintains a uniform stroke density in screen space. We achieve painterly rendering in real-time with a combination of object space positioning and image space rendering of strokes. We also maintain consistency of rendering between frames . We illustrate our method with images and performance results.

Shiben Bhattacharjee, Neeharika Adabala
Real-Time Camera Walks Using Light Fields

An interesting alternative to traditional geometry based rendering is Light Field Rendering [1,2]. A camera gantry is used to acquire authentic imagery and detailed novel views are synthetically generated from unknown viewpoints. The drawback is the significant data on disk.

Moving from static images, a walkthrough or a

camera walk

through the implied virtual world is often desirable but the repeated access of the large data makes the task increasingly difficult. We note that although potentially infinite walkthroughs are possible, for any given path, only a subset of the previously stored light field is required. Our prior work [3] exploited this and reduced the main memory requirement. However, considerable computational burden is encountered in processing even this reduced subset. This negatively impacts real-time rendering.

In this paper, we subdivide the image projection plane into “cells,” each of which gets all its radiance information from the cached portions of the light field at select “nodal points.” Once these cells are defined, the cache is visited

systematically

to find the radiance efficiently. The net result is

real-time

camera walks.

Biswarup Choudhury, Deepali Singla, Sharat Chandran
Massive Autonomous Characters: Animation and Interaction

This article reports the result of an experiment which integrates GPU-accelerated skinning, sprite animation, and character behavior control. The experiment shows that the existing techniques can be neatly integrated: thousands of characters are animated at real-time and the overall motion is natural like fluid. The result is attractive for games, especially where a huge number of non-player characters such as animals or monsters should be animated.

Ingu Kang, JungHyun Han
Clickstream Visualization Based on Usage Patterns

Most clickstream visualization techniques display web users’ clicks by highlighting paths in a graph of the underlying web site structure. These techniques do not scale to handle high volume web usage data. Further, historical usage data is not considered. The work described in this paper differs from other work in the following aspect. Fuzzy clustering is applied to historical usage data and the result imaged in the form of a point cloud. Web navigation data from active users are shown as animated paths in this point cloud. It is clear that when many paths get attracted to one of the clusters, that particular cluster is currently “hot.” Further as sessions terminate, new sessions are incrementally incorporated into the point cloud. The complete process is closely coupled to the fuzzy clustering technique and makes effective use of clustering results. The method is demonstrated on a very large set of web log records consisting of over half a million page clicks.

Srinidhi Kannappady, Sudhir P. Mudur, Nematollaah Shiri
GPU Objects

Points, lines, and polygons have been the fundamental primitives in graphics. Graphics hardware is optimized to handle them in a pipeline. Other objects are converted to these primitives before rendering. Programmable GPUs have made it possible to introduce a wide class of computations on each vertex and on each fragment. In this paper, we outline a procedure to accurately draw high-level procedural elements efficiently using the GPU. The CPU and the vertex shader setup the drawing area on screen and pass the required parameters. The pixel shader uses ray-casting to compute the 3D point that projects to it and shades it using a general shading model. We demonstrate the fast rendering of 2D and 3D primitives like circle, conic, triangle, sphere, quadric, box, etc., with a combination of specularity, refraction, and environment mapping. We also show combination of objects, like Constructive Solid Geometry (CSG) objects, can be rendered fast on the GPU. We believe customized GPU programs for a new set of high-level primitives – which we call

GPU Objects

– is a way to exploit the power of GPUs and to provide interactive rendering of scenes otherwise considered too complex.

Sunil Mohan Ranta, Jag Mohan Singh, P. J. Narayanan
Progressive Decomposition of Point Clouds Without Local Planes

We present a reordering-based procedure for the multiresolution decomposition of a point cloud in this paper. The points are first reordered recursively based on an optimal pairing. Each level of reordering induces a division of the points into approximation and detail values. A balanced quantization at each level results in further compression. The original point cloud can be reconstructed without loss from the decomposition. Our scheme does not require local reference planes for encoding or decoding and is progressive. The points also lie on the original manifold at all levels of decomposition. The scheme can be used to generate different discrete LODs of the point set with fewer points in each at low BPP numbers. We also present a scheme for the progressive representation of the point set by adding the detail values selectively. This results in the progressive approximation of the original shape with dense points even at low BPP numbers. The shape gets refined as more details are added and can reproduce the original point set. This scheme uses a wavelet decomposition of the detail coefficients of the multiresolution decomposition. Progressiveness is achieved by including different levels of the DWT decomposition at all multiresolution representation levels. We show that this scheme can generate much better approximations at equivalent BPP numbers for the point set.

Jag Mohan Singh, P. J. Narayanan

Video Analysis

Task Specific Factors for Video Characterization

Factorization methods are used extensively in computer vision for a wide variety of tasks. Existing factorization techniques extract factors that meet requirements such as compact representation, interpretability, efficiency, dimensionality reduction

etc

. However, when the extracted factors lack interpretability and are large in number, identification of factors that cause the data to exhibit certain properties of interest is useful in solving a variety of problems. Identification of such factors or

factor selection

has interesting applications in data synthesis and recognition. In this paper simple and efficient methods are proposed, for identification of factors of interest from a pool of factors obtained by decomposing videos represented as tensors into their constituent low rank factors. The method is used to select factors that enable appearance based facial expression transfer and facial expression recognition. Experimental results demonstrate that the factor selection facilitates efficient solutions to these problems with promising results.

Ranjeeth Kumar, S. Manikandan, C. V. Jawahar
Video Shot Boundary Detection Algorithm

We present a newly developed algorithm for automatically segmenting videos into basic shot units. A basic shot unit can be understood as an unbroken sequence of frames taken from one camera. At first we calculate the frame difference by using the local histogram comparison, and then we dynamically scale the frame difference by Log-formula to compress and enhance the frame difference. Finally we detect the shot boundaries by the newly proposed shot boundary detection algorithm which it is more robust to camera or object motion, and many flashlight events. The proposed algorithms are tested on the various video types and experimental results show that the proposed algorithm are effective and reliably detects shot boundaries.

Kyong-Cheol Ko, Young Min Cheon, Gye-Young Kim, Hyung –Il Choi, Seong-Yoon Shin, Yang-Won Rhee
Modeling of Echocardiogram Video Based on Views and States

In this work we propose a hierarchical state-based model for representing an echocardiogram video using objects present and their dynamic behavior. The modeling is done on the basis of the different types of views like short axis view, long axis view, apical view, etc. For view classification, an artificial neural network is trained with the histogram of a

‘region of interest’

of each video frame. A state transition diagram is used to represent the states of objects in different views and corresponding transition from one state to another. States are detected with the help of synthetic M-mode images. In contrast to traditional single M-mode approach, we propose a new approach named as ‘

Sweep M-mode

’ for the detection of states.

Aditi Roy, Shamik Sural, J. Mukherjee, A. K. Majumdar
Video Completion for Indoor Scenes

In this paper, we present a new approach for object removal and video completion of indoor scenes. In indoor images, the frames are not affine related. The region near the object to be removed can have multiple planes with sharply different motions. Dense motion estimation may fail for such scenes due to missing pixels. We use feature tracking to find dominant motion between two frames. The geometry of the motion of multiple planes is used to segment the motion layers into component planes. The homography corresponding to each hole pixel is used to warp a frame in the future or past for filling it. We show the application of our technique on some typical indoor videos.

Vardhman Jain, P. J. Narayanan
Reducing False Positives in Video Shot Detection Using Learning Techniques

Video has become an interactive medium of daily use today. However, the sheer volume of the data makes it extremely difficult to browse and find required information. Organizing the video and locating required information effectively and efficiently presents a great challenge to the video retrieval community. This demands a tool which would break down the video into smaller and manageable units called shots.

Traditional shot detection methods use pixel difference, histograms, or temporal slice analysis to detect hard-cuts and gradual transitions. However, systems need to be robust to sequences that contain dramatic illumination changes, shaky camera effects, and special effects such as fire, explosion, and synthetic screen split manipulations. Traditional systems produce false positives for these cases; i.e., they claim a shot break when there is none.

We propose a shot detection system which reduces false positives even if all the above effects are

cumulatively

present in one sequence. Similarities between successive frames are computed by finding the correlation and is further analyzed using a wavelet transformation. A final filtering step is to use a trained Support Vector Machine (SVM). As a result, we achieve better accuracy (while retaining speed) in detecting shot-breaks when compared with other techniques.

Nithya Manickam, Aman Parnami, Sharat Chandran
Text Driven Temporal Segmentation of Cricket Videos

In this paper we address the problem of temporal segmentation of videos. We present a multi-modal approach where clues from different information sources are merged to perform the segmentation. Specifically, we segment videos based on textual descriptions or commentaries of the action in the video. Such a parallel information is available for cricket videos, a class of videos where visual feature based (

bottom-up

) scene segmentation algorithms generally fail, due to lack of visual dissimilarity across space and time. With additional

top-down

information from textual domain, these ambiguities could be resolved to a large extent. The video is segmented to meaningful entities or scenes, using the scene level descriptions provided by the commentary. These segments can then be automatically annotated with the respective descriptions. This allows for a semantic access and retrieval of video segments, which is difficult to obtain from existing visual feature based approaches. We also present techniques for automatic highlight generation using our scheme.

K. Pramod Sankar, Saurabh Pandey, C. V. Jawahar

Tracking and Surveillance

Learning Efficient Linear Predictors for Motion Estimation

A novel object representation for tracking is proposed. The tracked object is represented as a constellation of spatially localised linear predictors which are learned on a single training image. In the learning stage, sets of pixels whose intensities allow for optimal least square predictions of the transformations are selected as a support of the linear predictor.

The approach comprises three contributions: learning object specific linear predictors, explicitly dealing with the predictor precision – computational complexity trade-off and selecting a view-specific set of predictors suitable for global object motion estimate. Robustness to occlusion is achieved by RANSAC procedure.

The learned tracker is very efficient, achieving frame rate generally higher than 30 frames per second despite the Matlab implementation.

Jiří Matas, Karel Zimmermann, Tomáš Svoboda, Adrian Hilton
Object Localization by Subspace Clustering of Local Descriptors

This paper presents a probabilistic approach for object localization which combines subspace clustering with the selection of discriminative clusters. Clustering is often a key step in object recognition and is penalized by the high dimensionality of the descriptors. Indeed, local descriptors, such as SIFT, which have shown excellent results in recognition, are high-dimensional and live in different low-dimensional subspaces. We therefore use a subspace clustering method called High-Dimensional Data Clustering (HDDC) which overcomes the curse of dimensionality. Furthermore, in many cases only a few of the clusters are useful to discriminate the object. We, thus, evaluate the discriminative capacity of clusters and use it to compute the probability that a local descriptor belongs to the object. Experimental results demonstrate the effectiveness of our probabilistic approach for object localization and show that subspace clustering gives better results compared to standard clustering methods. Furthermore, our approach outperforms existing results for the Pascal 2005 dataset.

C. Bouveyron, J. Kannala, C. Schmid, S. Girard
Integrated Tracking and Recognition of Human Activities in Shape Space

Activity recognition consists of two fundamental tasks: tracking the features/objects of interest, and recognizing the activities. In this paper, we show that these two tasks can be integrated within the framework of a dynamical feedback system. In our proposed method, the recognized activity is continuously adapted based on the output of the tracking algorithm, which in turn is driven by the identity of the recognized activity. A non-linear, non-stationary stochastic dynamical model on the “shape” of the objects participating in the activities is used to represent their motion, and forms the basis of the tracking algorithm. The tracked observations are used to recognize the activities by comparing against a prior database. Measures designed to evaluate the performance of the tracking algorithm serve as a feedback signal. The method is able to automatically detect changes and switch between activities happening one after another, which is akin to segmenting a long sequence into homogeneous parts. The entire process of tracking, recognition, change detection and model switching happens recursively as new video frames become available. We demonstrate the effectiveness of the method on real-life video and analyze its performance based on such metrics as detection delay and false alarm.

Bi Song, Amit K. Roy-Chowdhury, N. Vaswani
Inverse Composition for Multi-kernel Tracking

Existing multi-kernel tracking methods are based on a forwards additive motion model formulation. However this approach suffers from the need to estimate an update matrix for each iteration. This paper presents a general framework that extends the existing approach and that allows to introduce a new inverse compositional formulation which shifts the computation of the update matrix to a one time initialisation step. The proposed approach thus reduces the computational complexity of each iteration, compared to the existing forwards approach. The approaches are compared both in terms of algorithmic complexity and quality of the estimation.

Rémi Megret, Mounia Mikram, Yannick Berthoumieu
Tracking Facial Features Using Mixture of Point Distribution Models

We present a generic framework to track shapes across large variations by learning non-linear shape manifold as overlapping, piecewise linear subspaces. We use landmark based shape analysis to train a Gaussian mixture model over the aligned shapes and learn a Point Distribution Model(PDM) for each of the mixture components. The target shape is searched by first maximizing the mixture probability density for the local feature intensity profiles along the normal followed by constraining the global shape using the most probable PDM cluster. The feature shapes are robustly tracked across multiple frames by dynamically switching between the PDMs. Our contribution is to apply ASM to the task of tracking shapes involving wide aspect changes and generic movements. This is achieved by incorporating shape priors that are learned over non-linear shape space and using them to learn the plausible shape space. We demonstrate the results on tracking facial features and provide several empirical results to validate our approach. Our framework runs close to real time at 25 frames per second and can be extended to predict pose angles using Mixture of Experts.

Atul Kanaujia, Yuchi Huang, Dimitris Metaxas
Improved Kernel-Based Object Tracking Under Occluded Scenarios

A successful approach for object tracking has been kernel based object tracking [1] by Comaniciu

et al.

. The method provides an effective solution to the problems of representation and localization in tracking. The method involves representation of an object by a feature histogram with an isotropic kernel and performing a gradient based mean shift optimization for localizing the kernel. Though robust, this technique fails under cases of occlusion. We improve the kernel based object tracking by performing the localization using a

generalized (bidirectional) mean shift

based optimization. This makes the method resilient to occlusions. Another aspect related to the localization step is handling of scale changes by varying the bandwidth of the kernel. Here, we suggest a technique based on SIFT features [2] by Lowe to enable change of bandwidth of the kernel even in the presence of occlusion. We demonstrate the effectiveness of the techniques proposed through extensive experimentation on a number of challenging data sets.

Vinay P. Namboodiri, Amit Ghorawat, Subhasis Chaudhuri
Spatio-temporal Discovery: Appearance + Behavior = Agent

Experiments in infant category formation indicate a strong role for temporal continuity and change in perceptual categorization. Computational approaches to model discovery in vision have traditionally focused on static images, with appearance features such as shape playing an important role. In this work, we consider integrating agent behaviors with shape for the purpose of agent discovery. Improved algorithms for video segmentation and tracking under occlusion enable us to construct models that characterize agents in terms of motion and interaction with other objects. We present a preliminary approach for discovering agents based on a combination of appearance and motion histories. Using uncalibrated camera images, we characterize objects discovered in the scene by their shape and motion attributes, and cluster these using agglomerative hierarchical clustering. Even with very simple feature sets, initial results suggest that the approach forms reasonable clusters for diverse categories such as people, and for very distinct clusters (animals), and performs above average on other classes.

Prithwijit Guha, Amitabha Mukerjee, K. S. Venkatesh
Fusion of Thermal Infrared and Visible Spectrum Video for Robust Surveillance

This paper presents an approach of fusing the information provided by visible spectrum video with that of thermal infrared video to tackle video processing challenges such as object detection and tracking for increasing the performance and robustness of the surveillance system. An enhanced object detection strategy using gradient information along with background subtraction is implemented with efficient fusion based approach to handle typical problems in both the domains. An intelligent fusion approach using Fuzzy logic and Kalman filtering technique is proposed to track objects and obtain fused estimate according to the reliability of the sensors. Appropriate measurement parameters are identified to determine the measurement accuracy of each sensor. Experimental results are shown on some typical scenarios of detection and tracking of pedestrians.

Praveen Kumar, Ankush Mittal, Padam Kumar
Dynamic Events as Mixtures of Spatial and Temporal Features

Dynamic events comprise of spatiotemporal atomic units. In this paper we model them using a mixture model. Events are represented using a framework based on the Mixture of Factor Analyzers (MFA) model. It is to be noted that our framework is generic and is applicable for any mixture modelling scheme. The MFA, used to demonstrate the novelty of our approach, clusters events into spatially coherent mixtures in a low dimensional space. Based the observations that, (i) events comprise of varying degrees of spatial and temporal characteristics, and (ii) the number of mixtures determines the composition of these features, a method that incorporates models with varying number of mixtures is proposed. For a given event, the relative importance of each model component is estimated, thereby choosing the appropriate feature composition. The capabilities of the proposed framework are demonstrated with an application: recognition of events such as hand gestures, activities.

Karteek Alahari, C. V. Jawahar
Discriminative Actions for Recognising Events

This paper presents an approach to identify the importance of different parts of a video sequence from the recognition point of view. It builds on the observations that: (1) events consist of more fundamental (or atomic) units, and (2) a discriminant-based approach is more appropriate for the recognition task, when compared to the standard modelling techniques, such as PCA, HMM, etc. We introduce

discriminative actions

which describe the usefulness of the fundamental units in distinguishing between events. We first extract actions to capture the fine characteristics of individual parts in the events. These actions are modelled and their usefulness in discriminating between events is estimated as a score. The score highlights the important parts (or actions) of the event from the recognition aspect. Applicability of the approach on different classes of events is demonstrated along with a statistical analysis.

Karteek Alahari, C. V. Jawahar

Recognition (Face/Gesture/Object)

Continuous Hand Gesture Segmentation and Co-articulation Detection

Gesture segmentation is an extremely difficult task due to both the multitude of possible gesture variations in spatio-temporal space and the co-articulation of successive gestures. In this paper, a robust framework for this problem is proposed which has been used to segment out component gestures from a continuous stream of gestures using finite state machine and motion features in a vision based platform.

M. K. Bhuyan, D. Ghosh, P. K. Bora
OBJCUT for Face Detection

This paper proposes a novel, simple and efficient method for face segmentation which works by coupling face detection and segmentation in a single framework. We use the OBJCUT [1] formulation that allows for a smooth combination of object detection and Markov Random Field for segmentation, to produce a real-time face segmentation. It should be noted that our algorithm is extremely efficient and runs in real time.

Jonathan Rihan, Pushmeet Kohli, Philip H. S. Torr
Selection of Wavelet Subbands Using Genetic Algorithm for Face Recognition

In this paper, a novel representation called the

subband face

is proposed for face recognition. The subband face is generated from selected subbands obtained using wavelet decomposition of the original face image. It is surmised that certain subbands contain information that is more significant for discriminating faces than other subbands. The problem of subband selection is cast as a combinatorial optimization problem and genetic algorithm (GA) is used to find the optimum subband combination by maximizing Fisher ratio of the training features. The performance of the GA selected subband face is evaluated using three face databases and compared with other wavelet-based representations.

Vinod Pathangay, Sukhendu Das
Object Recognition Using Reflex Fuzzy Min-Max Neural Network with Floating Neurons

This paper proposes an object recognition system that is invariant to rotation, translation and scale and can be trained under partial supervision. The system is divided into two sections namely, feature extraction and recognition sections. Feature extraction section uses proposed rotation, translation and scale invariant features. Recognition section consists of a novel Reflex Fuzzy Min-Max Neural Network (RFMN) architecture with “Floating Neurons”. RFMN is capable to learn mixture of labeled and unlabeled data which enables training under partial supervision. Learning under partial supervision is of high importance for the practical implementation of pattern recognition systems, as it may not be always feasible to get a fully labeled dataset for training or cost to label all samples is not affordable. The proposed system is tested on shape data-base available online, Marathi and Bengali digits. Results are compared with “General Fuzzy Min-Max Neural Network” proposed by Gabrys and Bargiela.

A. V. Nandedkar, P. K. Biswas
Extended Fitting Methods of Active Shape Model for the Location of Facial Feature Points

In this study, we propose three extended fitting methods to the standard ASM(active shape model). Firstly, profiles are extended from 1D to 2D; Secondly, profiles of different landmarks are constructed individually; Thirdly, length of the profilesis determined adaptively with the change of level during searching, and the displacements in the last level are constrained. Each method and the combination of three methods are tested on the SJTU(Shanghai Jiaotong University) face database. In all cases, compared to the standard ASM, each method improves the accuracy or speed in a way, and the combination of three methods improves the accuracy and speed greatly.

Chunhua Du, Jie Yang, Qiang Wu, Tianhao Zhang, Huahua Wang, Lu Chen, Zheng Wu
Pose Invariant Generic Object Recognition with Orthogonal Axis Manifolds in Linear Subspace

This paper addresses the problem of pose invariant Generic Object Recognition by modeling the perceptual capability of human beings. We propose a novel framework using a combination of appearance and shape cues to recognize the object class and viewpoint (axis of rotation) as well as determine its pose (angle of view). The appearance model of the object from multiple viewpoints is captured using Linear Subspace Analysis techniques and is used to reduce the search space to a few rank-ordered candidates. We have used a decision-fusion based combination of 2D PCA and ICA to integrate the complementary information of classifiers and improve recognition accuracy. For matching based on shape features, we propose the use of distance transform based correlation. A decision fusion using ‘Sum Rule’ of 2D PCA and ICA subspace classifiers, and distance transform based correlation is then used to verify the correct object class and determine its viewpoint and pose. Experiments were conducted on COIL-100 and IGOIL (IITM Generic Object Image Library) databases which contain objects with complex appearance and shape characteristics. IGOIL database was captured to analyze the appearance manifolds along two orthogonal axes of rotation.

Manisha Kalra, P. Deepti, R. Abhilash, Sukhendu Das
A Profilometric Approach to 3D Face Reconstruction and Its Application to Face Recognition

3D Face Recognition is an active area of research for past several years. For a 3D face recognition system one would like to have an accurate as well as low cost setup for constructing 3D face model. In this paper, we use Profilometry approach to obtain a 3D face model. This method gives a low cost solution to the problem of acquiring 3D data and the 3D face models generated by this method are sufficiently accurate. We also develop an algorithm that can use the 3D face model generated by the above method for the recognition purpose.

Surath Raj Mitra, K. R. Ramakrishnan
Face Recognition Technique Using Symbolic Linear Discriminant Analysis Method

Techniques that can introduce low dimensional feature representation with enhanced discriminatory power are important in face recognition systems. This paper presents one of the symbolic factor analysis method i.e., symbolic Linear Discriminant Analysis (symbolic LDA) method for face representation and recognition. Classical factor analysis methods extract features, which are single valued in nature to represent face images. These single valued variables may not be able to capture variation of each feature in all the images of same subject; this leads to loss of information. The symbolic Linear Discriminant Analysis Algorithm extracts most discriminating interval type features; they optimally discriminate among the classes represented in the training set. The proposed method has been successfully tested for face recognition using two databases, ORL and Yale Face database. The effectiveness of the proposed method is shown in terms of comparative performance against popular classical factor analysis methods such as eigenface method and Linear Discriminant Analysis method. Experimental results show that symbolic LDA outperforms the classical factor analysis methods.

P. S. Hiremath, C. J. Prabhakar
Two-Dimensional Optimal Transform for Appearance Based Object Recognition

This paper proposes a new method of feature extraction called two-dimensional optimal transform (2D-OPT) useful for appearance based object recognition. The 2D-OPT method provides a better discrimination power between classes by maximizing the distance between class centers. We first argue that the proposed 2D-OPT method works in the row direction of images and subsequently we propose an alternate 2D-OPT which works in the column direction of images. To straighten out the problem of massive memory requirements of the 2D-OPT method and as well the alternate 2D-OPT method, we introduce bi-projection 2D-OPT. The introduced bi-projection 2D-OPT method has the advantage of higher recognition rate, lesser memory requirements and better computing performance than the standard PCA/2D-PCA/Generalized 2D-PCA method, and the same has been revealed through extensive experimentations conducted on COIL-20 dataset and AT&T face dataset.

B. H. Shekar, D. S. Guru, P. Nagabhushan
Computing Eigen Space from Limited Number of Views for Recognition

This paper presents a novel approach to construct an eigen space representation from limited number of views, which is equivalent to the one obtained from large number of images captured from multiple view points. This procedure implicitly incorporates a novel view synthesis algorithm in the eigen space construction process. Inherent information in an appearance representation is enhanced using geometric computations. We experimentally verify the performance for orthographic, affine and projective camera models. Recognition results on the COIL and SOIL image database are promising.

Paresh K. Jain, P. Kartik Rao, C. V. Jawahar
Face Recognition from Images with High Pose Variations by Transform Vector Quantization

Pose and illumination variations are the most dominating and persistent challenges haunting face recognition, leading to various highly-complex 2D and 3D model based solutions. We present a novel transform vector quantization (TVQ) method which is fast and accurate and yet significantly less complex than conventional methods. TVQ offers a flexible and customizable way to capture the pose variations. Use of transform such as DCT helps compressing the image data to a small feature vector and judicious use of vector quantization helps to capture the various poses into compact codebooks. A confidence measure based sequence analysis allows the proposed TVQ method to accurately recognize a person in only 3-9 frames (less than 1/2 a second) from a video sequence of images with wide pose variations.

Amitava Das, Manoj Balwani, Rahul Thota, Prasanta Ghosh

Compression

An Integrated Approach for Downscaling MPEG Video

Digital video databases are widely available in compressed format. In many applications such as video browsing, picture in picture, video conferencing etc. data transfer at lower bit rate is required. This requires downscaling of the video before transmission. The conventional spatial domain approach for downscaling video is computationally very expensive. The computation can greatly be reduced if downscaling and inverse motion compensation (IMC) are performed in Discrete Cosine Transform (DCT) domain. There are many algorithms in the literature to perform IMC in the DCT domain. In this paper, we propose an efficient integrated technique to perform IMC and downscaling in DCT domain. This new approach results in significant improvement in computational complexity.

Sudhir Porwal, Jayanta Mukherjee
DCT Domain Transcoding of H.264/AVC Video into MPEG-2 Video

As the number of different video compression standards increase, there is a growing need for conversion between video formats coded in different standards. H.264/AVC is a newly emerging video coding standard which achieves better video quality at reduced bit rate than other standards. The standalone media players that are available in the market do not support H.264 video playback. In this paper, we present novel techniques that can achieve conversion of pre-coded video in H.264/AVC standard to MPEG-2 standard directly in the compressed domain. Experimental results show that the proposed approach can produce transcoded video with quality comparable to the pixel-domain approach at significantly reduced cost.

Vasant Patil, Tummala Kalyani, Atul Bhartia, Rajeev Kumar, Jayanta Mukherjee
Adaptive Scalable Wavelet Difference Reduction Method for Efficient Image Transmission

This paper presents a scalable image transmission scheme based on the wavelet-based coding technique supporting region of interest properties. The proposed scheme scalable WDR (SWDR), is based on the wavelet difference reduction scheme, progresses adaptively to get different resolution images at any bit rate required and is supported with the spatial and SNR scalability. The method is developed for the limited bandwidth network where the image quality and data compression are mopst important. Simulations are performed on the medical images, satellite images and Standard test images like Barbara, fingerprint images. The simulation results show that the proposed scheme is up to 20-40% better than other famous scalable schemes like scalable SPIHT coding schemes in terms of signal to noise ratio values (dB) and reduces execution time around 40% in various resolutions. Thus, the proposed scalable coding scheme becomes increasingly important.

T. S. Bindulal, M. R. Kaimal
GAP-RBF Based NR Image Quality Measurement for JPEG Coded Images

In this paper, we present a growing and pruning radial basis function based no-reference (NR) image quality model for JPEG-coded images. The quality of the images are estimated without referring to their original images. The features for predicting the perceived image quality are extracted by considering key human visual sensitivity factors such as edge amplitude, edge length, background activity and background luminance. Image quality estimation involves computation of functional relationship between HVS features and subjective test scores. Here, the problem of quality estimation is transformed to a function approximation problem and solved using GAP-RBF network. GAP-RBF network uses sequential learning algorithm to approximate the functional relationship. The computational complexity and memory requirement are less in GAP-RBF algorithm compared to other batch learning algorithms. Also, the GAP-RBF algorithm finds a compact image quality model and does not require retraining when the new image samples are presented. Experimental results prove that the GAP-RBF image quality model does emulate the mean opinion score (MOS). The subjective test results of the proposed metric are compared with JPEG no-reference image quality index as well as full-reference structural similarity image quality index and it is observed to outperform both.

R. Venkatesh Babu, S. Suresh
A Novel Error Resilient Temporal Adjacency Based Adaptive Multiple State Video Coding over Error Prone Channels

Video streaming applications have been gaining interest rapidly in various perspectives from entertainment to e-learning. Practically, these applications suffer from inevitable loss in the transmission channels. Hence it is a challenging task to improve the quality of video streaming over the error prone channels. Multiple Description Coding (MDC) is a promising error resilient coding scheme which sends two or more descriptions of the source to the receiver to improve the quality of video streaming over error prone channels. Depending on the number of descriptions received, the reconstruction-distortion gets reduced at the receiver. Multiple State Video Coding (MSVC) is a MDC scheme based on frame-wise splitting of the video sequence into two or more sub-sequences. Each of these sub-sequences is encoded separately to generate descriptions, which can be decoded independently on reception. Basic MSVC is based on the separation of frames in a video into odd and even frames and sending each part over a different path. The drawbacks or certain subtleties of the basic MSVC such as lack of meaningful basis behind the frame wise splitting, inability to support adaptive streaming effectively, less error resiliency are brought out and discussed. Thus to overcome them and to improve the quality of video streaming, the design of a novel MSVC scheme based on the temporal adjacency between video frames is proposed in this paper. This temporal adjacency based splitting of the video stream into N sub-sequences also enables the proposed scheme to adapt to varying bandwidths in heterogeneous environments effectively. The simulation results show that the proposed scheme also outperforms Single State Video Coding (SSVC) scheme in terms of the sensitivity of perception of the reconstructed video sequence, under various loss scenarios.

M. Ragunathan, C. Mala
Adaptive Data Hiding in Compressed Video Domain

In this paper we propose a new adaptive block based compressed domain data hiding scheme which can embed relatively large number of secret bits without significant perceptual distortion in video domain. Macro blocks are selected for embedding on the basis of low inter frame velocity. From this subset, the blocks with high prediction error are selected for embedding. The embedding is done by modifying the quantized DCT AC coefficients in the compressed domain. The number of coefficients (both zero and non zero) used in embedding is adaptively determined using relative strength of the prediction error block. Experimental results show that this blind scheme can embed a relatively large number of bits without degrading significant video quality with respect to Human Visual System (HVS).

Arijit Sur, Jayanta Mukherjee

Document Processing/OCR

Learning Segmentation of Documents with Complex Scripts

Most of the state-of-the-art segmentation algorithms are designed to handle complex document layouts and backgrounds, while assuming a simple script structure such as in Roman script. They perform poorly when used with Indian languages, where the components are not strictly collinear. In this paper, we propose a document segmentation algorithm that can handle the complexity of Indian scripts in large document image collections. Segmentation is posed as a graph cut problem that incorporates the apriori information from script structure in the objective function of the cut. We show that this information can be learned automatically and be adapted within a collection of documents (a book) and across collections to achieve accurate segmentation. We show the results on Indian language documents in Telugu script. The approach is also applicable to other languages with complex scripts such as Bangla, Kannada, Malayalam, and Urdu.

K. S. Sesh Kumar, Anoop M. Namboodiri, C. V. Jawahar
Machine Learning for Signature Verification

Signature verification is a common task in forensic document analysis. It is one of determining whether a questioned signature matches known signature samples. From the viewpoint of automating the task it can be viewed as one that involves machine learning from a population of signatures. There are two types of learning to be accomplished. In the first, the training set consists of genuines and forgeries from a general population. In the second there are genuine signatures in a given case. The two learning tasks are called person-independent (or general) learning and person-dependent (or special) learning. General learning is from a population of genuine and forged signatures of several individuals, where the differences between genuines and forgeries across all individuals are learnt. The general learning model allows a questioned signature to be compared to a single genuine signature. In special learning, a person’s signature is learnt from multiple samples of only that person’s signature– where within-person similarities are learnt. When a sufficient number of samples are available, special learning performs better than general learning (5% higher accuracy). With special learning, verification accuracy increases with the number of samples.

Harish Srinivasan, Sargur N. Srihari, Matthew J. Beal
Text Localization and Extraction from Complex Gray Images

We propose two texture-based approaches, one involving Gabor filters and the other employing log-polar wavelets, for separating text from non-text elements in a document image. Both the proposed algorithms compute local energy at some information-rich points, which are marked by Harris’ corner detector. The advantage of this approach is that the algorithm calculates the local energy at selected points and not throughout the image, thus saving a lot of computational time. The algorithm has been tested on a large set of scanned text pages and the results have been seen to be better than the results from the existing algorithms. Among the proposed schemes, the Gabor filter based scheme marginally outperforms the wavelet based scheme.

Farshad Nourbakhsh, Peeta Basa Pati, A. G. Ramakrishnan
OCR of Printed Telugu Text with High Recognition Accuracies

Telugu is one of the oldest and popular languages of India spoken by more than 66 million people especially in South India. Development of Optical Character Recognition systems for Telugu text is an area of current research.

OCR of Indian scripts is much more complicated than the OCR of Roman script because of the use of huge number of combinations of characters and modifiers. Basic Symbols are identified as the unit of recognition in Telugu script. Edge Histograms are used for a feature based recognition scheme for these basic symbols. During recognition, it is observed that, in many cases, the recognizer incorrectly outputs a very similar looking symbol. Special logic and algorithms are developed using simple structural features for improving recognition accuracies considerably without too much additional computational effort. It is shown that recognition accuracies of 98.5 % can be achieved on laser quality prints with such a procedure.

C. Vasantha Lakshmi, Ritu Jain, C. Patvardhan
A MLP Classifier for Both Printed and Handwritten Bangla Numeral Recognition

This paper concerns automatic recognition of both printed and handwritten Bangla numerals. Such mixed numerals may appear in documents like application forms, postal mail, bank checks etc. Some pixel-based and shape-based features are chosen for the purpose of recognition. The pixel-based features are normalized pixel density over 4 X 4 blocks in which the numeral bounding-box is partitioned. The shape-based features are normalized position of holes, end-points, intersections and radius of curvature of strokes found in each block. A multi-layer neural network architecture was chosen as classifier of the mixed class of handwritten and printed numerals. For the mixture of twenty three different fonts of printed numerals of various sizes and 10,500 handwritten numerals, an overall recognition accuracy of 97.2% has been achieved.

A. Majumdar, B. B. Chaudhuri
Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier

Recognition of handwritten characters is a challenging task because of the variability involved in the writing styles of different individuals. In this paper we propose a quadratic classifier based scheme for the recognition of off-line Devnagari handwritten characters. The features used in the classifier are obtained from the directional chain code information of the contour points of the characters. The bounding box of a character is segmented into blocks and the chain code histogram is computed in each of the blocks. Based on the chain code histogram, here we have used 64 dimensional features for recognition. These chain code features are fed to the quadratic classifier for recognition. From the proposed scheme we obtained 98.86% and 80.36% recognition accuracy on Devnagari numerals and characters, respectively. We used five-fold cross-validation technique for result computation.

N. Sharma, U. Pal, F. Kimura, S. Pal
On Recognition of Handwritten Bangla Characters

Recently, a few works on recognition of handwritten Bangla characters have been reported in the literature. However, there is scope for further research in this area. In the present article, results of our recent study on recognition of handwritten Bangla basic characters will be reported. This is a 50 class problem since the alphabet of Bangla has 50 basic characters. In this study, features are obtained by computing local chain code histograms of input character shape. Comparative recognition results are obtained between computation of the above feature based on the contour and one-pixel skeletal representations of the input character image. Also, the classification results are obtained after down sampling the histogram feature by applying Gaussian filter in both these cases. Multilayer perceptrons (MLP) trained by backpropagation (BP) algorithm are used as classifiers in the present study. Near exhaustive studies are done for selection of its hidden layer size. An analysis of the misclassified samples shows an interesting error pattern and this has been used for further improvement in the recognition results. Final recognition accuracies on the training and the test sets are respectively 94.65% and 92.14%.

U. Bhattacharya, M. Shridhar, S. K. Parui
Evaluation Framework for Video OCR

In this work, we present a recently developed evaluation framework for video OCR specifically for English Text but could well be generalized for other languages as well. Earlier works include the development of an evaluation strategy for text detection and tracking in video, this work is a natural extension. We sucessfully port and use the ASR metrics used in the speech community here in the video domain. Further, we also show results on a small pilot corpus which involves 25 clips. Results obtained are promising and we believe that this is a good baseline and will encourage future participation in such evaluations.

Padmanabhan Soundararajan, Matthew Boonstra, Vasant Manohar, Valentina Korzhova, Dmitry Goldgof, Rangachar Kasturi, Shubha Prasad, Harish Raju, Rachel Bowers, John Garofolo
Enabling Search over Large Collections of Telugu Document Images – An Automatic Annotation Based Approach

For the first time, search is enabled over a massive collection of 21 Million word images from digitized document images. This work advances the state-of-the-art on multiple fronts: i)

Indian language

document images are made searchable by textual queries, ii)

interactive

content-level access is provided to document

images

for search and retrieval, iii) a novel

recognition-free

approach, that does not require an OCR, is adapted and validated iv) a suite of image processing and pattern classification algorithms are proposed to efficiently

automate

the process and v) the scalability of the solution is demonstrated over a

large collection

of 500 digitised books consisting of 75,000 pages.

Character recognition based approaches yield poor results for developing search engines for Indian language document images, due to the complexity of the script and the poor quality of the documents. Recognition free approaches, based on word-spotting, are not directly scalable to large collections, due to the computational complexity of matching images in the feature space. For example, if it requires 1 mSec to match two images, the retrieval of documents to a single query, from a large collection like ours, would require close to a day’s time. In this paper we propose a novel automatic annotation based approach to provide textual description of document images. With a one time, offline computational effort, we are able to build a text-based retrieval system, over annotated images. This system has an interactive response time of about 0.01 second. However, we pay the price in the form of massive offline computation, which is performed on a cluster of 35 computers, for about a month. Our procedure is highly automatic, requiring minimal human intervention.

K. Pramod Sankar, C. V. Jawahar

Content Based Image Retrieval

Retrieving Images for Remote Sensing Applications

A unique way in which content based image retrieval (CBIR) for remote sensing differs widely from traditional CBIR is the widespread occurrences of

weak textures

. The task of representing the weak textures becomes even more challenging especially if image properties like scale, illumination or the viewing geometry are not known.

In this work, we have proposed the use of a new feature

‘texton histogram’

to capture the weak-textured nature of remote sensing images. Combined with an automatic classifier, our texton histograms are robust to variations in scale, orientation and illumination conditions as illustrated experimentally. The classification accuracy is further improved using additional image driven features obtained by the application of a feature selection procedure.

Neela Sawant, Sharat Chandran, B. Krishna Mohan
Content-Based Image Retrieval Using Wavelet Packets and Fuzzy Spatial Relations

This paper proposes a region based approach for image retrieval. We develop an algorithm to segment an image into fuzzy regions based on coefficients of multiscale wavelet packet transform. The wavelet based features are clustered using fuzzy C-means algorithm. The final cluster centroids which are the representative points, signify the color and texture properties of the preassigned number of classes. Fuzzy Topological relationships are computed from the final fuzzy partition matrix. The color and texture properties as indicated by centroids and spatial relations between the segmented regions are used together to provide overall characterization of an image. The closeness between two images are estimated from these properties. The performance of the system is demonstrated using different set of examples from general purpose image database to prove that, our algorithm can be used to generate meaningful descriptions about the contents of the images.

Minakshi Banerjee, Malay K. Kundu
Content Based Image Retrieval Using Region Labelling

This paper proposes a content based image retrieval system that uses semantic labels for determining image similarity. Thus, it aims to bridge the semantic gap between human perception and low-level features. Our approach works in two stages. Image segments, obtained from a subset of images in the database by an adaptive

k

-means clustering algorithm, are labelled manually during the training stage. The training information is used to label all the images in the database during the second stage. When a query is given, it is also segmented and each segment is labelled using the information available from the training stage. Similarity score between the query and a database image is based on the labels associated with the two images. Our results on two test databases show that region labelling helps in increasing the retrieval precision when compared to feature-based matching.

J. Naveen Kumar Reddy, Chakravarthy Bhagvati, S. Bapi Raju, Arun K. Pujari, B. L. Deekshatulu

Stereo/Camera Calibration

Using Strong Shape Priors for Stereo

This paper addresses the problem of obtaining an accurate 3D reconstruction from multiple views. Taking inspiration from the recent successes of using strong prior knowledge for image segmentation, we propose a framework for 3D reconstruction which uses such priors to overcome the ambiguity inherent in this problem. Our framework is based on an object-specific Markov Random Field (

MRF

)[10]. It uses a volumetric scene representation and integrates conventional reconstruction measures such as photo-consistency, surface smoothness and visual hull membership with a strong object-specific prior. Simple parametric models of objects will be used as strong priors in our framework. We will show how parameters of these models can be efficiently estimated by performing inference on the

MRF

using dynamic graph cuts [7]. This procedure not only gives an accurate object reconstruction, but also provides us with information regarding the pose or state of the object being reconstructed. We will show the results of our method in reconstructing deformable and articulated objects.

Yunda Sun, Pushmeet Kohli, Matthieu Bray, Philip H. S. Torr
An Efficient Adaptive Window Based Disparity Map Computation Algorithm by Dense Two Frame Stereo Correspondence

This paper presents an efficient algorithm for disparity map computation with an adaptive window by establishing two frame stereo correspondence. Adaptive window based approach has a clear advantage of producing dense depth maps from stereo images. In recent years there has not been much research on adaptive window based approach due its high complexity and large computation time. Adaptive window based method selects an appropriate rectangular window by evaluating the local variation of the intensity and the disparity. Ideally the window need not be rectangular but to reduce algorithmic complexity and hence computation time, rectangular window is taken. There is a need for correction of errors introduced due to the rectangular window which is not dealt by the existing algorithm. To reduce this error, a method has been proposed which not only improves the disparity maps but also has a lesser computational complexity. To demonstrate the effectiveness of the algorithm the experimental results from synthetic and real image pairs (provided by middlebury research group) including ones with ground-truth values for quantitative comparison with the other methods are presented. The proposed algorithm outperforms most of the existing algorithms evaluated in the taxonomy of dense two frame stereo algorithms. The implementation has been done in C++. The algorithm has been tested with the standard stereo pairs which are used as benchmark for comparison of algorithms in the taxonomy implementation.

Narendra Kumar Shukla, Vivek Rathi, Vijaykumar Chakka
Robust Homography-Based Control for Camera Positioning in Piecewise Planar Environments

This paper presents a vision-based control for positioning a camera with respect to an unknown piecewise planar object. We introduce a novel homography-based approach that integrates information from multiple homographies to reliably estimate the relative displacement of the camera. This approach is robust to image measurement errors and provides a stable estimate of the camera motion that is free from degeneracies in the task space. We also develop a new control formulation that meets the contradictory requirements of producing a decoupled camera trajectory and ensuring object visibility by only utilizing the homography relating the two views. Experimental results validate the efficiency and robustness of our approach and demonstrate its applicability.

D. Santosh Kumar, C. V. Jawahar
Direct Estimation of Homogeneous Vectors: An Ill-Solved Problem in Computer Vision

Computer Vision theory is firmly rooted in Projective Geometry, whereby geometric objects can be effectively modeled by homogeneous vectors. We begin from Gauss’s 200 year old theorem of least squares to derive a generic algorithm for the direct estimation of homogeneous vectors. We uncover the common link of previous methods, showing that direct estimation is not an ill-conditioned problem as is the popular belief, but has merely been an ill-solved problem. Results show improvements in goodness-of-fit and numerical stability, and demonstrate that “data normalization” is unnecessary for a well-founded algorithm.

Matthew Harker, Paul O’Leary

Biometrics

Fingerprint Matching Based on Octantal Nearest-Neighbor Structure and Core Points

In this paper, we propose a novel Octantal Nearest-neighbor Structure and core points based fingerprint matching scheme. A novel fingerprint feature named the octantal nearest-neighbor structure (ONNS) is defined. Based on the ONNS, the minutiae pairing algorithm is conducted to find the corresponding minutiae pairs, and a novel algorithm is developed to evaluate the translational and rotational parameters between the input and the template fingerprints. Core point based orientation pairing is performed thereafter. Matching score is calculated. Experimental results on the FVC2004 fingerprint databases show the good performance of the proposed algorithm.

Li-min Yang, Jie Yang, Hong-tao Wu
Dempster-Shafer Theory Based Classifier Fusion for Improved Fingerprint Verification Performance

This paper presents a Dempster Shafer theory based classifier fusion algorithm to improve the performance of fingerprint verification. The proposed fusion algorithm combines decision induced match scores of minutiae, ridge, fingercode and pore based fingerprint verification algorithms and provides an improvement of at least 8.1% in the verification accuracy compared to the individual algorithms. Further, proposed fusion algorithm outperforms by at least 2.52% when compared with existing fusion algorithms. We also found that the use of Dempster’s rule of conditioning reduces the training time by approximately 191 seconds.

Richa Singh, Mayank Vatsa, Afzel Noore, Sanjay K. Singh
Fingerprint Image Enhancement Using Decimation Free Directional Adaptive Mean Filtering

In this paper we proposed a new enhancement technique that is based on the integration of Decimation Free Directional responses of the Decimation Free Directional Filter Banks (DDFB), adaptive mean filtering and the eigen decomposition of the Hessian matrix. By decomposing the input fingerprint image into decimation free directional images, it is easy to remove the noise directionally by means of adaptive mean filtering and further eigen decomposition of the Hessian matrix was used for the segmentation purpose. As the input fingerprint image is not uniformly illuminated so we have used the bandpass filter for the elimination of non-uniform illumination and for the creation of frequency ridge image before giving it to DDFB. The final enhanced result is constructed on a block-by-block basis by comparing energy of all the directional images and picking one that provides maximum energy.

Muhammad Talal Ibrahim, Imtiaz A. Taj, M. Khalid Khan, M. Aurangzeb Khan
Backmatter
Metadaten
Titel
Computer Vision, Graphics and Image Processing
herausgegeben von
Prem K. Kalra
Shmuel Peleg
Copyright-Jahr
2006
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-68302-5
Print ISBN
978-3-540-68301-8
DOI
https://doi.org/10.1007/11949619

Premium Partner