Skip to main content
Top

2017 | Book

Proceedings of International Conference on Computer Vision and Image Processing

CVIP 2016, Volume 2

Editors: Balasubramanian Raman, Sanjeev Kumar, Partha Pratim Roy, Debashis Sen

Publisher: Springer Singapore

Book Series : Advances in Intelligent Systems and Computing

insite
SEARCH

About this book

This edited volume contains technical contributions in the field of computer vision and image processing presented at the First International Conference on Computer Vision and Image Processing (CVIP 2016). The contributions are thematically divided based on their relation to operations at the lower, middle and higher levels of vision systems, and their applications. The technical contributions in the areas of sensors, acquisition, visualization and enhancement are classified as related to low-level operations. They discuss various modern topics – reconfigurable image system architecture, Scheimpflug camera calibration, real-time autofocusing, climate visualization, tone mapping, super-resolution and image resizing.
The technical contributions in the areas of segmentation and retrieval are classified as related to mid-level operations. They discuss some state-of-the-art techniques – non-rigid image registration, iterative image partitioning, egocentric object detection and video shot boundary detection.
The technical contributions in the areas of classification and retrieval are categorized as related to high-level operations. They discuss some state-of-the-art approaches – extreme learning machines, and target, gesture and action recognition. A non-regularized state preserving extreme learning machine is presented for natural scene classification. An algorithm for human action recognition through dynamic frame warping based on depth cues is given. Target recognition in night vision through convolutional neural network is also presented. Use of convolutional neural network in detecting static hand gesture is also discussed.
Finally, the technical contributions in the areas of surveillance, coding and data security, and biometrics and document processing are considered as applications of computer vision and image processing. They discuss some contemporary applications. A few of them are a system for tackling blind curves, a quick reaction target acquisition and tracking system, an algorithm to detect for copy-move forgery based on circle block, a novel visual secret sharing scheme using affine cipher and image interleaving, a finger knuckle print recognition system based on wavelet and Gabor filtering, and a palmprint recognition based on minutiae quadruplets.

Table of Contents

Frontmatter
Fingerprint Image Segmentation Using Textural Features

Automatic Fingerprint Identification System (AFIS) uses fingerprint segmentation as its pre-processing step. A fingerprint segmentation step divides the fingerprint image into foreground and background. An AFIS that uses a feature extraction algorithm for person identification will tend to fail if it extracts spurious features from the noisy background area. So fingerprint image segmentation plays a crucial role in reliably separating ridge like part (foreground) from its background. In this paper, an algorithm for fingerprint image segmentation using GLCM textural feature is presented. Four block level GLCM features: Contrast, Correlation, Energy and Homogeneity are used for fingerprint segmentation. A linear classifier is trained for classifying per block of fingerprint image. The algorithm is tested on standard FVC2002 dataset. Experimental results show that the proposed segmentation method works well in noisy fingerprint images.

Reji C. Joy, M. Azath
Improved Feature Selection for Neighbor Embedding Super-Resolution Using Zernike Moments

This paper presents a new feature selection method for learning based single image super-resolution (SR). The performance of learning based SR strongly depends on the quality of the feature. Better features produce better co-occurrence relationship between low-resolution (LR) and high-resolution (HR) patches, which share the same local geometry in the manifold. In this paper, Zernike moment is used for feature selection. To generate a better feature vector, the luminance norm with three Zernike moments are considered, which preserves the global structure. Additionally, a global neighborhood selection method is used to overcome the problem of blurring effect due to over-fitting and under-fitting during K-nearest neighbor (KNN) search. Experimental analysis shows that the proposed scheme yields better recovery quality during HR reconstruction.

Deepasikha Mishra, Banshidhar Majhi, Pankaj Kumar Sa
Target Recognition in Infrared Imagery Using Convolutional Neural Network

In this paper, deep learning based approach is advocated for automatic recognition of civilian targets in thermal infrared images. High variability of target signature and low contrast ratio of targets to background makes the task of target recognition in infrared images challenging, demanding robust adaptable methods capable of capturing these variations. As opposed to the traditional shallow learning approaches which rely on hand engineered feature extraction, deep learning based approaches use environmental knowledge to learn and extract the features automatically. We present convolutional neural network (CNN) based deep learning framework for automatic recognition of civilian targets in infrared images. The performance evaluation is carried on infrared target clips obtained from ‘CSIR-CSIO moving object thermal infrared imagery dataset’. The task involves four categories classification one category representing the background and three categories of targets -ambassador, auto and pedestrians. The proposed CNN framework provides classification accuracy of 88.15 % with all four categories and 98.24 % with only three target categories.

Aparna Akula, Arshdeep Singh, Ripul Ghosh, Satish Kumar, H. K. Sardana
Selected Context Dependent Prediction for Reversible Watermarking with Optimal Embedding

This paper presents a novel prediction error expansion (PEE) based reversible watermarking using 3 $$\times $$× 3 neighborhood of a pixel. Use of a good predictor is important in this kind of watermarking scheme. In the proposed predictor, the original pixel value is predicted based on a selected set, out of the eight neighborhood of a pixel. Moreover, the value of prediction error expansion (PEE) is optimally divided between current pixel and top-diagonal neighbor such that distortion remains minimum. Experimental results show that the proposed predictor with optimal embedding outperforms several other existing methods.

Ravi Uyyala, Munaga V. N. K. Prasad, Rajarshi Pal
Cancelable Biometrics Using Hadamard Transform and Friendly Random Projections

Biometrics based authentication increases robustness and security of a system, but at the same time biometric data of a user is subjected to various security and privacy issues. Biometric data is permanently associated to a user and cannot be revoked or changed unlike conventional PINs/passwords in case of thefts. Cancelable biometrics is a recent approach which aims to provide high security and privacy to biometric templates as well as imparting them with the ability to be canceled like passwords. The work proposes a novel cancelable biometric template protection algorithm based on Hadamard transform and friendly random projections using Achlioptas matrices followed by a one way modulus hashing. The approach is tested on face and palmprint biometric modalities. A thorough analysis is performed to study performance, non-invertibility, and distinctiveness of the proposed approach which reveals that the generated templates are non-invertible, easy to revoke, and also deliver good performance.

Harkeerat Kaur, Pritee Khanna
A Semi-automated Method for Object Segmentation in Infant’s Egocentric Videos to Study Object Perception

Object segmentation in infant’s egocentric videos is a fundamental step in studying how children perceive objects in early stages of development. From the computer vision perspective, object segmentation in such videos poses quite a few challenges because the child’s view is unfocused, often with large head movements, effecting in sudden changes in the child’s point of view which leads to frequent change in object properties such as size, shape and illumination. In this paper, we develop a semi-automated, domain specific method, to address these concerns and facilitate the object annotation process for cognitive scientists, allowing them to select and monitor the object under segmentation. The method starts with an annotation of the desired object by user and employs graph cut segmentation and optical flow computation to predict the object mask for subsequent video frames automatically. To maintain accurate segmentation of objects, we use domain specific heuristic rules to re-initialize the program with new user input whenever object properties change dramatically. The evaluations demonstrate the high speed and accuracy of the presented method for object segmentation in voluminous egocentric videos. We apply the proposed method to investigate potential patterns in object distribution in child’s view at progressive ages.

Qazaleh Mirsharif, Sidharth Sadani, Shishir Shah, Hanako Yoshida, Joseph Burling
A Novel Visual Secret Sharing Scheme Using Affine Cipher and Image Interleaving

Recently an interesting image sharing method for gray level images using Hill Cipher and RG-method has been introduced by Chen [1]. The method does not involve pixel expansion and image recovery is lossless. However, use of Hill Cipher requires a $$2\times 2$$2×2 integer matrix whose inverse should also be an integer matrix. Further, to extend the method for multi-secret sharing, one requires higher order integer matrices. This needs heavy computation and the choice of matrices is also very restricted, due to integer entry constraints. In the present paper we introduce an RG-based Visual Secret Sharing Scheme (VSS) scheme using image interleaving and affine cipher. Combined effect of image interleaving and affine transformation helps in improving the security of the secret images. Parameters of the affine cipher serve as keys and the random grid and encrypted image form the shares. No one can reveal the secret unless the keys and both the shares are known. Further, as opposed to the method in [1], the present scheme does not require invertible matrix with integer inverse. The scheme is also extended for multi-secret sharing.

Harkeerat Kaur, Aparajita Ojha
Comprehensive Representation and Efficient Extraction of Spatial Information for Human Activity Recognition from Video Data

Of late, human activity recognition (HAR) in video has generated much interest. A fundamental step is to develop a computational representation of interactions. Human body is often abstracted using minimum bounding rectangles (MBRs) and approximated as a set of MBRs corresponding to different body parts. Such approximations assume each MBR as an independent entity. This defeats the idea that these are parts of the whole body. A representation schema for interaction between entities, each of which is considered as set of related rectangles or what is referred to as extended objects holds promise. We propose an efficient representation schema for extended objects together with a simple recursive algorithm to extract spatial information. We evaluate our approach and demonstrate that, for HAR, the spatial information thus extracted leads to better models compared to CORE9 [1] a compact and comprehensive representation schema for video understanding.

Shobhanjana Kalita, Arindam Karmakar, Shyamanta M. Hazarika
Robust Pose Recognition Using Deep Learning

Current pose estimation methods make unrealistic assumptions regarding the body postures. Here, we seek to propose a general scheme which does not make assumptions regarding the relative position of body parts. Practitioners of Indian classical dances such as Bharatnatyam often enact several dramatic postures called Karanas. However, several challenges such as long flowing dresses of dancers, occlusions, change of camera viewpoint, poor lighting etc. affect the performance of state-of-the-art pose estimation algorithms [1, 2] adversely. Body postures enacted by practitioners performing Yoga also violate the assumptions used in current techniques for estimating pose. In this work, we adopt an image recognition approach to tackle this problem. We propose a dataset consisting of 864 images of 12 Karanas captured under controlled laboratory conditions and 1260 real-world images of 14 Karanas obtained from Youtube videos for Bharatnatyam. We also created a new dataset consisting of 400 real-world images of 8 Yoga postures. We use two deep learning methodologies, namely, convolutional neural network (CNN) and stacked auto encoder (SAE) and demonstrate that both these techniques achieve high recognition rates on the proposed datasets.

Aparna Mohanty, Alfaz Ahmed, Trishita Goswami, Arpita Das, Pratik Vaishnavi, Rajiv Ranjan Sahay
A Robust Scheme for Extraction of Text Lines from Handwritten Documents

Considering the vast collection of handwritten documents in various archives, research studies for their automatic processing have major impact in the society. Line segmentation from images of such documents is a crucial step. The problem is more difficult for documents of major Indian scripts such as Bangla because a large number of its characters have either ascender or descender or both and the majority of its writers are accustomed in extremely cursive handwriting. In this article, we describe a novel strip based text line segmentation method for handwritten documents of Bangla. Moreover, the proposed method has been found to perform efficiently on English and Devanagari handwritten documents. We conducted extensive experimentations and its results show the robustness of the proposed approach on multiple scripts.

Barun Biswas, Ujjwal Bhattacharya, Bidyut B. Chaudhuri
Palmprint Recognition Based on Minutiae Quadruplets

Palmprint recognition is a variant of fingerprint matching as both the systems share almost similar matching criteria and the minutiae feature extraction methods. However, there is a performance degradation with palmprint biometrics because of the failure of extracting genuine minutia points from the region of highly distorted ridge information with huge data. In this paper, we propose an efficient palmprint matching algorithm using nearest neighbor minutiae quadruplets. The representation of minutia points in the form of quadruplets improves the matching accuracy at nearest neighbors by discarding scope of the global matching on false minutia points. The proposed algorithm is evaluated on publicly available high resolution palmprint standard databases, namely, palmprint benchmark data sets (FVC ongoing) and Tsinghua palmprint database (THUPALMLAB). The experimental results demonstrate that the proposed palmprint matching algorithm achieves the state-of-the-art performance.

A. Tirupathi Rao, N. Pattabhi Ramaiah, C. Krishna Mohan
Human Action Recognition for Depth Cameras via Dynamic Frame Warping

Human action recognition using depth cameras is an important and challenging task which can involve highly similar motions in different actions. In addition, another factor which makes the problem difficult, is the large amount of intra class variations within the same action class. In this paper, we explore a Dynamic Frame Warping framework as an extension to the Dynamic Time Warping framework from the RGB domain, to address the action recognition with depth cameras. We employ intuitively relevant skeleton joints based features from the depth stream data generated using Microsoft Kinect. We show that the proposed approach is able to generate better accuracy for cross-subject evaluation compared to state-of-the-art works even on complex actions as well as simpler actions but which are similar to each other.

Kartik Gupta, Arnav Bhavsar
Reference Based Image Encoding

This paper describes a scheme to encode an image using another reference image in such a way that an end user can retrieve the encoded image only with the reference image. Such encoding schemes could have a potential application in secure image sharing. The proposed scheme is simple and similar to fractal encoding; and a key feature is it simultaneously performs compression and encryption. The encoding process is further speeded up using PatchMatch. The performance in terms of encoding time and PSNR is examined for the different encoding methods.

S. D. Yamini Devi, Raja Santhanakumar, K. R. Ramakrishnan
Improving Face Detection in Blurred Videos for Surveillance Applications

Performance of face detection system drops drastically when blur effect is present in the surveillance video. Motivated by this problem, the proposed method deblurs facial images to detect and improve faces degraded by blur in the scenario like banks, ATMs where sparse crowd is present. Prevalent Viola Jones technique detect faces, but fails in the presence of blur. Hence, to overcome this, first the target frame is decomposed using Discrete Wavelet Transform(DWT) into LL, LH, HL and HH bands. The LL band is processed using Lucy-Richardson’s algorithm which removes blur using Point Spread Function (PSF). Then the super enhanced de-blurred frame without ripples is given into Viola-Jones algorithm. It has been observed and validated experimentally that, the detection rate in the Viola Jones algorithm has been improved by 47 %. Experimental results illustrate the effectiveness of the proposed algorithm.

K. Menaka, B. Yogameena, C. Nagananthini
Support Vector Machine Based Extraction of Crime Information in Human Brain Using ERP Image

Event related potential (ERP) is a non-invasive way to measure person’s cognitive ability or any neuro-cognitive disorder. Familiarity with any stimulus can be indicated by the brain’s instantaneous response to that particular stimulus. In this research work ERP based eye witness identification system is proposed. Electroencephalogram (EEG) signal was passed through butterworth band-pass filter and EEG signal was segmented based on marker. EEG segments were averaged and ERP was extracted from EEG signal. Grey incidence degree based wavelet denoising was performed. ERP was converted to image form and structural similarity index feature was extracted. Radial basis function kernel based support vector machine classifier was used to classify a person with or without crime information. The observed accuracy of proposed approach was 87.50 %.

Maheshkumar H. Kolekar, Deba Prasad Dash, Priti N. Patil
View Invariant Motorcycle Detection for Helmet Wear Analysis in Intelligent Traffic Surveillance

An important issue for intelligent traffic surveillance is automatic vehicle classification in traffic scene videos, which has great prospective for all kinds of security applications. Due to the number of vehicles in operation surpassed, occurrence of accidents is increasing. Hence, the vehicle classification is an important building block of surveillance systems that significantly impacts reliability of its applications. It helps in classifying the motorcycles that uses public transportation. This has been identified as an important task to conduct surveys on estimation of people wearing helmets, accident with and without helmet and vehicle tracking. The inability of police power in many countries to enforce helmet laws results in reduced usage of motorcycle helmets which becomes the reason for head injuries in case of accidents. This paper comes up with a system with view invariant using Histogram of Oriented Gradients which automatically detects motorcycle riders and determines whether they are wearing helmets or not.

M. Ashvini, G. Revathi, B. Yogameena, S. Saravanaperumaal
Morphological Geodesic Active Contour Based Automatic Aorta Segmentation in Thoracic CT Images

Automatic aorta segmentation and quantification in thoracic computed tomography (CT) images is important for detection and prevention of aortic diseases. This paper proposes an automatic aorta segmentation algorithm in both contrast and non-contrast CT images of thorax. The proposed algorithm first detects the slice containing the carina region. Circular Hough Transform (CHT) is applied on the detected slice to localize ascending and descending aorta (circles with lowest variances) followed by a morphological geodesic active contour to segment the aorta from CT stack. The dice similarity coefficients (DSC) between the ground truth and the segmented output were found to be $$0.8845\pm 0.0584$$0.8845±0.0584 on LIDC-IDRI dataset.

Avijit Dasgupta, Sudipta Mukhopadhyay, Shrikant A. Mehre, Parthasarathi Bhattacharyya
Surveillance Video Synopsis While Preserving Object Motion Structure and Interaction

With the rapid growth of surveillance cameras and sensors, a need of smart video analysis and monitoring system is gradually increasing for browsing and storing a large amount of data. Traditional video analysis methods generate a summary of day long videos but maintaining the motion structure and interaction between object is of great concern to researchers. This paper presents an approach to produce video synopsis while preserving motion structure and object interactions. While condensing video, object appearance over spatial domain is maintained by considering its weight that preserve important activity portion and condense data related to regular events. The approach is tested in the context of condensation ratio while maintaining the interaction between objects. Experimental results over three video sequences show high condensation rate up to 11 %.

Tapas Badal, Neeta Nain, Mushtaq Ahmed
Face Expression Recognition Using Histograms of Oriented Gradients with Reduced Features

Facial expression recognition has been an emerging research area in last two decades. This paper proposes a new hybrid system for automatic facial expression recognition. The proposed method utilizes histograms of oriented gradients (HOG) descriptor to extract features from expressive facial images. Feature reduction techniques namely principal component analysis (PCA) and linear discriminant analysis (LDA) are applied to obtain the most important discriminant features. Finally, the discriminant features are fed to the back-propagation neural network (BPNN) classifier to determine the underlying emotions from expressive facial images. The Extended Cohn-Kanade dataset (CK$$+$$+) is used to validate the proposed method. Experimental results indicate that the proposed system provides the better result as compared to state-of-the-art methods in terms of accuracy with the substantially lesser number of features.

Nikunja Bihari Kar, Korra Sathya Babu, Sanjay Kumar Jena
Dicentric Chromosome Image Classification Using Fourier Domain Based Shape Descriptors and Support Vector Machine

Dicentric chromosomes can form in cells because of exposure to radioactivity. They differ from the regular chromosomes in that they have an extra centromere where the sister chromatids fuse. In this paper we work on chromosome classification into normal and dicentric classes. Segmentation followed by shape boundary extraction and shape based Fourier feature computation was performed. Fourier shape descriptor feature extraction was carried out to arrive at robust shape descriptors that have desirable properties of compactness and invariance to certain shape transformations. Support Vector Machine algorithm was used for the subsequent two-class image classification.

Sachin Prakash, Nabo Kumar Chaudhury
An Automated Ear Localization Technique Based on Modified Hausdorff Distance

Localization of ear in the side face images is a fundamental step in the development of ear recognition based biometric systems. In this paper, a well-known distance measure termed as modified Hausdorff distance (MHD) is proposed for automatic ear localization. We introduced the MHD to decrease the effect of outliers and allowing it more suitable for detection of ear in the side face images. The MHD uses coordinate pairs of edge pixels derived from ear template and skin regions of the side face image to locate the ear portion. To detect ears of various shapes, ear template is created by considering different structure of ears and resized it automatically for the probe image to find exact location of ear. The CVL and UND-E database have side face images with different poses, inconsistent background and poor illumination utilized to analyse the effectiveness of the proposed algorithm. Experimental results reveal the strength of the proposed technique is invariant to various poses, shape, occlusion, and noise.

Partha Pratim Sarangi, Madhumita Panda, B. S. P Mishra, Sachidananda Dehuri
Sclera Vessel Pattern Synthesis Based on a Non-parametric Texture Synthesis Technique

This work proposes a sclera vessel texture pattern synthesis technique. Sclera texture was synthesized by a non-parametric based texture regeneration technique. A small number of classes from the UBIRIS version: 1 dataset was employed as primitive images. An appreciable result was achieved which solicits the successful synthesis of sclera texture patterns. It is difficult to get a huge collection real sclera data and hence such synthetic data will be useful to the researchers.

Abhijit Das, Prabir Mondal, Umapada Pal, Michael Blumenstein, Miguel A. Ferrer
Virtual 3-D Walkthrough for Intelligent Emergency Response

After various cases of terrorist-attacks and other emergency situations across the globe; the need towards the development of virtual 3D walkthrough for the important premises are progressively on the rise (Lee, Zlatanova, A 3D data model and topological analyses for emergency response in urban areas, [1]). In contrast to the conventional 2D layout of the premises; the 3D modeling adds another dimension to make a quick and intelligent emergency response to such situations. Modern 3D modeling and game development tools have given the capability to rapid development of such applications with near real-time rendering capacity. In this paper, we examine the potential of using virtual 3D walkthrough for the important installations that aim at facilitating the security personnel and decision-makers to effectively carryout their training, strategic and operational task in case of emergency or otherwise.

Nikhil Saxena, Vikas Diwan
Spontaneous Versus Posed Smiles—Can We Tell the Difference?

Smile is an irrefutable expression that shows the physical state of the mind in both true and deceptive ways. Generally, it shows happy state of the mind, however, ‘smiles’ can be deceptive, for example people can give a smile when they feel happy and sometimes they might also give a smile (in a different way) when they feel pity for others. This work aims to distinguish spontaneous (felt) smile expressions from posed (deliberate) smiles by extracting and analyzing both global (macro) motion of the face and subtle (micro) changes in the facial expression features through both tracking a series of facial fiducial markers as well as using dense optical flow. Specifically the eyes and lips features are captured and used for analysis. It aims to automatically classify all smiles into either ‘spontaneous’ or ‘posed’ categories, by using support vector machines (SVM). Experimental results on large UvA-NEMO smile database show promising results as compared to other relevant methods.

Bappaditya Mandal, Nizar Ouarti
Handling Illumination Variation: A Challenge for Face Recognition

Though impressive recognition rates have been achieved with various techniques under the controlled face image capturing environment, making recognition more reliable under uncontrolled environment is still a great challenge. Security and surveillance images, captured in open uncontrolled environments, are likely subjected to extreme lighting conditions like underexposed, and overexposed areas that reduce the amount of useful details available in the collected face images. This paper explores two different preprocessing methods and compares the effect of enhancement in recognition results using Orthogonal Neighbourhood preserving Projection (ONPP) and Modified ONPP (MONPP), which are subspace based methods. Note that subspace based face recognition techniques are highly sought after in recent times. Experimental results on preprocessing techniques followed by face recognition using ONPP and MONPP are presented.

Purvi A. Koringa, Suman K. Mitra, Vijayan K. Asari
Bin Picking Using Manifold Learning

Bin picking using vision based sensors requires accurate estimation of location and pose of the object for positioning the end effector of the robotic arm. The computational burden and complexity depends upon the parametric model adopted for the task. Learning based techniques to implement the scheme using low dimensional manifolds offer computationally more efficient alternatives. In this paper we have employed Locally Linear Embedding (LLE) and Deep Learning (with auto encoders) for manifold learning in the visual domain as well as for the parameters of robotic manipulator for visual servoing. Images of clusters of cylindrical pellets were used as the training data set in the visual domain. Corresponding parameters of the six degrees of freedom robot for picking designated cylindrical pellet formed the training dataset in the robotic configuration space. The correspondence between the weight coefficients of LLE manifold in the visual domain and robotic domain is established through regression. Autoencoders in conjunction with feed forward neural networks were used for learning of correspondence between the high dimensional visual space and low dimensional configuration space. We have compared the results of the two implementations for the same dataset and found that manifold learning using auto encoders resulted in better performance. The eye-in-hand configuration used with KUKA KR5 robotic arm and Basler camera offers a potentially effective and efficient solution to the bin picking problem through learning based visual servoing.

Ashutosh Kumar, Santanu Chaudhury, J. B. Srivastava
Motion Estimation from Image Sequences: A Fractional Order Total Variation Model

In this paper, a fractional order total variation model is introduced in the estimation of motion field. In particular, the proposed model generalizes the integer order total variation models. The motion estimation is carried out in terms optical flow. The presented model is made using a quadratic and total variation terms. This mathematical formulation makes the model robust against outliers and preserves discontinuities. However, it is difficult to solve the presented model due to the non-differentiability nature of total variation term. For this purpose, the Grünwald-Letnikov derivative is used as a discretization scheme to discretize the fractional order derivative. The resulting formulation is solved by using a more efficient algorithm. Experimental results on various datasets verify the validity of the proposed model.

Pushpendra Kumar, Balasubramanian Raman
Script Identification in Natural Scene Images: A Dataset and Texture-Feature Based Performance Evaluation

Recognizing text with occlusion and perspective distortion in natural scenes is a challenging problem. In this work, we present a dataset of multi-lingual scripts and performance evaluation of script identification in this dataset using texture features. A ‘Station Signboard’ database that contains railway sign-boards written in 5 different Indic scripts is presented in this work. The images contain challenges like occlusion, perspective distortion, illumination effect, etc. We have collected a total of 500 images and corresponding ground-truths are made in semi-automatic way. Next, a script identification technique is proposed for multi-lingual scene text recognition. Considering the inherent problems in scene images, local texture features are used for feature extraction and SVM classifier, is employed for script identification. From the preliminary experiment, the performance of script identification is found to be 84 % using LBP feature with SVM classifier.

Manisha Verma, Nitakshi Sood, Partha Pratim Roy, Balasubramanian Raman
Posture Recognition in HINE Exercises

Pattern recognition, image and video processing based automatic or semi-automatic methodologies are widely used in healthcare services. Especially, image and video guided systems have successfully replaced various medical processes including physical examinations of the patients, analyzing physiological and bio-mechanical parameters, etc. Such systems are becoming popular because of their robustness and acceptability amongst the healthcare community. In this paper, we present an efficient way of infant’s posture recognition in a given video sequence of Hammersmith Infant Neurological Examinations (HINE). Our proposed methodology can be considered as a step forward in the process of automating HINE tests through computer assisted tools. We have tested our methodology with a large set of HINE videos recorded at the neuro-development clinic of hospital. It has been found that the proposed methodology can successfully classify the postures of infants with an accuracy of 78.26 %.

Abdul Fatir Ansari, Partha Pratim Roy, Debi Prosad Dogra
Multi-oriented Text Detection from Video Using Sub-pixel Mapping

We have proposes a robust multi oriented text detection approach in video images in this paper. Text detection and text segmentation in video data and images is a difficult task due to low contrast and noise from background. Our methodology focuses not only on spatial information of pixel but also optical flow of image data for detecting moving and static text. This paper provides an iterative algorithm with super resolution to reduce information into its fundamental unit, like alphabets and digits in our case. Proposed method performs image enhancement and sub pixel mapping Jiang Hao and Gao (Applied Mechanics and Materials. 262, 2013) [1] to localize text region and Stroke width Transformation Algorithm (SWT) Epshtein et al. (CVPR, 2010) [2] is used for further noise removal. Since SWT may include some non-text region, so SVM using HOM Khare et al. (A new Histogram Oriented Moments descriptor for multi-oriented moving text detection in video, 42(21):7627–7640, 2015) [3] as a descriptor is also used in Final text Selection, Components that satisfy is called a text region. Due to low resolution of images there is a text cluster to remove this text cluster, it is super resolved using sub pixel mapping and hence again passed through process for further segmentation giving an overall accuracy to around 80 %. Our proposed approach is tested in ICDAR2013 dataset in term of recall, precision and F-measure.

Anshul Mittal, Partha Pratim Roy, Balasubramanian Raman
Efficient Framework for Action Recognition Using Reduced Fisher Vector Encoding

This paper presents a novel and efficient approach to improve performance of recognizing human actions from video by using an unorthodox combination of stage-level approaches. Feature descriptors obtained from dense trajectory i.e. HOG, HOF and MBH are known to be successful in representing videos. In this work, Fisher Vector Encoding with reduced dimensions are separately obtained for each of these descriptors and all of them are concatenated to form one super vector representing each video. To limit the dimension of this super vector we only include first order statistics, computed by the Gaussian Mixture Model, in the individual Fisher Vectors. Finally, we use elements of this super vector, as inputs to be fed to the Deep Belief Network (DBN) classifier. The performance of this setup is evaluated on KTH and Weizmann datasets. Experimental results show a significant improvement on these datasets. An accuracy of 98.92 and 100 % has been obtained on KTH and Weizmann dataset respectively.

Prithviraj Dhar, Jose M. Alvarez, Partha Pratim Roy
Detection Algorithm for Copy-Move Forgery Based on Circle Block

Today lots of software tools are available which are used to manipulate the images easily to change their originality. The technique which is usually used these days for tampering an image without leaving any microscopic evidence is copy-move forgery. There are many existing techniques to detect image tampering but their computational complexity is high. Here we present a robust and effective technique to find the tampered region. Initially the given image is divided into fixed size blocks and DCT is applied on each block for feature extraction. Circle is used to represent each transformed block with two feature vectors. In this way we reduce the dimension of the blocks to extract the feature vectors. Then lexicographical sort is applied to sort the extracted feature vectors. Matching algorithm is applied to detect the tampered regions. Results show that our algorithm is robust and has less computational complexity than the existing one.

Choudhary Shyam Prakash, Sushila Maheshkar
FPGA Implementation of GMM Algorithm for Background Subtractions in Video Sequences

Moving object detection is an important feature for video surveillance based applications. Many background subtraction methods are available for object detection. Gaussian mixture modeling (GMM) is one of the best methods used for background subtraction which is the first and foremost step for video processing. The main objective is to implement the Gaussian mixture modeling (GMM) algorithm in Field-Programmable Gate Array (FPGA). In this proposed GMM algorithm, three Gaussian parameters are taken and the three parameters with learning rate over the neighborhood parameters were updated. From the updated parameters, the background pixels are classified. The background subtraction has been performed for consecutive frames by the updated parameters. The hardware architecture for Gaussian mixture modeling has been designed. The algorithm has been performed in offline from the collected data set. It can able to process up to frame size of 240 × 240.

S. Arivazhagan, K. Kiruthika
Site Suitability Evaluation for Urban Development Using Remote Sensing, GIS and Analytic Hierarchy Process (AHP)

An accurate and authentic data is prerequisite for proper planning and management. If one looks for proper identification and mapping of urban development site for any city, then accurate and authentic data on geomorphology, transport network, land use/land cover and ground water become paramount. In order to achieve such data in time satellite remote sensing and geographic information system techniques has proved its potentiality. The importance of this technique coupled with Analytic Hierarchy Process (AHP) in site suitability analysis for urban development site selection is established and accepted worldwide too and to know the present actual status of environmental impact in surrounding of urban development site. Remote Sensing, GIS, GPS and AHP method is a vital tool for identification, comparison and multi criterion decision making analysis of urban development site’s proper planning and management. Now keeping in view the availability of high resolution data of IKONOS satellite, cartosat and IRS 1C/1D LISS—III data has been used for preparation of various thematic layers in Lucknow city and its environs. The study describes the detailed information on the site suitability analysis for urban development site selection. The final maps of the study area prepared using GIS software and AHP method, can widely applied to compile and analyze the data on site selection for proper planning and management. It is necessary to generate digital data on site suitability for urban development sites for local bodies/development authorities in GIS & AHP environment, in which data are reliable and comparable.

Anugya, Virendra Kumar, Kamal Jain
A Hierarchical Shot Boundary Detection Algorithm Using Global and Local Features

A video is considered as high dimensional data which is tedious to process. Shot detection and key frame selection are activities to reduce redundant data from a video and make it presentable in few images. Researchers have worked in this area diligently. Basic shot detection schemes provide shot boundaries in a video sequence and key frames are selected based on each shot. Usually in video clips, shots repeat after one another, in that case the basic shot detection scheme gives redundant key frames from same video. In this work, we have proposed a hierarchical shot detection and key frames selection scheme which reduce a considerable amount of redundant key frames. For temporal analysis and abrupt transformation detection, color histogram has been used. After shot detection, spatial analysis has been done using local features. Local binary patterns have been utilized for local feature extraction. The proposed scheme is applied to three video sequences of news video, movie clip and tv-advertisement video.

Manisha Verma, Balasubramanian Raman
Analysis of Comparators for Binary Watermarks

Comparator is one of key components of watermarking system that determines its performance. However, analysis and development of comparator is an undermined objective in the field of watermarking. In this paper, the core contribution is that five comparators for binary watermarks are analysed by theory and experiments. In the analysis, it is explored that negative pair of binary watermarks provide same information. Receiver operating characteristic curve is used for experimental analysis. It is observed that comparators based on similarity measure functions of symmetric normalized Hamming similarity (SNHS) and absolute mean subtracted normalized correlation coefficient (AMSNCC) have outstanding performance. Further, a range of threshold of SNHS based comparator that maximizes decision accuracy of a watermarking system is found by theoretical analysis. This range is verified by experiments.

Himanshu Agarwal, Balasubramanian Raman, Pradeep K. Atrey, Mohan Kankanhalli
On Sphering the High Resolution Satellite Image Using Fixed Point Based ICA Approach

On sphering the satellite data, classified images are achieved by many authors that had tried to reduce the mixing effect in image classes with the help of different Independent component analysis (ICA) based approaches. In these cases multispectral images are limited with small spectral variation in heterogeneous classes. For better classification, high spectral variance among different classes and low spectral variance within a particular class should exhibit. In the consideration of this issue, a Fixed point (FP) based Independent Component Analysis (ICA) method is utilized to get better classification accuracy in the existing mixed classes that consist similar spectral behavior. This FP-ICA method identifies the objects from mixed classes having similar spectral characteristics, on sphering high resolution satellite images (HRSI). It also helps to reduce the effect of similar spectral behavior between different image classes. The estimation of independent component related to non-gaussian distribution data (image) with optimizing the performance of this approach with the help of nonlinearity, which utilize the low variance between similar spectral classes. It is quite robust, effortless in computation and high convergence rate, even though the spectral distributions of satellite images are rigid to classify. Hence, this FP-ICA approach plays a key role in image classification such as buildings, grassland area, road, and vegetation.

Pankaj Pratap Singh, R. D. Garg
A Novel Fuzzy Based Satellite Image Enhancement

A new approach is presented for the enhancement of color satellite images using the fuzzy logic technique. The hue, saturation, and gray level intensity (HSV) color space is applied for the purpose of color satellite image enhancement. The hue and saturation component of color satellite image are kept intact to preserve the original color information of an image. A modified sigmoid and modified Gaussian membership functions are used for the enhancement of the gray level intensity of underexposed and overexposed satellite images. Performance measures like luminance, entropy, average contrast and contrast enhancement function are evaluated for the proposed approach and compare with histogram equalization, discrete cosine transform (DCT) method. On comparison, this approach is found to be better than the recent used approaches.

Nitin Sharma, Om Prakash Verma
Differentiating Photographic and PRCG Images Using Tampering Localization Features

A large number of sophisticated, yet easily accessible computer graphics softwares (STUDIO MAX, 3D MAYA, etc.) have been developed in the recent past. The images generated with these softwares appear to be realistic and cannot be distinguished from natural images visually. As a result, distinguishing between photographic images (PIM) and Photo-realistic computer generated (PRCG) images of real world objects has become an active area of research. In this paper, we propose that “a computer generated image” would have the features corresponding to a “completely tampered image”, whereas a camera generated picture would not. So, the differentiation is done on the basis of tampering localization features viz., block measure factors based on JPEG compression and re-sampling. It has been observed experimentally, that these measure factors vary for a PIM from a PRCG image. The experimental results show that the proposed simple and robust classifier is able to differentiate between PIM and PRCG images with an accuracy of 96 %.

Roshan Sai Ayyalasomayajula, Vinod Pankajakshan
A Novel Chaos Based Robust Watermarking Framework

In this paper, a novel logo watermarking framework is proposed using non-linear chaotic map. The essence of proposed technique is to use chaotic map to generate keys to be used in the embedding process. Therefore, a method for generating keys is first proposed followed by the embedding process. A robust extraction process is then proposed to verify the presence of watermark from the possibly attacked watermarked image. Experimental results and attack analysis reveal the efficiency and robustness of the proposed framework.

Satendra Pal Singh, Gaurav Bhatnagar
Deep Gesture: Static Hand Gesture Recognition Using CNN

Hand gestures are an integral part of communication. In several scenarios hand gestures play a vital role by virtue of them being the only means of communication. For example hand signals by a traffic policeman, news reader on TV gesturing news for the deaf, signalling in airport for navigating aircrafts, playing games etc. So, there is a need for robust hand pose recognition (HPR) which can find utility in such applications. The existing state-of-the-art methods are challenged due to clutter in the background. We propose a deep learning framework to recognise hand gestures robustly. Specifically we propose a convolutional neural network (CNN) to identify hand postures despite variation in hand sizes, spatial location in the image and clutter in the background. The advantage of our method is that there is no need for feature extraction. Without explicitly segmenting foreground the proposed CNN learns to recognise the hand pose even in presence of complex, varying background or illumination. We provide experimental results demonstrating superior performance of the proposed algorithm on state-of-the-art datasets.

Aparna Mohanty, Sai Saketh Rambhatla, Rajiv Ranjan Sahay
A Redefined Codebook Model for Dynamic Backgrounds

Dynamic background updation is one of the major challenging situation in moving object detection, where we do not have a fix reference background model. The background model maintained needs to be updated as and when moving objects add and leave the background. This paper proposes a redefined codebook model which aims at eliminating the ghost regions left behind when a non-permanent background object starts to move. The background codewords which were routinely deleted from the set of codewords in codebook model are retained in this method while deleting the foreground codewords leading to ghost elimination. This method also reduces memory requirements significantly without effecting object detection, as only the foreground codewords are deleted and not background. The method has been tested for robust detection on various videos with multiple and different kinds of moving backgrounds. Compared to existing multimode modeling techniques our algorithm eliminates the ghost regions left behind when non permanent background objects starts to move. For performance evaluation, we have used similarity measure on video sequences having dynamic backgrounds and compared with three widely used background subtraction algorithms.

Vishakha Sharma, Neeta Nain, Tapas Badal
Reassigned Time Frequency Distribution Based Face Recognition

In this work, we have designed a local descriptor based on the reassigned Stankovic time frequency distribution. The Stankovic distribution is one of the improved extensions of the well known Wigner Wille distribution. The reassignment of the Stankovic distribution allows us to obtain a more resolute distribution and hence is used to describe the region of interest in a better manner. The suitability of Stankovic distribution to describe the regions of interest is studied by considering face recognition problem. For a given face image, we have obtained key points using box filter response scale space and scale dependent regions around these key points are represented using the reassigned Stankovic time frequency distribution. Our experiments on the ORL, UMIST and YALE-B face image datasets have shown the suitability of the proposed descriptor for face recognition problem.

B. H. Shekar, D. S. Rajesh
Image Registration of Medical Images Using Ripplet Transform

For image fusion of geometrically distorted images, registration is the prerequisite step. Intensity-based image registration methods are preferred due to higher accuracy than that of feature-based methods. But, perfect registered image using intensity based method leads towards improvements in computational complexity. Conventional transform like wavelet transform based image registration reduces the computational complexity, but suffers from discontinuities such as curved edges in the medical images. In this paper, a new registration algorithm is proposed that uses the approximate-level coefficients of the ripplet transform, which allows arbitrary support and degree as compared to curvelet transform. The entropy-based objective function is developed for registration using ripplet coefficients of the images. The computations are carried out with 6 sets of CT and MRI brain images to validate the performance of the proposed registration technique. The quantitative approach such as standard deviation, mutual information, peak signal to noise ratio and root mean square error are used as performance measure.

Smita Pradhan, Dipti Patra, Ajay Singh
3D Local Transform Patterns: A New Feature Descriptor for Image Retrieval

In this paper, authors proposed a novel approach for image retrieval in transform domain using 3D local transform pattern (3D-LTraP). The various existing spatial domain techniques such as local binary pattern (LBP), Local ternary pattern (LTP), Local derivative pattern (LDP) and Local tetra pattern (LTrP) are encoding the spatial relationship between the neighbors with their center pixel in image plane. The first attempt has been made in 3D using spherical symmetric three dimensional local ternary pattern (SS-3D-LTP). But, the performance of SS-3D-LTP is depend on the proper selection of threshold value for ternary pattern calculation. Also, multiscale and color information are missing in SS-3D-LTP method. In the proposed method i.e. 3D-LTraP, the first problem is overcome by using binary approach Similarly, the other lacunas are avoided by using wavelet transform which provide directional as well as multiscale information and color features are embedded in feature generation process itself. Two different databases which included natural and biomedical database (Coral 10 K and OASIS databases) are used for experimental purpose. The experimental results demonstrate a noteworthy improvement in precision and recall as compared to SS-3D-LTP and recent methods.

Anil Balaji Gonde, Subrahmanyam Murala, Santosh Kumar Vipparthi, Rudraprakash Maheshwari, R. Balasubramanian
Quaternion Circularly Semi-orthogonal Moments for Invariant Image Recognition

We propose a new Quaternion Circularly Semi-Orthogonal Moments for color images that are invariant to rotation, translation and scale changes. In order to derive these moments we employ the recently proposed Circularly Semi-Orthogonal Moment’s expression. Invariant properties are verified with simulation results and found that they are matching with theoretical proof.

P. Ananth Raj
Study of Zone-Based Feature for Online Handwritten Signature Recognition and Verification in Devanagari Script

This paper presents one zone-based feature extraction approach for online handwritten signature recognition and verification of one of the major Indic scripts–Devanagari. To the best of our knowledge no work is available for signature recognition and verification in Indic scripts. Here, the entire online image is divided into a number of local zones. In this approach, named Zone wise Slopes of Dominant Points (ZSDP), the dominant points are detected first from each stroke and next the slope angles between consecutive dominant points are calculated and features are extracted in these local zones. Next, these features are supplied to two different classifiers; Hidden Markov Model (HMM) and Support Vector Machine (SVM) for recognition and verification of signatures. An exhaustive experiment in a large dataset is performed using this zone-based feature on original and forged signatures in Devanagari script and encouraging results are found.

Rajib Ghosh, Partha Pratim Roy
Leaf Identification Using Shape and Texture Features

Identifying plant species based on a leaf image is a challenging task. This paper presents a leaf recognition system using orthogonal moments as shape descriptors and Histogram of oriented gradients (HOG) and Gabor features as texture descriptors. Th e shape descriptors captures the global shape of leaf image. The internal vein structure is captured by the texture features. The binarized leaf image is pre-processed to make it scale, rotation and translation-invariant. The Krawtchouk moments are computed from the scale and rotation normalized shape image. The HOG feature is computed on rotation normalized gray image. The combined shape and texture features are classified with a support vector machine classifier (SVM).

Thallapally Pradeep Kumar, M. Veera Prasad Reddy, Prabin Kumar Bora
Depth Image Super-Resolution: A Review and Wavelet Perspective

We propose an algorithm which utilizes the Discrete Wavelet Transform (DWT) to super-resolve the low-resolution (LR) depth image to a high-resolution (HR) depth image. Commercially available depth cameras capture depth images at a very low-resolution as compared to that of the optical cameras. Having an high-resolution depth camera is expensive because of the manufacturing cost of the depth sensor element. In many applications like robot navigation, human-machine interaction (HMI), surveillance, 3D viewing, etc. where depth images are used, the LR images from the depth cameras will restrict these applications, thus there is a need of a method to produce HR depth images from the available LR depth images. This paper addresses this issue using DWT method. This paper also contributes to the compilation of the existing methods for depth image super-resolution with their advantages and disadvantages, along with a proposed method to super-resolve depth image using DWT. Haar basis for DWT has been used as it has an intrinsic relationship with super-resolution (SR) for retaining the edges. The proposed method has been tested on Middlebury and Tsukuba dataset and compared with the conventional interpolation methods using peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) performance metrics.

Chandra Shaker Balure, M. Ramesh Kini
On-line Gesture Based User Authentication System Robust to Shoulder Surfing

People often prefer to preserve a lot of confidential information in different electronic devices such as laptops, desktops, tablets, etc. Access to these personalized devices are managed through well known and robust user authentication techniques. Therefore, designing authentication methodologies using various input modalities received much attention of the researchers of this domain. Since we access such personalized devices everywhere including crowded places such as offices, public places, meeting halls, etc., the risk of an imposter gaining one’s identification information becomes highly feasible. One of the oldest but effective form of identity theft by observation is known as shoulder surfing. Patterns drawn by the authentic user on tablet surfaces or keys typed through keyboard can easily be recognized through shoulder surfing. Contact-less user interface devices such as Leap Motion controller can be used to mitigate some of the limitations of existing contact-based input methodologies. In this paper, we propose a robust user authentication technique that has been designed to counter the chances of getting one’s identity stolen by shoulder surfing. Our results reveal that, the proposed methodology can be quite effective to design robust user authentication systems, especially for personalized electronic devices.

Suman Bhoi, Debi Prosad Dogra, Partha Pratim Roy
Backmatter
Metadata
Title
Proceedings of International Conference on Computer Vision and Image Processing
Editors
Balasubramanian Raman
Sanjeev Kumar
Partha Pratim Roy
Debashis Sen
Copyright Year
2017
Publisher
Springer Singapore
Electronic ISBN
978-981-10-2107-7
Print ISBN
978-981-10-2106-0
DOI
https://doi.org/10.1007/978-981-10-2107-7