Skip to main content
Top

2017 | Book

Computer Vision, Graphics, and Image Processing

ICVGIP 2016 Satellite Workshops, WCVA, DAR, and MedImage, Guwahati, India, December 19, 2016 Revised Selected Papers

Editors: Snehasis Mukherjee, Suvadip Mukherjee, Dipti Prasad Mukherjee, Prof. Jayanthi Sivaswamy, Suyash Awate, Srirangaraj Setlur, Anoop M. Namboodiri, Prof. Dr. Santanu Chaudhury

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the refereed conference proceedings of the ICVGIP 2016 Satellite Workshops, WCVA, DAR, and MedImage, held in Guwahati, India, in December 2016. The papers presented are extended versions of the papers of three of the four workshops: Computer Vision Applications, Document Analysis and Recognition and Medical Image Processing. The Computer Vision Application track received 52 submissions and after a rigorous review process, 18 papers were presented. The focus is mainly on industrial applications of computer vision and related technologies. The Document Analysis and Recognition track received 10 submissions from which 7 papers were selected. The MedImage workshops focuses on problems in medical image computing and received 14 papers from which 9 were accepted for presentation in this book.

Table of Contents

Frontmatter

WCVA

Frontmatter
A Novel Intelligent Multiple Watermarking Schemes for the Protection of the Information Content of a Document Image

Most of the past document image watermarking schemes focus on providing same level of integrity and copyright protection for information present in the source document image. However, in a document image the information contents possess various levels of sensitivity. Each level of sensitivity needs different type of protection and this demands multiple watermarking techniques. In this paper, a novel intelligent multiple watermarking techniques are proposed. The sensitivity of the information content of a block is based on the homogeneity and relative energy contribution parameters. Appropriate watermarking scheme is applied based on sensitivity classification of the block. Experiments are conducted exhaustively on documents. Experimental results reveal the accurate identification of the sensitivity of information content in the block. The results reveal that multiple watermarking schemes has reduced the amount of data to be embedded and consequently improved perceptual quality of the watermarked image.

K. R. Chetan, S. Nirmala
Experimental Evaluation of 3D Kinect Face Database

3D face recognition has gain a paramount importance over 2D due to its potential to address the limitations of 2D face recognition against the variation in facial poses, angles, occlusions etc. Research in 3D face recognition has accelerated in recent years due to the development of low cost 3D Kinect camera sensor. This has leads to the development of few RGB-D database across the world. Here in this paper we introduce the base results of our 3D facial database (GU-RGBD database) comprising variation in pose (0°, 45°, 90°, −45°, −90°), expression (smile, eyes closed), occlusion (half face covered with paper) and illumination variation using Kinect. We present a proposed noise removal non-linear interpolation filter for the patches present in the depth images. The results were obtained on three face recognition algorithms and fusion at matching score level for recognition and verification rate. The obtained results indicated that the performance with our proposed filter shows improvement over pose with score level fusion using sum rule.

A. A. Gaonkar, M. D. Gad, N. T. Vetrekar, Vithal Shet Tilve, R. S. Gad
Photometric Normalization Techniques for Extended Multi-spectral Face Recognition: A Comparative Analysis

Biometric authentication based on face recognition acquired enormous attention due to its non-intrusive nature of image capture. Recently, with the advancement in sensor technology, face recognition based on Multi-spectral imaging has gained lot of popularity due to its potential of capturing discrete spatio-spectral images across the electromagnetic spectrum. Our contribution here is to study empirically, the extensive comparative performance analysis of 22 photometric illumination normalization techniques for robust Multi-spectral face recognition. To evaluate this study, we developed a Multi-spectral imaging sensor that can capture Multi-spectral facial images across nine different spectral band in the wavelength range from 530 nm to 1000 nm. With the developed sensor we captured Multi-spectral facial database for 231 individuals, which will be made available in the public domain for the researcher community. Further, quantitative experimental performance analysis in the form of identification rate at rank 1, was conducted on 22 photometric normalization techniques using four state-of-the-art face recognition algorithms. The performance analysis indicates outstanding results with utmost all of the photometric normalization techniques for six spectral bands such as 650 nm, 710 nm, 770 nm, 830 nm, 890 nm, 950 nm.

N. T. Vetrekar, R. Raghavendra, R. S. Gad, G. M. Naik
Dictionary Based Approach for Facial Expression Recognition from Static Images

We present a simple approach for facial expression recognition from images using the principle of sparse representation using a learned dictionary. Visual appearance based feature descriptors like histogram of oriented gradients (HOG), local binary patterns (LBP) and eigenfaces are used. We use Fisher discrimination dictionary which has discrimination capability in addition to being reconstructive. The classification is based on the fact that each expression class with in the dictionary spans a subspace and these subspaces have non-overlapping directions so that they are widely separated. Each test feature point has a sparse representation in the union of subspaces of dictionary formed by labeled training points. To check recognition performance of the proposed approach, extensive experimentation is done over Jaffee and CK databases. Results show that the proposed approach has better classification accuracy than state-of-the-art techniques.

Krishan Sharma, Renu Rameshan
Vision Based Pose Estimation of Multiple Peg-in-Hole for Robotic Assembly

Vision sensors are used to estimate the pose (position and orientation) of mating components in a vision assisted robotic peg-in-hole assembly which is a crucial step in aligning the mating hole-component with the corresponding moving peg-component. The accuracy of this estimation decides the performance of peg-in-hole robotic assembly with an appropriate mapping between the image and task environment using a fixed overhead camera or camera on robot arm. The wheel and hub assembly in automobile has multiple holes and pegs in their mating parts which lead to more complex pose estimation procedure. The success rate of the assembly process (without jamming) is affected by an inaccurate pose estimation which leads to lateral and/or axial misalignment between the mating components during its insertion phase. On this consideration, this work proposes a pose estimation algorithm for a multiple peg-in-hole assembly with the use of genetic algorithm based two-stage camera calibration procedure. The proposed algorithm has also been tested for its performance in estimating the pose of the multiple pegs in wheel-hub of a car. The result reveals that the proposed method estimates the pose of the pegs accurately with minimum re-projection error.

Pitchandi Nagarajan, S. Saravana Perumaal, B. Yogameena
A Spatio Temporal Texture Saliency Approach for Object Detection in Videos

Detecting what attracts human attention is one of the vital tasks for visual processing. Saliency detection finds out the location of foci of attention on an outstanding object in image/video sequences. However, temporal information in videos play major role in human visual perception in locating salient objects. This paper presents a novel approach to detect salient object in a video using spatio-temporal textural saliency which also includes temporal information, an important aspect in videos. In this work, the context driven static saliency extracted from Lab color space in XY plane is combined with the local phase quantization on three orthogonal planes (LPQ-TOP) driven dynamic saliency to detect the spatio-temporal saliency in videos. The dynamic saliency is obtained by fusing two temporal saliencies extracted from XT-plane and YT-plane using LPQ texture feature, which extracts the temporal salient region. This approach is evaluated on Benchmark dataset and the result shows that the proposed saliency approach yields promising performance.

A. Sasithradevi, S. Mohamed Mansoor Roomi, I. Sanofer
Simultaneous Reconstruction of Multiple Hand Shredded Content-Less Pages Using Graph-Based Global Reassembly

Hand shredded content-less pages reassembly is a challenging task. This has applications in forensics and fun games. The process is even more tedious when the number of pages from which the fragments are obtained is unknown. An iterative framework to solve the jigsaw puzzles of multiple hand shredded content-less pages has been proposed in this paper. This framework makes use of the shape-based information alone to solve the puzzle. All pairs of fragments are matched using the normalized shape-based features. Then, incorrect matches between the fragments are pruned using three scores that measure the goodness of the alignment. Finally, a graph-based technique is used to densely arrange the fragments for the global reassembly of the page(s). Experimental evaluation of our proposed framework on an annotated dataset of shredded documents shows the efficiency in the reconstruction of multiple content-less pages from arbitrarily torn fragments and performance metrics have been proposed to numerically evaluate the reassembly.

K. S. Lalitha, Sukhendu Das, Arun Menon, Koshy Varghese
Super Resolution Mapping of Trees for Urban Forest Monitoring in Madurai City Using Remote Sensing

This paper proposes a super resolution mapping of trees pixel swapping method in Madurai city. Identifying and mapping the vegetation specifically trees is a significant issue in remote sensing applications where the lack of height information becomes a hard monocular recognition task. The density and shape of the trees gets affected by other man-made objects which gives rise to an erroneous recognition. The quality of recognition may be affected by various terms like resolution, visibility, sizes or scale. Predicting trees when they are partially blocked from view is also a challenging task. A common problem associated with the application of satellite images is the frequent occurrence of mixed pixels. The motivation of this work is to extract trees using pixel swapping method. Pixel-swapping algorithm is a simple and efficient technique for super resolution mapping to change the spatial arrangement of sub-pixels in such a way that the spatial correlation between neighboring sub-pixels would be maximized. Soft classification techniques were introduced to avoid the loss of information by assigning a pixel to multiple land-use/land-cover classes according to the area represented within the pixel. This soft classification technique generates a number of fractional images equal to the number of classes. Super resolution mapping was then used to know where each class is located within the pixel, in order to obtain detailed spatial patterns. The aim of supper resolution mapping is to determine a fine resolution map of the trees from the soft classification result. The experiment is conducted with images of Madurai city obtained from WorldView2 satellite. The accuracy of the pixel swapping algorithm was 98.74%.

D. Synthiya Vinothini, B. Sathyabama, S. Karthikeyan
Scale-Invariant Image Inpainting Using Gradient-Based Image Composition

In this paper, we propose a novel scale-invariant image inpainting algorithm that combines several inpainted images obtained from multiple pyramids of different coarsest scales. To achieve this, first we build the pyramids and then we run an image inpainting algorithm individually on each of the pyramids to obtain different inpainted images. Finally, we combine those inpainted images by a gradient based approach to obtain the final inpainted image. The motivation of this approach is to solve the problem of appearing artifacts in traditional single pyramid-based approach since the results depend on the starting scale of the pyramid. Here we assume that most of the inpainted images produced by the pyramids are quite good. However, some of them may have artifacts and these artifacts are eliminated by gradient based image composition. We test the proposed algorithm on a large number of natural images and compare the results with some of the existing methods to demonstrate the efficacy and superiority of the proposed method.

Mrinmoy Ghorai, Soumitra Samanta, Bhabatosh Chanda
Recursive Structure from Motion

In this paper we present a technique that estimates the Structure from Motion (SFM) in a recursive fashion. Traditionally successful SFM algorithms take the set of images and estimate the scene geometry and camera positions either using incremental algorithms or the global algorithms and do the refinement process [2] to reduce the reprojection error. In this work it is assumed that we don’t have complete image set at the start of the reconstruction process, unlike most of the traditional approaches present in the literature. It is assumed that the set of images come in at the regular intervals and we recursively perform the SFM on the incoming set of images and update the previously reconstructed structure with the structure estimated from the current set of images. The proposed system has been tested on two datasets which consist of 12 images and 60 images respectively and reconstructions obtained show the validity of our proposed technique.

M. Chebiyyam, S. Chaudhury, I. N. Kar
A Hybrid Deep Architecture for Face Recognition in Real-Life Scenario

This article describes our recent study of a real-life face recognition problem using a hybrid architecture consisting of a very deep convolution neural network (CNN) and a support vector machine (SVM). The novel aspects of this study include (i) implementation of a really deep CNN architecture consisting of 11 layers to study the effect of increasing depth on recognition performance by a subsequent SVM, and (ii) verification of the recognition performance of this hybrid classifier trained by samples of a certain standard size on test face images of smaller sizes reminiscent to various real-life scenarios. Results of the present study show that the features computed at various shallow levels of a deep architecture have identical or at least comparable performances and are more robust than the deepest feature computed at the inner most sub-sampling layer. We have also studied a simple strategy of recognizing face images of very small sizes using this hybrid architecture trained by standard size face images and the recognition performance is reported. We obtained simulation results using the cropped images of the standard extended Yale Face Database which show an interesting characteristic of the proposed architecture with respect to face images captured in a very low intensity lighting condition.

A. Sanyal, U. Bhattacharya, S. K. Parui
Brain Tumor Segmentation from Multimodal MR Images Using Rough Sets

Automatic segmentation of brain tumors from Magnetic Resonance images is a challenging task due to the wide variation in intensity, size, location of tumors in images. Defining a precise boundary for a tumor is essential for diagnosis and treatment of patients. Rough set theory, an extension of classical set theory, deals with the vagueness of data by determining the boundary region of a set. The aim of this work is to explore the possibility and effectiveness of using a rough set model to represent the tumor regions in MR images accurately, with Quadtree partitioning and simple K-means as precursors to indicate and limit the possible relevant regions. The advantage of using rough sets lie in its ability to represent the impreciseness of set boundaries, which is one of the major challenges faced in tumor segmentation. Experiments are carried out on the BRATS 2013 and 2015 databases and results are comparable to those reported by recent works.

Rupsa Saha, Ashish Phophalia, Suman K. Mitra
A Text Recognition Augmented Deep Learning Approach for Logo Identification

Logo/brand name detection and recognition in unstructured and highly unpredictable natural images has always been a challenging problem. We notice that in most natural images logos are accompanied with associated text. Therefore, we address the problem of logo recognition by first detecting and isolating text of varying color, font size and orientation in the input image using affine invariant maximally stable extremal regions (MSERs). Using an off-the-shelf OCR, we identify the text associated with the logo image. Then an effective grouping technique is employed to combine the remaining stable regions based on spatial proximity of MSERs. Deep learning has the advantage that optimal features can be learned automatically from image pixel data. This motivates us to feed the clustered logo candidate image regions to a pre-trained deep convolutional neural network (DCNN) to generate a set of complex features which are further input to a multiclass support vector machine (SVM) for classification. We tested our proposed logo recognition system on 32 logo classes, and a non-logo class obtained by combining FlickrLogos-32 and MICC logo databases, amounting to a total of 23582 training and testing images. Our method yields robust recognition performance, outperforming state-of-the-art techniques achieving 97.8% precision, 95.7% recall and 95.7% average accuracy on the combined MICC and FlickrLogos-32 datasets and a precision of 98.6%, recall of 97.9% and average accuracy of 99.6% on only the FlickrLogos-32 dataset.

Moushumi Medhi, Shubham Sinha, Rajiv Ranjan Sahay
High Frame Rate Real-Time Scene Change Detection System

Scene change detection, one of the fundamental and most important problem of computer vision, plays a very important role in the realization of a complete industrial vision system as well as automated video surveillance system - for automatic scene analysis, monitoring, and generation of alerts based on relevant changes in a video stream. Therefore, in addition to being accurate and robust, a successful scene change detection system must also be of very high frame rate in order to detect scene changes which goes off within a glimpse of the eye and often goes unnoticeable by the conventional frame rate cameras. Keeping the high frame rate processing as main focus, a very high frame rate real-time scene change detection system is developed by leveraging VLSI design to achieve high performance. This is accomplished by proposing, designing, and implementing an area-efficient scene change detection VLSI architecture on FPGA-based IDP Express platform. The developed prototype of complete real-time scene change detection system is capable of processing 2000 frames per second for 512 × 512 video resolution and is tested for live incoming video streams from high speed camera. The proposed and implemented system architecture is adaptable and scalable for different video resolutions and frame rates.

Sanjay Singh, Ravi Saini, Sumeet Saurav, Pramod Tanwar, Kota S. Raju, Anil K. Saini, Santanu Chaudhury, Idaku Ishii
Painting Classification Using a Pre-trained Convolutional Neural Network

The problem of classifying images into different predefined categories is an important high-level vision problem. In recent years, convolutional neural networks (CNNs) have been the most popular tool for image classification tasks. CNNs are multi-layered neural networks that can handle complex classification tasks if trained properly. However, training a CNN requires a huge number of labeled images that are not always available for all problem domains. A CNN pre-trained on a different image dataset may not be effective for classification across domains. In this paper, we explore the use of pre-trained CNN not as a classification tool but as a feature extraction tool for painting classification. We run an extensive array of experiments to identify the layers that work best with the problems of artist and style classification, and also discuss several novel representation and classification techniques using these features.

Sugata Banerji, Atreyee Sinha
Eigen Domain Transformation for Soft-Margin Multiple Feature-Kernel Learning for Surveillance Face Recognition

Face Recognition (FR) is the most accepted method of biometric authentication due to its inherent passive nature. This has attracted a lot of researchers over past few decades to achieve an moderately high accuracy under controlled environments. In order to achieve such an accuracy for FR under surveillance scenario has been proved to be a major hurdle in this area of research, mainly due to the difference in resolution, contrast, illumination and camera parameters of the training and the testing samples. In this paper, we propose a novel technique to find the optimal feature-kernel combination by SML_MFKC (Soft-margin Learning for Multi-Feature-Kernel Combination) to solve the problem of FR in surveillance, followed by an Eigen Domain Transformation (EDT) to bridge the gap between the distributions of the gallery and the probe samples. Rigorous experimentation has been performed on three real-world surveillance face datasets : FR_SURV [24], SCface [17] and ChokePoint [35]. Results have been shown using Rank-1 Recognition rates, ROC and CMC measures. Our proposed method outperforms all other recent state-of-the-art techniques by a considerable margin. Experimentations also show that the recent state-of-the-art Deep Learning techniques also fail to perform appreciably compared to our proposed method for the afore-mentioned datasets.

Samik Banerjee, Sukhendu Das
A Beta Distribution Based Novel Scheme for Detection of Changes in Crowd Motion

An automated system for crowd behaviour analysis has gained significance in the context of surveillance and public management. Detecting the changes in the crowd behaviour demarcates one activity or event from another. Thus, change detection is a fundamental step that enables the subsequent characterisation of the activities and analysis of the transition from one state to another. Proposed work deals with high density crowd. Global motion is an important cue for studying the behaviour of such crowd. In this work, crowd motion is modelled using beta distribution. Change in the distribution parameter is an indicator for change in crowd motion pattern. Proposed methodology has been tested with number of synthetic and natural video sequences and the performance is satisfactory.

Soumyajit Pal, Sounak Mondal, Sanjoy Kumar Saha, Bhabatosh Chanda
Reconstruction of Sparse-View Tomography via Banded Matrices

Computed Tomography (CT) is one of the significant research areas in medical image analysis. One of the main aspects of CT that researchers remain focused, is on reducing the dosage as X-rays are generally harmful to human bodies. In order to reduce radiation dosage, compressed sensing (CS) based methodologies appear to be promising. The basic premise is that medical images have inherent sparsity in some transformation domain. As a result, CS provides the possibility of recovering a high quality image from fewer projection data. In general, the sensing matrix in CT is generated from Radon projections by appropriately sampling the radial and angular parameters. In our work, by restricting the number of such parameters, we generate an under-determined linear system involving projection (Radon) data and a sparse sensing matrix, bringing thereby the problem into CS framework.Among various recent solvers, the Split-Bregman iterative scheme has of late become popular due to its suitability for solving a wide variety of optimization problems. Intending to exploit the underlying structure of sensing matrix, the present work analyzes its properties and finds a banded structure for an associated intermediate matrix. Using this observation, we simplify the Split-Bregman solver, proposing thereby a CT-specific solver of low complexity. We also provide the efficacy of proposed method empirically.

T. Prasad, P. U. Praveen Kumar, C. S. Sastry, P. V. Jampana

DAR

Frontmatter
SPODS: A Dataset of Color-Official Documents and Detection of Logo, Stamp, and Signature

Office automation is an active area of research. It involves archival and retrieval of official documents. For developing a system for this purpose, it is necessary to have an extensive benchmark dataset consisting various types of official documents. However, it is hard to make available real world official documents as they are mostly confidential. In the absence of such benchmark datasets, it is difficult to evaluate newly developed algorithms. Hence, efforts have been made to build dataset consisting of different categories of documents that resemble real world official documents. In this work, we present a dataset called as scanned pseudo-official data-set (SPODS) which is created by us and made available online. Official documents are usually distinguished by presence of logo, stamp, signature, date, etc. The paper also presents a new approach for the detection of logo, stamp, and signature using spectral filtering and part based features. A comparative study on performances of the proposed method and existing algorithms on the SPODS dataset demonstrates the effectiveness of the proposed technique.

Amit Vijay Nandedkar, Jayanta Mukherjee, Shamik Sural
Text and Non-text Separation in Scanned Color-Official Documents

Official documents consist of text and non-textual elements such as logo, stamp, and signature. Separation of these elements from a scanned document plays a significant role in document image retrieval, recognition, and verification. This paper presents a novel scheme to separate text and non-text elements of official documents using part-based features. In this work, we exploit the fact that intensity distributions of text and non-text elements in HSV color space are of distinctive nature. A new approach to compute part-based features using S and V channels is proposed. The classification of text and non-text components is performed based on majority voting scheme and K-approximate nearest neighbors. The knowledge base acquired during training is indexed using kD-tree indexing scheme. Subsequently, the method is extended for detection of logo, stamp, and signature. Experimental results show the effectiveness of the proposed approach.

Amit Vijay Nandedkar, Jayanta Mukherjee, Shamik Sural
Multi-font Telugu Text Recognition Using Hidden Markov Models and Akshara Bi-grams

Recent advances in the information technology made possible to introduce many Unicode Telugu fonts for the documentation needs of present society. But the recognition of documents printed in a variety of fonts poses new challenges in building Telugu OCR systems. In this paper, we demonstrate multi-font Telugu printed word recognition using implicit segmentation approach that provides segmentation as a by-product of recognition. Our word recognition approach relies on Hidden Markov Models and akshara bi-gram language model to recognize word images in terms of aksharas (characters). The training set of word images is prepared from document images of popular books and the synthetic document images generated using 8 different Unicode fonts. The testing involves matching the feature vector sequence against sequence of akshara HMMs based on bi-grams. The CER and WER of this system are 21% and 37% respectively. The performance of our system is very encouraging.

Koteswara Rao Devarapalli, Atul Negi
Anveshak - A Groundtruth Generation Tool for Foreground Regions of Document Images

We propose a graphical user interface based groundtruth generation tool in this paper. Here, annotation of an input document image is done based on the foreground pixels. Foreground pixels are grouped together with user interaction to form labeling units. These units are then labeled by the user with the user defined labels. The output produced by the tool is an image with an XML file containing its metadata information. This annotated data can be further used in different applications of document image analysis.

Soumyadeep Dey, Jayanta Mukherjee, Shamik Sural, Amit Vijay Nandedkar
Writer Identification for Handwritten Words

In this work we present a framework for recognizing writer for a handwritten word. We make use of allographic features at sub-word level. Our work is motivated by previous techniques which make use of a codebook. However, instead of encoding the features using the codewords, we exploit the discriminative properties of features that belong to the same cluster, in a supervised approach. We are able to achieve writer identification rates close to 63% on the handwritten words drawn from a dataset by 10 writers. Our work has application in scenarios where multiple writers write/annotate on the same page.

Shilpa Pandey, Gaurav Harit
Kalanjiyam: Unconstrained Offline Tamil Handwritten Database

In this paper, we present a new offline Tamil Handwritten database Kalanjiyam. This database serves dual purpose by providing isolated handwritten characters of varying writing styles (Phase-1) as well as handwritten text paragraphs (Phase-2). For both Phase-1 and Phase-2, we collect samples from 500 people of varying age groups, profession, different handedness etc. Each data collection form allows the volunteer to write 80 characters and two paragraphs of four sentences at least. Phase-1 of this work, involving data acquisition, processing and other protocols, has been reported in this paper for quick and timely dissemination of the ongoing work. The database is supplemented with essential ground-truth (GT) information and is made publicly available. To the best of our knowledge, this Kalanjiyam database when it is full-fledged (Phase-2 of the Database included), will be first of its kind to host comprehensive collection of offline handwritten characters as well text lines.

Faizal Hajamohideen, S. Noushath
Info-Graphics Retrieval: A Multi-kernel Distance Based Hashing Scheme

Information retrieval research has shown significant improvement and provided techniques that retrieve documents in image or text form. However, retrieval of multi-modal documents has been given very less attention. We aim to build a system for retrieval of documents with embedded information graphics (Info-graphics). Info-graphics are images of bar charts and line graphs appearing with textual components in magazines, newspapers, and journals. In this paper, we present multi-modal document image retrieval framework by learning an optimal fusion of information from text and info-graphics regions. The evaluation of the proposed concept is demonstrated on documents collected from various sources such as magazines and journals.

Ritu Garg, Santanu Chaudhury

MedImage

Frontmatter
Neovascularization Detection on Retinal Images

Proliferative Diabetic Retinopathy (PDR) is characterized by the growth of new abnormal, thin blood vessels called neovascularzation that spread along the retinal surface. An automated computer aided diagnosis system needs to identify neovasculars for PDR screening. Retinal images are often noisy and poorly illuminated. The thin vessels mostly appear to be disconnected and are inseparable from the background. This paper proposes a new method for neovascularization detection on retinal images. Blood vessels are extracted as thick, medium and thin types using multilevel thresholding on matched filter response. The total mutual information between the vessel density and the tortuosity of the thin vessel class is maximized to obtain the optimal thresholds to classify the normal and the abnormal vessels. Simulation results demonstrate that the proposed method outperforms the existing ones for neovascularization detection with an average accuracy of $$97.54\%$$.

Sudeshna Sil Kar, Santi P. Maity, Seba Maity
Distribution Based EEG Baseline Classification

Electrical signals generated in the brain, known as Electroencephalographic (EEG) signals, form a non-invasive measure of brain functioning. Baseline states of EEG are Eyes Open (EO) and Eyes Closed (EC) relaxed states. The choice of baseline used in an experiment is of critical importance since they form a reference with which other states are measured. In Brain Machine Interface, it is imperative that the system should be able to distinguish between these states and hence the need for automated classification of EEG baselines. In the proposed method, Statistical Moments are utilized. The Moment Generating Functions (MGFs) obtained using these moments are given as features to SVM and k-NN classifiers resulting in mean accuracies of 86.71% and 86.54%. The fact that MGF is able to differentiate between these states indicate that the two states have different source distribution parameters. A Smirnov test verified that the data of two classes indeed come from different distributions.

Gopika Gopan K., Neelam Sinha, Dinesh Babu J.
Texture Based Person Identification Using Dental Radiographs and Photographs in Forensic Odontology

Forensic Odontology is the evaluation of dental information that includes ante-mortem (AM) and post-mortem (PM) radiographs for the purpose of identifying person in some grave situations such as mass fatalities, natural disasters and terrorist attacks etc. One of the key issues in using dental images is that, although both the AM and the PM radiographs belong to the same person, there may be a mismatch between those radiographs due to the missing tooth in either of the radiographs. In such a case, the missing tooth in the radiograph has to be identified prior to the matching in order to achieve accurate identification of an individual. Thus an automatic algorithm for person identification in dental radiographs and photographs is a more challenging one at present. In this paper, texture based shape extraction algorithm is taken for analysis. Distance measures and classifier based approaches are the shape matching algorithm which is used to match both AM and PM images in order to obtain exact person identification. A novel approach has to be introduced for the extraction of the missing tooth, and subsequently each tooth in the radiograph is classified using k-NN classifier with Hu’s moment invariants as feature. Then each individual tooth is separated with pulp, enamel and dentine is applied to GLCM texture features. In this paper, a novel framework has been proposed to improve the identification performance. Moreover, the proposed algorithm achieves an overall accuracy of 98% than the existing approaches.

G. Jaffino, A. Banumathi, Ulaganathan Gurunathan, J. Prabin Jose
Shearlet Based Medical Image Fusion Using Pulse-Coupled Neural Network with Fuzzy Memberships

In this article, we propose a novel multimodal Medical Image Fusion (MIF) method based on a neuro-fuzzy technique in the transform (Non-Subsampled Shearlet Transform (NSST)) domain for spatially registered, multi-modal medical images. The source medical images are first decomposed by NSST. The low-frequency subbands (LFSs) are fused using the Max-selection rule. Fuzzy triangular memberships are derived from a specific neighborhood-region of each high-frequency coefficient. Then they (high-frequency subbands, HFSs) are fused using a biologically inspired neural network (Pulse Coupled Neural Network (PCNN)) according to our newly proposed rule. Then inverse NSST (INSST) is applied to the fused coefficients to get the fused image. Visual and quantitative analysis and comparisons with state-of-the-art MIF techniques show the effectiveness of the proposed scheme in fusing multimodality medical images.

Niladri Shekhar Mishra, Sudeb Das, Amlan Chakrabarti
MR Imaging via Reduced Generalized Autocalibrating Partially Parallel Acquisition Compressed Sensing

Magnetic Resonance Imaging (MRI) system in recent times demands a high rate of acceleration in data acquisition to reduce the scanning time. The data acquisition rate can be accelerated to a significant order through Parallel MRI (pMRI) approach. An additional improvement in low sensing time for data acquisition can be achieved using Compressed Sensing (CS) or Compressive Sampling that enables reconstruction of a sparse signal from sub-sample (incomplete) measurements. This paper proposes an efficient pMRI scheme by combining CS with Generalized Auto-calibrating Partially Parallel Acquisitions (GRAPPA) to produce an MR image at high data acquisition rate. A kernel of reduced size is used within GRAPPA for estimating the unobserved encoded samples. Instead of all the unobserved samples, a certain number of the same are estimated randomly. Now, an $$l_{1}$$-minimization based CS reconstruction technique is used in which the observed and the estimated unobserved samples are taken as measurements to reconstruct the final MR images. Extensive simulation results show that a significant reduction in artifacts and thereby consequent visual improvement in the reconstructed MRIs are achieved even when a high rate of acceleration factor is used. Simulation results also demonstrate that the proposed method outperforms some state-of-art pMRI methods, both in terms of subjective and objective quality assessment for the reconstructed images.

Sheikh Rafiul Islam, Seba Maity, Santi P. Maity, Ajoy Kumar Ray
Tracking of Retinal Microsurgery Tools Using Late Fusion of Responses from Convolutional Neural Network over Pyramidally Decomposed Frames

Computer vision and robotic assistance are increasingly being used to improve the quality of surgical interventions. Tool tracking becomes critical in interventions viz. endoscopy, laparoscopy and retinal microsurgery (RM) where unlike open surgery the surgeons do not have direct visual and physical access to the surgical site. RM is performed using miniaturized tools and requires careful observation through a surgical microscope by the surgeon. Tracking of surgical tools primarily provides robotic assistance during surgery and also serves as a means to assess the quality of surgery, which is extremely useful during surgical training. In this paper we propose a deep learning based visual tracking of surgical tool using late fusion of responses from convolutional neural network (CNN) which comprises of 3 steps: (i) training of CNN for localizing the tool tip on a frame (ii) coarsely estimating the tool tip region using the trained CNN and (iii) a finer search around the estimated region to accurately localize the tool tip. Scale invariant tracking of tool is ensured by incorporating multi-scale late fusion where the CNN responses are obtained at each level of the Gaussian scale decomposition pyramid. Performance of the proposed method is experimentally validated on the publicly available retinal microscopy instrument tracking (RMIT) dataset (https://sites.google.com/site/sznitr/code-and-datasets). Our method tracks tools with a maximum accuracy of $$99.13\%$$ which substantiates the efficacy of the proposed method in comparison to existing approaches.

Kaustuv Mishra, Rachana Sathish, Debdoot Sheet
Cardiac Ultrasound Image Enhancement Using Tissue Selective Total Variation Regularization

Speckle reduction is desired to improve the quality of ultrasound images. However, a uniform speckle reduction from the entire image results in loss of important information, especially in cardiac ultrasound images. In this paper, a tissue selective total variation regularization approach is proposed for the enhancement of cardiac ultrasound images. It measures the pixel probability of belonging to blood regions and uses it in the total variation framework. As a result, the unwanted speckle from the blood chamber regions is removed and the useful speckle in the tissue regions is preserved. This helps to improve the visible contrast of the images and enhances the structural details. The proposed approach is evaluated using synthetic as well as real images. A better performance is observed as compared to the state-of-the-art filters in terms of speckle region’s signal to noise ratio, structural similarity measure index, figure of merit, and mean square error.

Deepak Mishra, Santanu Chaudhury, Mukul Sarkar, Arvinder Singh Soin
Methods and System for Segmentation of Isolated Nuclei in Microscopic Breast Fine Needle Aspiration Cytology Images

Computer vision systems for automated breast cancer diagnosis using Fine Needle Aspiration Cytology (FNAC) images are under development for a while now. Accurate segmentation of the nuclei in microscopic images is crucial for functioning of these systems, as most quantify and analyze nuclear features for diagnosis. This paper presents a nucleus segmentation system (NSS) involving pre-processing, pre-segmentation and refined segmentation stages. The NSS includes a novel pixel transformation step to create a high contrast grayscale representation of the input color image. The grayscale image gives NSS the capability- to disregard elements that mimic nuclear morphological and luminescence characteristics, and to minimize effects of non-specific staining of cytoplasm by Hematoxylin. Experimental results illustrate generalizability of the NSS to use multiple refined segmentation techniques and particularly achieve accurate nucleus segmentation using active contours without edges(F-score > 0.92). The paper also presents the results of experiments conducted to study the impact of image pre-processing steps on the NSS performance. The pre-processing steps are observed to improve accuracy and consistency across tested refined segmentation techniques.

Hrushikesh Garud, Sri Phani Krishna Karri, Debdoot Sheet, Ashok Kumar Maity, Jyotirmoy Chatterjee, Manjunatha Mahadevappa, Ajoy Kumar Ray
Segmentation of Lumen and External Elastic Laminae in Intravascular Ultrasound Images Using Ultrasonic Backscattering Physics Initialized Multiscale Random Walks

Coronary artery disease accounts for a large number of deaths across the world and clinicians generally prefer using x-ray computed tomography or magnetic resonance imaging for localizing vascular pathologies. Interventional imaging modalities like intravascular ultrasound (IVUS) are used to adjunct diagnosis of atherosclerotic plaques in vessels, and help assess morphological state of the vessel and plaque, which play a significant role for treatment planning. Since speckle intensity in IVUS images are inherently stochastic in nature and challenge clinicians with accurate visibility of the vessel wall boundaries, it requires automation. In this paper we present a method for segmenting the lumen and external elastic laminae of the artery wall in IVUS images using random walks over a multiscale pyramid of Gaussian decomposed frames. The seeds for the random walker are initialized by supervised learning of ultrasonic backscattering and attenuation statistical mechanics from labelled training samples. We have experimentally evaluated the performance using 77 IVUS images acquired at 40 MHz that are available in the IVUS segmentation challenge dataset (http://www.cvc.uab.es/IVUSchallenge2011/dataset.html.) to obtain a Jaccard score of $$0.89 \pm 0.14$$ for lumen and $$0.85 \pm 0.12$$ for external elastic laminae segmentation over a 10-fold cross-validation study.

Debarghya China, Pabitra Mitra, Debdoot Sheet
Backmatter
Metadata
Title
Computer Vision, Graphics, and Image Processing
Editors
Snehasis Mukherjee
Suvadip Mukherjee
Dipti Prasad Mukherjee
Prof. Jayanthi Sivaswamy
Suyash Awate
Srirangaraj Setlur
Anoop M. Namboodiri
Prof. Dr. Santanu Chaudhury
Copyright Year
2017
Electronic ISBN
978-3-319-68124-5
Print ISBN
978-3-319-68123-8
DOI
https://doi.org/10.1007/978-3-319-68124-5

Premium Partner