Skip to main content

2018 | Buch

Proceedings of 2nd International Conference on Computer Vision & Image Processing

CVIP 2017, Volume 1

herausgegeben von: Prof. Bidyut B. Chaudhuri, Dr. Mohan S. Kankanhalli, Prof. Balasubramanian Raman

Verlag: Springer Singapore

Buchreihe : Advances in Intelligent Systems and Computing

insite
SUCHEN

Über dieses Buch

The book provides insights into the Second International Conference on Computer Vision & Image Processing (CVIP-2017) organized by Department of Computer Science and Engineering of Indian Institute of Technology Roorkee. The book presents technological progress and research outcomes in the area of image processing and computer vision. The topics covered in this book are image/video processing and analysis; image/video formation and display; image/video filtering, restoration, enhancement and super-resolution; image/video coding and transmission; image/video storage, retrieval and authentication; image/video quality; transform-based and multi-resolution image/video analysis; biological and perceptual models for image/video processing; machine learning in image/video analysis; probability and uncertainty handling for image/video processing; motion and tracking; segmentation and recognition; shape, structure and stereo.

Inhaltsverzeichnis

Frontmatter
Moving Target Detection Under Turbulence Degraded Visible and Infrared Image Sequences

The presence of atmospheric turbulence over horizontal imaging paths introduces time-varying perturbations and blur in the scene that severely degrade the performance of moving object detection and tracking systems of vision applications. This paper proposed a simple and efficient algorithm for moving target detection under turbulent media, based on adaptive background subtraction approach with different types of background models followed by adaptive global thresholding to detect foreground. This proposed method is implemented in MATLAB and tested on turbulence degraded video sequences. Further, this proposed method is also compared with state-of-the-art method published in the literature. The result shows that the detection performance by proposed algorithm is better. Further, the proposed method can be easily implemented in FPGA-based hardware.

Chaudhary Veenu, Kumar Ajay, Sharma Anurekha
Effective Denoising with Non-local Means Filter for Reliable Unwrapping of Digital Holographic Interferometric Fringes

Estimation of phase from the complex interference field has become an emerging area of research for last few decades. The phase values obtained by using arctan function are limited to the interval $$(-\pi , \pi ]$$. Such phase map is known as wrapped phase. The unwrapping process, which produces continuous phase map from the wrapped phase, becomes tedious in presence of noise. In this paper, we propose a preprocessing technique that removes the noise from the interference field, thereby improving the performance of naive unwrapping algorithms. For de-noising of the complex field, real part and imaginary parts of the field are processed separately. Real-valued images (real and imaginary parts) are processed using non-local means filter with non-Euclidian distance measure. The de-noised real and imaginary parts are then combined to form a clean interference field. MATLAB’s unwrap function is used as unwrapping algorithm to get the continuous phase from the cleaned interference field. Comparison with the Frost’s filter validates the applicability of proposed approach for processing the noisy interference field.

P. L. Aparna, Rahul G. Waghmare, Deepak Mishra, R. K. Sai Subrahmanyam Gorthi
Iris Recognition Through Score-Level Fusion

Although there are many iris recognition approaches available in the literature, there is a trade-off as which approach is giving the most reliable authentication. In this paper, score-level fusion of two different approaches, XOR-SUM Code and BLPOC, is used to achieve better performance than either approach individually. Different fusion strategies are employed to investigate the effect of fusion on genuine acceptance rate (GAR). It is observed that fusion through sum and product schemes provides better result than that through minimum and maximum schemes. For further improvement, sum and product schemes are more explored through weighted sum with different weights. The best GAR and equal error rate (EER) values are 98.83% and 0.95%, respectively. Performance of proposed score-level fusion is also compared with existing approaches.

Ritesh Vyas, Tirupathiraju Kanumuri, Gyanendra Sheoran, Pawan Dubey
A Novel Pattern Matching Approach on the Use of Multi-variant Local Descriptor

The objective of pattern matching problem is to find the most similar image pattern in a scene image by matching to an instance of the given pattern. For pattern matching, most distinctive features are computed from a pattern that is to be searched in the scene image. Scene image is logically divided into sliding windows of pattern size, and all the sliding windows are to be checked with the pattern for matching. Due to constant matching between the pattern and the sliding window, the matching process should be very efficient in terms of space, time and impacts due to orientation, illumination and occlusion must be minimized to obtain better matching accuracy. This paper presents a novel local feature descriptor called Multi-variant Local Binary Pattern (MVLBP) for pattern matching process while LBP is considered as base-line technique. The efficacy of the proposed pattern matching algorithm is tested on two databases and proved to be a computationally efficient one.

Deep Suman Dev, Dakshina Ranjan Kisku
GUESS: Genetic Uses in Video Encryption with Secret Sharing

Nowadays, video security systems are essential for supervision everywhere, for example video conference, WhatsApp, ATM, airport, railway station, and other crowded places. In multi-view video systems, various cameras are producing a huge amount of video content which makes it difficult for fast browsing and securing the information. Due to advancement in networking, digital cameras, and media, interactive sites, the importance of privacy and security is rapidly increasing. Hence, nowadays the security of digital videos become an emerging research area in the multimedia domain; especially when the communication happens over the Internet. Cryptography is an essential practice to protect the information in this digital world. Standard encryption techniques like AES/DES are not optimal and efficient in case of videos. Therefore, a technique is immediately required, which can provide the security to video content. In this paper, we address the video security-related issues and their solutions. An optimized version of the genetic algorithm is employed to solve the aforementioned issues through modeling the simplified version of genetic processes. It is used to generate a frame sequence such that the correlation between any two frames is minimized. The frame sequence determines the randomization in order of frames of a video. The proposed method is not only fast but also more accurate to enhance the efficiency of an encryption process.

Shikhar Sharma, Krishan Kumar
Learning-Based Fuzzy Fusion of Multiple Classifiers for Object-Oriented Classification of High Resolution Images

In remote-sensing, multi-classifier systems (MCS) have found its use for efficient pixel level image classification. Current challenge faced by the RS community is, classification of very high resolution (VHR) satellite/aerial images. Despite the abundance of data, certain inherent difficulties affect the performance of existing pixel-based models. Hence, the trend for classification of VHR imagery has shifted to object-oriented image analysis (OOIA) which work at object level. We propose a shift of paradigm to object-oriented MCS (OOMCS) for efficient classification of VHR imagery. Our system uses the modern computer vision concept of superpixels for the segmentation stage in OOIA. To this end, we construct a learning-based decision fusion method for integrating the decisions from the MCS at superpixel level for the classification task. Upon detailed experimentation, we show that our method exceeds in performance with respect to a variety of traditional OOIA decision systems. Our method has also empirically outperformed under conditions of two typical artefacts, namely unbalanced samples and high intra-class variance.

Rajeswari Balasubramaniam, Gorthi R. K. Sai Subrahmanyam, Rama Rao Nidamanuri
Image Retrieval Using Random Forest-Based Semantic Similarity Measures and SURF-Based Visual Words

In this paper, we propose a novel image retrieval scheme using random forest-based semantic similarity measures and SURF-based bag of visual words. A patch-based representation for the images is carried out with SURF-based bag of visual words. A random forest, which is an ensemble of randomized decision trees, is applied next on a set of training images. The training images accumulate into different leaf nodes in each decision tree of the random forest as a result. During retrieval, a query image, represented using SURF-based bag of visual words, is passed through each decision tree. We define a query path and a semantic neighbor set for such query images in all the decision trees. Different measures of semantic image similarity are derived by exploring the characteristics of query paths and semantic neighbor sets. Experimental results on the publicly available COIL-100 image database clearly demonstrate the superior performance of the proposed content-based image retrieval (CBIR) method with these new measures over some of the similar existing approaches.

Anindita Mukherjee, Jaya Sil, Ananda S. Chowdhury
Rotation Invariant Digit Recognition Using Convolutional Neural Network

Deep learning architectures use a set of layers to learn hierarchical features from the input. The learnt features are discriminative, and thus can be used for classification tasks. Convolutional neural networks (CNNs) are one of the widely used deep learning architectures. CNN extracts prominent features from the input by passing it through the layers of convolution and nonlinear activation. These features are invariant to scaling and small amount of distortions in the input image, but they offer rotation invariance only for smaller degrees of rotation. We propose an idea of using multiple instance of CNN to enhance the overall rotation invariant capabilities of the architecture even for higher degrees of rotation in the input image. The architecture is then applied to handwritten digit classification and captcha recognition. The proposed method requires less number of images for training, and therefore reduces the training time. Moreover, our method offers an additional advantage of finding the approximate orientation of the object in an image, without any additional computational complexity.

Ayushi Jain, Gorthi R. K. Sai Subrahmanyam, Deepak Mishra
Stochastic Assimilation Technique for Cloud Motion Analysis

Cloud motion analysis plays a key role in analyzing the climatic changes. Recent works show that Classic-NL approach outperforms many other conventional motion analysis techniques. This paper presents an efficient approach for assimilation of satellite images using a recursive stochastic filter, Weighted Ensemble Transform Kalman Filter (WETKF), with appropriate dynamical model and image warping-based non-linear measurement model. Here, cloud motion against the occlusions, missing information, and unexpected merging and splitting of clouds has been analyzed. This will pave a way for automatic analysis of motion fields and to draw inferences about their local and global motion over several years. This paper also demonstrates efficacy and robustness of WETKF over Classic-Non-Local-based approach (Bibin Johnson J et al., International conference on computer vision and 11 image processing, 2016) [1].

Kalamraju Mounika, J. Sheeba Rani, Gorthi Sai Subrahmanyam
Image Contrast Enhancement Using Hybrid Elitist Ant System, Elitism-Based Immigrants Genetic Algorithm and Simulated Annealing

Contrast enhancement is a technique which is used to expand the range of intensities within the image to make its features more distinct and easily perceptible to the human eye. It has found many applications ranging from medical to satellite imagery where the primary aim is to find hidden or minute details within an image. Through literary research, the authors have realised that the existing approaches lag behind in enhancing the contrast of an image. Hence in the present paper, an improved contrast enhancement technique is proposed which is based on the hybrid combination of nature-based metaheuristics: Elitist Ant System (EAS), Elitism-based Genetic Algorithm (EIGA) and Simulated Annealing (SA). EAS and EIGA work together to search globally for the optimum solution which is then refined by SA locally. Through experiment, it is observed that the proposed algorithm is efficiently improving the contrast of an image when compared with existing algorithms.

Rajeev Kumar, Anand Gupta, Apoorv Gupta, Aman Bansal
A Novel Robust Reversible Watermarking Technique Based on Prediction Error Expansion for Medical Images

Degradation of the host image by noise due to errors during data transmission is a major concern in telemedicine, especially with respect to reversible watermarking. This paper presents the effect of salt and pepper noise on prototypical prediction error expansion-based reversible watermarking and proposed prediction error expansion scheme using border embedding for gray scale medical images. In prototypical prediction error expansion, the accretion of the predicted error values is used for data insertion while in the proposed scheme, prediction error expansion using border embedding is used and aftermath of noise is demonstrated, respectively. A performance assessment based on peak signal-to-noise ratio (PSNR), total payload capacity, noise effect is conducted. Additional capacity and less mutilation of the host image in contrast to the pristine method in the presence of noise is obtained through the results.

Vishakha Kelkar, Jinal H. Mehta, Kushal Tuckley
Integrated Feature Exploration for Handwritten Devanagari Numeral Recognition

In this paper, the statistical feature extraction techniques are explored, incrementally combined using different methods and analyzed for the recognition of isolated offline handwritten Devanagari numerals. The techniques selected are zoning, directional distance distribution, Zernike moments, discrete cosine transform, and Gabor filter that encapsulate the mutually exclusive statistical features like average pixel densities, directional distribution, orthogonal invariant moments, elementary frequency components, and space frequency component, respectively. The standard benchmark handwritten Devanagari numeral database provided by ISI, Kolkata, is used for the experimentation and 1-nearest neighbor and support vector machine for classification. The accuracy achieved with individual feature extraction techniques ranges from 86.87% to 98.96%. Further, features are integrated with methods like feature concatenation, majority voting, and a new proposed methodology by us named winners pooling. The maximum recognition obtained through feature integration is 99.14%.

Shraddha Arya, Indu Chhabra, G. S. Lehal
Privacy Preserving for Annular Distribution Density Structure Descriptor in CBIR Using Bit-plane Randomization Encryption

With the rapid increase in multimedia services and Internet users over the network, it is crucial to have effective and accurate retrieval while preserving data confidentiality. We propose a simple and effective content-based image retrieval algorithm using annular distribution density structure descriptor (ADDSD) to retrieve the relevant images using encrypted features to preserve the privacy of image content. It exploits the HSV color space of image to generate quantized image. The structure element is obtained using same or similar edge orientation in uniform HSV color space. The structure element is detected using the grid and based on the quantized structure image so formed. Finally, annular histogram is generated from the quantized structure image which is encrypted by bit-plane randomization technique. Experimental analysis illustrates that the proposed method retrieves the relevant images effectively and efficiently without revealing image content information.

Mukul Majhi, Sushila Maheshkar
Near-Duplicate Video Retrieval Based on Spatiotemporal Pattern Tree

Recently, due to rapid advancement in multimedia devices and exponential increase in Internet user activities such as video editing, preview, and streaming accumulate enormous amount of near-duplicate videos which cannot be detected or retrieved effectively by conventional video retrieval technique. In this paper, we propose a simple but effective hierarchical spatiotemporal approach for high-quality near-duplicate video retrieval. Pattern generation of encoded key frames using angular distribution density is used which are translation and rotation invariant. Queue pool contributes temporal matching and consistency for the retrieval. Experimental result analysis demonstrates the effectiveness of the proposed method.

Ajay Kumar Mallick, Sushila Maheshkar
Fingerprint Liveness Detection Using Wavelet-Based Completed LBP Descriptor

Fingerprint-based authentication systems need to be secured against spoof attacks. In this paper, we propose completed local binary pattern (CLBP) texture descriptor with wavelet transform (WT) for fingerprint liveness detection. The fundamental basis of the proposed method is live, and spoof finger images differ in textural characteristics due to gray-level variations. These textural characteristics occur at various scales and orientations. CLBP has high discriminatory power as it takes into account local sign and magnitude difference with average gray level of an image. CLBP extended to 2-D Discrete WT (DWT), and 2-D Real Oriented Dual Tree WT (RODTWT) domain captures texture features at multiple scales and orientations. Each image was decomposed up to four levels, and CLBP features computed at each level are classified using linear and RBF kernel support vector machine (SVM) classifiers. Extensive comparisons are made to evaluate influence of wavelet decomposition level, wavelet type, number of wavelet orientations, and feature normalization method on fingerprint classification performance. CLBP in WT domain has proved to offer effective classification performance with simplicity of computation. While texture features at each scale contribute to performance, higher performance is achieved at lower decomposition levels of high resolution with db2 and db1 wavelets, RBF SVM and mean normalized features.

Jayshree Kundargi, R. G. Karandikar
Silhouette-Based Real-Time Object Detection and Tracking

Object detection and tracking in the video sequence is a challenging task and time-consuming process. Intrinsic factors like pose, appearance, variation in scale and extrinsic factors like variation in illumination, occlusion, and clutter are major challenges in object detection and tracking. The main objective of the tracking algorithm is accuracy and speed in each frame. We propose the best combination of detection and tracking algorithm which performs very efficiently in real time. In proposed algorithm, object detection task is performed from given sketch using Fast Directional Chamfer Matching (FDCM) which is capable of handling some amount of deformation in edges. To deal with the articulation condition, part decomposition algorithm is used in the proposed algorithm. Combination of these two parts is capable enough to handle deformation in shape automatically. Amount of time taken to perform this algorithm depends on the size and edge segment in the input frame. For object tracking, Speeded up Robust Features (SURF) algorithm is used because of its rotation invariant and fast performance features. The proposed algorithm works in all situations without the prior knowledge about number of frames.

Bhaumik Vaidya, Harendra Panchal, Chirag Paunwala
Visual Object Detection for an Autonomous Indoor Robotic System

This paper discusses an indoor robotic system that integrates a state-of-the-art object detection algorithm trained with data augmented for an indoor scenario and enabled with mechanisms to localize and position objects in 3D and display them interactively to a user. Size, weight, and power constraints in a mobile robot constrain the type of computing hardware that can be integrated with the robotic platform. However, on the other hand, the robot’s mobility if leveraged properly can provide enough opportunity to detect objects from different distances and viewpoints as the robot approaches them giving more robust results. This work adapts a CNN-based algorithm, YOLO, to run on a GPU-enabled board, the Jetson TX1. An innovative method to calculate the object position in the 3D environment map is discussed along with the problems therein, such as that of duplicate detections that need to be suppressed. Since multiple objects of different or same class may be detected, the user is overloaded with information and management of the visualization through human–machine interaction gains an important role. A scheme for informative display of objects is implemented which lets the user interactively view object images as well as their position in the scene. The complete robotic system including the interactive visualization tool can be put to various uses such as search and rescue, indoor assistance, patrolling and surveillance.

Anima M. Sharma, Imran A. Syed, Bishwajit Sharma, Arshad Jamal, Dipti Deodhare
Engineering the Perception of Recognition Through Interactive Raw Primal Sketch by HNFGS and CNN-MRF

The impression of a scene on human brain, specifically the primary visual cortex, is still a far-reached goal by the computer vision research community. This work is a proposal of a novel system to engineer the human perception of recognizing a subject of interest. This end-to-end solution implements all the stages from entropy-based unbiased cognitive interview to the final reconstruction of human perception in terms of machine sketch in the framework of forensic sketch of suspects. The lower mid-level vision as designed behaviorally in primary visual cortex honoring the scale-space concept of object identification has been modeled by hierarchical 2D filters, namely hierarchical neuro-visually inspired figure-ground segregation (HNFGS) for interactive sketch rendering. The aforementioned human–machine interaction is twofold: in gross structural design layer and finer/granular modification of the pre-realized digital perception. Pre-realized sketches are formed learning the characteristics of human artists while sketching an object through integrated framework of deep convolutional neural network (D-CNN) and Markov Random field (MRF). After few iterations of interactive fine-tuning of the sketch, a psycho-visual experiment has been designed and performed to evaluate the feasibility and effectiveness of the proposed algorithm.

Apurba Das, Nitin Ajithkumar
An Efficient Algorithm for Medical Image Fusion Using Nonsubsampled Shearlet Transform

Multimodal medical image fusion techniques are utilized to fuse two images obtained from dissimilar sensors for obtaining additional information. These methods are used to fuse computed tomography (CT) images with magnetic resonance images (MRI), MR-T1 images with MR-T2 images, and MR images with single photon emission computed tomography (SPECT) images. In proposed method, nonsubsampled shearlet transform (NSST) is used for decomposition of source images to attain the low-frequency and high-frequency bands. The low-frequency bands are fused using weighted saliency-based fusion criteria, and high-frequency bands are fused with the help of phase stretch transform (PST) features. Applying inverse NSST operation, fused image is obtained. The results show the proposed method produces better results compared to state-of-the-art methods.

Amit Vishwakarma, M. K. Bhuyan, Yuji Iwahori
A Novel Text Localization Scheme for Camera Captured Document Images

In this paper, a hybrid model for detecting text regions from scene images as well as document image is presented. At first, background is suppressed to isolate foreground regions. Then, morphological operations are applied on isolated foreground regions to ensure appropriate region boundary of such objects. Statistical features are extracted from these objects to classify them as text or non-text using a multi-layer perceptron. Classified text components are localized, and non-text ones are ignored. Experimenting on a data set of 227 camera captured images, it is found that the object isolation accuracy is 0.8638 and text non-text classification accuracy is 0.9648. It may be stated that for images with near homogenous background, the present method yields reasonably satisfactory accuracy for practical applications.

Tauseef Khan, Ayatullah Faruk Mollah
Video Inpainting Based on Re-weighted Tensor Decomposition

Video inpainting is the process of improving the information content in a video by removing irrelevant video objects and restoring lost or deteriorated parts utilizing the spatiotemporal features that are available from adjacent frames. This paper proposes an effective video inpainting technique utilizing the multi-dimensional data decomposition technique. In Tensor Robust Principal Component Analysis (TRPCA), a multi-dimensional data corrupted by gross errors is decomposed into a low multi-rank component and a sparse component. The proposed method employs an improved version of TRPCA called Re-weighted low-rank Tensor Decomposition (RWTD) to separate the true information and the irrelevant sparse components in a video. Through this, manual identification of the components which have to be removed is avoided. Subsequent inpainting algorithm fills the region with appropriate and visually plausible data. The capabilities of the proposed method are validated by applying into videos having moving sparse outliers in it. The experimental results reveal that the proposed method performs well compared with other techniques.

Anjali Ravindran, M. Baburaj, Sudhish N. George
Deep Convolutional Neural Network for Person Re-identification: A Comprehensive Review

In video surveillance, person re-identification (re-id) is a popular technique to automatically finding whether a person has been already seen in a group of cameras. In the recent years, availability of large-scale datasets, the deep learning-based approaches have made significant improvement in the accuracy over the years as compared to hand-crafted approaches. In this paper, we have distinguished the person re-id approaches into two categories, i.e., image-based and video-based approaches; deep learning approaches are reviewed in both categories. This paper contains the brief survey of deep learning approaches on both image and video person re-id datasets. We have also presented the current ongoing works, issues, and future directions in large-scale datasets.

Harendra Chahar, Neeta Nain
Flexible Threshold Visual Odometry Algorithm Using Fuzzy Logics

Visual odometry is a widely known art in the field of computer vision used for the task of estimating rotation and translation between two consecutive time instants. The RANSAC scheme used for outlier rejection incorporates a constant threshold for selecting inliers. The selection of an optimum number of inliers dispersed over the entire image is very important for accurate pose estimation and is decided on the basis of inlier threshold. In this paper, the threshold for inlier classification is adapted with the help of fuzzy logic scheme and varies with the data dynamics. The fuzzy logic is designed with an assumption about the maximum possible camera rotation that can be observed between consequent frames. The proposed methodology has been applied on KITTI dataset, and a comparison has been laid forth between adaptive RANSAC with and without using fuzzy logic with an aim of imparting flexibility to visual odometry algorithm.

Rahul Mahajan, P. Vivekananda Shanmuganathan, Vinod Karar, Shashi Poddar
Fast Single Image Learning-Based Super Resolution of Medical Images Using a New Analytical Solution for Reconstruction Problem

The process of retrieving images with high resolution using its low-resolution version is refereed to as super resolution. This paper proposes a fast and efficient algorithm that performs resolution enhancement and denoising of medical images. By using the patch pairs of high- and low-resolution images as database, the super-resolved image is recovered from their decimated, blurred and noise-added version. In this paper, the high-resolution patch to be estimated can be expressed as a sparse linear combination of HR patches over the database. Such linear combination of patches can be modelled as nonnegative quadratic problem. The computational cost of proposed method is reduced by finding closed form solution to the associated image reconstruction problem. Instead of traditional splitting strategy of decimation and convolution process, we decided to use the decimation and blurring operator’s frequency domain properties simultaneously. Simulation result conducted on several images with various noise level shows the potency of our SR approach compared with existing super-resolution techniques.

K. Mariyambi, E. Saritha, M. Baburaj
Analyzing ConvNets Depth for Deep Face Recognition

Deep convolutional neural networks are becoming increasingly popular in large-scale image recognition, classification, localization, and detection. In this paper, the performance of state-of-the-art convolution neural networks (ConvNets) models of the ImageNet challenge (ILSVRC), namely VGG16, VGG19, OverFeat, ResNet50, and Inception-v3 which achieved top-5 error rates up to 4.2% are analyzed in the context of face recognition. Instead of using handcrafted feature extraction techniques which requires a domain-level understanding, ConvNets have the advantages of automatically learning complex features, more training time, and less evaluation time. These models are benchmarked on AR and Extended Yale B face dataset with five performance metrics, namely Precision, Recall, F1-score, Rank-1 accuracy, and Rank-5 accuracy. It is found that GoogleNet ConvNets model with Inception-v3 architecture outperforms than other four architectures with a Rank-1 accuracy of 98.46% on AR face dataset and 97.94% accuracy on Extended Yale B face dataset. It confirms that deep CNN architectures are suitable for real-time face recognition in the future.

Mohan Raj, I. Gogul, M. Deepan Raj, V. Sathiesh Kumar, V. Vaidehi, S. Sibi Chakkaravarthy
Use of High-Speed Photography to Track and Analyze Melt Pool Quality in Selective Laser Sintering

Manufacturing industry is moving toward a process model involving rapid and frequent product deliveries to increase their consumer base. Employing laser sintering methods in product fabrication provides a superior quality, low cost and high-fidelity solution to support this. Effective monitoring and diagnostics of laser sintering process become a critical task in this regard. This paper focuses on analyzing spatters and plume generated during continuous laser sintering for fabrication of circular rings. Analysis using high-speed photography and subsequent image processing was undertaken. By varying laser parameters, the generated spatter and plume was tracked and features such as spatter count, spatter size, and plume area were examined. Results show that spatter count and plume size are related to the variations in laser power intensity. Optimal power settings are shown to produce best quality product. The proposed analysis method could be used to monitor the stability of laser sintering process.

Sourin Ghosh, Priti P. Rege, Manoj J. Rathod
Multi-Scale Directional Mask Pattern for Medical Image Classification and Retrieval

This paper presents a classification scheme for interstitial lung disease (ILD) pattern using patch-based approach and artificial neural network (ANN) classifier. A new feature descriptor, Multi-Scale Directional Mask Pattern (MSDMP), is proposed for feature extraction. Proposed MSDMP extracts microstructure information from a (31 × 31) size patches of the region of interest (ROI) which were marked by the radiologists. A two-layer feed-forward neural network is used for classification of ILD patterns. Also, proposed MSDMP feature descriptor has been tested on medical image retrieval system to check its robustness. Two benchmark medical datasets are used to evaluate the proposed descriptor. Performance analysis shows that the proposed feature descriptor outperforms the other existing state-of-the-art methods in terms of average recognition rate (ARR) and F-score.

Akshay A. Dudhane, Sanjay N. Talbar
Enhanced Characterness for Text Detection in the Wild

Text spotting is an interesting research problem as text may appear at any random place and may occur in various forms. Moreover, ability to detect text opens the horizons for improving many advanced computer vision problems. In this paper, we propose a novel language agnostic text detection method utilizing edge-enhanced maximally stable extremal regions (MSERs) in natural scenes by defining strong characterness measures. We show that a simple combination of characterness cues helps in rejecting the non-text regions. These regions are further fine-tuned for rejecting the non-textual neighbor regions. Comprehensive evaluation of the proposed scheme shows that it provides comparative to better generalization performance to the traditional methods for this task.

Aarushi Agrawal, Prerana Mukherjee, Siddharth Srivastava, Brejesh Lall
Denoising of Volumetric MR Image Using Low-Rank Approximation on Tensor SVD Framework

In this paper, we focus on denoising of additively corrupted volumetric magnetic resonance (MR) images for improved clinical diagnosis and further processing. We have considered three dimensional MR images as third-order tensors. MR image denoising is solved as a low-rank tensor approximation problem, where the non-local similarity and correlation existing in volumetric MR images are exploited. The corrupted images are divided into 3D patches and similar patches form a group matrix. The group matrices exhibit low-rank property and is decomposed with tensor singular value decomposition (t-SVD) technique, and reweighted iterative thresholding is performed on core coefficients for removing the noise. The proposed method is compared with the state-of-the-art methods and has shown improved performance.

Hawazin S. Khaleel, Sameera V. Mohd Sagheer, M. Baburaj, Sudhish N. George
V-LESS: A Video from Linear Event Summaries

In this paper, we propose a novel V-LESS technique for generating the event summaries from monocular videos. We employed Linear Discriminant Analysis (LDA) as a machine learning approach. First, we analyze the features of the frames, after breaking the video into the frames. Then these frames are used as input to the model which classifies the frames into active frames and inactive frames using LDA. The clusters are formed with the remaining active frames. Finally, the events are obtained using the key-frames with the assumption that a key-frame is either the centroid or the nearest frame to the centroid of an event. The users can easily opt the number of key-frames without incurring the additional computational overhead. Experimental results on two benchmark datasets show that our model outperforms the state-of-the-art models on Precision and F-measure. It also successfully abates the video content while holding the interesting information as events. The computational complexity indicates that the V-LESS model meets the requirements for the real-time applications.

Krishan Kumar, Deepti D. Shrimankar, Navjot Singh
Action Recognition from Optical Flow Visualizations

Optical flow is an important computer vision technique used for motion estimation, object tracking and activity recognition. In this paper, we study the effectiveness of the optical flow feature in recognizing simple actions by using only their RGB visualizations as input to a deep neural network. Feeding only the optical flow visualizations, instead of the raw multimedia content, ensures that only a single motion feature is used as a classification criterion. Here, we deal with human action recognition as a multi-class classification problem. In order to categorize an action, we train an AlexNet-like Convolutional Neural Network (CNN) on Farneback optical flow visualization features of the action videos. We have chosen the KTH data set, which contains six types of action videos, namely walking, running, boxing, jogging, hand-clapping and hand-waving. The accuracy obtained on the test set is 84.72%, and it is naturally less than the state of the art since only a single motion feature is used for classification, but it is high enough to show the effectiveness of optical flow visualization as a good distinguishing criterion for action recognition. The AlexNet-like CNN was trained in Caffe on two NVIDIA Quadro K4200 GPU cards, while the Farneback optical flow features were calculated using OpenCV library.

Arpan Gupta, M. Sakthi Balan
Human Activity Recognition by Fusion of RGB, Depth, and Skeletal Data

A significant increase in research of human activity recognition can be seen in recent years due to availability of low-cost RGB-D sensors and advancement of deep learning algorithms. In this paper, we augmented our previous work on human activity recognition (Imran et al., IEEE international conference on advances in computing, communications, and informatics (ICACCI), 2016) [1] by incorporating skeletal data for fusion. Three main approaches are used to fuse skeletal data with RGB, depth data, and the results are compared with each other. A challenging UTD-MHAD activity recognition dataset with intraclass variations, comprising of twenty-seven activities, is used for testing and experimentation. Proposed fusion results in accuracy of 95.38% (nearly 4% improvement over previous work), and it also justifies the fact that recognition improves with an increase in number of evidences in support.

Pushpajit Khaire, Javed Imran, Praveen Kumar
Backmatter
Metadaten
Titel
Proceedings of 2nd International Conference on Computer Vision & Image Processing
herausgegeben von
Prof. Bidyut B. Chaudhuri
Dr. Mohan S. Kankanhalli
Prof. Balasubramanian Raman
Copyright-Jahr
2018
Verlag
Springer Singapore
Electronic ISBN
978-981-10-7895-8
Print ISBN
978-981-10-7894-1
DOI
https://doi.org/10.1007/978-981-10-7895-8