Skip to main content

Über dieses Buch

This book constitutes the proceedings of the 7th International Conference on Pattern Recognition and Machine Intelligence, PReMI 2017,held in Kolkata, India, in December 2017.
The total of 86 full papers presented in this volume were carefully reviewed and selected from 293 submissions. They were organized in topical sections named: pattern recognition and machine learning; signal and image processing; computer vision and video processing; soft and natural computing; speech and natural language processing; bioinformatics and computational biology; data mining and big data analytics; deep learning; spatial data science and engineering; and applications of pattern recognition and machine intelligence.



Invited Talks


An Incremental Fast Policy Search Using a Single Sample Path

In this paper, we consider the control problem in a reinforcement learning setting with large state and action spaces. The control problem most commonly addressed in the contemporary literature is to find an optimal policy which optimizes the long run $$\gamma $$-discounted transition costs, where $$\gamma \in [0,1)$$. They also assume access to a generative model/simulator of the underlying MDP with the hidden premise that realization of the system dynamics of the MDP for arbitrary policies in the form of sample paths can be obtained with ease from the model. In this paper, we consider a cost function which is the expectation of a approximate value function w.r.t. the steady state distribution of the Markov chain induced by the policy, without having access to the generative model. We assume that a single sample path generated using a priori chosen behaviour policy is made available. In this information restricted setting, we solve the generalized control problem using the incremental cross entropy method. The proposed algorithm is shown to converge to the solution which is globally optimal relative to the behaviour policy.

Ajin George Joseph, Shalabh Bhatnagar

Biometric Counter-Spoofing for Mobile Devices Using Gaze Information

With the rise in the use of biometric authentication on mobile devices, it is important to address the security vulnerability of spoofing attacks where an attacker using an artefact representing the biometric features of a genuine user attempts to subvert the system. In this paper, techniques for presentation attack detection are presented using gaze information with a focus on their applicability for use on mobile devices. Novel features that rely on directing the gaze of the user and establishing its behaviour are explored for detecting spoofing attempts. The attack scenarios considered in this work include the use of projected photos, 2D and 3D masks. The proposed features and the systems based on them were extensively evaluated using data captured from volunteers performing genuine and spoofing attempts. The results of the evaluations indicate that gaze-based features have the potential for discriminating between genuine attempts and imposter attacks on mobile devices.

Asad Ali, Nawal Alsufyani, Sanaul Hoque, Farzin Deravi

Pattern Recognition and Machine Learning


kNN Classification with an Outlier Informative Distance Measure

Classification accuracy of the kNN algorithm is found to be adversely affected by the presence of outliers in the experimental datasets. An outlier score based on rank difference can be assigned to the points in these datasets by taking into consideration the distance and density of their local neighborhood points. In the present work, we introduce a generalized outlier informative distance measure where a factor based on the above score is used to modulate any potential distance function. Properties of the new outlier informative distance measure are presented. Experiments on several numeric datasets in the UCI machine learning repository clearly reveal the effectiveness of the proposed formulation.

Gautam Bhattacharya, Koushik Ghosh, Ananda S. Chowdhury

Tree-Based Structural Twin Support Tensor Clustering with Square Loss Function

Most of the real-life applications involving images, videos etc. deals with matrix data (second order tensor space). Tensor based clustering models can be utilized for identifying patterns in matrix data as they take advantage of structural information in multi-dimensional framework and reduce computational overheads as well. Despite such numerous advantages, tensor clustering has still remained relatively unexplored research area. In this paper, we propose a novel clustering technique, termed as Treebased Structural Least Squares Twin Support Tensor Clustering (Tree-SLSTWSTC), that builds a cluster model as a binary tree, where each node comprises of proposed Structural Least Squares Twin Support Tensor Machine (S-LSTWSTM) classifier that considers the structural risk minimization of data alongside a symmetrical L2-norm loss function. The proposed approach results in time-efficient learning. Initialization framework based on tensor $$k{-}$$means has been proposed and implemented in order to overcome the instability disseminated by random initialization. To validate the efficacy of the proposed framework, computational experiments have been performed with relevant tensor based models on face recognition and optical digit recognition datasets.

Reshma Rastogi, Sweta Sharma

Kernel Entropy Discriminant Analysis for Dimension Reduction

The unsupervised techniques for dimension reduction, such as principal component analysis (PCA), kernel PCA and kernel entropy component analysis, do not take the information about class labels into consideration. The reduced dimension representation obtained using the unsupervised techniques may not capture the discrimination information. The supervised techniques, such as multiple discriminant analysis and generalized discriminant analysis, can capture discriminatory information. However the reduced dimension is limited by number of classes. We propose a supervised technique, kernel entropy discriminant analysis (kernel EDA), that uses Euclidean divergence as criterion function. Parzen window method for density estimation is used to find an estimate of Euclidean divergence. Euclidean divergence estimate is expressed in terms of eigenvectors and eigenvalues of the kernel gram matrix. The eigenvalues and eigenvectors that contribute significantly to the Euclidean divergence estimate are used for determining the directions for projection. Effectiveness of the kernel EDA method is demonstrated through the improved classification accuracy for benchmark datasets.

Aditya Mehta, C. Chandra Sekhar

A New Method to Address Singularity Problem in Multimodal Data Analysis

In general, the ‘small sample (n)-large feature ()’ problem of bioinformatics, image analysis, high throughput molecular screening, astronomy, and other high dimensional applications makes the features highly collinear. In this context, the paper presents a new feature extraction algorithm to address this ‘large small n’ issue associated with multimodal data sets. The proposed algorithm judiciously integrates the concept of both regularization and shrinkage with canonical correlation analysis to extract important features. To deal with the singularity problem, the proposed method increases the diagonal elements of covariance matrices by using regularization parameters, while the off-diagonal elements are decreased by shrinkage coefficients. The concept of hypercuboid equivalence partition matrix of rough hypercuboid approach is used to compute both significance and relevance measures of a feature. The importance of the proposed algorithm over other existing methods is established extensively on real life multimodal omics data set.

Ankita Mandal, Pradipta Maji

Label Correlation Propagation for Semi-supervised Multi-label Learning

Many real world machine learning tasks suffer from the problem of scarce labeled data. In multi-label learning, each instance is associated with more than one label as in semantic scene understanding, text categorization and bio-informatics. Semi-supervised multi-label learning has attracted recent interest as gathering labeled data is both expensive and requires manual effort. Further, many of the labels have semantic correlation which manifests as co-occurrence and this information can be used to build effective classifiers in the multi-label scenario. In this paper, we propose two different graph based transductive methods, namely, the label correlation propagation and the k-nearest neighbors based label correlation propagation. Extensive experimentation on real-world datasets demonstrates the efficacy of the proposed methods and the importance of using the label correlation information in semi-supervised multi-label learning.

Aritra Ghosh, C. Chandra Sekhar

Formulation of Two Stage Multiple Kernel Learning Using Regression Framework

Multiple kernel learning (MKL) is an approach to find the optimal kernel for kernel methods. We formulated MKL as a regression problem for analyzing the regression data and hence the data modeling problem involves the computation of two functions, namely, the optimal kernel function which is related with MKL and the optimal regression function which generates the data. As such a formulation demands more space requirements supervised pre-clustering technique has been used for selecting the vital data points. We used two stage optimization for finding the models, in which, the optimal kernel function is found in the first stage and the optimal regression function in the second stage. Using kernel ridge regression the proposed method had been applied on real world problems and the experimental results were found to be promising.

S. S. Shiju, Asif Salim, S. Sumitra

A Two-Stage Conditional Random Field Model Based Framework for Multi-Label Classification

Multi-label classification (MLC) deals with the task of assigning an instance to all its relevant classes. This task becomes challenging in the presence of the label dependencies. The MLC methods that assume label independence do not use the dependencies among labels. We present a two-stage framework which improves the performance of MLC by using label dependencies. In the first stage, a standard MLC method is used to get the confidence scores for different labels. A conditional random field (CRF) is used in the second stage that improves the performance of the first-stage MLC by using the label dependencies among labels. An optimization-based framework is used to learn the structure and parameters of the CRF. Experiments show that the proposed model performs better than the state-of-the-art methods for MLC.

Abhiram Kumar Singh, C. Chandra Sekhar

A Matrix Factorization & Clustering Based Approach for Transfer Learning

Recommender systems that make use of collaborative filtering tend to suffer from data sparsity as the number of items rated by the users are very small as compared to the very large item space. In order to alleviate it, recently transfer learning (TL) methods have seen a growing interest wherein data is considered from multiple domains so that ratings from the first (source) domain can be used to improve the prediction accuracy in the second (target) domain. In this paper, we propose a model for transfer learning in collaborative filtering wherein the latent factor model for the source domain is obtained through Matrix Factorization (MF). User and Item matrices are combined in a novel way to generate cluster level rating pattern and a Code Book Transfer (CBT) is used for transfer of information from source to the target domain. Results from experiments using benchmark datasets show that our model approximates the target matrix well.

V. Sowmini Devi, Vineet Padmanabhan, Arun K. Pujari

Signal and Image Processing


Feature Selection and Fuzzy Rule Mining for Epileptic Patients from Clinical EEG Data

In this paper, we create EEG data derived signatures for differentiating epileptic patients from normal individuals. Epilepsy is a neurological condition of human beings, mostly treated based on a patient’s seizure symptoms. Clinicians face immense difficulty in detecting epileptic patients. Here we define brain region-connection based signatures from EEG data with help of various machine learning techniques. These signatures will help the clinicians in detecting epileptic patients in general. Moreover, we define separate signatures by taking into account a few demographic features like gender and age. Such signatures may aid the clinicians along with the generalized epileptic signature in case of complex decisions.

Abhijit Dasgupta, Losiana Nayak, Ritankar Das, Debasis Basu, Preetam Chandra, Rajat K. De

Selection of Relevant Electrodes Based on Temporal Similarity for Classification of Motor Imagery Tasks

Selection of relevant electrodes is of prime importance for developing efficient motor imagery Brain Computer Interface devices. In this paper, we propose a novel spectral clustering based on temporal similarity of electrodes to select a reduced set of relevant electrodes for classification of motor imagery tasks. Further, Stationary common spatial pattern method in conjunction with Composite kernel Support Vector Machine is utilized to develop a decision model. Experimental results demonstrate improvement in classification accuracy in comparison to variants of the common spatial pattern method on publicly available datasets. Friedman statistical test shows that the proposed method significantly outperformed the variants of the common spatial pattern method.

Jyoti Singh Kirar, Ayesha Choudhary, R. K. Agrawal

Automated Measurement of Translational Margins and Rotational Shifts in Pelvic Structures Using CBCT Images of Rectal Cancer Patients

Clinical radiotherapy procedures target to achieve high accuracy which is inhibited by various error sources. As a result, a safety margin is needed to ensure that the planned dosage is delivered to the target. In this work, 3D image coordinates of Pubic Symphysis (pb) and Coccyx are evaluated from Cone Beam CT images of colo-rectal cancer patients. Using those coordinates, we propose an automated method to obtain systematic and random error components. The standard deviations of systematic and random errors are used to evaluate the 3D PTV margin. We have also measured rotational variations in the positioning of patients using those locations. We have validated and found that the automated measurements show a very good match with those measured by oncologists manually.

Sai Phani Kumar Malladi, Bijju Kranthi Veduruparthi, Jayanta Mukherjee, Partha Pratim Das, Saswat Chakrabarti, Indranil Mallick

Exploring the Scope of HSV Color Channels Towards Simple Shadow Contour Detection

This paper presents a comparatively simple approach towards shadow contour detection in image. The methods of shadow detection during the last decade is based on chromacity, physical, geometry and texture of the image but most of the reported techniques are time consuming and computationally expensive. The presented method is based on color space and color channel selection on terms of co-occurrence matrix (GLCM) feature. The original images are converted to HSV color space from its native RGB color space for examining the separation of shadow and non-shadow regions in different color channels. The study reveals that value (V) and saturation (S) component is visibly influenced by the shadow and the same is reflected in their GLCM feature values. Thus further segmentation and morphological operations on those channels can result into a comparatively easier detection of shadow for simple and well as complex cases. The pictorial presentations of the results show the considerable potential of the presented technique.

Jayeeta Saha, Arpitam Chatterjee

Linear Curve Fitting-Based Headline Estimation in Handwritten Words for Indian Scripts

Most segmentation algorithms for Indian scripts require some prior knowledge about the structure of a handwritten word to efficiently fragment the word into constituent characters. Zone detection is a considerably used strategy for this purpose. Headline estimation is a salient part of zone detection. In the present work, we propose a method that uses simple linear regression for estimating headlines present in handwritten words. This method efficiently detects headline in three Indian scripts, namely Bangla, Devanagari, and Gurmukhi. The proposed method is able to detect headlines in skewed word images and provides accurate result even when the headline is discontinuous or mostly absent. We have compared our method with a recent work to show the efficacy of our proposed methodology.

Rahul Pramanik, Soumen Bag

Object Segmentation in Texture Images Using Texture Gradient Based Active Contours

Active contour models are one of the most popular and effective models for object segmentation. These models are usually dependant on the intensity gradient of the image. However, using such a model it is not possible to segment texture objects due to local convergence problem. So, we have used texture gradient instead of the intensity gradient in our proposed active contour model for texture segmentation, which is found out using non-decimated complex wavelet transform. Experimental results show that the proposed active contour model can effectively segment texture objects from their complex backgrounds in case of synthetic as well as natural texture images.

Priyambada Subudhi, Susanta Mukhopadhyay

A Variance Based Image Binarization Scheme and Its Application in Text Segmentation

This paper presents a novel variance based image binarization scheme for automatic segmentation of text from low resolution images. First, the variance based binarization scheme is separately carried out on the three color planes of the image. Then, we merge these planes to obtain final binarized image. This creates several connected components (CCs). Now, these CCs are studied in order to segment possible text CCs. Now, a number of features that classify between text and non-text components, are considered. Further, KNN and SVM classifiers are applied for the present two class classification problem. For the training of KNN and SVM, ground-truth information of text CCs and our laboratory made non-text CCs are considered. We conduct extensive experiments on publicly available ICDAR 2011 Born Digital Data set. Concerning comparison, we consider a number of previously reported methods. Our binarization scheme significantly outperforms the existing methods and segmentation results are also satisfactory.

Ranjit Ghoshal, Aditya Saha, Sayan Das

Computer Vision and Video Processing


Variants of Locality Preserving Projection for Modular Face and Facial Expression Recognition

Locality Preserving Projection (LPP) is one of the widely used approaches for finding intrinsic dimensionality of high dimensional data by preserving the local structure. Data points which are neighbors but belong to different classes are thereby projected as neighbors in the projection space, causing problem of discrimination. Various extensions of LPP have been proposed to enhance the discrimination power achieve better between class separation. In case of face recognition using full face images, if any portion of the face image is distorted, it may reflect on the recognition performance. Humans have the capability to recognize faces even by looking at some parts of the face. This article is an attempt to replicate the same on machines by only considering some of the informative regions of the face. Instead of the entire image, variants of LPP are applied on parts of face images and recognition is performed by combining the results of their reduced dimensional representations. Face and facial expression recognition experiments have been performed on some of the benchmark face databases.

Gitam Shikkenawis, Suman K. Mitra

A Robust Color Video Watermarking Technique Using DWT, SVD and Frame Difference

In the last decade there has been a steep rise in the amount of digital media. This has made entertainment comfortable for the consumer but insecure for the producer. With affordable broadband and innumerable methods, protecting proprierty digital media assets is difficult. Digital watermarking is one way to protect these assets. In this paper, we propose an efficient watermarking method for a colour video sequence. This method improves the execution time than existing techniques due to choosing lesser number of frames to embed the watermark, while still maintaing the robustness against various attacks. The robustness is measured using PSNR values and the correlation coefficient.

Sai Shyam Sharma, Sanik Thapa, Chaitanya Pavan Tanay

Aggregated Channel Features with Optimum Parameters for Pedestrian Detection

Aggregated Channel Features (ACF) proposed by Dollar [3] provide strong framework for pedestrian detection. In this paper we show that, fine tuning the parameters of the baseline ACF detector can achieve competitive performance without additional channels and filtering actions. We experimentally determined the optimized values of four parameters of ACF detector: (1) size of training dataset, (2) sliding window stride, (3) sliding window size and (4) number of bootstrapping stages. Accordingly, our optimized detector using pre learned eigen filters achieved state of the art performance compared with other variants of ACF detector on Caltech pedestrian dataset.

Blossom Treesa Bastian, C. Victor Jiji

Object Tracking with Classification Score Weighted Histogram of Sparse Codes

Object tracking involves target localization in dynamic scenes using either generative models, discriminative classifiers or their combination. We propose a combined approach consisting of generative models (learned in sparse representation framework) and discriminative classifiers (SVM). Sparse codes are initially computed from two different dictionaries constructed from foreground and background patches using K-SVD. SVM learned on these sparse codes provides classifier scores for patches. These scores for sparse codes of patches drawn from a region are used to form a weighted histogram. This weighted histogram of sparse codes form the object and candidate models. The learned dictionaries provide distinct representations for object and background patches. This discrimination is further enhanced by classifier scores. The object is localized by maximizing Bhattacharyya coefficient between target and candidate models in a particle filter framework. Performance of the proposed tracker is benchmarked on videos from VOT2014 dataset against existing generative and discriminative approaches. Our proposal was able to handle different challenging situations involving background clutter, in-plane rotations, scale and illumination changes.

Mathew Francis, Prithwijit Guha

A Machine Learning Inspired Approach for Detection, Recognition and Tracking of Moving Objects from Real-Time Video

In this paper, we address the problem of recognizing moving objects in video im-ages using Visual Vocabulary model and Bag of Words. Initially, the shadow free images are obtained by background modelling followed by object segmentation from the video frame to extract the blobs of our object of interest. Subsequently, we train a Visual Vocabulary model with human body datasets in accordance with our domain of interest for recognition. In training, we use the principle of Bag of Words to extract necessary features to certain domains and objects for classification, similarly, matching them with extracted object blobs that are obtained by subtracting the shadow free background from the foreground. We track the detected objects via Kalman Filter. We evaluate our algorithm on benchmark datasets. A comparative analysis of our algorithm against the existing state-of-the-art methods shows very satisfactory results to go forward.

Anit Chakrabory, Sayandip Dutta

Does Rotation Influence the Estimated Contour Length of a Digital Object?

In this work, we study the variation of estimated contour length of a digital object when it is rotated with respect to its centroid. This analysis also helps to ascertain the unknown angle of rotation of an object relative to a reference position. Additionally, we propose a new technique for estimating the length of a digital contour based on stitched digital cover. The proposed study on rotational variation of contour length finds applications to various image-registration problems such as the detection of positioning-errors that are often encountered during X-ray imaging of patients. Experimental results are presented for some regular curves, natural objects, and X-ray images.

Sabyasachi Mukherjee, Oishila Bandyopadhyay, Arindam Biswas, Bhargab B. Bhattacharya

Abnormal Crowd Behavior Detection Based on Combined Approach of Energy Model and Threshold

The world population continues to grow. The size of gatherings at various venues under different circumstances is tremendously increasing. Any mass assembly has potential risk of the lethal crowd disaster. The various algorithms in computer vision techniques are being developed as a part of proactive system for crowd management. This paper presents novel approach to identify abnormal crowd behavior. The algorithm employedoptical flow method to estimate displacement vectors of moving crowd and computation of crowd motion energy. The crowd motion energy was further modified by crowd motion intensity (CMI). The peaks in the CMI characteristics were the indicators of abnormal activity and were detected more accurately by applying threshold. The algorithm has been tested on standard UMN dataset with an average accuracy of 91.66 percentages. The accuracy has been improved with application of threshold. Adaptive threshold may aid to further improve the performance of algorithm.

Madhura Halbe, Vibha Vyas, Yogita M. Vaidya

Unsupervised Feature Descriptors Based Facial Tracking over Distributed Geospatial Subspaces

Object Tracking has primarily been characterized as the study of object motion trajectory over constraint subspaces under attempts to mimic human efficiency. However, the trend of monotonically increasing applicability and integrated relevance over distributed commercial frontiers necessitates that scalability be addressed. The present work proposes a system for fast large scale facial tracking over distributed systems beyond individual human capabilities leveraging the computational prowess of large scale processing engines such as Apache Spark. The system is pivoted on an interval based approach for receiving the input feed streams, which is followed by a deep encoder-decoder network for generation of robust environment invariant feature encoding. The system performance is analyzed while functionally varying various pipeline components, to highlight the robustness of the vector representations and near real-time processing performance.

Shubham Dokania, Ayush Chopra, Feroz Ahmad, S. Indu, Santanu Chaudhury

Face Detection Based on Frequency Domain Features

In this paper we have developed a novel face detection method using the Stockwell and the log dyadic wavelet transform features, following the cascaded face detectors framework. Stockwell transform (ST) time frequency distribution of an image region is known for its excellent feature representational capabilities (due to the high resolution of the distribution). Log dyadic wavelet transform (LDWT) is capable of representing image patches with high accuracy. We have used the Stockwell transform and the log dyadic wavelet transform for representing the facial features effectively. Our face detection method consists of two stages. The first stage consists of a cascade of 4 face detectors constructed using discriminative facial ST features selected by the ADABOOST feature selection method. The second stage consists of a cascade of 4 more face detectors, each of them is a SVM classifier trained with face/nonfacial LDWT features. We have conducted our face detection experiments on the well known CMU-MIT and FDDB face detection datasets to verify the efficacy of our method.

B. H. Shekar, D. S. Rajesh

A Study on the Properties of 3D Digital Straight Line Segments

Digital representation of three-dimensional straight line segments is considered here. A 3D straight line, given its direction vector, when projected on any two planes gives two 2D straight lines. The chain codes of these two 2D straight lines combined together give the chain code representation of the corresponding 3D straight line. The properties of 3D digital straight lines are studied and analyzed.

Mousumi Dutt, Somrita Saha, Arindam Biswas

Unlocking the Mechanism of Devanagari Letter Identification Using Eye Tracking

The present day computers can outperform the human in many complicated tasks very precisely and efficiently. However, in many scenarios like pattern recognition and more importantly, character recognition; a school going child can outperform the sophisticated machines available today. The modern machines present today find handwritten, calligraphic text difficult to recognize because such texts hardly contain rationalized straight lines or perfect loops or circles. Therefore, most of the optical character recognition systems fail to recognize the characters beyond certain levels of distortions and noise. On the other hand, the human brain has achieved a remarkable ability to recognize visual patterns or characters in various distortion conditions with high speed. The present work tries to understand how human perceive, process and recognize the Devanagari characters under various distortion levels. In order to achieve this objective, eye tracking experiment was performed on 20 graduate participants by presenting stimuli in decreasing level of distortions (from highly distorted to more normal one). The eye fixation patterns along with the time course of recognition gave us the moment-to-moment processing involved in letter identification. Upon understanding the level of distortion acceptable for correct letter recognition and the processes involved in the identification of the letters, the OCR can be made more robust and the gap between human reading and machine reading can be narrowed down.

Chetan Ralekar, Tapan K. Gandhi, Santanu Chaudhury

Video Stabilization Using Sliding Frame Window

Shaky videos are visually unappealing to viewers. Digital video stabilization is a technique to compensate for unwanted camera motion and produce a video that looks relatively stable. In this paper, an approach for video stabilization is proposed which works by estimating a trajectory built by calculating motion between continuous frames using the Shi-Tomasi Corner Detection and Optical Flow algorithms for the entire length of the video. The trajectory is then smoothed using a moving average to give a stabilized output. A smoothing radius is defined, which determines the smoothness of the resulting video. Automatically deciding this parameter’s value is also discussed. The results of stabilization of the proposed approach are observed to be comparable with the state of the art YouTube stabilization.

Keerthan S. Shagrithaya, Eeshwar Gurushankar, Deepak Srikanth, Pravin Bhaskar Ramteke, Shashidhar G. Koolagudi

Palmprint and Finger Knuckle Based Person Authentication with Random Forest via Kernel-2DPCA

This paper presents a hand biometric system by fusing information of palmprint and finger knuckle to check the loopholes that are present in transfer of payments through various levels of bureaucratic financial inclusion projects. Initially, a novel, fixed size ROIs of palm and finger knuckle has been extracted. The poor contrast ROI images are enhanced using modified CLAHE algorithm. To minimize the pose and illumination effects, Line Ordinal Pattern (LOP) based transformation scheme has been applied. The generation of dense feature representation by using dual tree complex wavelet transform can increase the discrimination power of independent local features. Then, the original feature space is mapped into high dimensional sub feature set, where K2DPCA is performed on each subset to extract high order statistics. Addressing to the matching problem, a high-performance Random Forest method has been employed. Finally, the two modalities are combined at weighted sum score level fusion rule which has shown the increased performance (CRR (100%), EER (0.68%), and (computation time (2130 ms)) of combined approach. The proposed method is evaluated using a virtual combination of publicly available PolyU palm print and PolyU FKP databases.

Gaurav Jaswal, Amit Kaul, Ravinder Nath

Soft and Natural Computing


A Fuzzy-LP Approach in Time Series Forecasting

In this study, a novel model is presented to forecast the time series data set based on the fuzzy time series (FTS) concept. To remove various drawbacks associated with the FTS modeling approach, this study incorporates significant changes in the existing FTS models. These changes are: (a) to apply the linear programming (LP) model in the FTS modeling approach for the selection of appropriate length of intervals, (b) to fuzzify the historical time series value (TSV) based on its involvement in the universe of discourse, (c) to use the high-order fuzzy logical relations (FLRs) in the decision making, and (d) to use the degree of membership (DM) along with the corresponding mid-value of the interval in the defuzzification operation. All these implications signify the effective results in time series forecasting, which are verified and validated with real-world time series data set.

Pritpal Singh, Gaurav Dhiman

Third Order Backward Elimination Approach for Fuzzy-Rough Set Based Feature Selection

Two important control strategies for Rough Set based reduct computation are Sequential Forward Selection (SFS), and Sequential Backward Elimination (SBE). SBE methods have an inherent advantage of resulting in reduct whereas SFS approaches usually result in superset of reduct. The fuzzy rough sets is an extension of rough sets used for reduct computation in Hybrid Decision Systems. The SBE based fuzzy rough reduct computation has not attempted till date by researchers due to the fuzzy similarity relation of a set of attributes will not typically lead to fuzzy similarity relation of the subset of attributes. This paper proposes a novel SBE approach based on Gaussian Kernel-based fuzzy rough set reduct computation. The complexity of the proposed approach is the order of three while existing are fourth order. Empirical experiment conducted on standard benchmark datasets established the relevance of the proposed approach.

Soumen Ghosh, P. S. V. S. Sai Prasad, C. Raghavendra Rao

A Novel OCR System Based on Rough Set Semi-reduct

Most of the well-known OCR engines, such as Google Tesseract, resort to a supervised classification, causing the system drooping in speed with increasing diversity in font style. Hence, with an aim to resolve the tediousness and pitfalls of training an OCR system, but without compromising with its efficiency, we introduce here a novel rough-set-theoretic model. It is designed to effectuate an unsupervised classification of optical characters with a suboptimal attribute set, called the semi-reduct. The semi-reduct attributes are mostly geometric and topological in nature, each having a small range of discrete values estimated from different combinatorial characteristics of rough-set approximations. This eventually leads to quick and easy discernibility of almost all the characters irrespective of their font style. For a few indiscernible characters, Tesseract features are used, but very sparingly, in the final stages of the OCR pipeline so as to ensure an attractive run time of the overall process. Preliminary experimental results demonstrate its further scope and promise.

Ushasi Chaudhuri, Partha Bhowmick, Jayanta Mukherjee

Rough Set Rules Determine Disease Progressions in Different Groups of Parkinson’s Patients

Parkinson’s disease (PD) is the second after Alzheimer most popular neurodegenerative disease (ND). We do not have cure for both NDs. Therefore the purpose of our study was to predict results of different PD patients’ treatments in order to find an optimal one.We have used rough sets (RS) and machine learning (ML) rules to describe and predict disease progression (UPDRS - Unified Parkinson’s Disease Rating Scale) in three groups of Parkinson’s patients: 23 BMT patients on medication; 24 DBS patients on medication and on DBS therapy (deep brain stimulation) after surgery performed during our study; and 15 POP patients that have surgery earlier (before beginning of our study). Every PD patient had three visits approximately every 6 months. The first visit for DBS patients was before surgery.On the basis of the following condition attributes: disease duration, saccadic eye movement parameters, and neuropsychological tests: PDQ39, and Epworth tests we have estimated UPDRS changes (as the decision attribute).By means of ML and RS rules obtained for the first visit of BMT/DBS/POP patients we have predicted UPDRS values in next year (two visits) with the global accuracy of 70% for both BMT visits; 56% for DBS, and 67, 79% for POP second and third visits.We have used rules obtained in BMT patients to predict UPDRS of DBS patients; for first session DBSW1: global accuracy was 64%, for second DBSW2: 85% and the third DBSW3: 74% but only for DBS patients during stimulation-ON. These rules could not predict UPDRS in DBS patients during stimulation-OFF visits and in all conditions of POP patients.

Andrzej W. Przybyszewski, Stanislaw Szlufik, Piotr Habela, Dariusz M. Koziorowski

Adversarial Optimization of Indoor Positioning System Using Differential Evolution

This paper presents an adversarial approach to improve the accuracy of an indoor positioning system. In the present work, we propose a system, composed of two components which act as an adversary to each other while determining the accurate parameters in the equations governing the distance evaluations. Differential Evolution is employed to update the parameters in the continuous domain, in real-time by generating an adversarial relation using the two components. Distance evaluation using Time-of-Arrival (TOA) and Received Signal Strength (RSS) are the two strategies used to evaluate distances independently.

Feroz Ahmad, Sreedevi Indu

Fast Convergence to Near Optimal Solution for Job Shop Scheduling Using Cat Swarm Optimization

Job Shop Scheduling problem has wide range of applications. However it being a NP-Hard optimization problem, always finding an optimal solution is not possible in polynomial amount of time. In this paper we propose a heuristic approach to find near optimal solution for Job Shop Scheduling Problem in predetermined amount of time using Cat Swarm Optimization. Novelty in our approach is our non-conventional way of representing position of cat in search space that ensures advantage of spatial locality is taken. Further while exploring the search space using randomization, we never explore an infeasible solution. This reduces search time. Our proposed approach outperforms some of the conventional algorithms and achieves nearly 86% accuracy, while restricting processing time to one second.

Vivek Dani, Aparna Sarswat, Vishnu Swaroop, Shridhar Domanal, Ram Mohana Reddy Guddeti

Music-Induced Emotion Classification from the Prefrontal Hemodynamics

Most of the traditional works on emotion recognition utilize manifestation of emotion in face, voice, gesture/posture and bio-potential signals of the subjects. However, these modalities of emotion recognition cannot totally justify its significance because of wide variations in these parameters due to habitat and culture. The paper aims at recognizing emotion of people directly from their brain response to infrared signal using music as the stimulus. A type-2 fuzzy classifier has been used to eliminate the effect of intra and inter-personal variations in the feature-space, extracted from the infrared response of the brain. A comparative analysis reveals that the proposed interval type-2 fuzzy classifier outperforms its competitors by classification accuracy as the metric.

Pallabi Samanta, Diptendu Bhattacharya, Amiyangshu De, Lidia Ghosh, Amit Konar

Speech and Natural Language Processing


Analysis of Features and Metrics for Alignment in Text-Dependent Voice Conversion

Voice Conversion (VC) is a technique that convert the perceived speaker identity from a source speaker to a target speaker. Given a source and target speakers’ parallel training speech database in the text-dependent VC, first task is to align source and target speakers’ spectral features at frame-level before learning the mapping function. The accuracy of alignment will affect the learning of mapping function and hence, the voice quality of converted voice in VC. The impact of alignment is not much explored in the VC literature. Most of the alignment techniques try to align the acoustical features (namely, spectral features, such as Mel Cepstral Coefficients (MCC)). However, spectral features represents both speaker as well as speech-specific information. In this paper, we have done analysis on the use of different speaker-independent features (namely, unsupervised posterior features, such as, Gaussian Mixture Model (GMM)-based and Maximum A Posteriori (MAP) adapted from Universal Background Model (UBM), i.e., GMM-UBM-based posterior features) for the alignment task. In addition, we propose to use different metrics, such as, symmetric Kullback-Leibler (KL) and cosine distances instead of Euclidean distance for the alignment. Our analysis-based on % Phone Accuracy (PA) is correlating with subjective scores of the developed VC systems with 0.98 Pearson correlation coefficient.

Nirmesh J. Shah, Hemant A. Patil

Effectiveness of Mel Scale-Based ESA-IFCC Features for Classification of Natural vs. Spoofed Speech

The performance of biometric systems based on Automatic Speaker Verification (ASV) degrades due to spoofing attacks, generated using different speech synthesis (SS) and voice conversion (VC) techniques. Results of recent ASV spoof 2015 challenge indicate that spoof-aware features are a possible solution, rather than focusing on a powerful classifier. In this paper, we investigate the effect of various frequency scales (such as, ERB, Mel and linear) applied on a Gabor filterbank. The output of filterbank was used to exploit the contribution of instantaneous frequency (IF) in each subband energy via Teager Energy Operator-based Energy Separation Algorithm (TEO-ESA) to capture possible changes in spectral envelope of spoofed speech. The IF is computed from narrowband components of the speech signal and Discrete Cosine Transform (DCT) is applied on deviations in IF, which are referred to as Instantaneous Frequency Cosine Coefficients (IFCC). The classification results on static features shows an EER of 1.32% with Mel frequency scale and 1.87% with linear. The results with delta feature of linear frequency scale gets reduced further to 1.39% whereas, with Mel scale, it increased by 0.64% on development set of ASV spoof 2015 challenge database.

Madhu R. Kamble, Hemant A. Patil

Novel Phase Encoded Mel Filterbank Energies for Environmental Sound Classification

In Environment Sound Classification (ESC) task, only the magnitude spectrum is processed and the phase spectrum is ignored, which leads to degradation in the performance. In this paper, we propose to use phase encoded filterbank energies (PEFBEs) for ESC task. In proposed feature set, we have used Mel-filterbank, since it represents characteristics of human auditory processing. Here, we have used Convolutional Neural Network (CNN) as a pattern classifier. The experiments were performed on ESC-50 database. We found that our proposed PEFBEs feature set gives better results compared to the state-of-the-art Filterbank Energies (FBEs). In addition, score-level fusion of FBEs and proposed PEFBEs have been carried out, which leads to further relatively better performance than the individual feature set. Hence, the proposed PEFBEs captures the complementary information than FBEs alone.

Rishabh N. Tak, Dharmesh M. Agrawal, Hemant A. Patil

An Adaptive i-Vector Extraction for Speaker Verification with Short Utterance

A prime challenge in automatic speaker verification (ASV) is to improve performance with short speech segments. The variability and uncertainty of intermediate model parameters associated with state-of-the-art i-vector based ASV system, extensively increases in short duration. To compensate increased variability, we propose an adaptive approach for estimation of model parameters. The pre-estimated universal background model (UBM) parameters are used for adaptation. The speaker models i.e., i-vectors are generated with the proposed adapted parameters. The ASV performance with the proposed approach considerably outperformed conventional i-vector based system on publicly available speech corpora, NIST SRE 2010, especially in short duration, as required in real-world applications.

Arnab Poddar, Md Sahidullah, Goutam Saha

Spoken Keyword Retrieval Using Source and System Features

In this paper, a novel excitation source-related feature set, viz., Teager Energy-based Mel Frequency Cepstral Coefficients (T-MFCC) is proposed for the task of spoken keyword detection. Experiments are carried out on TIMIT database for spoken keyword detection. Furthermore, state-of-the-art feature set, viz., MFCC is used as the baseline spectral feature set to represent implicitly vocal tract (i.e., system) information. The idea is to exploit the vocal-source (and its nonlinear coupling with formant) and system-related information embedded in the spoken query. Experimental results show % EER of 17.23 and 22.58 for MFCC and proposed T-MFCC features, respectively. However, the significant reduction in % EER, i.e., by 1.8 % (as compared to MFCC) is observed when evidences from T-MFCC and MFCC are combined using score-level fusion; indicating that proposed feature set captures complementary linguistic information (in the spoken keyword) than MFCC alone.

Maulik C. Madhavi, Hemant A. Patil, Nikhil Bhendawade

Novel Gammatone Filterbank Based Spectro-Temporal Features for Robust Phoneme Recognition

Recently, Automatic Speech Recognition (ASR) technology is being used in practical scenarios and hence, robustness of ASR is becoming increasingly important. State-of-the-art Mel Frequency Cepstral Coefficients (MFCC) features are known to be affected by acoustic noise whereas physiologically motivated features such as spectro-temporal Gabor filterbank (GBFB) features intend to perform better in signal degradation conditions. The spectro-temporal GBFB feature extraction incorporates mel filterbank to mimic frequency mapping in the Basilar Membrane (BM) in the inner ear. In this paper, Gammatone filterbank is used and a comparison is done between GBFB with mel filterbank (GBFBmel) features and GBFB with Gammatone filterbank (GBFBGamm) features. MFCC features and Gammatone Frequency Cepstral Coefficients (GFCC) features are concatenated with GBFBmel and GBFBGamm features, respectively, to improve recognition performance. Experiments are carried out to calculate phoneme recognition accuracy (PRA), on TIMIT database (without ‘sa’ sentences), with additive white, volvo and high frequency noises at various SNR levels from −5 dB to 20 dB. Results show that, with acoustic modeling only, proposed feature set (GBFBGamm+GFCC) performs better (in terms of PRA %), than GBFBmel+MFCC features by an average of 1%, 0.2% and 0.8% for white, volvo and high frequency noises, respectively.

Ankit Nagpal, Hemant A. Patil

Neural Networks Compression for Language Modeling

In this paper, we consider several compression techniques for the language modeling problem based on recurrent neural networks (RNNs). It is known that conventional RNNs, e.g., LSTM-based networks in language modeling, are characterized with either high space complexity or substantial inference time. This problem is especially crucial for mobile applications, in which the constant interaction with the remote server is inappropriate. By using the Penn Treebank (PTB) dataset we compare pruning, quantization, low-rank factorization, tensor train decomposition for LSTM networks in terms of model size and suitability for fast inference.

Artem M. Grachev, Dmitry I. Ignatov, Andrey V. Savchenko

A Metaphor Detection Approach Using Cosine Similarity

Metaphor is a prominent figure of speech. For their prevalence in text and speech, detection and analysis of metaphors are required for complete natural language understanding. This paper describes a novel method for identification of metaphors with word vectors. Our method relies on the semantic distance between the word and the corresponding object or action it is applied to. Our method does not target any particular kind of metaphor but tries to identify metaphors in general. Experimental results on the VU Amsterdam Metaphor Corpus show that our method gives state of the art results as compared to previous reported works.

Malay Pramanick, Pabitra Mitra

Named Entity Identification Based Translation Disambiguation Model

Machine Translation (MT) systems are in growing state for Indian languages, where either a translation or transliteration mechanism is used for a word or phrase. Identifying whether a word needs translation or transliteration mechanism, is still a challenge. Since the Named Entity (NE) terms have a property of similar pronunciation across the languages. So the Named Entity Identification (NEI) will be very useful for disambiguating the word in favor of either translation or transliteration. Term Frequency Model (TFM), i.e., a Cross-Lingual Information Retrieval (CLIR) model is used to evaluate the NEI based translation disambiguation model.

Vijay Kumar Sharma, Namita Mittal

LEXER: LEXicon Based Emotion AnalyzeR

The huge population of India poses a challenge to government, security and law enforcement. What if we could know beforehand the consequences of any events. Social spaces, such as Twitter, Facebook, and Personal blogs, enable people to show their thoughts regarding public issues and topics. Public emotion regarding future and past events, like public gatherings, governmental policies, shows public beliefs and can be deployed to analyze the measure of support, disorder, or disrupted in such situations. Therefore, emotion analysis of Internet content may be beneficial for various organizations, particularly in government, law enforcement, and security sectors. This paper presents an extension to state-of-art-model for lexicon-based sentiment analysis algorithm for analysis of human emotions.

Shikhar Sharma, Piyush Kumar, Krishan Kumar

Lexical TF-IDF: An n-gram Feature Space for Cross-Domain Classification of Sentiment Reviews

Feature extraction and selection is a vital step in sentiment classification using machine learning approach. Existing methods use only TF-IDF rating to represent either unigram or n-gram feature vectors. Some approaches leverage upon the use of existing sentiment dictionaries and use the score of a unigram sentiment word as the feature vector and ignore TF-IDF rating. In this work, we construct n-gram sentiment features by extracting the sentiment words and their intensifiers or negations from a review. Then the score of an n-gram constructed from lexicon of semantic unigram and its intensifier or negation is multiplied to TF-IDF rating to determine the feature score. We experiment with two benchmark data sets for sentiment classification using Support Vector Machine and Maximum Entropy method with cross domain validation by considering training and testing data from two different sets and obtain a substantial improvement in terms of various performance measures compared to existing methods. Cross-domain validation ensures proposed method can be applied for sentiment classification of data sets where example patterns are not available, which typically is the case with commercial data sets.

Atanu Dey, Mamata Jenamani, Jitesh J. Thakkar

A Method for Semantic Relatedness Based Query Focused Text Summarization

In this paper, a semantic relatedness based query focused text summarization technique is introduced to find relevant information from single text document. This semantic relatedness measure extracts the related sentences according to the query. The query focused text summarization approach can work on short query when the query does not contain enough information. Better summaries are produced by this method with increased number of query related sentences included. Experiments and evaluation are done on DUC 2005 and 2006 datasets and results show significant performance.

Nazreena Rahman, Bhogeswar Borah

Bioinformatics and Computational Biology


Efficient and Effective Multiple Protein Sequence Alignment Model Using Dynamic Progressive Approach with Novel Look Back Ahead Scoring System

Multiple protein sequence alignment is the elementary hurdle towards addressing further challenges like prediction of protein structure and its functions, protein sub-cellular localization, drug discovery etc. For the last 3 decades numerous models have been proposed to address this challenge however the models are either computationally complex or not effective with respect to aligned results. In this paper, a computationally efficient and effective model is proposed to solve multiple protein sequence alignment. Our proposed model follows dynamic progressive global alignment approach in which a sequence pair is merged dynamically based on novel scoring system, named Look Back Ahead (LBA). Proposed model results were validated with aligned reference results on benchmark datasets (PREFAB4refm and SABrem), using four metrics: Sum-of-Pairs (SP), Total Gap Penalty (TGP), Column Score (CS) and Total Mutation Count Pair-wise (TMCP). Experimental results demonstrate that the proposed method outperforms benchmark reference results in any three evaluation metrics by 77.46% and 68.65% for PREFAB4refm and SABrem datasets respectively.

Sanjay Bankapur, Nagamma Patil

Classification of Vector-Borne Virus Through Totally Ordered Set of Dinucleotide Interval Patterns

In genome analysis, common approach to all word methods is use of long words to improve precision in biological findings. However, arbitrary increment in word length cannot always be fruitful, rather causing increase in space-time complexity. We observe that instead of mere increase in length, integration of word intervals along with order and frequency of their occurrence have great impact in extracting sequence information with much smaller word length and devise a method, Dinucleotide Interval Patterns (DIP), for entropy retrieval from ordered sets of dinucleotide intervals. Experiments on natural sequences of Flaviviridae virus with length 9 to 12 kbp establish that only word size of 2bp is capable of deriving precise taxonomic classification of the virus. This is in sharp contrast to standard word-based methods requiring a minimum of 6bp word size to achieve nearly 30% Topological Similarity in comparison to 60% score by DIP with only 2bp.

Uddalak Mitra, Balaram Bhattacharyya

A Quasi-Clique Mining Algorithm for Analysis of the Human Protein-Protein Interaction Network

The fundamental of complete interaction system of all living cell is protein- protein interactions (PPI). A protein-protein interactions network (PPIN) can be viewed as an intricate system of proteins. The proteins are linked by interactions between themselves. In this work, we developed a new algorithm to find largest quasi-cliques in human PPIN. We also identify significant clusters of proteins for subsequent pathway analysis. In the current experimental setup, we have mined 49 quasi-cliques from the human PPIN, with the largest quasi-clique having size 29. Each of these protein clusters are analysed with KEGG pathway analysis. The algorithm has been compared with the state-of-the art available in this field. We observe that our method is better than other methods available in this domain and finds larger quasi-cliques with higher size.

Brijesh Kumar Sriwastava, Subhadip Basu, Ujjwal Maulik

Prediction of Thyroid Cancer Genes Using an Ensemble of Post Translational Modification, Semantic and Structural Similarity Based Clustering Results

Thyroid cancer is one of the most prevalent cancers which affects a large population all over the world. To find effective therapeutic measures against thyroid cancer, it is necessary to identify potential genes which lead to this disease. In this paper, we consider an ensemble of structural, semantic and post translational modification (PTM) similarities based clustering of human genes using known thyroid cancer genes as seeds. Our purpose is to identify potential genes which may be responsible for thyroid cancer from the clusters.

Anup Kumar Halder, Pritha Dutta, Mahantapas Kundu, Mita Nasipuri, Subhadip Basu

mRMR+: An Effective Feature Selection Algorithm for Classification

This paper presents an empirical study using three entropy measures such as Shannon’s entropy, Renyi’s entropy, and Tsallis entropy, while calculating mutual information to select top ranked features. We evaluate the selected features using three established classifiers such as naive Bayes, IBK and Random Forest in terms of classification accuracy on five gene expression datasets. We observe that none gives consistent performance in ordering the features based on their rank. To address this issue, we propose a variant of mRMR, using ensemble approach based on our own weight function. The results establish that our method is significantly superior than its other counterparts in terms of feature selection and classification accuracy in most of the datasets.

Hussain A. Chowdhury, Dhruba K. Bhattacharyya

Topological Inquisition into the PPI Networks Associated with Human Diseases Through Graphlet Frequency Distribution

In this article, we have proposed a new framework to compare topological structure of protein-protein interaction (PPI) networks constructed from disease associated proteins. Here, similarity of local topological structure between networks is discovered through the analysis of frequent sub-pattern occurred in them using a novel similarity measure based on graphlet frequency distribution. Graphlets are small connected non-isomorphic induced subgraphs in a network which provides detailed topological statistics of it. We have analyzed pairwise similarity of 22 disease associated PPI networks and compared topological and biological characteristics. It has been observed that the PPI networks associated with disease classes ‘metabolic’ and ‘neurological’ have the highest similarity scores. Higher similarity has also been observed for networks of disease classes ‘bone’ and ‘skeletal’; ‘endocrine’ and ‘multiple’; and ‘gastrointestinal and respiratory’. Topological analysis of the networks also reveals that degree and betweenness centrality of proteins is strongly correlated for the network pairs with high similarity scores. We have also performed gene ontology and pathway based analysis of the proteins involved in the disease associated networks.

Debjani Bhattacharjee, Sk Md Mosaddek Hossain, Raziya Sultana, Sumanta Ray

Machine Learning Approach for Identification of miRNA-mRNA Regulatory Modules in Ovarian Cancer

Ovarian cancer is a fatal gynecologic cancer. Altered expression of biomarkers leads to this deadly cancer. Therefore, understanding the underlying biological mechanisms may help in developing a robust diagnostic as well as a prognostic tool. It has been demonstrated in various studies the pathways associated with ovarian cancer have dysregulated miRNA as well as mRNA expression. Identification of miRNA-mRNA regulatory modules may help in understanding the mechanism of altered ovarian cancer pathways. In this regard, an existing robust mutual information based Maximum-Relevance Maximum-Significance algorithm has been used for identification of miRNA-mRNA regulatory modules in ovarian cancer. A set of miRNA-mRNA modules are identified first than their association with ovarian cancer are studied exhaustively. The effectiveness of the proposed approach is compared with existing methods. The proposed approach is found to generate more robust integrated networks of miRNA-mRNA in ovarian cancer.

Sushmita Paul, Shubham Talbar

Data Mining and Big Data Analytics


K-Means Algorithm to Identify -Most Demanding Products

This paper attempts to identify $$k_1$$-most demanding products using K-Means clustering algorithm. A comparison of proposed algorithm with existing algorithms has been made. The experiments performed on synthetic and real datasets showed the effectiveness of our proposed algorithm.

Ritesh Kumar, Partha Sarathi Bishnu, Vandana Bhattacherjee

Detection of Atypical Elements by Transforming Task to Supervised Form

The problem of identifying atypical elements in a data set presents many difficulties at every stage of analysis. For instance, it is not clear which traits should distinguish such elements, and what more we cannot know in advance of their natural pattern, which even if it did exist, would in its nature be significantly limited. The subject of the presented research is the procedure for transforming the problem of detection of atypical elements from an unsupervised task to a supervised one with equal-sized patterns. This allows a suitable analysis, in particular the use of diverse well-developed methods of classification. Elements are considered atypical by their rare occurrence, which when coupled with the application of nonparametric methodology enables their detection not only on the peripheries of the distribution, but also – in the multimodal case – potentially located inside.

Piotr Kulczycki, Damian Kruszewski

Mining Rare Patterns Using Hyper-Linked Data Structure

Rare pattern mining has emerged as a compelling field of research over the years. Experimental results from literature illustrate that tree-based approaches are most efficient among the rare pattern mining techniques. Despite their significance and implication, tree-based approaches become inefficient while dealing with sparse data and data with short patterns and also suffer from the limitation of memory. In this study, an efficient rare pattern mining technique has been proposed that employs a hyper-linked data structure to overcome the shortcomings of tree data structure based approaches. The hyper-linked data structure enables dynamic adjustment of links during the mining process that reduces the space overhead and performs better with sparse datasets.

Anindita Borah, Bhabesh Nath

Random Binary Search Trees for Approximate Nearest Neighbour Search in Binary Space

Approximate nearest neighbour (ANN) search is one of the most important problems in computer science fields such as data mining or computer vision. In this paper, we focus on ANN for high-dimensional binary vectors and we propose a simple yet powerful search method that uses Random Binary Search Trees (RBST). We apply our method to a dataset of 1.25M binary local feature descriptors obtained from a real-life image-based localisation system provided by Google as a part of Project Tango [7]. An extensive evaluation of our method against the state-of-the-art variations of Locality Sensitive Hashing (LSH), namely Uniform LSH and Multi-probe LSH, shows the superiority of our method in terms of retrieval precision with performance boost of over 20%.

Michał Komorowski, Tomasz Trzciński

A Graphical Model for Football Story Snippet Synthesis from Large Scale Commentary

Sports Commentaries offer sparse and redundant information in a lengthy format. Patterns can be observed in news articles written by sports journalists. In this paper, we propose a graphical method to synthesise story snippet from football match commentaries. Our model effectively extracts important information from lengthy text documents. Experimental study reveals that our model closely matches with human expectations. Both qualitative and quantitative analysis proves the effectiveness of our proposed method.

Anirudh Vyas, Sangram Gaikwad, Chiranjoy Chattopadhyay

An Efficient Approach for Mining Frequent Subgraphs

Graph-based data mining techniques, known as graph mining, are capable of modeling several real-life complex structures such as roads, maps, computer or social networks and chemical structures by graphs. Useful information can be mined by discovering the frequent subgraphs. However, the existing approaches to mine frequent subgraphs have significant drawbacks in terms of efficiency. In this paper, we focus on real-time frequent subgraph mining and propose an efficient customized data structure and technique to reduce subgraph isomorphism checking as well as a supergraph based optimized descendant generation algorithm. Extensive performance analyses prove the efficiency of our algorithm over the existing methods.

Tahira Alam, Sabit Anwar Zahin, Md. Samiullah, Chowdhury Farhan Ahmed

Image Annotation Using Latent Components and Transmedia Association

During the last decade, image collections have increased considerably. Searching these voluminous image databases over web requires either visual features or text available in form of captions or tags. Another issue with visual search is the ambiguity in tagging where same content is expressed in different words by different users. Therefore textual search, which is easier for representing information and reliable to access it, is generally used to explore huge database of images. Another issue with visual search is the ambiguity in tagging i.e. tagging of the same content using different words by different users. To address these issues, we propose a simple and effective image annotation model based on probabilistic latent component analysis (PLCA). In our framework, the probabilistic model serves two fold purpose: firstly to label various textual words against the images while the second being the identification of the visual features for tagging. In this paper, we resolve multi-tag problem against each image. This approach has been rigorously tested on LabelMe dataset and the results are found to be encouraging. This facilitates the multiple relevant tags to given input image.

Anurag Tripathi, Abhinav Gupta, Santanu Chaudhary, Brejesh Lall

Incremental Learning of Non-stationary Temporal Causal Networks for Telecommunication Domain

In today’s competitive telecommunication industry understanding the causes that influence the revenue is of importance. In a continuously evolving business environment, the causes that influence the revenue keeps changing. To understand and quantify the effect of different factors we model it as a non-stationary temporal causal network. To handle the massive volume of data, we propose a novel framework as part of which we define rules to identify the concept drift and propose an incremental algorithm for learning non-stationary temporal causal structure from streaming data. We apply the framework on a telecommunication operator’s data and the framework detects the concept drift related to changes in revenue associated with data usage and the incremental causal network learning algorithm updates the knowledge accordingly.

Ram Mohan, Santanu Chaudhury, Brejesh Lall

Effectiveness of Representation and Length Variation of Shortest Paths in Graph Classification

Kernel methods are widely used for the classification of graphs. Different graph kernels have been proposed based on certain properties of graphs such as shortest paths, random walks, subtree patterns, subgraphs etc. Since shortest paths have been used for different graph kernel designs, we make a detailed analysis on the effectiveness of representation and effect of variation in length of shortest paths in classification of the node labeled graphs. We identified that certain modification in their conventional representation and resultant feature extraction gives better results and/or efficient feature representation rather than using them in their trivial definition mode. The effectiveness of resulting representations and length variations are analyzed with their ability to classify labeled graphs with an appropriate graph kernel design using support vector machines.

Asif Salim, S. S. Shiju, S. Sumitra

An Efficient Encoding Scheme for Dynamic Multidimensional Datasets

Big Data involve composite, undefined volume and unspecified rate of datasets [1]. The index array lags behind the conventional approaches to maintain the data velocity by allowing subjective expansion on the boundary of array dimension. The major concern of large volume applications like “Big Data” is to perceive data volume and high velocity for further operations. In this paper we offer a scalable encoding scheme that replaces data block allocation with segment allocation and reorganizes the n dimensions of array into 2 dimensions only. Hence it requires 2 indices for data encoding and offers low indexing cost.

Mehnuma Tabassum Omar, K. M. Azharul Hasan

Deep Learning


Stacked Features Based CNN for Rotation Invariant Digit Classification

Covolutional neural networks extract deep features from input image. The features are invariant to small distortions in the input, but are sensitive to rotations, which makes them inefficient to classify rotated images. We propose an architecture that requires training with images having digits at one orientation, but is able to classify rotated digits oriented at any angle. Our network is built such that it uses any simple unit of CNN by training it with single orientation images and uses it multiple times in testing to accomplish rotation invariant classification. By using CNNs trained with prominent features of images, we create a stacked architecture which gives adequately satisfactory classification accuracy. We demonstrate the architecture on handwritten digit classification and on the benchmark mnist-rot-12k. The introduced method is capable of roughly identifying the orientation of digit in an image.

Ayushi Jain, Gorthi R. K. Sai Subrahmanyam, Deepak Mishra

Improving the Performance of Deep Learning Based Speech Enhancement System Using Fuzzy Restricted Boltzmann Machine

Supervised speech enhancement based on machine learning is a new paradigm for segregating clean speech from background noise. The current work represents a supervised speech enhancement system based on a robust deep learning method where the pre-training phase of deep belief network (DBN) has been conducted by employing fuzzy restricted Boltzmann machines (FRBM) instead of regular RBM. It has been observed that the performance of FRBM model is superior to that of RBM model particularly when the training data is noisy. Our experimental results on various noise scenarios have shown that the proposed approach outperforms the conventional DNN-based speech enhancement methods which use regular RBM for unsupervised pre-training.

Suman Samui, Indrajit Chakrabarti, Soumya K. Ghosh

A Study on Deep Convolutional Neural Network Based Approaches for Person Re-identification

Person re-identification is a process to identify the same person again viewed by disjoint field of view of cameras. It is a challenging problem due to visual ambiguity in a person’s appearance across different camera views. These difficulties are often compounded by low resolution surveillance images, occlusion, background clutter and varying lighting conditions. In recent years, person re-identification community obtained large size of annotated datasets and deep learning architecture based approaches have obtained significant improvement in the accuracy over the years as compared to hand-crafted approaches. In this survey paper, we have classified deep learning based approaches into two categories, i.e., image-based and video-based person re-identification. We have also presented the currently ongoing under developing works, issues and future directions for person re-identification.

Harendra Chahar, Neeta Nain

Two-Stream Convolutional Network with Multi-level Feature Fusion for Categorization of Human Action from Videos

This paper presents the results of the exploration of a two-stream Convolutional Neural Network (2S-CNN) architecture, with a novel feature fusion technique at multiple levels, to categorize events in videos. The two streams are a combination of dense optical flow features with: (a) RGB frames; and (b) salient object regions detected using a fast space-time saliency method. The main contribution is in the design of a classifier moderated method to fuse information from the two streams at multiple stages of the network, which enables capturing the most discriminative and complimentary features for localizing the spatio-temporal attention for the action being performed. This mutual auto-exchange of information in local and global contexts, produces an optimal combination of appearance and dynamism, for enhanced discrimination, thus producing the best performance of categorization. The network is trained end-to-end and subsequently evaluated on two challenging human action recognition benchmark datasets viz. UCF-101 and HMDB-51, where, the proposed 2S-CNN method outperforms the current state of the art ConvNets by a significant margin.

Prateep Bhattacharjee, Sukhendu Das

Learning Deep Representation for Place Recognition in SLAM

Closing loops for pose graph optimization, by recognising previously mapped places is an essential step for performing Simultaneous Localisation and Mapping (SLAM). The traditional approaches for recognising known places follow a feature-based bag-of-words model while discarding certain geometric and structural information. In order to improve real-time query performance, we take a slightly different approach by learning low-dimensional global representation vectors using a deconvolution net. Proposed 12-layer deconvolution net encodes and decodes an image to itself and in the process learns a representation of the image in a reduced feature space, it is then used for comparing one image with another to identify loop closures. Sequences from KITTI Visual Odometry dataset are used for evaluation and performance is compared with state-of-the-art techniques. Perceptual aliasing common in most place recognition approaches, is considerably less in ours.

Aritra Mukherjee, Satyaki Chakraborty, Sanjoy Kumar Saha

Performance of Deep Learning Algorithms vs. Shallow Models, in Extreme Conditions - Some Empirical Studies

Deep convolutional neural networks (DCNN) successfully exhibit exceptionally good classification performance, despite their massive size. The effect of a large value of noise term, as irreducible error in Expected Prediction Error (EPE) is first discussed. Through extensive systematic experiments, we show how in extreme conditions the traditional approaches fare at par with large neural networks, which generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks trained for classification barely fit a random labeling of the training data as an extreme condition to learn. This phenomenon is quantitatively unaffected even if we train the CNNs with completely inseparable data. This can be due to large degree of corruption of the entire data by random noise or random labels associated with data due to observation error. We corroborate these experimental findings by showing that depth six CNN (VGG-6) fails to overcome large noise in image signals.

Samik Banerjee, Prateep Bhattacharjee, Sukhendu Das

Deep Learning in the Domain of Multi-Document Text Summarization

Text summarization is the process of generating a shorter version of the input text which captures its most important information. This paper addresses and tries to solve the problem of extractive text summarization which works by selecting a subset of phrases or sentences from the original document(s) to form a summary. Selections of such sentences are done based on certain criteria which formulates a feature set. Multilayer ELM (Extreme Learning Machine) which is based on the underlying deep network architecture is trained over this feature set to classify the sentences as important or unimportant. The used approach is unique and highlights the effectiveness of Multilayer ELM and its stability for usage in the domain of text summarization. Effectiveness of Multilayer ELM is justified by the experimental results on DUC and TAC datasets wherein it significantly outperforms the other well known classifiers.

Rajendra Kumar Roul, Jajati Keshari Sahoo, Rohan Goel

Space-Time Super-Resolution Using Deep Learning Based Framework

This paper introduces a novel end-to-end deep learning framework to learn space-time super-resolution (SR) process. We propose a coupled deep convolutional auto-encoder (CDCA) which learns the non-linear mapping between convolutional features of up-sampled low-resolution (LR) video sequence patches and convolutional features of high-resolution (HR) video sequence patches. The upsampling in LR video refers to tri-cubic interpolation both in space and time. We also propose a H.264/AVC compatible video space-time SR framework by using learned CDCA, which enables to super-resolve compressed LR video with less computational complexity. The experimental results prove that the proposed H.264/AVC compatible framework performs better than the state-of-art techniques on space-time SR in terms of quality and time complexity.

Manoj Sharma, Santanu Chaudhury, Brejesh Lall

A Spatio-temporal Feature Learning Approach for Dynamic Scene Recognition

The dynamic scene in a video comprises of a specific spatio-temporal pattern. A mask can learn the features efficiently compared to a sliding kernel approach as in a convolutional neural network that shrinks many parameters with respect to non-sliding or fully connected neural networks. In this paper, 3DPyraNet-F a discriminative approach of spatio-temporal feature learning is proposed for dynamic scene recognition. It performs transfer learning by considering the highest layer of the learned network structure and combines it with a linear-SVM classifier, in a way that enhances dynamic scenes in videos. Encouraging results are achieved despite the lower computational cost, fewer parameters, and camera-induced motion. It outperforms the state-of-the-art for MaryLand-in-the-wild and shows a comparable result for YUPPEN dataset.

Ihsan Ullah, Alfredo Petrosino

Spatial Data Science and Engineering


Spatial Distribution Based Provisional Disease Diagnosis in Remote Healthcare

Patients in rural India cannot able to enquire about their health using appropriate disease related keywords, submitted as query. Lack of domain knowledge prevents the patients to refine the query using well-known feedback mechanism. Moreover, due to scarcity of doctors in rural India, the health assistants who run the health centers do not have enough knowledge to treat the patients based on the imprecise query. In the paper, we propose an autonomous provisional disease diagnosis system by classifying the query, which has been expanded using semantic of the domain knowledge. First, we apply spatial distribution based nearest neighbor spacing distribution (NNSD) on the disease related medical document corpus (MDC) to find the relevant terms, mostly symptoms with respect to different diseases. We frame a symptom vocabulary (SV) with the unique terms present in different diseases, known apriori. Each query is expanded as bag of symptoms (BoS) using 5-gram collocation model and log likelihood ratio (LLR) to measure the association between the query and the terms in the MDC. The terms in the BoS may not exactly match with the symptoms in the SV but have contextual similarity. We propose a novel approach to know which symptoms in the SV are nearest in context to the corresponding terms in the BoS. The feature vector is obtained by encoding the SV with respect to (w.r.t.) each BoS, which is sparse in nature. We apply sparse representation based classifier (SRC) to classify the query into a particular disease. Proposed nearest neighbor spacing distribution based sparse representation classifier (NNSD-SRC) shows promising performance considering MDC dataset and we validate the results with the doctors showing negligible error.

Indrani Bhattacharya, Jaya Sil

Extraction of Phenotypic Traits for Drought Stress Study Using Hyperspectral Images

High-throughput identification of digital traits encapsulating the changes in plant’s internal structure under drought stress, based on hyperspectral imaging (HSI) is a challenging task. This is due to the high spectral and spatial resolution of HSI data and lack of labelled data. Therefore, this work proposes a novel framework for phenotypic discovery based on autoencoders, which is trained using Simple Linear Iterative Clustering (SLIC) superpixels. The distinctive archetypes from the learnt digital traits are selected using simplex volume maximisation (SiVM). Their accumulation maps are employed to reveal differential drought responses of wheat cultivars based on t-distributed stochastic neighbour embedding (t-SNE) and the separability is quantified using cluster silhouette index. Unlike prior methods using raw pixels or feature vectors computed by fusing predefined indices as phenotypic traits, our proposed framework shows potential by separating the plant responses into three classes with a finer granularity. This capability shows the potential of our framework for the discovery of data-driven phenotypes to quantify drought stress responses.

Swati Bhugra, Nitish Agarwal, Shubham Yadav, Soham Banerjee, Santanu Chaudhury, Brejesh Lall

Spatio-Temporal Prediction of Meteorological Time Series Data: An Approach Based on Spatial Bayesian Network (SpaBN)

This paper proposes a space-time model for prediction of meteorological time series data. The proposed prediction model is based on a spatially extended Bayesian network (SpaBN), which helps to efficiently model the complex spatio-temporal dependency among large number of spatially distributed variables. Validation has been made with respect to prediction of daily temperature, humidity, and precipitation rate around the spatial region of Kolkata, India. Comparative study with the benchmark and state-of-the-art prediction techniques demonstrates the superiority of the proposed spatio-temporal prediction model.

Monidipa Das, Soumya K. Ghosh

Adaptive TerraSAR-X Image Registration (AIR) Using Spatial Fisher Kernel Framework

TerraSAR-X image registration is a forerunner for remote sensing application like target detection, which need accurate spatial transformation between the real time sensed image and the reference off-line image. It is observed that the outcome of registration of two TerraSAR images even when acquired from the same sensor is unpredictable with all the parameters of the feature extraction, matching and transformation algorithm are fixed. Hence we have approached the problem by trying to predict if the given TerraSAR-X images that can be registered without actually registering them. The proposed adaptive image registration (AIR) approach incorporates a classifier into the standard pipeline of feature based image registration. The attributes for the classifier model are derived from fusing the spatial parameters of the feature detector with the descriptor vector in Fisher kernel framework. We have demonstrated that the proposed AIR approach saves the time of feature matching and transformation estimation for SAR images which cannot be registered.

B. Sirisha, Chandra Sekhar Paidimarry, A. S. Chandrasekhara Sastry, B. Sandhya

Applications of Pattern Recognition and Machine Intelligence


Hierarchical Ranking of Cricket Teams Incorporating Player Composition

We analyze the performance of international ODI cricket teams through a hierarchical ranking scheme. Players are represented as the nodes of a graph with batsmen as authorities, and bowlers as hubs. Low level player ratings determined using weighted HITS algorithm, are fed into a higher level Elo rating system to rank teams. Strike rate, economy rate, number of boundaries, etc. determine the edge weights for player graph. Match characteristics like margin of victory, winning style and player rankings determine the K-factor in Elo ratings. We show that player composition along with match characteristics is also an important aspect of team ranking, which has not explored previously. We report significant improvements in predicting match outcomes against other ranking schemes.

Abhinav Agarwalla, Madhav Mantri, Vishal Singh

Smart Water Management: An Ontology-Driven Context-Aware IoT Application

This paper presents a context-aware ontology driven approach to water resource management in smart cities for providing adequate water supply to the citizens. The appropriate management of water requires exploitation of efficient action plan to review the prevailing causes of water shortage in a geospatial environment. This involves analysis of historical and real-time water specific information captured through heterogeneous sensors. Since the gathered contextual data is available in different formats so interoperability across diverse data requires converting it into a common perceivable RDF format. As the perceptual model of the Smart Water domain comprises of observable media properties of the concepts so to achieve context-aware data fusion we have employed multimedia ontology based semantic mapping. The multimedia ontology encoded in Multimedia Web Ontology Language (MOWL) forms the core of our IoT based smart water application. It supports Dynamic Bayesian Network based probabilistic reasoning to predict the changing situations in a real-time irregular environment patterns. Ultimately, the paper presents a context-aware approach to deal with uncertainties in water resource in the face of environment variability and offer timely conveyance to water authorities by circulating warnings via text-messages or emails. To illustrate the usability of the presented approach we have utilized the online available sample water data-sets.

Deepti Goel, Santanu Chaudhury, Hiranmay Ghosh

Structured Prediction of Music Mood with Twin Gaussian Processes

Music mood is one of the most frequently used descriptors when people search for music, but due to its subjective nature, it is difficult to accurately estimate mood. In this work, we propose a structured prediction framework to model the valence and arousal dimensions of mood jointly without requiring multiple regressors. A confidence-interval based estimated consensus from crowdsourced annotations is first learned along with reliabilities of various annotators to serve as the ground truth and is shown to perform better than using the average annotation values. A variational Bayesian approach is used to learn the Gaussian mixture model representation for acoustic features. Using an efficient implementation of Twin Gaussian process for structured regression, the proposed work achieves an improvement in $$R^2$$ of $$9.3\%$$ for arousal and $$18.2\%$$ for valence relative to state-of-the-art techniques.

Santosh Chapaneri, Deepak Jayaswal

Differentiating Pen Inks in Handwritten Bank Cheques Using Multi-layer Perceptron

In handwritten Bank cheques, addition of new words using similar color pen can cause huge loss. Hence, it is important to differentiate pen ink used in these types of documents. In this work, we propose a non-destructive pen ink differentiation method using statistical features of ink and multi-layer perceptron (MLP) classifier. Large sample of blue and black pen ink is acquired from 112 Bank cheque leaves, written by nine different volunteers using fourteen different blue and black pens. Handwritten words are extracted from scanned cheque images manually. Pen ink pixels are identified using K-means binarization. Fifteen statistical features from each color handwritten words are extracted and are used to formulate the problem as a binary classification problem. MLP classifier is used to train the model for differentiating pen ink in handwritten Bank cheques. The proposed method performs efficiently on both known and unknown pen samples with an average accuracy of 94.6% and 93.5% respectively. We have compared the proposed method with other existing method to show its efficiency.

Prabhat Dansena, Soumen Bag, Rajarshi Pal

Analysis of Causal Interactions and Predictive Modelling of Financial Markets Using Econometric Methods, Maximal Overlap Discrete Wavelet Transformation and Machine Learning: A Study in Asian Context

Proper understanding of dynamics of equity markets in long run and short run is extremely critical for investors, speculators and arbitrageurs. It is essential to delve into causal interrelationships among different financial markets in order to assess the impact of ongoing inter country trades and forecast future movements. In this paper, initially effort has been made to comprehend the nature of temporal movements and interactions among four Asian stock indices namely, Bombay Stock Exchange (BSE), Taiwan Stock Exchange (TWSE), Jakarta Stock Exchange (JSX) and Korea Composite Stock Price Exchange (KOSPI) through conventional Econometric and Statistical methods. Subsequently a granular forecasting model comprising Maximal Overlap Discrete Wavelet Transformation (MODWT) and Support Vector Regression (SVR) has been utilized to predict the future prices of the respective indices in univariate framework.

Indranil Ghosh, Manas K. Sanyal, R. K. Jana

Opinion Mining Using Support Vector Machine with Web Based Diverse Data

Opinions of other people always carry a very important source of information that has a major impact on the entire decision making process. With the emerging availability and popularity of online reviews, opinions, feedback and suggestions, people now actively employ these views for better decision making. Opinion mining is a natural language processing and information extraction task that aims to examine people’s opinions, sentiments, emotions and attitudes about a product. This paper presents an opinion classifier based on Support Vector Machines (SVM) algorithm that can be used to analyze data for classifying opinions. We design a classifier to determine opinion from Bangla text data. We evaluate the performance and analyze comparative results.

Mir Shahriar Sabuj, Zakia Afrin, K. M. Azharul Hasan

Harnessing Online News for Sarcasm Detection in Hindi Tweets

Detection of sarcasm in Indian languages is one of the most challenging tasks of Natural Language Processing (NLP) because Indian languages are ambiguous in nature and rich in morphology. Though Hindi is the fourth popular language in the world, sarcasm detection in it remains unexplored. One of the reasons is the lack of annotated resources. In the absence of sufficient resources, processing the NLP tasks such as POS tagging, sentiment analysis, text mining, sarcasm detection, etc., becomes tough for researchers. Here, we proposed a framework for sarcasm detection in Hindi tweets using online news. In this article, the online news is considered as the context of a given tweet during the detection of sarcasm. The proposed framework attains an accuracy of 79.4%.

Santosh Kumar Bharti, Korra Sathya Babu, Sanjay Kumar Jena

Concept-Based Approach for Research Paper Recommendation

Research Paper Recommender Systems are developed to deal with the increasing amount of published information over web and provide recommendations for research articles based on the user preferences. Researchers invest their huge time in literature search to carry out the research work. To provide ease in building literature and finding useful research articles in less time, a novel concept-based recommendation approach is proposed that represents research article in terms of its concept or semantics, used to recommend conceptually related papers (based on the higher relevance of concepts) to researchers. This paper provides a brief overview of popular algorithms and previous systems developed to solve the problem of information explosion. Then, discuss the proposed approach with implementation details and a comparative analysis is presented between the proposed approach and baseline method.

Ritu Sharma, Dinesh Gopalani, Yogesh Meena


Weitere Informationen

Premium Partner