Skip to main content

2012 | Buch

Perception and Machine Intelligence

First Indo-Japan Conference, PerMIn 2012, Kolkata, India, January 12-13, 2012. Proceedings

herausgegeben von: Malay K. Kundu, Sushmita Mitra, Debasis Mazumdar, Sankar K. Pal

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the proceedings of the First Indo-Japanese conference on Perception and Machine Intelligence, PerMIn 2012, held in Kolkata, India, in January 2012. The 41 papers, presented together with 1 keynote paper and 3 plenary papers, were carefully reviewed and selected for inclusion in the book. The papers are organized in topical sections named perception; human-computer interaction; e-nose and e-tongue; machine intelligence and application; image and video processing; and speech and signal processing.

Inhaltsverzeichnis

Frontmatter

Keynote Paper

The Rules of Guidance in Visual Search

It is impossible to identify all objects in the visual world at the same time. Accordingly, we must direct attention to specific objects in order to fully recognize them. The deployment of attention is far from random. Attention is guided toward likely targets by a limited set of stimulus attributes such as color and size (“classic guidance”). Attention is also guided by a number of scene-based properties. Thus, if we were looking for sheep, we would expect them on surfaces that could support sheep, not in mid-air. We use information about the 3D layout of a space to determine which objects could plausibly be sheep-sized in that space. This paper briefly reviews the diverse set of guiding properties and the rules that govern their use.

Jeremy M. Wolfe

Plenary Papers

Smart Sensing System for Human Emotion and Behaviour Recognition

In this study, we reported a smart sensing system for detecting Human Emotion and Behaviour Recognition. The inhabitant emotions are sensed based on information from the physiological sensors as happiness, sadness, stressed and neutral. Also, we defined two new wellness functions to determine the regularity of house-hold activities and foresee changes in the domestic activity behaviour. Developed intelligent program was tested at different elderly houses living alone and the results are encouraging. The developed system is less cost, reliable and robust in realizing functional condition of the inhabitant both emotionally and physically.

N. K. Suryadevara, T. Quazi, Subhas C. Mukhopadhyay
Fuzzy Rule-Based Approaches to Dimensionality Reduction

In this talk we deal with the problem of dimensionality reduction in a fuzzy rule-based framework. We consider dimensionality reduction through feature extraction as well as through feature selection. For the former approach, we use Sammon’s stress function as a criterion for structure-preserving dimensionality reduction. For feature selection we propose an integrated framework, which embeds the feature selection task into the classifier design task. This method uses a novel concept of feature modulating gate and it can exploit the subtle nonlinear interaction between the tool (here a fuzzy rule based system), the features and the task at hand. This method is then extended to Takagi-Sugeno (TS) model for function approximation/prediction problem. The effectiveness of these methods is demonstrated using several data sets.

Nikhil Ranjan Pal
Face Recognition Technology and Its Real-World Application

Facial image processing is a promising tool for consumer electronics and social infrastructure systems. In recent years, digital processing of a facial image can easily be performed with the spread of digital image apparatus, such as a digital camera and a mobile phone, by improvement of throughput of a computer. The performance of the face detection that is basic of the facial image processing improves drastically, and the computational cost has also decreased. It is the reason why that has expanded the application to various appliances. This paper introduces our group’s facial image processing algorithm as an example and trends of various applications using facial image processing in consumer electronics field and social infrastructure systems.

Osamu Yamaguchi

Contributory Papers

Perception

Contextual Effects in the Visual Cortex Area 1 (V1) and Camouflage Perception

The cells in the visual cortex area 1 (V1) and area 2 (V2) show context dependent modulation in their responses. Suppressive as well as modulatory effects from stimuli well outside the classical receptive field are observed. This is attributed to the long-range horizontal connections in the visual cortex. In our work we have carried out contextual effect experiments with our corticocortical connections model. Our simulation results confirm the suppressive as well as the modulatory effects. We are proposing that the surround effect phenomenon can be used for camouflage perception.

Atanendu Sekhar Mandal
Affective Information Processing and Representations

Affective information processing is analysed considering the emotion circuits within the brain substrates of emotionality. Based on Gärdenfors’ [8] conceptual spaces model we try to examine an emotion episode from its elicitation to the differentiation into affective processes. An

affective-conceptual spaces model

is developed taking in consideration Panksepp’s [20] nested BrainMind hierarchies.

Dana Sugu, Amita Chatterjee
Face Image Retrieval Based on Probe Sketch Using SIFT Feature Descriptors

This paper presents a feature-based method for matching facial sketch images to face photographs. Earlier approaches calculated descriptors over the whole image and used some transformation and matched them by some classifiers. We present an idea, where descriptors are calculated at selected discrete points (eyes, nose, ears…). This allows us to compare only prominent features. We use SIFT (Scale Invariant Feature Transform) to extract feature descriptors at the annotated points in the sketches and experiment with various methods to retrieve photos. Experimental results demonstrate appreciable matching performances using the presented feature-based methods at a low computational cost.

Rakesh S., Kailash Atal, Ashish Arora, Pulak Purkait, Bhabatosh Chanda
Effect of Perceptual Anchorage Points on Recognition of Bangla Characters

Character recognition (Printed and Handwritten) system has become an extremely useful tool in Human Computer Interaction. Handwriting is a complex perceptual motor task generating linguistic information. Characters reflect shape distinction needed to perceive different phonetic information of words. We have explored ‘perceptual processes’ of character recognition for developing a cognitive model. Especially, we tried to extract the Perceptual Anchorage Points in the character. An experiment was performed to identify Perceptual anchorage points in a character both for handwritten as well as printed character of different Bangla script fonts including fonts used in Bangladesh. A set of important points and shapes has been found in the experiment. It is noted that deletion of these regions from characters greatly reduces the human cognition of characters.

Asok Bandyopadhyay, Bhaswati Mukherjee, Bidyut Baran Chaudhuri
Scaling Properties of Mach Bands and Perceptual Models

Mach bands are the pronounced light and dark bands visible where a luminance plateau meets a ramp as in a penumbra. A great deal of effort has been devoted to study these in order to understand the underlying neural circuitry. A number of theoretical models, linear and non-linear, have consequently been proposed starting from the seminal studies of Ernst Mach himself. In this work we demonstrate why no linear model of visual perception can explain the Mach band illusion although many such attempts have been made starting from that of Mach to some recent ones. From the same approach, we also systematically demonstrate why the Mach bands are weak or inexistent at step changes of intensity. A new aspect, viz. the scaling properties of the widths of Mach band has been studied to provide a unified approach to solve both these problems in vision.

Ashish Bakshi, Kuntal Ghosh
Psycho-Visual Evaluation of Contrast Enhancement Algorithms by Adaptive Neuro-Fuzzy Inference System

Image information maximization is an alternative method of contrast enhancement of images. There are plenty of algorithms for contrast enhancement of poor illumination images. In present paper we have proposed a novel method of psycho-visual evaluation of contrast enhancement algorithms. Adaptive Neuro-Fuzzy Inference System (ANFIS) is used here for classification of well known contrast enhancement algorithms. The metric/feature of contrast enhancement is modeled including image statistics both in spatial and frequency domain. The perception inspired model is then used for automatic classification of algorithms depending on the strength of contrast enhancement.

Apurba Das, Suparna Parua

Human Computer Interaction

Theory of Mind in Man-Machine Interactions

We report questionnaire data investigating people’s reactions to the directions provided by a hypothetical QA system, in order to understand how they interpret a given query. The discussion is couched within theory of mind, which provides a metaphor for research to improve man-machine communication by requiring a representation for the user’s representation of the world to be introduced.

Fumito Hamada, Edson T. Miyamoto
Face Recognition System Invariant to Light-Camera Setup

This paper proposes an efficient face recognition system where images are acquired under different camera positions and lighting conditions. Active Appearance model is used to obtain shape and appearance information from faces in the form of feature vectors. Bilinear model then works upon these vectors to obtain style specific basis matrices in the training phase. In the test phase the bilinear model uses elastic net regularization to determine stable content vectors using style specific basis matrix. Euclidean distance between content vectors of two images is used to take decision on matching. The proposed system has been tested on 1255 images of 108 subjects. Experiment results reveal that the system achieves an accuracy of 95% when five top best matches are considered in a closed set identification setup.

Naman Dauthal, Surya Prakash, Phalguni Gupta
Representing Feature Quantization Approach Using Spatial-Temporal Relation for Action Recognition

In this paper we propose an efficient & intuitive algorithm for the design of feature vector quantization using space-time interest point in video surveillance. The performance of activity recognition is generally depend upon the quantity of significant features but with proper feature quantization one can delivered the same performance with less number of features. The basic characteristics of algorithm are discussed and demonstrated by experiment. It is scalable in nature and work efficiently under varying conditions. In an experiment section, we show that our novel feature quantization approach takes less number of features in compared to standard quantization, while delivering the same performance.

Sarvesh Vishwakarma, Anupam Agrawal
Human Computer Interaction with Hand Gestures in Virtual Environment

With the ever increasing and flourishing phenomena of growth in virtual environments based upon computer systems; demands for new kind of interaction devices have emerged. The present used devices like keyboard, mouse and pen are cumbrousome within these promising applications. The developments of user interfaces influence the changes in the Human-Computer Interaction (HCI). This paper focuses to design an application using computer vision and gesture recognition techniques which develop a relatively economic input device of interacting with virtual games using hand gestures. The architecture of the gesture recognition system comprises of different image processing techniques like camshift, and Lucas Kanade technique for tracking of hands and its gestures. Haar like features locates the position of the hand and recognizes the gesture being made by such located hand image. The modeling of gestures has been done for recognition through matching the feature of defects present in the hand with the assigned gestures. The virtual game is created using Open GL library. The application uses seven gestures for manipulating the virtual game. This main connotation of this hand gesture recognition system is providing a substitute for input devices while making interaction during the virtual games. Hence instead of making effort to develop a new vocabulary of hand gesture we have matched control instruction set of mouse to subset of most discriminating hand gestures, so that we get a robust interface.

Siddharth S. Rautaray, Anand Kumar, Anupam Agrawal
Interval Type-2 Fuzzy Model for Emotion Recognition from Facial Expression

The paper proposes a new approach to emotion recognition from facial expression of a subject by constructing an Interval type-2 fuzzy model. An interval type-2 fuzzy face-space is first constructed with the background knowledge of facial features of different subjects for different emotions. The fuzzy face-space thus created comprises primary membership distributions for m facial features, obtained from n subjects, each having

$\textit{l}$

-instances of facial expression for a given emotion. Second, the emotion of an unknown facial expression is determined based on the consensus of the measured facial features with the fuzzy face-space.The classification accuracy of the proposed method is as high as 88.66 %.

Amit Konar, Aruna Chakraborty, Anisha Halder, Rajshree Mandal, Ramadoss Janarthanan
Augmenting Language Tools for Effective Communication

Wide spread use of the Internet and tools available on it, help users to communicate well. Good vocabulary is a prerequisite for effective communication, whether written or spoken. Sometimes, while composing text, the user gets stuck for an appropriate word, even though the concept to be communicated is clear in the mind. In this paper, we propose a language tool called ‘WordCoin’, to suggest the appropriate word options based on the concepts expressed for the intended word. We study the utility of existing dictionaries and networks for such an application. Importantly, we also study how far user’s interpretation of a word matches with the meanings quoted in the dictionary. Our study shows, there is a lot of difference between the two. We need to augment the tools to supply the appropriate word based on these observations.

Nandini Bondale, Gaurav Gupta
Design and Evaluation of a Cognition Aware File Browser for Users in Rural India

Among GUI Users throughout the world, the File Browsers are important programs for daily computer use. Making the interface ”cognition-aware” results in significant reduction in cognitive load specially for the less technology-aware users while learning and using the interface for file browsing tasks. In this paper we propose SahajBrowser, a special type of cognition-aware, Navigational File Browser for digital technology inexperienced, neo-computer literates in rural India. The novel design of its treeview and abilities to open multiple folders at once, provides more assistance for human cognition and for certain file browsing tasks, performs better than the other widely used Spatial and Navigational file browsers. With a KLM-GOMS model analysis, we will compare 4 browsers of different type and conclude that the design of SahajBrowser is the most time efficient of them all.

Debmalya Sinha, Anupam Basu
Shiksha: A Novel Architecture for Tele-teaching Using Handwriting as a Perceptually Significant Temporal Media

In this paper we present Shiksha − an integrated architecture which incorporates handwritten illustrations captured and rendered in a temporal fashion synchronized with audio and video data. The architecture of Shiksha permits non-linear growth in the form of multiple hierarchically organized play streams. We have developed an asynchronous multimedia conferencing application in which the users are provided with an authoring and rendering environment to record and view lectures. It also allows the users to ask and reply to doubts in the previously stored lectures making it a fully interactive but asynchronous system.

Amit Singhal, Santanu Chaudhury, Sumantra Dutta Roy
A New Motion Based Fully Automatic Facial Expression Recognition System

In this paper, a new motion based person-independent fully Automatic Facial Expression Recognition system is introduced. The system uses gradient based optical flow for muscle movement estimation from video. Decision tree generated rule base is used for recognition purpose. The performance of the system is validated by human psycho-visual judgment.

Chandrani Saha, Washef Ahmed, Soma Mitra

E-Nose and E-Tongue

Profiling Scotch Malt Whisky Spirits from Different Distilleries Using an Electronic Nose and an Expert Sensory Panel

Spirits from different Scotch malt whisky distilleries exhibit distinct sensory characteristics. To ensure the future diversity of this spirit category and sustainability of individual distilleries, it is vital that such differences can be maintained. In this research the characteristics of spirits from six distilleries were profiled using an electronic nose (e-nose) and by an expert sensory panel. The instrumental method used a flash GC-based e-nose, the HERACLES. The e-nose produced compositional data that could clearly discriminate between the spirits according to distillery of origin. This discrimination was based on levels of a range of volatile compounds that could potentially influence flavor. The sensory panel provided quantitative data on the levels of sixteen aroma attributes in the spirits. This showed clear differences in flavor among the distilleries. Although the separation obtained using the two approaches was not directly comparable, correlations were observed between peaks in the e-nose chromatograms and certain aroma attributes, indicating that the two techniques are complementary.

Koichi Yoshida, Emiko Ishikawa, Maltesh Joshi, Hervé Lechat, Fatma Ayouni, Marion Bonnefille
Quality Control and Rancidity Tendency of Nut Mix Using an Electronic Nose

Nuts are rich in polyunsaturated fats, which are particularly sensitive to lipid oxidation. A Flash GC based electronic nose was used to identify the causes of rancidity in nut mixes and to monitor global sensory quality. Five appetizers of different sensory qualities, all composed of the same nut types, were considered. It was shown that peanuts were the most critical ingredients in the development of rancidity off-flavors. Pecans and cashew nuts also presented a relatively high concentration of off-odors, but their relatively low proportion in the final mix made them less critical towards rancidity. As for Brazil nuts, almonds and hazelnuts, containing lower amounts of volatile compounds, they proved to have a low impact on the overall mix aroma.

Koichi Yoshida, Emiko Ishikawa, Maltesh Joshi, Hervé Lechat, Fatma Ayouni, Marion Bonnefille
Estimation of Aroma Determining Compounds of Kangra Valley Tea by Electronic Nose System

Aroma is a major factor for quality evaluation of finished tea. Professional tea tasters distinguish the aroma of finished tea and decide the valuation of tea. Since tea tasters’ being humans their evaluations could be subjective. Instruments like spectrophotometer, high-performance liquid chromatography (HPLC) and gas liquid chromatography(GLC) measure the chemical/volatile compounds polyphenols, catechins and flavour profile of tea[b]. But these instruments are costly, time consuming, take long time to prepare the sample and also needs expert manpower to operate. Besides Electronic Nose (E-Nose) is also used by different tea factories to supplement the work of a tea taster for predicting the tea taster like score of finished tea. E-Nose can give tea taster like score within one and half minute and easy to operate. This paper describes the estimation of tea compounds responsible for tea aroma by E-Nose.

Devdulal Ghosh, Ashu Gulati, Robin Joshi, Nabarun Bhattacharyya, Rajib Bandyopadhyay
Taste Attributes Profiling in Carrot Juice Using an Electronic Tongue

The taste of five brands of carrot juice was analyzed both by a sensory panel and an electronic tongue. The panelists found significant differences between the carrot juice samples in some appearance and odor attributes and in the relevant taste attributes such as sour taste, sweet taste and taste persistence. Principal component analysis plot calculated from the electronic tongue results showed a clear separation between the sample groups, with a ranking on sourness similar to the one from the panel.

Zoltán Kovács, Dániel Szöllősi, András Fekete, Koichi Yoshida, Emiko Ishikawa, Sandrine Isz, Marion Bonnefille
Improvement of Quality Perception for Black CTC Tea by Means of an Electronic Tongue

Electronic tongue has already been used to estimate the quality of tea in terms of tea-tasters scores that are subjective and limited by human sensory organs. It is known that, chemical constituents play significant role to determine the quality of tea. Thus the perception of quality as understood from an electronic tongue can be improved if it can be trained to estimate the amount of major chemicals responsible for quality of tea. An alternate method of rapid quality evaluation of tea is proposed using a voltammetric electronic tongue to determine two major taste descriptors in black tea. The correlation model is developed between the electronic tongue signatures and theaflavins/thearubigins contents of tea using multi-layer perceptrons. The perception of taste is further improved using scaled conjugate gradient as a weight optimization algorithm.

Arunangshu Ghosh, Bipan Tudu, Pradip Tamuly, Nabarun Bhattacharyya, Rajib Bandyopadhyay

Machine-Intelligence and Application

A New Cross-Validation Technique to Evaluate Quality of Recommender Systems

The topic of recommender systems is rapidly gaining interest in the user-behaviour modeling research domain. Over the years, various recommender algorithms based on different mathematical models have been introduced in the literature. Researchers interested in proposing a new recommender model or modifying an existing algorithm should take into account a variety of key performance indicators, such as execution time, recall and precision. Till date and to the best of our knowledge, no general cross-validation scheme to evaluate the performance of recommender algorithms has been developed. To fill this gap we propose an extension of conventional cross-validation. Besides splitting the initial data into training and test subsets, we also split the attribute description of the dataset into a hidden and visible part. We then discuss how such a splitting scheme can be applied in practice. Empirical validation is performed on traditional user-based and item-based recommender algorithms which were applied to the MovieLens dataset.

Dmitry I. Ignatov, Jonas Poelmans, Guido Dedene, Stijn Viaene
Rough-Fuzzy C-Means for Clustering Microarray Gene Expression Data

Clustering technique is one of the useful tools to elucidate similar patterns across large number of transcripts and to identify likely co-regulated genes. It attempts to partition the genes into groups exhibiting similar patterns of variation in expression level. An application of rough-fuzzy c-means (RFCM) algorithm is presented in this paper to discover co-expressed gene clusters. Selection of initial prototypes of different clusters is one of the major issues of the RFCM based microarray data clustering. The pearson correlation based initialization method is used to address this limitation. It enables the RFCM algorithm to discover co-expressed gene clusters. The effectiveness of the RFCM algorithm and the initialization method, along with a comparison with other related methods, is demonstrated on five yeast gene expression data sets using standard cluster validity indices and gene ontology based analysis.

Pradipta Maji, Sushmita Paul
Topographic Map Object Classification Using Real-Value Grammar Classifier System

Learning Classifier Systems (LCS) became a large branch of machine learning applications that received a lot of attention recently. Our model of LCS - rGCS or real-value Grammar Classifier System - uses grammar inference to classify real-value vectors which may describe range variety of problems. In this paper we utilize the rGCS core in an object recognition task. Our application seeks for certain graphic symbols on a topographic map scan.

Lukasz Cielecki
Distance Measures in Training Set Selection for Debt Value Prediction

A comparative study over six learning scenarios in debt pattern recognition is presented in the paper. There are proposed new approaches for distance measure definitions in training set selection. Using those measures for training set selection the inference models are trained using distinct reference. All proposed approaches are examined in dataset selection during prediction of debt portfolio value. Finally, basic evaluation on prediction performance is conducted.

Tomasz Kajdanowicz, Slawomir Plamowski, Przemyslaw Kazienko
A Second-Order Learning Algorithm for Computing Optimal Regulatory Pathways

Gene regulatory pathways play an important role in the functional understanding and interpretation of gene function. Many different approaches have been developed to model and simulate gene regulatory networks. In this paper we present the results of an iterative new second-order learning algorithm based on the multilayer perceptron (MLP) for generating optimal gene regulatory pathways by using ordinary differential equations. The algorithm based on Newton’s method is independent on the learning parameter and overcomes the drawbacks of the standard backpropagation (BP) algorithm. The methodology generates flow vectors which indicate the flow of mRNA and thereby the protein produced from one gene to another gene. A set of weighting coefficients representing concentration of various transcription factors is incorporated. The gene regulatory pathways are obtained through optimization of an objective function with respect to these weighting coefficients. Two gene regulatory networks are used to demonstrate the efficiency of the proposed learning algorithm. A comparative study with the existing extreme pathway analysis (EPA) also forms a part of this study. Results reported in the paper were corroborated by the same reported in the literature.

Mouli Das, C. A. Murthy, Subhasis Mukhopadhyay, Rajat K. De
Aggregation of Correlation Measures for the Reverse Engineering of Gene Regulatory Sub-networks

This paper presents a simple and novel approach involving the aggregation of some correlation-based techniques for deciphering simple gene interaction sub-networks from biclusters in microarray time series gene expression data. Preprocessing has been used for discarding the weakly interacting gene pairs,

i.e.

, those that are poorly correlated. The proposed technique was successfully applied to public-domain data sets of Yeast and the experimental results were biologically validated based on benchmark databases and information from literature.

Ranajit Das, Sushmita Mitra

Image and Video Processing

Interactive Content Based Image Retrieval Using Ripplet Transform and Fuzzy Relevance Feedback

In this article, a novel content based image retrieval (CBIR) system based on a new Multiscale Geometric Analysis (MGA)-tool, called Ripplet Transform Type-I (RT) is presented. To improve the retrieval result, a fuzzy relevance feedback mechanism (F-RFM) is also implemented. Fuzzy entropy based feature evaluation mechanism is used for automatic computation of revised feature’s importance and similarity distance at the end of each iteration. Experimental results on a large image database demonstrate the efficiency and effectiveness of the proposed CBIR system in the image retrieval paradigm

Manish Chowdhury, Sudeb Das, Malay Kumar Kundu
Fast Computation of Edge Model Representation for Image Sequence Super-Resolution

Edge model based representation of Laplacian subbands has been demonstrated to be useful in single frame high resolution image generation. A reconstruction based multiframe super-resolution algorithm yields a better super-resolved image if high resolution estimate of individual frame is given as input, instead of original low resolution frames. Fast computation of edge-model based representation enables fast single frame high resolution image generation for multiple frames and in turn helps in speeding up reconstruction based super resolution. In the present work, efficient multiframe edge model computation is achieved by computing edge model for the reference frame and then computing successive models by adapting it on the remaining frames.

Malay K. Nema, Subrata Rakshit, Subhasis Chaudhuri
Detection of Structural Concavities in Character Images—A Writer-Independent Approach

In this paper, we present a novel technique for detection of concave regions as a structural information of character images. The problem difficulty lies in reporting all concavities irrespective of the viewing direction on the 2D plane. In our approach, we detect concave regions by analyzing the sequence of discrete turns taken to describe the character stroke; hence, it becomes view-invariant. The proposed method has the added advantage of detecting same concave regions of a particular character written by different individuals. We have tested our method on printed and handwritten Bangla and Hindi isolated character images. Initial results demonstrate the efficacy of our approach.

Soumen Bag, Partha Bhowmick, Gaurav Harit
Semi-supervised Fuzzy Clustering Algorithms for Change Detection in Remote Sensing Images

For the problem of

change detection

it is difficult to have sufficient amount of ground truth information that is needed in supervised learning. On the contrary it is easy to identify

a few

labeled patterns by the experts. In this situation to avoid wastage of available information semi-supervision is suggestible to enhance the performance of unsupervised ones. Here we present the fuzzy clustering based semi-supervised technique to detect the

changes

in remote sensing images that takes care of spatial correlation between neighboring pixels of the difference image produced by comparing two images acquired on the same geographical area at different times. To do so two classical fuzzy clustering algorithms, namely fuzzy c-means (FCM) and Gustafson Kessel clustering (GKC) algorithms have been used in semi-supervised way. For clustering purpose various image features are extracted using the neighborhood information of pixels. To show the effectiveness of the proposed technique, experiments are conducted on two multispectral and multitemporal remote sensing images. Results are compared with those of existing unsupervised fuzzy clustering based technique, Markov random field (MRF) & neural network based algorithms and found to be superior.

Niladri Shekhar Mishra, Susmita Ghosh, Ashish Ghosh
SLAR (Simultaneous Localization And Recognition) Framework for Smart CBIR

In traditional content-based image retrieval (CBIR) methods, features are extracted from the entire image for computing similarity with query. It is necessary to design a smart object-centric CBIR to retrieve images from the gallery, having objects similar to that present in the foreground of the query image. We propose a model for a novel SLAR (Simultaneous Localization And Recognition) framework for solving this problem of smart CBIR, to simultaneously: (i) detect the location and (ii) recognize the type (ID or class) of the foreground object in a scene. The framework integrates both unsupervised and supervised methods of foreground segmentation and object classification. This model is motivated by the cognitive models of human visual perception, which generalizes from examples to simultaneously locate and categorize objects. Experimentation has been done on six categories of objects and the results have been compared with a contemporary work on CBIR.

Gyanesh Dwivedi, Sukhendu Das, Subrata Rakshit, Megha Vora, Suranjana Samanta
A Novel Statistical Model to Evaluate the Performance of EBGM Based Face Recognition

Pose, illumination, expression and other transitive and demographic variates present in the facial images have significant effects on the performance of face recognition system. A Gibbs sampler based statistical simulation algorithm is presented to evaluate the performance of EBGM based face recognition system. A new set of microscopic and stochastic image features are proposed which takes key role in determining the quality of facial images. Effects of these features on the performance of the EBGM based face recognition system are evaluated using an algorithm based on random effects model and Gibbs sampler.

Munmun Chakraborty, Kunal Chanda, Debasis Mazumdar
A New Proposal for Locality Preserving Projection

Locality Preserving Projection (LPP) is the recently proposed approach for dimensionality reduction to preserve the neighbourhood information. It is widely used for finding the intrinsic dimensionality of data. As LPP preserves the information about the nearest neighbours of data points, it may lead to misclassification in the overlapping regions of two or more classes. The conventional method works on a graph based technique where weights given to the edges are used to emphasize the local information. In this paper, we propose a new weighing scheme for the neighbourhood preserving graph which also gives importance to the data points that are at a moderate distance, in addition to the nearest points. This helps in resolving the ambiguity occurring in the overlapping regions. The proposal is tested on varying datasets.

Gitam Shikkenawis, Suman K. Mitra
Self-similarity and Points of Interest in Textured Images

We propose the application of symmetry for texture classification. First we propose a feature vector based on the distribution of local bilateral symmetry in textured images. This feature is more effective in classifying a uniform texture versus a non-uniform texture. The feature when used with a texton-based feature improves the classification rate and is tested on 4 texture datasets. Secondly, we also present a global clustering of texture based on symmetry.

Shripad Kondra, Alfredo Petrosino

Speech and Signal Processing

Ordinal Incremental Data in Collaborative Filtering

In modern collaborative filtering applications initial data are typically very large (holding millions of users and items) and come in real time. In this case only incremental algorithms are practically efficient. The additional complication in using standard methods for matrix decompositions appears when the initial data are ratings, i.e. they are represented in the ordinal scale. Standard methods are used for quantitative data. In this paper a new incremental gradient method based on Generalized Hebbian Algorithm (GHA) is proposed. It allows to find matrix decompositions for ordinal data bulks. The functional for ordinal data is worked in. The algorithm does not require to store the initial data matrix and effectively updates user/item profiles when a new user or a new item appears or a matrix cell is modified. The results of experiments show the better RMSE when applying an algorithm adjusted to ordinal data.

Elena Polezhaeva
Combining Evidence from Temporal and Spectral Features for Person Recognition Using Humming

In this paper, hum of a person is used to identify a speaker with the help of machine. In addition, novel temporal features (such as zero-crossing rate & short-time energy) and spectral features (such as spectral centroid & spectral flux) are proposed for person recognition task. Feature-level fusion of each of these features with state-of-the art spectral feature set,

viz

., Mel Frequency Cepstral Coefficients (MFCC) is found to give better recognition performance than MFCC alone. In addition, it is shown that the person identification rate is competitive over baseline MFCC. Furthermore, the reduction in equal error rate (EER) by 1.46 % is obtained when a feature-level fusion system is employed by combining evidences from MFCC, temporal and proposed spectral features.

Hemant A. Patil, Maulik C. Madhavi, Rahul Jain, Alok K. Jain
Novel Interleaving Schemes for Speaker Recognition over Lossy Networks

Cases of cybercrime & terrorism on IP network is increasing day by day. In addition, there is a tendency to fraud phone-banking systems, and gain access to secure premises or accounts, which may be protected through the voice-based biometric system. To minimize these problems, we need a voice/speaker recognition system with utmost accuracy. Number of users of internet applications is also increasing, causes heavy traffic over IP channel almost round the clock. In this paper, the effect of packet loss on the performance of speaker recognition system is demonstrated and to alleviate this degradation we propose novel interleaving schemes. The proposed interleaving schemes help to spread the risk of burst loss in the network which is expected to improve speech quality and hence performance of the speaker recognition system.

Hemant A. Patil, Parth A. Goswami, Tapan Kumar Basu
Class Dependent 2D Correlation Filter for Illumination Tolerant Face Recognition

This paper proposes a class dependent 2D correlation filtering technique in frequency domain for illumination tolerant face recognition. The technique is based on the frequency domain correlation between phase spectrum of reconstructed image and the phase spectrum of optimum correlation filter. The optimization is achieved by minimizing the energy at the correlation plane due to resonstructed image and maximizing the corelation peak. The synthesis of optimum filter is developed by using the projecting image. Peak to side lobe ratio (PSR) is taken as the metric for recogntion and classification. The performance evaluation of this technique is validated by comparing performance of other unconstrained filtering techniques using benchmark databases (Yale B and PIE) and better results are obtained.

Pradipta K. Banerjee, Jayanta K. Chandra, Asit K. Datta
Removing EEG Artifacts Using Spatially Constrained Independent Component Analysis and Daubechies Wavelet Based Denoising with Otsu’ Thresholding Technique

ElectroEncephaloGram (EEG) records contains data regarding abnormalities or responses to some stimuli in the human brain. Such rhythms are examined by physicians for the purpose of detecting the neural disorders and cerebral pathologies. Because to the occurrences of artifacts, it is complicated to examine the EEG, for they introduce spikes which can be confused with neurological rhythms. Therefore, noise and undesirable signals must be removed from the EEG to guarantee a correct examination and diagnosis. This paper presents a novel technique for removing the artifacts from the ElectroEncephaloGram (EEG) signals. This paper uses Spatially-Constrained Independent Component Analysis (SCICA) to separate the exactly the artificate Independent Components (ICs) from the initial EEG signal. Then, Wavelet Denoising is applied to eliminate the brain activity from extracted artifacts, and finally project back the artifacts to be subtracted from EEG signals to get clean EEG data. This paper uses Daubechies wavelet transform for wavelet denoising. Here, thresholding plays an important role in deciding the artifacts. Therefore, a better thresholding technique called Otsu’, thresholding is applied. Experimental result shows that the proposed technique results in better removal of artifacts.

G. Geetha, S. N. Geethalakshmi
Performance Evaluation of PBDP Based Real-Time Speaker Identification System with Normal MFCC vs MFCC of LP Residual Features

Present study compares, Mel Frequency Cepstral Coefficients (MFCC) of Linear Predictive (LP) Residuals with normal MFCC features using both VQ and GMM based speaker modeling approaches for performance evaluation of real- time Automatic Speaker Identification systems including both co-operative and non co-operative speaking scenario. Pitch Based Dynamic Pruning (PBDP) technique is applied regarding optimization of Speaker Identification process. System is trained and tested with voice samples of 62 speakers across different age groups. Residual of a signal contains information mostly about the source, which is speaker specific. Result shows that, in co-operative speaking, MFCC of LP residuals outperform normal MFCC features for both VQ and GMM based speaker modeling with an improvement of 7.6% and 6.8% in average accuracy respectively. But combined modeling of both features (source and vocal tract) is required for non co-operative speaking in real-time as it enhances the highest identification accuracy from 67.7% to 83%.

Soma Khan, Joyanta Basu, Milton Samirakshma Bepari
Investigation of Speech Coding Effects on Different Speech Sounds in Automatic Speech Recognition

Automatic Speech Recognition (ASR) systems are increasing in usage for voice centric applications in mobile handheld and Voice over Internet Protocol (VoIP) based devices. The necessity is also increasing to find out the ASR performance under different network impediments when the recognition is performed in the remote servers, in real sense. Among the major impediments, speech coding is the one, which affects the ASR performance greatly, when using it with different sampling rates and bit rates in the practical systems. The speech codecs which use different algorithms for generating different bit rates will affect the speech sounds, i.e. vowels and consonants, differently, and cause the critical sounds in the words to be changed and in-turn affects the overall word recognition performance of the ASR systems. In this paper, the influence of the sampling rate and bit rate changes with different narrowband and wideband codecs on the speech sounds is analyzed. Investigation is carried out to see how the speech sounds are changing while using different codecs operating at different bit rates.

A. V. Ramana, P. Laxminarayana, P. Mythilisharan
Backmatter
Metadaten
Titel
Perception and Machine Intelligence
herausgegeben von
Malay K. Kundu
Sushmita Mitra
Debasis Mazumdar
Sankar K. Pal
Copyright-Jahr
2012
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-27387-2
Print ISBN
978-3-642-27386-5
DOI
https://doi.org/10.1007/978-3-642-27387-2

Premium Partner