Invited Paper

Frontmatter

Recent Advances in Recommender Systems and Future Directions

This article presents an overview of recent methodological advances in developing nearest-neighbor-based recommender systems that have substantially improved their performance. The key components in these methods are: (i) the use of statistical learning to estimate from the data the desired user-user and item-item similarity matrices, (ii) the use of lower-dimensional representations to handle issues associated with data sparsity, (iii) the combination of neighborhood and latent space models, and (iv) the direct incorporation of auxiliary information during model estimation. The article will also provide illustrative examples for these methods in the context of item-item nearest-neighbor methods for rating prediction and Top-N recommendation. In addition, the article will present an overview of exciting new application areas of recommender systems along with the challenges and opportunities associated with them.

Xia Ning, George Karypis

Foundations of Machine Learning

Frontmatter

On the Number of Rules and Conditions in Mining Data with Attribute-Concept Values and “Do Not Care” Conditions

In this paper we discuss two interpretations of missing attribute values: attribute-concept values and “do not care” conditions. Experiments were conducted on eight kinds of data sets, using three types of probabilistic approximations: singleton, subset and concept. Rules were induced by the MLEM2 rule induction system. Our main objective was to test which interpretation of missing attribute values provides simpler rule sets in terms of the number of rules and the total number of conditions. Our main result is that experimental evidence exists showing rule sets induced from data sets with attribute-concept values are simpler than the rule sets induced from “do not care” conditions.

Patrick G. Clark, Jerzy W. Grzymala-Busse

Simplifying Contextual Structures

We present a method to simplify a formal context while retaining much of its information content. Although simple, our ICRA approach offers an effective way to reduce the complexity of a concept lattice and/or a knowledge space by changing only little information in comparison to a competing model which uses fuzzy K-Means clustering.

Ivo Düntsch, Günther Gediga

Towards a Robust Scale Invariant Feature Correspondence

In this paper, we introduce an improved scale invariant feature correspondence algorithm which depends on the Similarity-Topology Matching algorithm. It pays attention not only to the similarity between features but also to the spatial layout of every matched feature and its neighbours. The features are represented as an undirected graph where every node represents a local feature and every edge represents adjacency between them. The topology of the resulting graph can be considered as a robust global feature of the represented object. The matching process is modeled as a graph matching problem; which in turn is formulated as a variation of the quadratic assignment problem. The Similarity-Topology Matching algorithm achieves superior performance in almost all the experiments except when the image has been exposed to scaling deformations. An amendment has been done to the algorithm in order to cope with this limitation. In this work, we depend not only on the distance between the two interest points but also on the scale at which the interest points are detected to decide the neighbourhood relations between every pair of features. A set of challenging experiments conducted using 50 images (contain repeated structure) representing 5 objects from COIL-100 data-set with extra synthetic deformations reveal that the modified version of the Similarity-Topology Matching algorithm has better performance. It is considered more robust especially under the scale deformations.

Shady Y. El-Mashad, Amin Shoukry

A Comparison of Two Approaches to Discretization: Multiple Scanning and C4.5

In a Multiple Scanning discretization technique the entire attribute set is scanned many times. During every scan, the best cutpoint is selected for all attributes. The main objective of this paper is to compare the quality of two setups: the Multiple Scanning discretization technique combined with the C4.5 classification system and the internal discretization technique of C4.5. Our results show that the Multiple Scanning discretization technique is significantly better than the internal discretization used in C4.5 in terms of an error rate computed by ten-fold cross validation (two-tailed test, 5 % level of significance). Additionally, the Multiple Scanning discretization technique is significantly better than a variant of discretization based on conditional entropy introduced by Fayyad and Irani called Dominant Attribute. At the same time, decision trees generated from data discretized by Multiple Scanning are significantly simpler from decision trees generated directly by C4.5 from the same data sets.

Jerzy W. Grzymala-Busse, Teresa Mroczek

Hierarchical Agglomerative Method for Improving NPS

The paper proposes a new strategy called HAMIS to improve NPS (Net Promoter Score) of certain companies involved in heavy equipment repair in the US and Canada - we call them clients. HAMIS is based on the semantic dendrogram built by using agglomerative clustering strategy and semantic distance between clients. More similar is the knowledge extracted from two clients, more close these clients semantically are to each other. Each company is represented by a dataset which is built from answers to the questionnaire sent to a number of randomly chosen customers using services offered by this company. Before knowledge is extracted from these datasets, each one is extended by merging it with datasets which are close to it in the semantic dendrogram, have higher NPS, and if classifiers extracted from them have higher FS-score. Action rules are extracted from these extended datasets and used for providing recommendations to clients of how to improve their businesses.

Jieyan Kuang, Zbigniew W. Raś, Albert Daniel

A New Linear Discriminant Analysis Method to Address the Over-Reducing Problem

Linear discriminant analysis

(LDA) is an effective and efficient linear dimensionality reduction and feature extraction method. It has been used in a broad range of pattern recognition tasks including face recognition, document recognition and image retrieval. When applied to fewer-class classification tasks (such as binary classification), however, LDA suffers from the

over-reducing

problem – insufficient number of features are extracted for describing the class boundaries. This is due to the fact that LDA results in a fixed number of reduced features, which is one less the number of classes. As a result, the classification performance will suffer, especially when the classification data space has high dimensionality. To cope with the problem we propose a new LDA variant,

orLDA

(i.e.,

LDA for over-reducing problem

), which promotes the use of individual data instances instead of summary data alone in generating the transformation matrix. As a result orLDA will obtain a number of features that is independent of the number of classes. Extensive experiments show that orLDA has better performance than the original LDA and two LDA variants – uncorrelated LDA and orthogonal LDA.

Huan Wan, Gongde Guo, Hui Wang, Xin Wei

Image Processing

Frontmatter

Procedural Generation of Adjustable Terrain for Application in Computer Games Using 2D Maps

This paper describes method for generating 3D terrain for usage in computer games by processing set of 2D maps and employing of user-specified parameters. Most of existing solutions don’t allow for modifications during generation process, while introducing any changes usually requires complex activities, or is limited to adjusting input maps. We present our solution that allows not only for easy edition of created terrain, but also verification of its quality at each step of generation.

Izabella Antoniuk, Przemysław Rokita

Fixed Point Learning Based 3D Conversion of 2D Videos

The depth cues which are also called monocular cues from single still image are more versatile while depth cues of multiple images gives more accurate depth extraction. Machine learning is a promising and new research direction for this type of conversion in today scenario. In our paper, a fast automatic 2D to 3D conversion technique is proposed which utilizes a fixed point learning framework for the accurate estimation of depth maps of query images using model trained from a training database of 2D color and depth images. The depth maps obtained from monocular and motion depth cues of input images/video and ground truth depths are used in training database for the fixed point iteration. The results produces with fixed point model are more accurate and reliable than MRF fusion of both types of depth cues. The stereo pairs are generated then using input video frames and their corresponding depth maps obtained from fixed point learning framework. These stereo pairs are put together to get the final 3D video which can be displayed on any 3DTV and seen using 3D glasses.

Nidhi Chahal, Santanu Chaudhury

Fast and Accurate Foreground Background Separation for Video Surveillance

Fast and accurate algorithms for background-foreground separation are essential part of any video surveillance system. GMM (Gaussian Mixture Models) based object segmentation methods give accurate results for background-foreground separation problems, but are computationally expensive. In contrast, modeling with only a single Gaussian improves the time complexity with a reduction in the accuracy due to variations in illumination and dynamic nature of the background. It is observed that these variations affect only a few pixels in an image. Most of the background pixels are unimodal. We propose a method to account for dynamic nature of the background and low lighting conditions. It is an adaptive approach where each pixel is modeled as either unimodal Gaussian or multimodal Gaussians. The flexibility in terms of number of Gaussians used to model each pixel, along with

learning when it is required

approach reduces the time complexity of the algorithm significantly. To resolve problems related to false negative due to homogeneity of color and texture in foreground and background, a spatial smoothing is carried out by K-means, which improves the overall accuracy of proposed algorithm.

Prashant Domadiya, Pratik Shah, Suman K. Mitra

Enumeration of Shortest Isothetic Paths Inside a Digital Object

The computation of a shortest isothetic path (SIP) between two points in an object is important in various applications such as robot navigation and VLSI design. However, a SIP between two grid points in a digital object laid on a uniform 2D isothetic square lattice may not be unique. We assume that each discrete path consists of a sequence of consecutive grid edges that starts from the source point and ends at the sink point. In this paper, we present a novel algorithm to calculate the

number

of such distinct shortest isothetic paths between two given grid points inside a digital object, with time complexity

$$O(S/g^2)$$

, where

S

is the total number of pixels in the digital object, and

g

is the grid size. The number of available SIPs also serves as a metric for shape registration.

Mousumi Dutt, Arindam Biswas, Bhargab B. Bhattacharya

Modified Exemplar-Based Image Inpainting via Primal-Dual Optimization

In this paper we present a modified exemplar based image inpainting technique to remove objects from digital images. Traditional exemplar based image inpainting techniques do not take into account similarity among patches to be filled with neighbors inside the hole. This gives visually incoherent results. To correct this problem we formulate image inpainting as a global energy optimization problem. We use primal-dual schema of linear programming for optimization. We also modify the criteria for determining priority among candidate patches to be inpainted by introducing one

‘edge length’

term which propagates linear structures better than the existing techniques. Results show the effectiveness of our method compared to other recent methods.

Veepin Kumar, Jayanta Mukhopadhyay, Shyamal Kumar Das Mandal

A Novel Approach for Image Super Resolution Using Kernel Methods

We present a learning based method for image super resolution problem. Our approach uses kernel methods to build an efficient representation and also to learn the regression model. For constructing an efficient set of features, we apply Kernel Principal Component Analysis (Kernel-PCA) with a Gaussian kernel on a patch based data-base constructed from 69 training images up-scaled using bi-cubic interpolation. These features were given as input to a non-linear Support Vector Regression (SVR) model, with Gaussian kernel, to predict the pixels of the high resolution image. The model selection for SVR was performed using grid search. We tested our algorithm on an unseen data-set of 13 images. Our method out-performed a state-of the-art method and achieved an average of 0.92 dB higher Peak signal-to-noise ratio (PSNR). The average improvement in PSNR over bi-cubic interpolation was found to be 3.38 dB.

Adhish Prasoon, Himanshu Chaubey, Abhinav Gupta, Rohit Garg, Santanu Chaudhury

Generation of Random Triangular Digital Curves Using Combinatorial Techniques

This work presents an algorithm to generate simple closed random triangular digital curves of finite length imposed on a background triangular grid. A novel timestamp-based combinatorial technique is incorporated to allow the curve to grow freely without intersecting itself. The algorithm runs in linear time as a fixed set of vertices are consulted to find the next direction and since it does not require backtracking. The proposed algorithm is implemented and tested exhaustively.

Apurba Sarkar, Arindam Biswas, Mousumi Dutt, Arnab Bhattacharya

Image Retrieval

Frontmatter

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval

This paper proposes a content based image retrieval (CBIR) technique for tackling curse of dimensionality arising from high dimensional feature representation of database images and search space reduction by clustering. Kernel principal component analysis (KPCA) is taken on MPEG-7 Color Structure Descriptor (CSD) (64-bins) to get low-dimensional nonlinear-subspace. The reduced feature space is clustered using Partitioning Around Medoids (PAM) algorithm with number of clusters chosen from optimum average silhouette width. The clusters are refined to remove possible outliers to enhance retrieval accuracy. The training samples for a query are marked manually and fed to One-Class Support Vector Machine (OCSVM) to search the refined cluster containing the query image. Images are ranked and retrieved from the positively labeled outcome of the belonging cluster. The effectiveness of the proposed method is supported with comparative results obtained from (i) MPEG-7 CSD features directly (ii) other dimensionality reduction techniques.

Minakshi Banerjee, Seikh Mazharul Islam

Face Profile View Retrieval Using Time of Flight Camera Image Analysis

Method for profile view retrieving of the human face is presented. The depth data from the 3D camera is taken as an input. The preprocessing is, besides of standard filtration, extended by the process of filling of the holes which are present in depth data. The keypoints, defined as the nose tip and the chin are detected in user’s face and tracked. The Kalman filtering is applied to smooth the coordinates of those points which can vary with each frame because of the subject’s movement in front of the camera. Knowing the locations of keypoints and having the depth data the contour of the user’s face a profile retrieval is attempted. Further filtering and modifications are introduced to the profile view in order to enhance its representation. Data processing enhancements allow emphasizing minima and maxima in the contour signals leading to discrimination of the face profiles and enable robust facial landmarks tracking.

Piotr Bratoszewski, Andrzej Czyżewski

Context-Based Semantic Tagging of Multimedia Data

With the rapid growth of broadcast systems and ease of accessing internet services, lots of information is available and accessible on the web. The information available in multimedia documents may have different

context

and

content

. Since, interpretation of multimedia content cannot be free of context so tagging on the basis on context is indispensable for dealing with this problem. Tagging plays an important role in retrieving multimedia data as now-a-days most of the videos are retrieved based on text describing them and not by the actual context embodied in them. So, in this paper we have proposed a scheme for tagging multimedia data based on the contents and context as identified from web-based resources. The hierarchical LDA (hLDA) is used to model the context information while Correspondence-LDA (Corr-LDA) is used to model the content information of multimedia data. Finally, multimedia data is tagged with the relevant contents and context information on the basis of Context-Matching Algorithm. These tags can then be used by search engines for increasing precision and recall of multimedia search results.

Nisha Pahal, Santanu Chaudhury, Brejesh Lall

Image Tracking

Frontmatter

Real-Time Distributed Multi-object Tracking in a PTZ Camera Network

A visual surveillance system should have the ability to view an object of interest at a certain size so that important information related to that object can be collected and analyzed as the object moves in the area observed by multiple cameras. In this paper, we propose a novel framework for real-time, distributed, multi-object tracking in a PTZ camera network with this capability. In our framework, the user is provided a tool to mark an object of interest such that the object is tracked at a certain size as it moves in the view of various cameras across space and time. The pan, tilt and zoom capabilities of the PTZ cameras are leveraged upon to ensure that the object of interest remains within the predefined size range as it is seamlessly tracked in the PTZ camera network. In our distributed system, each camera tracks the objects in its view using particle filter tracking and multi-layered belief propagation is used for seamlessly tracking objects across cameras.

Ayesha Choudhary, Shubham Sharma, Indu Sreedevi, Santanu Chaudhury

Improved Simulation of Holography Based on Stereoscopy and Face Tracking

To meet the requirements of the market, people are improving communication with the virtual reality systems. We propose a method for simulating holography, where person can see the object from different points of view. To achieve such effect we used stereoscopy in combination with face tracking, what enables us to manipulate content on the screen. Despite heavy computation load we were able to maintain interactivity of the whole system.

Łukasz Dąbała, Przemysław Rokita

Head Pose Tracking from RGBD Sensor Based on Direct Motion Estimation

We propose to use a state-of-the-art visual odometry technique for the purpose of head pose estimation. We demonstrate that with small adaptation this algorithm allows to achieve more accurate head pose estimation from an RGBD sensor than all the methods published to date. We also propose a novel methodology to automatically assess the accuracy of a tracking algorithm without the need to manually label or otherwise annotate each image in a test sequence.

Adam Strupczewski, Błażej Czupryński, Władysław Skarbek, Marek Kowalski, Jacek Naruniec

Pattern Recognition

Frontmatter

A Novel Hybrid CNN-AIS Visual Pattern Recognition Engine

Machine learning methods are used today mostly for recognition problems. Convolutional Neural Networks (CNN) have time and again proved successful for many image processing tasks primarily for their architecture. In this paper we propose to apply CNN to small data sets like for example, personal photo albums or other similar environs where the size of training dataset is a limitation, within the framework of a proposed hybrid CNN-AIS model. We use Artificial Immune System Principles to enhance the small size of training data set. A layer of Clonal Selection is added to the local filtering and max pooling of CNN Architecture. The proposed Architecture is evaluated using the standard MNIST dataset by limiting the data size and also with a small personal data sample belonging to two different classes. Experimental results show that the proposed hybrid CNN-AIS based recognition engine works well when the size of training data is limited in size.

Vandna Bhalla, Santanu Chaudhury, Arihant Jain

Modified Orthogonal Neighborhood Preserving Projection for Face Recognition

In recent times most of the face recognition algorithms are based on subspace analysis. High dimensional image data are being transformed into lower dimensional subspace thus leading towards recognition by embedding a new image into the lower dimensional space. Starting from Principle Component Analysis(PCA) many such dimensionality reduction procedures have been utilized for face recognition. Recent edition is Neighborhood Preserving Projection (NPP). All such methods lead towards creating an orthogonal transformation based on some criteria. Orthogonal NPP builds a linear relation within a small neighborhood of the data and then assumes its validity in the lower dimension space. However, the assumption of linearity could be invalid in some applications. With this aim in mind, current paper introduces an approximate non-linearity. In particular piecewise linearity, within the small neighborhood which gives rise to a more compact data representation that could be utilized for recognition. The proposed scheme is implemented on synthetic as well as real data. Suitability of the proposal is tested on a set of face images and a significant improvement in recognition is observed.

Purvi Koringa, Gitam Shikkenawis, Suman K. Mitra, S. K. Parulkar

An Optimal Greedy Approximate Nearest Neighbor Method in Statistical Pattern Recognition

The insufficient performance of statistical recognition of composite objects (images, speech signals) is explored in case of medium-sized database (thousands of classes). In contrast to heuristic approximate nearest-neighbor methods we propose a statistically optimal greedy algorithm. The decision is made based on the Kullback-Leibler minimum information discrimination principle. The model object to be checked at the next step is selected from the class with the maximal likelihood (joint density) of distances to previously checked models. Experimental study results in face recognition task with FERET dataset are presented. It is shown that the proposed method is much more effective than the brute force and fast approximate nearest neighbor algorithms, such as randomized kd-tree, perm-sort, directed enumeration method.

Andrey V. Savchenko

Ear Recognition Using Block-Based Principal Component Analysis and Decision Fusion

In this paper, we propose a fast and accurate ear recognition system based on principal component analysis (PCA) and fusion at classification and feature levels. Conventional PCA suffers from time and space complexity when dealing with high-dimensional data sets. Our proposed algorithm divides a large image into smaller blocks, and then applies PCA on each block separately, followed by classification using a minimum distance classifier. While the recognition rates on small blocks are lower than that on the whole ear image, combining the outputs of the classifiers is shown to increase the recognition rate. Experimental results confirm that our proposed algorithm is fast and achieves recognition performance superior to that yielded when using whole ear images.

Alaa Tharwat, Abdelhameed Ibrahim, Aboul Ella Hassanien, Gerald Schaefer

Data Mining Techniques for Large Scale Data

Frontmatter

Binarizing Change for Fast Trend Similarity Based Clustering of Time Series Data

It is observed that traditional clustering methods do not necessarily perform well on time-series data because of the temporal relationships in the observed values over a period of time. Another issue with time series is that databases contain bulk amount of data in terms of dimension and size. Clustering algorithms based on traditional measures of dissimilarity find trade-offs between efficiency and accuracy. In addition, time series analysis should be more concerned with the patterns in change and the points of change rather than the values of change. In this paper a new representation technique and similarity measure have been proposed for agglomerative hierarchical clustering.

Ibrahim K. A. Abughali, Sonajharia Minz

Big Data Processing by Volunteer Computing Supported by Intelligent Agents

In this paper, volunteer computing systems have been proposed for big data processing. Moreover, intelligent agents have been developed to efficiency improvement of a grid middleware layer. In consequence, an intelligent volunteer grid has been equipped with agents that belong to five sets. The first one consists of some user tasks. Furthermore, two kinds of semi-intelligent tasks have been introduced to implement a middleware layer. Finally, two agents based on genetic programming as well as harmony search have been applied to optimize big data processing.

Jerzy Balicki, Waldemar Korłub, Jacek Paluszak

Two Stage SVM and kNN Text Documents Classifier

The paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named

one-vs-near

. It is an extension of the

one-vs-all

approach, typically used with binary classifiers in order to solve multiclass problems. The experiments were performed on a large scale dataset, with use of many parallel threads on a supercomputer. Results of the experiments show that the proposed classifier scales well and gives reasonable quality results. Finally, it is shown that the proposed method gives better performance compared to the traditional approach.

Marcin Kępa, Julian Szymański

Task Allocation and Scalability Evaluation for Real-Time Multimedia Processing in a Cluster Environment

An allocation algorithm for stream processing tasks is proposed (Modified Best Fit Descendent, MBFD). A comparison with another solution (BFD) is provided. Tests of the algorithms in an HPC environment are described and the results are presented. A proper scalability metric is proposed and used for the evaluation of the allocation algorithm.

Jerzy Proficz, Henryk Krawczyk

Fuzzy Computing

Frontmatter

Concept Synthesis Using Logic of Prototypes and Counterexamples: A Graded Consequence Approach

This paper is a preliminary step towards proposing a scheme for synthesis of a concept out of a set of concepts focusing on the following aspects. The first is that the semantics of a set of simple (or independent) concepts would be understood in terms of its prototypes and counterexamples, where these instances of positive and negative cases may vary with the change of the context, i.e., a set of situations which works as a precursor of an information system. Secondly, based on the classification of a concept in terms of the situations where it strictly applies and where not, a degree of application of the concept to some new situation/world would be determined. This layer of reasoning is named as logic of prototypes and counterexamples. In the next layer the method of concept synthesis would be designed as a graded concept based on the already developed degree based approach for logic of prototypes and counterexamples.

Soma Dutta, Piotr Wasilewski

Fuzzy Rough Sets Theory Reducts for Quantitative Decisions – Approach for Spatial Data Generalization

One of the most important objectives within the scope of current cartography is the creation of system controlling the process of geographical data generalisation. Firstly, it requires selection of the features crucial from the point of view of the decision making process. Such tools as reducts and fuzzy reducts, though useful, are still insufficient for the quantitative decisions, common in cartographical generalization. Thus the author proposed a modification in fuzzy reducts calculating, which can allow to calculate them with regard to a continuous decision variable. The proposed method is based on the t-norm of fuzzy indiscernibility based on attribute value and fuzzy indiscernibility based on decision, which is calculated for each pair of objects. The solution seems to be more intuitive than the ones established previously.

Anna Fiedukowicz

Fuzzy Rough Sets Theory Applied to Parameters of Eye Movements Can Help to Predict Effects of Different Treatments in Parkinson’s Patients

Parkinson (PD) is the second most common neurodegenerative disease (ND) with characteristic movement disorders. There are well defined standard procedures to measure disease stage (Hohen Yahr scale), progression and effects of treatments (UPDRS – unified Parkinson Disease Rate Scale). But these procedures can only be performed by experienced neurologist and they are partly subjective. The purpose of our work was to test objective and non-invasive method that may help to estimate disease stage by measuring fast and slow eye movements (EM). It was demonstrated earlier that EM changes in PD. We have measured reflexive saccades (RS) and slow pursuit ocular movements (POM) in four sessions related to different treatments. With help of fuzzy rough sets theory (FRST) we have related measurements with expert’s opinion by generalizing experimental finding by

fuzzy rules

. In order to test our approach, we have divided our measurements into training and testing sets. In the second test, we have removed expert’s decisions and predicted them from the training set in two situations: on the basis of only classical neurological measurements and on the basis of EM measurements. We have observed, on 12 PD patients basis, an increase in predictions accuracy when eye movements were included as condition attributes. Our results with help of the FRST suggest that EM measurements may become an important diagnostic tool in PD.

Anna Kubis, Artur Szymański, Andrzej W. Przybyszewski

Determining OWA Operator Weights by Maximum Deviation Minimization

The ordered weighted averaging (OWA) operator uses the weights assigned to the ordered values of the attributes. This allows one to model various aggregation preferences characterized by the so-called orness measure. The determination of the OWA operator weights is a crucial issue of applying the operator for decision making. In this paper, for a given orness value, monotonic weights of the OWA operator are determined by minimization of the maximum absolute deviation inequality measure. This leads to a linear programming model which can also be solved analytically.

Wlodzimierz Ogryczak, Jaroslaw Hurkala

Fuzzy Set Interpretation of Comparator Networks

We discuss how to model similarities between compound objects by utilizing networks of comparators. The framework is used to construct identification and classification systems. Comparing to our previous research, we pay a special attention to fuzzy-set-inspired foundations of how compound signals are processed through the network. We also reconsider some of already-known examples of applications of comparator networks, now using the proposed fuzzy-set-based terminology.

Łukasz Sosnowski, Dominik Ślęzak

Inverted Fuzzy Implications in Backward Reasoning

Fuzzy inference systems generate inference results based on fuzzy IF-THEN rules. Fuzzy implications are mostly used as a way of interpretation of the IF-THEN rules with fuzzy antecedent and fuzzy consequent. From over eight decades a number of different fuzzy implications have been described, e.g. [

6

–

10

]. This leads to the following question: how to choose the proper function among basic fuzzy implications. In our paper, we propose a new method for choosing implication. Our method allows to compare two fuzzy implications. If the truth value of the consequent and the truth value of the implication are given, by means of inverse fuzzy implications we can easily optimize the truth value of the implication antecedent. In other words, we can choose the fuzzy implication, which has the highest or the lowest truth value of the implication antecedent or which has higher or lower truth value than another implication.

Zbigniew Suraj, Agnieszka Lasek

Rough Sets

Frontmatter

Generating Core Based on Discernibility Measure and MapReduce

In this paper we propose a parallel method for generating attribute core based on distributed programming model MapReduce and rough set theory. The results of the experiments on real dataset show that the proposed method is effective for big data.

Michal Czolombitko, Jaroslaw Stepaniuk

Music Genre Recognition in the Rough Set-Based Environment

The aim of this paper is to investigate music genre recognition in the rough set-based environment. Experiments involve a parameterized music database containing 1100 music excerpts. The database is divided into 11 classes corresponding to music genres. Tests are conducted using the Rough Set Exploration System (RSES), a toolset for analyzing data with the use of methods based on the rough set theory. Classification effectiveness employing rough sets is compared against

k

-Nearest Neighbors (

k

-NN) and Local Transfer function classifiers (LTF-C). Results obtained are analyzed in terms of global class recognition and also per genre.

Piotr Hoffmann, Bożena Kostek

Scalability of Data Decomposition Based Algorithms: Attribute Reduction Problem

This paper studies the issue of scalability of data decomposed based algorithms that are intended for attribute reduction. Two approaches that decompose a decision table and use the relative discernibility matrix method to compute all reducts are investigated. The experiments results reported in this paper show that application of the approaches makes it possible to gain a better scalability compared with the standard algorithm based on the relative discernibility matrix method.

Piotr Hońko

Application of Fuzzy Rough Sets to Financial Time Series Forecasting

This paper investigates experimentally the feasibility of Fuzzy Rough Sets in building trend prediction models for financial time series, as related research is scarce. Aside of the standard classification accuracy measures, financial profit and loss backtesting using a sample market timing strategy was performed, and profit related quality of the tested methods compared against that of buy&hold strategy applied to the used market indices. The experiments show that Fuzzy Rough Sets models present a viable basis for forecasting market movement direction and thus can support profitable market timing strategies.

Mariusz Podsiadło, Henryk Rybinski

A New Post-processing Method to Detect Brain Tumor Using Rough-Fuzzy Clustering

Automatic and accurate brain tumor segmentation from MR images is one of the important problems in cancer research. However, the lack of shape prior and weak contrast at boundary make unsupervised brain tumor segmentation more challenging. In this background, a new brain tumor segmentation method is being developed, integrating judiciously the merits of multiresolution image analysis technique and rough-fuzzy clustering. One of the major issues of the clustering based segmentation method is how to extract brain tumor accurately, since tumors may not have clearly defined intensity or textural boundaries. In this regard, this paper presents a new post-processing method for clustering based brain tumor detection. It combines the merits of mathematical morphology and the concept of rough set based region growing approach to refine the result obtained after clustering, thereby ensuring the accurateness of brain tumor segmentation application. The performance of the proposed approach, along with a comparison with related methods, is demonstrated on a set of synthetic and real brain MR images.

Shaswati Roy, Pradipta Maji

Rough Set Based Modeling and Visualization of the Acoustic Field Around the Human Head

The presented research aims at modeling acoustical wave propagation phenomena by applying rough set theory in a novel manner. In a typical listening environment sound intensity is determined by numerous factors: a distance from a sound source, signal levels and frequencies, obstacles’ locations and sizes. Contrarily, a free-field is characterized by direct, unimpeded propagation of the acoustical waves. The proposed approach is focused on processing sound field measurements performed in an anechoic chamber, collected by a dedicated acoustic probe, comprising thousands of datapoints for six signal frequencies, with and without the presence of a dummy head in a free-field. The rough set theory is applied for modeling the influence of an obstacle that a dummy head creates in a free-field and the effects of the head acoustic interferences, shading and diffraction. A data pre-processing method is proposed, involving coordinate system transformation, data discretization, and classification. Four rule sets are acquired, and achieved accuracy and coverage are assessed. Final results allow simplification of the model and new method for visualization.

Piotr Szczuko, Bożena Kostek, Józef Kotus, Andrzej Czyżewski

Global Optimization of Exact Association Rules Relative to Coverage

In the paper, an application of dynamic programming approach to global optimization of exact association rules relative to coverage is presented.

Beata Zielosko

Bioinformatics

Frontmatter

PDP-RF: Protein Domain Boundary Prediction Using Random Forest Classifier

The Domain Boundary Prediction is a crucial task for functional classification of proteins, homology-based protein structure prediction and for high-throughput structural genomics. Each amino acid is represented using a set of physico-chemical properties. Random Forest Classifier is explored for accurate prediction of domain regions by training on the curated dataset obtained from CATH database. The software is tested on proteins of CASP-6, CASP-8, CASP-9 and CASP-10 targets in order to evaluate its prediction accuracy using three fold cross validation experiments. Finally, a consensus approach is used to combine results of the classifiers obtained through the cross-validation experiments. The average recall and precision scores achieved by the developed consensus based Random Forest classifiers (PDP-RF) are 0.98 and 0.88 respectively for prediction of CASP targets. The overall accuracy and F-scores of the PDP-RF are observed as 0.87 and 0.91 respectively.

Piyali Chatterjee, Subhadip Basu, Julian Zubek, Mahantapas Kundu, Mita Nasipuri, Dariusz Plewczynski

A New Similarity Measure for Identification of Disease Genes

One of the important problems in functional genomics is how to select the disease genes. In this regard, the paper presents a new similarity measure to compute the functional similarity between two genes. It is based on the information of protein-protein interaction networks. A new gene selection algorithm is introduced to identify disease genes, integrating judiciously the information of gene expression profiles and protein-protein interaction networks. The proposed algorithm selects a set of genes from microarray data as disease genes by maximizing the relevance and functional similarity of the selected genes. The performance of the proposed algorithm, along with a comparison with other related methods, is demonstrated on colorectal cancer data set.

Pradipta Maji, Ekta Shah, Sushmita Paul

MaER: A New Ensemble Based Multiclass Classifier for Binding Activity Prediction of HLA Class II Proteins

Human Leukocyte Antigen class II (HLA II) proteins are crucial for the activation of adaptive immune response. In HLA class II molecules, high rate of polymorphisms has been observed. Hence, the accurate prediction of HLA II-peptide interactions is a challenging task that can both improve the understanding of immunological processes and facilitate decision-making in vaccine design. In this regard, during the last decade various computational tools have been developed, which were mainly focused on the binding activity prediction of different HLA II isotypes (such as DP, DQ and DR) separately. This fact motivated us to make a humble contribution towards the prediction of isotypes binding propensity as a multiclass classification task. In this regard, we have analysed a binding affinity dataset, which contains the interactions of 27 HLA II proteins with 636 variable length peptides, in order to prepare new multiclass datasets for strong and weak binding peptides. Thereafter, a new ensemble based multiclass classifier, called

M

et

a

E

nsemble

R

(MaER) is proposed to predict the activity of weak/unknown binding peptides, by integrating the results of various heterogeneous classifiers. It pre-processes the training and testing datasets by making feature subsets, bootstrap samples and creates diverse datasets using principle component analysis, which are then used to train and test the MaER. The performance of MaER with respect to other existing state-of-the-art classifiers, has been estimated using validity measures, ROC curves and gain value analysis. Finally, a statistical test called Friedman test has been conducted to judge the statistical significance of the results produced by MaER.

Giovanni Mazzocco, Shib Sankar Bhowmick, Indrajit Saha, Ujjwal Maulik, Debotosh Bhattacharjee, Dariusz Plewczynski

Selection of a Consensus Area Size for Multithreaded Wavefront-Based Alignment Procedure for Compressed Sequences of Protein Secondary Structures

Multithreaded wavefront-based alignment procedure is used in the PSS-SQL language that allows for flexible scanning of databases of protein secondary structures and finding similarities among protein molecules. Efficiency of the process depends on several factors, including the way how the similarity matrix, calculated during the process, is divided into areas, the number of CPU cores possessed by the computer hosting the database with PSS-SQL extension, and structural patterns submitted by users in PSS-SQL queries. In this paper, we show how we achieved consensus values of area sizes for the multithreaded wavefront-based alignment procedure by a series of experimental trials. Availability: PSS-SQL extension for Microsoft SQL Server database management system can be downloaded from PSS-SQL project home page available at:

http://www.zti.aei.polsl.pl/w3/dmrozek/science/pss-sql.htm

.

Dariusz Mrozek, Bożena Małysiak-Mrozek, Bartek Socha, Stanisław Kozielski

Supervised Cluster Analysis of miRNA Expression Data Using Rough Hypercuboid Partition Matrix

The microRNAs are small, endogenous non-coding RNAs found in plants and animals, which suppresses the expression of genes post-transcriptionally. It is suggested by various genome-wide studies that a substantial fraction of miRNA genes is likely to form clusters. The coherent expression of the miRNA clusters can then be used to classify samples according to the clinical outcome. In this background, a new rough hypercuboid based supervised similarity measure is proposed that is integrated with the supervised attribute clustering to find groups of miRNAs whose coherent expression can classify samples. The proposed method directly incorporates the information of sample categories into the miRNA clustering process, generating a supervised clustering algorithm for miRNAs. The effectiveness of the rough hypercuboid based algorithm, along with a comparison with other related algorithms, is demonstrated on three miRNA microarray expression data sets using the

$$B.632+$$

bootstrap error rate of support vector machine. The association of the miRNA clusters to various biological pathways are also shown by doing pathway enrichment analysis.

Sushmita Paul, Julio Vera

Analysis of AmpliSeq RNA-Sequencing Enrichment Panels

This study presents a proof of concept of encoding genomic signatures in the AmpliSeq technology. The samples of patients with a disease and healthy ones have been processed using an AmpliSeq RNA sequencing kit of a custom design, that include 290 amplicons, sequenced using an IonTorrent machine. The read count data show the sufficient coverage in most of the chosen amplicons, which results in a good separability between the disease patients and healthy donors. In addition, several amplicons allow for checking useful genomics variants (SNPs), whenever the coverage level permits. The paper presents a machine-learning classifier evaluation of the answer to the question of difference between the patients and healthy donors, based upon the AmpliSeq panel data. The outcome confirms the potential utility of similar RNA amplicon kits in the research and clinical practice to encode gene expression signatures of diseases and their phenotypes.

Marek S. Wiewiorka, Alicja Szabelska, Michal J. Okoniewski

Consensus-Based Prediction of RNA and DNA Binding Residues from Protein Sequences

Computational prediction of RNA- and DNA-binding residues from protein sequences offers a high-throughput and accurate solution to functionally annotate the avalanche of the protein sequence data. Although many predictors exist, the efforts to improve predictive performance with the use of consensus methods are so far limited. We explore and empirically compare a comprehensive set of different designs of consensuses including simple approaches that combine binary predictions and more sophisticated machine learning models. We consider both DNA- and RNA-binding motivated by similarities in these interactions, which should lead to similar conclusions. We observe that the simple consensuses do not provide improved predictive performance when applied to sequences that share low similarity with the datasets used to build their input predictors. However, use of machine learning models, such as linear regression, Support Vector Machine and Naïve Bayes, results in improved predictive performance when compared with the best individual predictors for the prediction of DNA- and RNA-binding residues.

Jing Yan, Lukasz Kurgan

Applications of Artificial Intelligence

Frontmatter

Fusion of Static and Dynamic Parameters at Decision Level in Human Gait Recognition

This paper presents the bimodal biometric system based on human gait data of different type: dynamic - ground reaction forces and static - some anthropometric data of human body derived by means of Kinect. The innovation of this work is the use of unprecedented hitherto in the literature set of signals. The study was conducted on a group of 31 people (606 gait cycles). Kistlers force plates and Kinect device as well as the authors software were used to measure and process data. The following anthropometric parameters were used here: torso, hip width, length of left thigh, length of right thigh and body height. These signals have been combined at decision level of the biometric system. Our biometric system in gait recognition process involves both k-nearest neighbour classifier as well as majority voting system. In case of users the False Rejected Rate (FRR) reaches the level of 4.55 % and False Accepted Rate (FAR) is equal to 0.85 %. In the case of impostors it has been possible to reject 26 cases previously classified by 5NN. The presented biometric system fills the gaps in the existing studies and confirms the superiority of systems based fusion over typical methods of human gait recognition.

Marcin Derlatka, Mariusz Bogdan

Web Search Results Clustering Using Frequent Termset Mining

We present a novel method for clustering web search results based on frequent termsets mining. First, we acquire the senses of a query by means of a word sense induction method that identify meanings as trees of closed frequent termsets. Then we cluster the search results based on their lexical and semantic intersection with induced senses. We show that our approach is better or comparable with state-of-the-art classical search result clustering methods in terms of both clustering quality and degree of diversification.

Marek Kozlowski

Effective Imbalanced Classification of Breast Thermogram Features

Breast cancer is the most commonly occurring form of cancer in women, and can be diagnosed using various imaging modalities including thermography. In this paper, we present an approach to analysing breast thermograms based on statistical image features and an effective ensemble method for imbalanced classification problems. We extract a series of features from the images to arrive at indications of asymmetry between left and right breast regions. These then form the input to a classification stage for which we develop a dedicated multiple classifier system that employs neural networks or support vector machines as base classifiers, trains base classifiers on balanced subsets of the training data to address the class imbalance that is typically inherent in medical decision making problems, and fuses the decisions using a neural network combined with a fuzzy diversity measure to remove individual classifiers from the ensemble and to enhance prediction performance. Experimental results, on a large dataset of about 150 breast thermograms, confirm our approach to provide excellent classification performance and to outperform other classifier ensembles designed for imbalanced datasets.

Bartosz Krawczyk, Gerald Schaefer

Rician Noise Removal Approach for Brain MR Images Using Kernel Principal Component Analysis

It has been observed that the noise accumulated in medical images due to various reasons during acquisition process is Rician in nature. A Rician noise removal method of Brain Magnetic Resonance (MR) Images using Kernel Principal Component Analysis (KPCA) is proposed in this paper. The proposed approach is non-parametric in nature. It explores the image space for

non-local

similar patch search and clusters them accordingly. The basis vectors are then learned using KPCA for each cluster which makes the proposed method data adaptive in nature. The approach has been applied to 2D phantom Brain MR images and experimental results are comparable to the other state-of-the-art methods in terms of various quantitative measures.

Ashish Phophalia, Suman K. Mitra

Climate Network Based Index Discovery for Prediction of Indian Monsoon

Identification of climatic indices are vital in essence of their ability to characterize different climatic events. We focus on discovery of climatic indices important for Indian summer monsoon from climatic parameters surface pressure and zonal wind velocity. We use climatic network based community detection approach for discovery of climatic indices. New indices depict better correlation with monsoon than existing indices. Regression and non-linear models are designed using newly discovered climatic indices for prediction of Indian summer monsoon. Models show superior accuracy to existing state of art models.

Moumita Saha, Pabitra Mitra

Using Patterns in Computer Go

Building a good heuristics for a computer program for Go is difficult. Game tree is highly branched and there is a threat that the heuristics would eliminate strong moves. Human players often use patterns to decide where to put stones. Therefore, one of the ideas is to develop the heuristics based on the database of “good” moves denoted by patterns. A pattern is a small segment of the board. Each pattern’s point can be vacant, occupied by black or white stone or can be an off-board point. A potential move is executed in the center of the pattern. Patterns can be acquired from a human expert or through machine learning. This paper presents a technique for: (1) retrieving patterns from a collection of records of games played between human expert players, (2) storing patterns, (3) implementing patterns in a computer program for Go.

Leszek Stanisław Śliwa

Event Detection from Business News

An event is usually defined as a specific happening associated with a particular location and time. Though there has been a lot of focus on detecting events from political and other general News articles, there has not been much work on detecting Business-critical events from Business News. The major difference of business events from other events is that business events are often announcements that may refer to future happenings rather than happenings that have already occurred. In this paper, we propose a method to identify business-critical events within News text and classify them into pre-defined categories using a k-NN method. We also present an event-based retrieval mechanism for business News collections.

Ishan Verma, Lipika Dey, Ramakrishnan S. Srinivasan, Lokendra Singh

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Invited Paper

Frontmatter

Recent Advances in Recommender Systems and Future Directions

Foundations of Machine Learning

Frontmatter

On the Number of Rules and Conditions in Mining Data with Attribute-Concept Values and “Do Not Care” Conditions

Simplifying Contextual Structures

Towards a Robust Scale Invariant Feature Correspondence

A Comparison of Two Approaches to Discretization: Multiple Scanning and C4.5

Hierarchical Agglomerative Method for Improving NPS

A New Linear Discriminant Analysis Method to Address the Over-Reducing Problem

Image Processing

Frontmatter

Procedural Generation of Adjustable Terrain for Application in Computer Games Using 2D Maps

Fixed Point Learning Based 3D Conversion of 2D Videos

Fast and Accurate Foreground Background Separation for Video Surveillance

Enumeration of Shortest Isothetic Paths Inside a Digital Object

Modified Exemplar-Based Image Inpainting via Primal-Dual Optimization

A Novel Approach for Image Super Resolution Using Kernel Methods

Generation of Random Triangular Digital Curves Using Combinatorial Techniques

Image Retrieval

Frontmatter

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval

Face Profile View Retrieval Using Time of Flight Camera Image Analysis

Context-Based Semantic Tagging of Multimedia Data

Image Tracking

Frontmatter

Real-Time Distributed Multi-object Tracking in a PTZ Camera Network

Improved Simulation of Holography Based on Stereoscopy and Face Tracking

Head Pose Tracking from RGBD Sensor Based on Direct Motion Estimation

Pattern Recognition

Frontmatter

A Novel Hybrid CNN-AIS Visual Pattern Recognition Engine

Modified Orthogonal Neighborhood Preserving Projection for Face Recognition

An Optimal Greedy Approximate Nearest Neighbor Method in Statistical Pattern Recognition

Ear Recognition Using Block-Based Principal Component Analysis and Decision Fusion

Data Mining Techniques for Large Scale Data

Frontmatter

Binarizing Change for Fast Trend Similarity Based Clustering of Time Series Data

Big Data Processing by Volunteer Computing Supported by Intelligent Agents

Two Stage SVM and kNN Text Documents Classifier

Task Allocation and Scalability Evaluation for Real-Time Multimedia Processing in a Cluster Environment

Fuzzy Computing

Frontmatter

Concept Synthesis Using Logic of Prototypes and Counterexamples: A Graded Consequence Approach

Fuzzy Rough Sets Theory Reducts for Quantitative Decisions – Approach for Spatial Data Generalization

Fuzzy Rough Sets Theory Applied to Parameters of Eye Movements Can Help to Predict Effects of Different Treatments in Parkinson’s Patients

Determining OWA Operator Weights by Maximum Deviation Minimization

Fuzzy Set Interpretation of Comparator Networks

Inverted Fuzzy Implications in Backward Reasoning

Rough Sets

Frontmatter

Generating Core Based on Discernibility Measure and MapReduce

Music Genre Recognition in the Rough Set-Based Environment

Scalability of Data Decomposition Based Algorithms: Attribute Reduction Problem

Application of Fuzzy Rough Sets to Financial Time Series Forecasting

A New Post-processing Method to Detect Brain Tumor Using Rough-Fuzzy Clustering

Rough Set Based Modeling and Visualization of the Acoustic Field Around the Human Head

Global Optimization of Exact Association Rules Relative to Coverage

Bioinformatics

Frontmatter

PDP-RF: Protein Domain Boundary Prediction Using Random Forest Classifier

A New Similarity Measure for Identification of Disease Genes

MaER: A New Ensemble Based Multiclass Classifier for Binding Activity Prediction of HLA Class II Proteins

Selection of a Consensus Area Size for Multithreaded Wavefront-Based Alignment Procedure for Compressed Sequences of Protein Secondary Structures

Supervised Cluster Analysis of miRNA Expression Data Using Rough Hypercuboid Partition Matrix

Analysis of AmpliSeq RNA-Sequencing Enrichment Panels

Consensus-Based Prediction of RNA and DNA Binding Residues from Protein Sequences

Applications of Artificial Intelligence

Frontmatter

Fusion of Static and Dynamic Parameters at Decision Level in Human Gait Recognition

Web Search Results Clustering Using Frequent Termset Mining

Effective Imbalanced Classification of Breast Thermogram Features

Rician Noise Removal Approach for Brain MR Images Using Kernel Principal Component Analysis

Climate Network Based Index Discovery for Prediction of Indian Monsoon

Using Patterns in Computer Go