Skip to main content

2012 | Buch

Research and Development in Intelligent Systems XXIX

Incorporating Applications and Innovations in Intelligent Systems XX Proceedings of AI-2012, The Thirty-second SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence

herausgegeben von: Max Bramer, Miltos Petridis

Verlag: Springer London

insite
SUCHEN

Über dieses Buch

The papers in this volume are the refereed papers presented at AI-2012, the Thirty-second SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, held in Cambridge in December 2012 in both the technical and the application streams.

They present new and innovative developments and applications, divided into technical stream sections on Data Mining, Data Mining and Machine Learning, Planning and Optimisation, and Knowledge Management and Prediction, followed by application stream sections on Language and Classification, Recommendation, Practical Applications and Systems, and Data Mining and Machine Learning. The volume also includes the text of short papers presented as posters at the conference.

This is the twenty-ninth volume in the Research and Development in Intelligent Systems series, which also incorporates the twentieth volume in the Applications and Innovations in Intelligent Systems series. These series are essential reading for those who wish to keep up to date with developments in this important field.

Inhaltsverzeichnis

Frontmatter

Research and Development in Intelligent Systems XXIX

Biologically inspired speaker verification using Spiking Self-Organising Map

This paper presents a speaker verification system that uses a self organising map composed of spiking neurons. The architecture of the system is inspired by the biomechanical mechanism of the human auditory system which converts speech into electrical spikes inside the cochlea. A spike-based rank order coding input feature vector is suggested that is designed to be representative of the real biological spike trains found within the human auditory nerve. The Spiking Self Organising Map (SSOM) updates its winner neuron only when its activity exceeds a specified threshold. The algorithm is evaluated using 50 speakers from the Centre for Spoken Language Understanding (CSLU2002) speaker verification database and shows a speaker verification performance of 90.1%. This compares favorably with previous non-spiking self organising map that used Discrete Fourier Transform (DFT)-based input feature vector with the same dataset.

Tariq Tashan, Tony Allen, Lars Nolle

DATA MINING

Parallel Random Prism: A Computationally Efficient Ensemble Learner for Classification

Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier’s classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.

Frederic Stahl, David May, Max Bramer
Questionnaire Free Text Summarisation Using Hierarchical Classification

This paper presents an investigation into the summarisation of the free text element of questionnaire data using hierarchical text classification. The process makes the assumption that text summarisation can be achieved using a classification approach whereby several class labels can be associated with documents which then constitute the summarisation. A hierarchical classification approach is suggested which offers the advantage that different levels of classification can be used and the summarisation customised according to which branch of the tree the current document is located. The approach is evaluated using free text from questionnaires used in the SAVSNET (Small Animal Veterinary Surveillance Network) project. The results demonstrate the viability of using hierarchical classification to generate free text summaries.

Matias Garcia-Constantino, Frans Coenen, P-J Noble, Alan Radford
Mining Interesting Correlated Contrast Sets

Contrast set mining has been developed as a data mining task which aims at discerning differences across groups. These groups can be patients, organizations, molecules, and even time-lines. A valid correlated contrast set is a conjunction of attribute-value pairs that are highly correlated with each other and differ significantly in their distribution across groups. Although the search for valid correlated contrast sets produces a comparatively smaller set of results than the search for valid contrast sets, these results must still be further filtered in order to be examined by a domain expert and have decisions enacted from them. In this paper, we apply the minimum support ratio threshold which measures the ratio of maximum to minimum support across groups. We propose a contrast set mining technique which utilizes the minimum support ratio threshold to discover maximal valid correlated contrast sets. We also demonstrate how four probability-based objective measures developed for association rules can be used to rank contrast sets. Our experiments on real datasets demonstrate the efficiency and effectiveness of our approach.

Mondelle Simeon, Robert J. Hilderman, Howard J. Hamilton

DATA MINING AND MACHINE LEARNING

eRules: A Modular Adaptive Classification Rule Learning Algorithm for Data Streams

Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces

eRules

, a new rule based adaptive classifier for data streams, based on an evolving set of Rules.

eRules

induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values

Frederic Stahl, Mohamed Medhat Gaber, Manuel Martin Salvador
A Geometric Moving Average Martingale method for detecting changes in data streams

In this paper, we propose a Geometric Moving Average Martingale (GMAM) method for detecting changes in data streams. There are two components underpinning the GMAM method. The first is the exponential weighting of observations which has the capability of reducing false changes. The second is the use of the GMAM value for hypothesis testing. When a new data point is observed, the hypothesis testing decides whether any change has occurred on it based on the GMAM value. Once a change is detected, then all variables of the GMAM algorithm are re-initialized in order to find other changes. The experiments show that the GMAM method is effective in detecting concept changes in two synthetic time-varying data streams and a real world dataset ‘Respiration dataset’.

X. Z. Kong, Y. X. Bi, D. H. Glass
Using Chunks to Categorise Chess Positions

Expert computer performances in domains such as chess are achieved by techniques different from those which can be used by a human expert. The match between Gary Kasparov and Deep Blue shows that human expertise is able to balance an eight-magnitude difference in computational speed. Theories of human expertise, in particular the chunking and template theories, provide detailed computational models of human long-term memory, how it is acquired and retrieved. We extend an implementation of the template theory, CHREST, to support the learning and retrieval of categorisations of chess positions. Our extended model provides equivalent performance to a support-vector machine in categorising chess positions by opening, and reveals how learning for retrieval relates to learning for content.

Peter C. R. Lane, Fernand Gobet

PLANNING AND OPTIMISATION

S-Theta: low steering path-planning algorithm

The path-planning problem for autonomous mobile robots has been addressed by classical search techniques such as A* or, more recently, Theta*. However, research usually focuses on reducing the length of the path or the processing time. Applying these advances to autonomous robots may result in the obtained “short” routes being less suitable for the robot locomotion subsystem. That is, in some types of exploration robots, the heading changes can be very costly (i.e. consume a lot of battery) and therefore may be beneficial to slightly increase the length of the path and decrease the number of turns (and thus reduce the battery consumption). In this paper we present a path-planning algorithm called S-Theta* that

smoothes

the turns of the path. This algorithm significantly reduces the heading changes, in both, indoors and outdoors problems as results show, making the algorithm especially suitable for robots whose ability to turn is limited or the associated cost is high.

Pablo Muñoz, María D. R-Moreno
Comprehensive Parent Selection-based Genetic Algorithm

During the past few years, many variations of genetic algorithm (GA) have been proposed. These algorithms have been successfully used to solve problems in different disciplines such as engineering, business, science, and networking etc. Real world optimization problems are divided into two categories: (1) single objective, and (2) multi-objective. Genetic algorithms have key advantages over other optimization techniques to deal with multi-objective optimization problems. One of the most popular techniques of GA to obtain the Pareto-optimal set of solutions for multi-objective problems is the non-dominated sorting genetic algorithm- II (NSGA-II). In this paper, we propose a variant of NSGA-II that we call the comprehensive parent selection-based genetic algorithm (CPSGA). The proposed strategy uses the information of all the individuals to generate new offspring from the selected parents. This strategy ensures diversity to discourage premature convergence. CPSGA is tested using the standard ZDT benchmark problems and the performance metrics taken from the literature. Moreover, the results produced are compared with the original NSGA-II algorithm. The results show that the proposed approach is a viable alternative to solve multi-objective optimization problems.

Hamid Ali, Farrukh Aslam Khan
Run-Time Analysis of Classical Path-Planning Algorithms

Run-time analysis is a type of empirical tool that studies the time consumed by running an algorithm. This type of analysis has been successfully used in some Artificial Intelligence (AI) fields, in paticular in Metaheuristics. This paper is an attempt to bring this tool to the path-planning community. In particular, we analyse the statistical properties of the run-time of the A*, Theta* and S-Theta* algorithms with a variety of problems of different degrees of complexity. Traditionally, the path-planning literature has compared run-times just comparing their mean values. This practice, which unfortunately is quite common in the literature, raises serious concerns from a methodological and statistical point of view. Simple mean comparison provides poorly supported conclusions, and, in general, it can be said that this practice should be avoided.

After our analysis, we conclude that the time required by these three algorithms follows a lognormal distribution. In low complexity problems, the lognormal distribution looses some accuracy to describe the algorithm run-times. The lognormality of the run-times opens the use of powerful parametric statistics to compare execution times, which could lead to stronger empirical methods.

Pablo Muñoz, David F. Barrero, María D. R-Moreno

KNOWLEDGE MANAGEMENT AND PREDICTION

A Generic Platform for Ontological Query Answering

The paper presents ALASKA, a multi-layered platform enabling to perform ontological conjunctive query answering (OCQA) over heterogeneously-stored knowledge bases in a generic, logic-based manner. While this problem knows today a renewed interest in knowledge-based systems with the semantic equivalence of different languages widely studied, from a practical view point this equivalence has been not made explicit. Moreover, the emergence of graph database provides competitive storage methods not yet addressed by existing literature.

Bruno Paiva Lima da Silva, Jean-François Baget, Madalina Croitoru
Multi-Agent Knowledge Allocation

Classical query answering either assumes the existence of just one knowledge requester, or knowledge requests from distinct parties are treated independently. Yet, this assumption is inappropriate in practical applications where requesters are in direct competition for knowledge. We provide a formal model for such scenarios by proposing the Multi-Agent Knowledge Allocation (MAKA) setting which combines the fields of query answering in information systems and multi-agent resource allocation.We define a bidding language based on exclusivityannotated conjunctive queries and succinctly translate the allocation problem into a graph structure which allows for employing network-flow-based constraint solving techniques for optimal allocation.

Sebastian Rudolph, Madalina Croitoru
A Hybrid Model for Business Process Event Prediction

Process event prediction is the prediction of various properties of the remaining path of a process sequence or workflow. The prediction is based on the data extracted from a combination of historical (closed) and/or live (open) workflows (jobs or process instances). In real-world applications, the problem is compounded by the fact that the number of unique workflows (process prototypes) can be enormous, their occurrences can be limited, and a real process may deviate from the designed process when executed in real environment and under realistic constraints. It is necessary for an efficient predictor to be able to cope with the diverse characteristics of the data.We also have to ensure that useful process data is collected to build the appropriate predictive model. In this paper we propose an extension of Markov models for predicting the next step in a process instance.We have shown, via a set of experiments, that our model offers better results when compared to methods based on random guess, Markov models and Hidden Markov models. The data for our experiments comes from a real live process in a major telecommunication company.

Mai Le, Bogdan Gabrys, Detlef Nauck

SHORT PAPERS

A Comparison of Machine Learning Techniques for Recommending Search Experiences in Social Search

In this paper we focus on one particular implementation of social search, namely

HeyStaks

, which combines ideas from web search, content curation, and social networking to make recommendations to users, at search time, based on topics that matter to them. The central concept in HeyStaks is the

search stak

. Users can create and share staks as a way to curate their search experiences. A key problem for HeyStaks is the need for users to pre-select their active stak at search time, to provide a context for their current search experience so that HeyStaks can index and store what they find. The focus of this paper is to look at how machine learning techniques can be used to recommend a suitable active stak to the user at search time automatically.

Zurina Saaya, Markus Schaal, Maurice Coyle, Peter Briggs, Barry Smyth
A Cooperative Multi-objective Optimization Framework based on Dendritic Cells Migration Dynamics

Clonal Selection

and

Immune Network Theory

are commonly adopted for resolving optimization problems. Here, the mechanisms of migration and maturation of Dendritic Cells (DCs) is adopted for pursuing pareto optimal solution( s) in complex problems, specifically, the adoption of multiple characters of distinct clones of DCs and the immunological control parameters in the process of signal cascading. Such an unconventional approach, namely,

DC-mediated Signal Cascading Framework

further exploits the intrinsic abilities of DCs, with the added benefit of overcoming some of the limitations of conventional optimization algorithms, such as convergence of the Pareto Front.

N. M. Y. Lee, H. Y. K. Lau
R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora

Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.

Ben Blamey, Tom Crick, Giles Oatley
Predicting Multi-class Customer Profiles Based on Transactions: a Case Study in Food Sales

Predicting the class of customer profiles is a key task in marketing, which enables businesses to approach the customers in a right way to satisfy the customer’s evolving needs. However, due to costs, privacy and/or data protection, only the business’ owned transactional data is typically available for constructing customer profiles. We present a new approach that is designed to efficiently and accurately handle the multi-class classification of customer profiles built using sparse and skewed transactional data. Our approach first bins the customer profiles on the basis of the number of items transacted. The discovered bins are then partitioned and prototypes within each of the discovered bins selected to build the multi-class classifier models. The results obtained from using four multi-class classifiers on real-world transactional data consistently show the critical numbers of items at which the predictive performance of customer profiles can be substantially improved.

Edward Apeh, Indrė Žliobaitė, Mykola Pechenizkiy, Bogdan Gabrys
Controlling Anytime Scheduling of Observation Tasks

This paper describes how multiple independent observation tasks can be scheduled for an autonomous vehicle. Presented with large numbers of tasks, of differing reward levels, a vehicle has to evaluate the best schedule to execute given a limited time to both plan and act. A meta-management framework acting on top of an anytime scheduler analyses the problem and the progress made in generating solutions to identify when to stop planning and start executing. We compare a probabilistic management technique with active monitoring of the current execution reward and conclude that in this case detecting a local maxima in the predicted reward is the most effective policy.

J W Baxter, J Hargreaves, N Hawes, R Stolkin

Applications and Innovations in Intelligent Systems XX

Swing Up and Balance Control of the Acrobot Solved by Genetic Programming

The evolution of controllers using genetic programming is described for the continuous, limited torque minimum time swing-up and inverted balance problems of the acrobot. The best swing-up controller found is able to swing the acrobot up to a position very close to the inverted ‘handstand’ position in a very short time, which is comparable to the results which have been achieved by other methods using similar parameters for the dynamic system. The balance controller is successful at keeping the acrobot in the unstable, inverted position when starting from the inverted position.

Dimitris C. Dracopoulos, Barry D. Nichols

LANGUAGE AND CLASSIFICATION

Biologically inspired Continuous Arabic Speech Recognition

Despite many years of research into speech recognition systems, there are limited research publications available covering Arabic speech recognition. Although statistical techniques have been the most applied techniques for such classification problems, Neural Networks have also recorded successful results in speech recognition. In this research three different biologically inspired Continuous Arabic Speech Recognition neural network system structures are presented. An Arabic phoneme database (APD) of six male speakers was constructed manually from the King Abdulaziz Arabic Phonetics Database (KAPD). The Mel-Frequency Cepstrum Coefficients (MFCCs) algorithm was used to extract the phoneme features from the speech signals of this database. The normalized dataset was used to train and test three different architectures of Multilayer Perceptron (MLP) neural network identification systems.

N. Hmad, T. Allen
A cross-study of Sentiment Classification on Arabic corpora

Sentiment Analysis is a research area where the studies focus on processing and analyzing the opinions available on the web. Several interesting and advanced works were performed on English. In contrast, very few works were conducted on Arabic. This paper presents the study we have carried out to investigate supervised sentiment classification in an Arabic context. We use two Arabic Corpora which are different in many aspects. We use three common classifiers known by their effectiveness, namely Naïve Bayes, Support Vector Machines and k-Nearest Neighbor. We investigate some settings to identify those that allow achieving the best results. These settings are about stemming type, term frequency thresholding, term weighting and n-gram words. We show that Naïve Bayes and Support Vector Machines are competitively effective; however k- Nearest Neighbor’s effectiveness depends on the corpus. Through this study, we recommend to use light-stemming rather than stemming, to remove terms that occur once, to combine unigram and bigram words and to use presence-based weighting rather than frequency-based one. Our results show also that classification performance may be influenced by documents length, documents homogeneity and the nature of document authors. However, the size of data sets does not have an impact on classification results.

A. Mountassir, H. Benbrahim, I. Berrada
Towards the Profiling of Twitter Users for Topic-Based Filtering

There is no doubting the incredible impact of Twitter on how we communicate, access and share information online. Currently users can follow other users or hashtags in order to benefit from a stream of data from people they trust or on topics that matter to them. However at the moment the

following

granularity of Twitter means that users cannot limit their information streams to a set of topics

by

a given user. Thus, even the most carefully curated information streams can quickly become polluted with extraneous content. In this paper we describe our initial steps to improve this situation by proposing a profiling approach that can be used for information filtering purposes as well as recommendation purposes. First, we demonstrate that it is feasible to automatically profile the interests of users by using machine learning techniques to classify the pages that they share via their tweets. We then go on to describe how this profiling mechanism can be used to organise and filter Twitter information streams. In particular we present a system that provides for a more fine-grained way to follow users on specific topics and thereby refine the standard Twitter timeline based on a user’s core topical interests.

Sandra Garcia Esparza, Michael P. O’Mahony, Barry Smyth

RECOMMENDATION

Content vs. Tags for Friend Recommendation

Recently, friend recommendation has become an important application in a variety of social networking contexts, whether as part of in-house enterprise networks or as part of public networks like Twitter and Facebook. The value of these social networks is based, in part at least, on connecting the right people. But friend recommendation is challenging and many systems do little to help users make these valuable connections. In this paper, we build on previous work to consider new strategies for friend recommendation on Twitter. In particular, we compare strategies based on the content of users tweets, recommending users who tweet about similar things, to strategies based on Twitter-list tags by recommending users who are members of lists on similar topics. We describe a comprehensive evaluation to highlight the different benefits of these complementary strategies. We also discuss the most appropriate ways to evaluate their recommendations.

John Hannon, Kevin McCarthy, Barry Smyth
Collaborative Filtering For Recommendation In Online Social Networks

In the past recommender systems have relied heavily on the availability of ratings data as the raw material for recommendation. Moreover, popular collaborative filtering approaches generate recommendations by drawing on the interests of users who share similar ratings patterns. This is set to change because of the

unbundling

of social networks (via open APIs), providing a richer world of recommendation data. For example, we now have access to a richer source of ratings and preference data, across many item types. In addition, we also have access to mature social graphs, which means we can explore different ways of creating recommendations, often based on explicit social links and friendships. In this paper we evaluate a conventional collaborative filtering framework in the context of this richer source of social data and clarify some important new opportunities for improved recommendation performance.

Steven Bourke, Michael P O’Mahony, Rachael Rafter, Kevin McCarthy, Barry Smyth
Unsupervised Topic Extraction for the Reviewer’s Assistant

User generated reviews are now a familiar and valuable part of most ecommerce sites since high quality reviews are known to influence purchasing decisions. In this paper we describe work on the

Reviewer’s Assistant

(RA), which is a recommendation system that is designed to help users to write better reviews. It does this by suggesting relevant topics that they may wish to discuss based on the product they are reviewing and the content of their review so far. We build on prior work and describe an unsupervised topic extraction module for the RA system that enhances the system’s ability to automatically adapt to new content categories and application domains. Our main contribution includes the results of a controlled, live-user study to show that the RA system is capable of supporting users to create reviews that enjoy higher quality ratings than Amazon’s own high quality reviews, even without using manually created topic models.

Ruihai Dong, Markus Schaal, Michael P. O’Mahony, Kevin McCarthy, Barry Smyth

PRACTICAL APPLICATIONS AND SYSTEMS

Adapting Bottom-up, Emergent Behaviour for Character-Based AI in Games

It is widely acknowledged that there is a demand for alternatives to handcrafted character behaviour in interactive entertainment/video games. This paper investigates a simple agent architecture inspired by the thought experiment “Vehicles: Experiments in Synthetic Psychology” by the cyberneticist and neuroscientist Valentino Braitenberg [1]. It also shows how architectures based on the core principles of bottom-up, sensory driven behaviour controllers can demonstrate emergent behaviour and increase the believability of virtual agents, in particular for application in games.

Micah Rosenkind, Graham Winstanley, Andrew Blake
Improving WRF-ARW Wind Speed Predictions using Genetic Programming

Numerical weather prediction models can produce wind speed forecasts at a very high space resolution. However, running these models with that amount of precision is time and resource consuming. In this paper, the integration of the Weather Research and Forecasting – Advanced Research WRF (WRF-ARW) mesoscale model with four different downscaling approaches is presented. Three of the proposed methods are mathematical based approaches that need a predefined model to be applied. The fourth approach, based on genetic programming (GP), will implicitly find the optimal model to downscale WRF forecasts, so no previous assumptions about the model need to be made. WRFARW forecasts and observations at three different sites of the state of Illinois in the USA are analysed before and after applying the downscaling techniques. Results have shown that GP is able to successfully downscale the wind speed predictions, reducing significantly the inherent error of the numerical models.

Giovanna Martinez-Arellano, Lars Nolle, John Bland
Optimizing Opening Strategies in a Real-time Strategy Game by a Multi-objective Genetic Algorithm

This paper presents modeling, forward simulation, and optimization of different opening strategies in the real-time strategy game Starcraft 2. We implemented an event-driven simulator in C# with graphical user interface. In order to find optimal build orders, we employ a modified version of the multi-objective genetic algorithm NSGA II. Procedural constraints e.g. given by the tech-tree or other game mechanisms, are implicitly encoded into the chromosomes. Additionally, the size of the active part of the chromosomes is not known a priori, and the objectives values have a small diversity. The model was tested on different Tech-Pushes and Rushes, and validated with empirical data of expert Starcraft 2 players.

Björn Gmeiner, Gerald Donnert, Harald Köstler
Managing Uncertainties in the Field of Planning and Budgeting – An Interactive Fuzzy Approach

Despite all effort in last decades, Planning and Budgeting (P&B) is still a challenging task. The corresponding processes for large organizations are usually very resource-intensive, time consuming and costive. One of the major issues is to manage uncertainty. Apart from that, P&B methods have to fulfill other requirements. They have to be consistent with human thinking and should allow realistic modeling. In this paper we analyze whether interactive fuzzy approaches solve these issues and if they could be a new way to fulfill the above mentioned requirements efficiently. For that purpose, prior research related to P&B problems as well as practical issues and challenges will be outlined. Based on a case study, fuzzy interactive solutions will be analyzed. Possible benefits and issues of the applied approach are discussed and ideas about ongoing and further research are given.

Peter Rausch, Heinrich J. Rommelfanger, Michael Stumpf, Birgit Jehle

DATA MINING AND MACHINE LEARNING

Identification of Correlations Between 3D Surfaces Using Data Mining Techniques: Predicting Springback in Sheet Metal Forming

A classification framework for identifying correlations between 3D surfaces in the context of sheet metal forming, especially Asymmetric Incremental Sheet Forming (AISF), is described. The objective is to predict “springback”, the deformation that results as a consequence of the application of a sheet metal forming processes. Central to the framework there are two proposed mechanisms to represent the geometry of 3D surfaces that are compatible with the concept of classification. The first is founded on the concept of a Local Geometry Matrix (LGM) that concisely describes the geometry surrounding a location in a 3D surface. The second, is founded on the concept of a Local Distance Measure (LDM) derived from the observation that springback is greater at locations that are away from edges and corners. The representations have been built into a classification framework directed at the prediction of springback values. The proposed framework and representations have been evaluated using two surfaces, a small and a large flat-topped pyramid, and by considering a variety of classification mechanisms and parameter settings.

Subhieh El-Salhi, Frans Coenen, Clare Dixon, Muhammad Sulaiman Khan
Towards The Collection of Census Data From Satellite Imagery Using Data Mining: A Study With Respect to the Ethiopian Hinterland

The collection of census data is an important task with respect to providing support for decision makers. However, the collection of census data is also resource intensive. This is especially the case in areas which feature poor communication and transport networks. In this paper a method is proposed for collecting census data by applying classification techniques to relevant satellite imagery. The test site for the work is a collection of villages lying some 300km to the northwest of Addis Ababa in Ethiopia. The idea is to build a classifier that can label households according to “family” size. To this end training data has been obtained, by collecting on ground census data and aligning this up with satellite data. The fundamental idea is to segment satellite images so as to obtain the satellite pixels describing individual households and representing these segmentations using a histogram representation. By pairing each histogram represented household with collated census data, namely family size, a classifier can be constructed to predict household sizes according to the nature of the histograms. This classifier can then be used to provide a quick and easy mechanism for the approximate collection of census data that does not require significant resource.

Kwankamon Dittakan, Frans Coenen, Rob Christley
Challenges in Applying Machine Learning to Media Monitoring

The Gorkana Group provides high quality media monitoring services to its clients. This paper describes an ongoing project aimed at increasing the amount of automation in Gorkana Group’s workflow through the application of machine learning and language processing technologies. It is important that Gorkana Group’s clients should have a very high level of confidence that if an article has been published, that is relevant to one of their briefs, then they will be shown the article. However, delivering this high-quality media monitoring service means that humans are having to read through very large quantities of data, only a small portion of which is typically deemed relevant. The challenge being addressed by the work reported in this paper is how to efficiently achieve such high-quality media monitoring in the face of huge increases in the amount of the data that needs to be monitored. This paper discusses some of the findings that have emerged during the early stages of the project. We show that, while machine learning can be applied successfully to this real world business problem, the distinctive constraints of the task give rise to a number of interesting challenges.

Matti Lyra, Daoud Clarke, Hamish Morgan, Jeremy Reffin, David Weir

SHORT PAPERS

Comparative Study of One-Class Classifiers for Item-based Filtering

In this paper, we address the recommendation process as a one-class classification problem. One-class classification is an umbrella term that covers a specific subset of learning problems that try to induce a general function that can discriminate between two classes of interest, given the constraint that training patterns are available only from one class. Usually, users provide ratings only for items that they are interested in and belong to their preferences without to give information for items that they dislike. The problem in one-class classification is to make a description of a target set of items and to detect which items are similar to this training set. We conduct a comparative study of one-class classifiers from density, boundary and reconstruction methods. The experimental results show that one-class classifiers do not only cope with the problem of missing of negative examples but also, succeed to perform efficiently in the recommendation process.

Aristomenis S. Lampropoulos, George A. Tsihrintzis
Hybridization of Adaptive Differential Evolution with BFGS

Local search

(LS) methods start from a point and use the gradient or objective function value to guide the search. Such methods are good in searching the neighborhood of a given solution (i.e., they are good at exploitation), but they are poor in exploration.

Evolutionary Algorithms

(EAs) are nature inspired populationbased search optimizers. They are good in exploration, but not as good at exploitation as LS methods. Thus, it makes sense to hybridize EAs with LS techniques to arrive at a method which benefits from both and, as a result, have good search ability.

Broydon-Fletcher-Goldfarb-Shanno

(BFGS) method is a gradient-based LS method designed for nonlinear optimization. It is an efficient, but expensive method.

Adaptive Differential Evolution with Optional External Archive

(JADE) is an efficient EA. Nonetheless, its performance decreases with the increase in problem dimension. In this paper, we present a new hybrid algorithm of JADE and BFGS, called

Hybrid of Adaptive Differential Evolution and BFGS

, or DEELS, to solve the unconstrained continuous optimization problems. The performance of DEELS is compared, in terms of the statistics of the function error values with JADE.

R. A. Khanum, M. A. Jan
Metadaten
Titel
Research and Development in Intelligent Systems XXIX
herausgegeben von
Max Bramer
Miltos Petridis
Copyright-Jahr
2012
Verlag
Springer London
Electronic ISBN
978-1-4471-4739-8
Print ISBN
978-1-4471-4738-1
DOI
https://doi.org/10.1007/978-1-4471-4739-8

Premium Partner