Skip to main content

Über dieses Buch

Data mining deals with finding patterns in data that are by user-definition, interesting and valid. It is an interdisciplinary area involving databases, machine learning, pattern recognition, statistics, visualization and others.
Decision support focuses on developing systems to help decision-makers solve problems. Decision support provides a selection of data analysis, simulation, visualization and modeling techniques, and software tools such as decision support systems, group decision support and mediation systems, expert systems, databases and data warehouses.

Independently, data mining and decision support are well-developed research areas, but until now there has been no systematic attempt to integrate them. Data Mining and Decision Support: Integration and Collaboration, written by leading researchers in the field, presents a conceptual framework, plus the methods and tools for integrating the two disciplines and for applying this technology to business problems in a collaborative setting.



Basic Technologies


Chapter 1. Data Mining

This chapter gives an informal introduction to data mining, an area that grew into a recognizable scientific and engineering discipline through the nineties. This development is due to the advances in data analysis research, growth in the database industry and the resulting needs in the market for methods that are capable of extracting value from the large data stores. In this chapter, data mining is presented from historical, application and scientific perspective. The chapter describes selected data mining methods that proved useful in the applications described in this book.
Nada Lavrač, Marko Grobelnik

Chapter 2. Text and Web Mining

This chapter describes text and Web mining, illustrating the potential of the methods by giving examples of several applications, identified through interaction with end users. We provide short descriptions and references for the selected text and web mining methods that were shown to be useful for the problems we have addressed.
Dunja Mladenić, Marko Grobelnik

Chapter 3. Decision Support

This chapter describes and clarifies the meaning of the term decision support. Taking a broad view, a classification of decision support and related disciplines is presented. Decision support is put in the context of decision making, and an overview of some of the most important disciplines within decision support is provided including: operations research, decision analysis, decision support systems, data warehousing, and group decision support. Among these, the chapter focuses on the multi-attribute modeling methodology, presenting its basic concepts, introducing the modeling tools DEX and DEXi, and describing main model development phases.
Marko Bohanec

Chapter 4. Integration of Data Mining and Decision Support

This chapter discusses some integration aspects of two separate research areas: data mining and decision support. It investigates how data mining can be used to enhance decision support, and how to use data mining for making better decisions. It also highlights the standardization efforts and other recent trends in the integration of the two approaches.
Nada Lavrač, Marko Bohanec

Chapter 5. Collaboration in a Data Mining Virtual Organization

Both data mining and decision support are branches of applied problem solving. Both fields are not simply about technology, but are processes that require highly skilled humans. As with any knowledge intensive enterprise, collaboration — be it local or remote — offers the potential of improved results by harnessing dispersed expertise and enabling knowledge sharing and learning. This was precisely the objective of the SolEuNet Project — to solve problems utilizing teams of geographically dispersed experts. Unfortunately, organizations find that realizing the potential of remote e-collaboration is not an easy process. To assist in the understanding of difficulties in e-collaborative enterprises, a model of the e-collaboration space is reviewed. The SolEuNet Remote Data Mining Virtual Organization and its implemented methodology — a key factor for success — is analyzed with respect to the e-collaboration space model. The case studies of three instances of using the Remote Data Mining Virtual Organization are presented.
Steve Moyle, Jane McKenzie, Alípio Jorge

Chapter 6. Data Mining Processes and Collaboration Principles

Data mining is a process involving the application of human skill as well as technology, and as such it can be supported by clearly defined processes and procedures. This chapter presents the CRISP-DM process, one well developed standard data mining process, which contains clearly defined phases with clearly defined steps and deliverables. The nature of some of the CRISP-DM phases is such that it is possible to perform them in an e-collaboration setting. The principles for extending the CRISP-DM process to support collaborative data mining are described in the RAMSYS approach to data mining. The tools, systems, and evaluation procedures that are required for the RAMSYS approach to reach its potential are described.
Alípio Jorge, Steve Moyle, Hendrik Blockeel, Angi Voß

Integration Aspects of Data Mining and Decision Support


Chapter 7. Decision Support for Data Mining

An introduction to ROC analysis and its applications
In this chapter we give an introduction to ROC (‘receiver operating characteristics’) analysis and its applications to data mining. We argue that ROC analysis provides decision support for data mining in several ways. For model selection, ROC analysis establishes a method to determine the optimal model once the operating characteristics for the model deployment context are known. We also show how ROC analysis can aid in constructing and refining models in the modeling stage.
Peter Flach, Hendrik Blockeel, Cèsar Ferri, José Hernández-Orallo, Jan Struyf

Chapter 8. Data Mining for Decision Support

Supporting marketing decisions through subgroup discovery
This chapter presents two methods that combine data mining and decision support techniques with the aim to generate actionable knowledge. Both methods follow the same methodology in which data mining is used to support decision-making. The methodology consists of the following phases: business understanding; data acquisition, data understanding and preprocessing; data mining through subgroup discovery; subgroup evaluation; and deployment for decision support. The two methods have been applied to support decisionmaking in marketing.
Bojan Cestnik, Nada Lavrač, Peter Flach, Dragan Gamberger, Mihael Kline

Chapter 9. Preprocessing for Data Mining and Decision Support

The goal of this chapter is to identify data preprocessing tasks that can benefit from the existence of software support, and to describe the basic requirements on the tool, which can serve this purpose. These requirements are implemented in the data transformation tool, SumatraTT. The design principles and basic functionality of SumatraTT are explained. The chapter concludes by a brief evaluation of experience gained using SumatraTT was in different tasks, and with a summary of plans for its further development.
Olga Štěpánková, Petr Aubrecht, Zdeněk Kouba, Petr Mikšovský

Chapter 10. Data Mining and Decision Support Integration through the Predictive Model Markup Language Standard and Visualization

The emerging standard for the platform- and system-independent representation of data mining models, PMML (Predictive Model Markup Language), is currently supported by a number of knowledge discovery support engines (KDDSE). The primary purpose of the PMML standard is to separate model generation from model storage in order to enable users to view, post-process, and utilize data mining models independently of the KDDSE that generated the model. In this chapter, an architectural framework for collaborative data mining and decision support that utilizes PMML is described. Important parts of such a general framework are visualization and evaluation methods for data mining models. Two such systems, called VizWiz and PEAR, are described in some detail.
Dietrich Wettschereck, Alípio Jorge, Steve Moyle

Applications of Data Mining and Decision Support


Chapter 11. Analysis of Slovenian Media Space

Media space consists of many different factors trying to attract for the attention of the customer population in a certain environment. A common problem in bigger environments (or countries) is that datasets describing the complete media space are hard or almost impossible to obtain since the detailed picture is too complex or too expensive to compose. However, this is not the case in smaller environments, where it is easier to collect the data. In this study, the access was provided to the data describing the entire media space of a population of 2 million people in Slovenia. Because of the language and the economy, this media space behaves relatively independently of various influences, particularly those from outside the country. The data was collected in 1998 by the Media Research Institute, Mediana, and assembled into a database consisting of 8000 questionnaires. The sampling method and the structure of the questionnaires were designed according to well-established international research standards. In this chapter we present and discuss different types of analyses, performed on the data to better understand the media space.
Marko Grobelnik, Maja Škrjanc, Darko Zupanič

Chapter 12. On the Road to Knowledge

Mining 21 years of UK traffic accident reports
In this chapter we describe our experience with mining a large multi-relational database of traffic accident reports. We applied a range of data mining techniques to this dataset, including text mining, clustering of time series, subgroup discovery, multi-relational data mining, and association rule learning. We also describe a collaborative data mining challenge on part of the dataset.
Peter Flach, Hendrik Blockeel, Thomas Gärtner, Marko Grobelnik, Branko Kavšek, Martin Kejkula, Darek Krzywania, Nada Lavrač, Peter Ljubič, Dunja Mladenić, Steve Moyle, Stefan Raeymaekers, Jan Rauch, Simon Rawles, Rita Ribeiro, Gert Sclep, Jan Struyf, Ljupčo Todorovski, Luis Torgo, Dietrich Wettschereck, Shaomin Wu

Chapter 13. Analysis of a Database of Research Projects Using Text Mining and Link Analysis

This chapter describes an application of text mining and link analysis to the database of research and development projects funded within the information technology European program in years 2000–2005. The main items in the research project database were textual description of each project and the list of organizations participating in the project. The goal was to find various informative insights into the research project database, which would enable better understanding of the past dynamics and provide ground for better planning of the future research programs. In the analysis we used three types of analytic methods: text mining, link analysis, and several visualization techniques. The main emphasis was on the analysis of various aspects of research collaboration between different objects (such as institution, countries, and research areas).
Marko Grobelnik, Dunja Mladenić

Chapter 14. Web Site Access Analysis for a National Statistical Agency

Web access log analysis is gaining popularity, especially with the growing number of commercial web sites selling their products. The driver for this increase in interest is the promise of gaining some insights into the behaviour of users/customers when browsing through their Web site, fuelled by the desire to improve the user experience. In this chapter we describe the approach taken in analysing web access logs of a non-commercial Web site disseminating Portuguese statistical data. In developing the approach, we follow the common steps for data mining applications (the CRISP-DM phases), and give details about several phases involved in developing the data mining solution. Through intensive communication with the web site owner, we identified three data mining problems which were successfully addressed using different tools and methods. The solution methodology is briefly described here accompanied with some of the results for illustrative purposes. We conclude with an attempt to generalize our experience and provide a number of lessons learned.
Alípio Jorge, Mário A. Alves, Marko Grobelnik, Dunja Mladenić, Johann Petrak

Chapter 15. Five Decision Support Applications

This chapter presents five real-life applications of decision support methods and techniques, conducted within the SolEuNet Project. The problem areas were: (1) The selection of banks for the Slovenian National Housing Program, (2) Housing loan allocation, (3) diabetic foot risk assessment, (4) model development for the selection of information technology providers, and (5) the evaluation of research project proposals. The approach was based primarily on qualitative multi-attribute modeling, and was combined with databases and other general modeling methods. The bank selection case is presented in detail whereas the remaining four applications are presented quantitatively.
Marko Bohanec, Vladislav Rajkovič, Bojan Cestnik

Chapter 16. Large and Tall Buildings

A case study in the application of decision support and data mining
Large and Tall buildings can be broadly classified into three groups: sprawling, squat, or tall. The decision to build a particular type of large building can be based on a vast number of attributes. A building construction expert’s analysis of seventy international building projects was used as input to further decision support and data mining analyses. Decision models were developed that incorporated customer values of proposed construction project attributes. Data mining was used to model the feasibility of construction projects from their input attributes. On this basis, we propose a novel way of integrating data mining and decision support methods, where both techniques are used. In this approach both techniques are employed to utilize the same input vectors. While decision support models are designed to assess and possibly maximize utility, data mining models provide a test for feasibility.
Steve Moyle, Marko Bohanec, Eric Osrowski

Chapter 17. A Combined Data Mining and Decision Support Approach to Educational Planning

Multi-attribute hierarchical models for the prediction of final academic achievement in a particular high school educational program were developed by a sequential application of data mining and decision support methods. A database of pupils’ achievements was first analyzed by different data mining methods. Then the findings were incorporated into expert-developed decision support models. The predictive accuracies of these models were comparable to that of experienced human experts.
Silvana Gasar, Marko Bohanec, Vladislav Rajkovič

Collaboration Aspects


Chapter 18. Collaborative Data Mining With Ramsys and Sumatra TT

Prediction of resources for a health farm
This chapter catalogs the experience gained during a collaborative data mining project solved using the RAMSYS methodology. The data mining project aimed to produce a system for planning the allocation of resources in a spa (health farm). The chapter discusses and describes how past data can be used as a source for data mining leading to the discovery of models useful for the prediction of resource requirements. Data preprocessing using the SumatraTT tool is emphasized. Difficulties which appeared during the collaborative data mining process are highlighted, and their reasons are identified. The chapter concludes with several suggestions for effective knowledge management supporting concise and transparent information exchange among all participating partners.
Olga Štěpánková, Jiří Kléma, Petr Mikšovský

Chapter 19. Collaborative Decision Making

An environmental case study
This chapter presents a method for supporting collaborative decision making with groups of people having different backgrounds and varying levels of expertise. A method of multi-attribute decision modeling is proposed for such situations. An experiment was carried out in which the participants were involved in collaborative decision modeling to choose a location for low and intermediate level radioactive waste disposal The results show that due to the well-defined procedures the participants were able to produce complex decision models that were evaluated by the experts as reasonable and relevant. This opens new perspectives in the practice of environmental decision making and confirms the applicability of collaborative multi-attribute decision modeling to a wide range of demanding real-world domains.
Tanja Urbančič, Marko Bohanec, Branko Kontić

Chapter 20. Lessons Learned From Data Mining, Decision Support and Collaboration

This chapter reports on the experience of the partners of the SolEuNet project in solving data mining and decision support problems in a collaborative way. The lessons learned are presented from the following perspectives: research, business, collaborative problem solving, and customer applications of data mining, text mining and decision support methods.
Dunja Mladenić, Nada Lavrač, Marko Bohanec

Chapter 21. Internet Support to Collaboration

A knowledge management and organizational memory view
The knowledge gathered by an organization throughout its activity is too valuable an asset to be kept volatile, always dependent on those who produced it. Organizational knowledge also tends to be tacit, and distributed, so only a small part of it is likely to be acquired and retained. This chapter describes a number of techniques and tools for capturing this kind of knowledge, applied to a particular research project organization. These techniques cover the design of versatile information collection templates and ways of collecting information from members of the organization. Another important aim is to make such information available internally to the organization, as well as externally to the world. The collection and dissemination of organizational knowledge is realized through Web systems. The Web is also used to link the (distributed) set of tools into an integrated system, allowing some of these tools to communicate automatically. Important issues such as security and ease of maintenance are also addressed.
Alípio Jorge, Damjan Bojadžiev, Mário Amado Alves, Olga Štěpánková, Dunja Mladenić, Jiří Palouš, Peter Flach, Johann Petrak

Chapter 22. Mind The Gap

Academia-business partnership models and e-collaboration lessons learned
This chapter presents organizational models for the networking of international expert teams from academia and business in the area of data mining and decision support. Among the considered models, the virtual enterprise model was chosen as the basis for a flexible collaboration between academic institutions and business entities, established with the objective of promoting and selling advanced services on the market. Besides the analysis of the partnership models considered, this chapter presents some other e-collaboration lessons learned during the course of a three-year academia-business project partnership.
Nada Lavrač, Tanja Urbančič


Weitere Informationen