nach oben

2008 | Buch

Kapitel lesen Erstes Kapitel lesen

Visual Data Mining

Theory, Techniques and Tools for Visual Analytics

herausgegeben von: Simeon J. Simoff, Michael H. Böhlen, Arturas Mazeika

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Visual Data Mining—Opening the Black Box Knowledge discovery holds the promise of insight into large, otherwise opaque datasets. Thenatureofwhatmakesaruleinterestingtoauserhasbeendiscussed 1 widely but most agree that it is a subjective quality based on the practical u- fulness of the information. Being subjective, the user needs to provide feedback to the system and, as is the case for all systems, the sooner the feedback is given the quicker it can in?uence the behavior of the system. There have been some impressive research activities over the past few years but the question to be asked is why is visual data mining only now being - vestigated commercially? Certainly, there have been arguments for visual data 2 mining for a number of years – Ankerst and others argued in 2002 that current (autonomous and opaque) analysis techniques are ine?cient, as they fail to - rectly embed the user in dataset exploration and that a better solution involves the user and algorithm being more tightly coupled. Grinstein stated that the “current state of the art data mining tools are automated, but the perfect data mining tool is interactive and highly participatory,” while Han has suggested that the “data selection and viewing of mining results should be fully inter- tive, the mining process should be more interactive than the current state of the 2 art and embedded applications should be fairly automated . ” A good survey on 3 techniques until 2003 was published by de Oliveira and Levkowitz .

Inhaltsverzeichnis

Frontmatter

Visual Data Mining: An Introduction and Overview

Abstract

In our everyday life we interact with various information media, which present us with facts and opinions, supported with some evidence, based, usually, on condensed information extracted from data. It is common to communicate such condensed information in a visual form – a static or animated, preferably interactive, visualisation. For example, when we watch familiar weather programs on the TV, landscapes with cloud, rain and sun icons and numbers next to them quickly allow us to build a picture about the predicted weather pattern in a region. Playing sequences of such visualisations will easily communicate the dynamics of the weather pattern, based on the large amount of data collected by many thousands of climate sensors and monitors scattered across the globe and on weather satellites. These pictures are fine when one watches the weather on Friday to plan what to do on Sunday – after all if the patterns are wrong there are always alternative ways of enjoying a holiday. Professional decision making would be a rather different scenario. It will require weather forecasts at a high level of granularity and precision, and in real-time. Such requirements translate into requirements for high volume data collection, processing, mining, modelling and communicating the models quickly to the decision makers. Further, the requirements translate into high-performance computing with integrated efficient interactive visualisation. From practical point of view, if a weather pattern can not be depicted fast enough, then it has no value. Recognising the power of the human visual perception system and pattern recognition skills adds another twist to the requirements – data manipulations need to be completed at least an order of magnitude faster than real-time in order to combine them with a variety of highly interactive visualisations, allowing easy remapping of data attributes to the features of the visual metaphor, used to present the data. In this few steps in the weather domain, we have specified some requirements towards a visual data mining system.

Simeon J. Simoff, Michael H. Böhlen, Arturas Mazeika

Part 1 – Theory and Methodologies

The 3DVDM Approach: A Case Study with Clickstream Data

Abstract

Clickstreams are among the most popular data sources because Web servers automatically record each action and the Web log entries promise to add up to a comprehensive description of behaviors of users. Clickstreams, however, are large and raise a number of unique challenges with respect to visual data mining. At the technical level the huge amount of data requires scalable solutions and limits the presentation to summary and model data. Equally challenging is the interpretation of the data at the conceptual level. Many analysis tools are able to produce different types of statistical charts. However, the step from statistical charts to comprehensive information about customer behavior is still largely unresolved. We propose a density surface based analysis of 3D data that uses state-of-the-art interaction techniques to explore the data at various granularities.

Michael H. Böhlen, Linas Bukauskas, Arturas Mazeika, Peer Mylov

Form-Semantics-Function – A Framework for Designing Visual Data Representations for Visual Data Mining

Abstract

Visual data mining, as an art and science of teasing meaningful insights out of large quantities of data that are incomprehensible in another way, requires consistent visual data representations (information visualisation models). The frequently used expression "the art of information visualisation" appropriately describes the situation. Though substantial work has been done in the area of information visualisation, it is still a challenging activity to find out the methods, techniques and corresponding tools that support visual data mining of a particular type of information. The comparison of visualisation techniques across different designs is not a trivial problem either. This chapter presents an attempt for a consistent approach to formal development, evaluation and comparison of visualisation methods. The application of the approach is illustrated with examples of visualisation models for data from the area of team collaboration in virtual environments and from the results of text analysis.

Simeon J. Simoff

A Methodology for Exploring Association Models

Abstract

Visualization in data mining is typically related to data exploration. In this chapter we present a methodology for the post processing and visualization of association rule models. One aim is to provide the user with a tool that enables the exploration of a large set of association rules. The method is inspired by the hypertext metaphor. The initial set of rules is dynamically divided into small comprehensible sets or pages, according to the interest of the user. From each set, the user can move to other sets by choosing one appropriate operator. The set of available operators transform sets of rules into sets of rules, allowing focusing on interesting regions of the rule space. Each set of rules can also be then seen with different graphical representations. The tool is web-based and dynamically generates SVG pages to represent graphics. Association rules are given in PMML format.

Alipio Jorge, João Poças, Paulo J. Azevedo

Visual Exploration of Frequent Itemsets and Association Rules

Abstract

Frequent itemsets and association rules are defined on the powerset of a set of items and reflect the many-to-many relationships among the items. They bring technical challenges to information visualization which in general lacks effective visual technique to describe many-to-many relationships. This paper describes an approach for visualizing frequent itemsets and association rules by a novel use of parallel coordinates. An association rule is visualized by connecting its items, one on each parallel coordinate, with polynomial curves. In the presence of item taxonomy, an item taxonomy tree is displayed as coordinate and can be expanded or shrunk by user interaction. This interaction introduces a border in the generalized itemset lattice, which separates displayable itemsets from non-displayable ones. Only those frequent itemsets on the border are displayed. This approach can be generalized to the visualization of general monotone Boolean functions on lattice structure. Its usefulness is demonstrated through examples.

Li Yang

Visual Analytics: Scope and Challenges

Abstract

In today’s applications data is produced at unprecedented rates. While the capacity to collect and store new data rapidly grows, the ability to analyze these data volumes increases at much lower rates. This gap leads to new challenges in the analysis process, since analysts, decision makers, engineers, or emergency response teams depend on information hidden in the data. The emerging field of visual analytics focuses on handling these massive, heterogenous, and dynamic volumes of information by integrating human judgement by means of visual representations and interaction techniques in the analysis process. Furthermore, it is the combination of related research areas including visualization, data mining, and statistics that turns visual analytics into a promising field of research. This paper aims at providing an overview of visual analytics, its scope and concepts, addresses the most important research challenges and presents use cases from a wide variety of application scenarios.

Daniel A. Keim, Florian Mansmann, Jörn Schneidewind, Jim Thomas, Hartmut Ziegler

Part 2 – Techniques

Using Nested Surfaces for Visual Detection of Structures in Databases

Abstract

We define, compute, and evaluate nested surfaces for the purpose of visual data mining. Nested surfaces enclose the data at various density levels, and make it possible to equalize the more and less pronounced structures in the data. This facilitates the detection of multiple structures, which is important for data mining where the less obvious relationships are often the most interesting ones. The experimental results illustrate that surfaces are fairly robust with respect to the number of observations, easy to perceive, and intuitive to interpret. We give a topology-based definition of nested surfaces and establish a relationship to the density of the data. Several algorithms are given that compute surface grids and surface contours, respectively.

Arturas Mazeika, Michael H. Böhlen, Peer Mylov

Visual Mining of Association Rules

Abstract

Association Rules are one of the most widespread data mining tools because they can be easily mined, even from very huge database, and they provide valuable information for many application fields such as marketing, credit scoring, business, etc. The counterpart is that a massive effort is required (due to the large number of rules usually mined) in order to make actionable the retained knowledge. In this framework vizualization tools become essential to have a deep insight into the association structures and interactive features have to be exploited for highlighting the most relevant and meaningful rules.

Dario Bruzzese, Cristina Davino

Interactive Decision Tree Construction for Interval and Taxonomical Data

Abstract

Visual data-mining strategy lies in tightly coupling the visualizations and analytical processes into one data-mining tool that takes advantage of the assets from multiple sources. This paper presents two graphical interactive decision tree construction algorithms able to deal either with (usual) continuous data or with interval and taxonomical data. They are the extensions of two existing algorithms: CIAD [17] and PBC [3]. Both CIAD and PBC algorithms can be used in an interactive or cooperative mode (with an automatic algorithm to find the best split of the current tree node). We have modified the corresponding help mechanisms to allow them to deal with interval-valued attributes. Some of the results obtained on interval-valued and taxonomical data sets are presented with the methods we have used to create these data sets.

François Poulet, Thanh-Nghi Do

Visual Methods for Examining SVM Classifiers

Abstract

Support vector machines (SVM) offer a theoretically wellfounded approach to automated learning of pattern classifiers. They have been proven to give highly accurate results in complex classification problems, for example, gene expression analysis. The SVM algorithm is also quite intuitive with a few inputs to vary in the fitting process and several outputs that are interesting to study. For many data mining tasks (e.g., cancer prediction) finding classifiers with good predictive accuracy is important, but understanding the classifier is equally important. By studying the classifier outputs we may be able to produce a simpler classifier, learn which variables are the important discriminators between classes, and find the samples that are problematic to the classification. Visual methods for exploratory data analysis can help us to study the outputs and complement automated classification algorithms in data mining. We present the use of tour-based methods to plot aspects of the SVM classifier. This approach provides insights about the cluster structure in the data, the nature of boundaries between clusters, and problematic outliers. Furthermore, tours can be used to assess the variable importance. We show how visual methods can be used as a complement to crossvalidation methods in order to find good SVM input parameters for a particular data set.

Doina Caragea, Dianne Cook, Hadley Wickham, Vasant Honavar

Text Visualization for Visual Text Analytics

Abstract

The termvisual text analytics describes a class of information analysis techniques and processes that enable knowledge discovery via the use of interactive graphical representations of textual data. These techniques enable discovery and understanding via the recruitment of human visual pattern recognition and spatial reasoning capabilities. Visual text analytics is a subclass of visual data mining / visual analytics, which more generally encompasses analytical techniques that employ visualization of non-physically-based (or “abstract”) data of all types. Text visualization is a key component in visual text analytics. While the term “text visualization” has been used to describe a variety of methods for visualizing both structured and unstructured characteristics of text-based data, it is most closely associated with techniques for depicting the semantic characteristics of the free-text components of documents in large document collections. In contrast with text clustering techniques which serve only to partition text corpora into sets of related items, these so-called semantic mapping methods also typically strive to depict detailed inter- and intra-set similarity structure. Text analytics software typically couples semantic mapping techniques with additional visualization techniques to enable interactive comparison of semantic structure with other characteristics of the information, such as publication date or citation information. In this way, value can be derived from the material in the form of multidimensional relationship patterns existing among the discrete items in the collection. The ultimate goal of these techniques is to enable human understanding and reasoning about the contents of large and complexly related text collections.

John Risch, Anne Kao, Stephen R. Poteet, Y. -J. Jason Wu

Visual Discovery of Network Patterns of Interaction between Attributes

Abstract

Visual discovery of network patterns of interaction between attributes in a data set identifies emergent networks between myriads of individual data items and utilises special algorithms that aid visualisation of ‘emergent’ patterns and trends in the linkage. It complements conventional data mining methods, which assume the independence between the attributes and the independence between the values of these attributes. The approach complements analytical data mining techniques where the rules or definitions of what might constitute an exception are able to be known and specified ahead of time. For example, in the analysis of transaction data there are no known suspicious transactions. This chapter presents a human-centred visual data mining methodology that addresses the issues of depicting implicit relationships between data attributes and/or specific values of these attributes. Different aspects of the approach is demonstrated through the reflection of the analytical process in two cases: one looking at fraudulent activity which will be difficult, if not impossible to detect with conventional exception detection methods, and the other one looking at exploring a large data set of low level communication data. The chapter argues that for many problems, a ‘discovery’ phase in the investigative process based on visualisation and human cognition is a logical precedent to, and complement of, more automated ‘exception detection’ phases.

Simeon J. Simoff, John Galloway

Mining Patterns for Visual Interpretation in a Multiple-Views Environment

Abstract

This chapter introduces a novel systematization aiming at extending the application range of Information Visualization and Visual Data Mining. We present an innovative framework named Visualization Tree in order to integrate multiple data visualizations assisted by novel visual exploration techniques. These exploration techniques are named Frequency Plot, Relevance Plot and Representative Plot, and are integrated according the proposed Visualization Tree framework. The systematization of visualization techniques enabled by these concepts defines a Visual Data Mining environment where multiple presentation workspaces are kept together, linked according to analytical decisions taken by the user. Our emphasis is on developing an intuitive and versatile multiple-views system that helps the user to identify visual patterns while interpreting multiple data subsets. In this context, the analyst is able to draw and summarize several subsets that are inspected simultaneously each in a dedicated workspace.

José F. Rodrigues Jr., Agma J. M. Traina, Caetano Traina Jr.

Using 2D Hierarchical Heavy Hitters to Investigate Binary Relationships

Abstract

This chapter presents VHHH: a visual data mining tool to compute and investigate hierarchical heavy hitters (HHHs) for two-dimensional data. VHHH computes the HHHs for a two-dimensional categorical dataset and a given threshold, and visualizes the HHHs in the three dimensional space. The chapter evaluates VHHH on synthetic and real world data, provides an interpretation alphabet, and identifies common visualization patterns of HHHs.

Daniel Trivellato, Arturas Mazeika, Michael H. Böhlen

Complementing Visual Data Mining with the Sound Dimension: Sonification of Time Dependent Data

Abstract

This chapter explores the extension of visual data mining by adding a sound dimension to the data representation. It presents the results of an early 2001 experiments with sonification of 2D and 3D time series data. A number of sonification means for these experiments have been implemented. The goal of these experiments was to determine how sonification of two and three-dimensional graphs can support and complement or even be an alternative to visually displayed graphs. The research methodology used the triangulation method, combining the automated generation of the sound patterns with two evaluation techniques. The first one included the assessment and evaluation of the sound sequences of the sonified data by the participants in the experiment via a dedicated server. The second one was based on the analysis of an evaluation questionnaire, filled by each participant that performed the tests. The chapter presents the results and the issues raised by the experiments.

Monique Noirhomme-Fraiture, Olivier Schöller, Christophe Demoulin, Simeon J. Simoff

Context Visualization for Visual Data Mining

Abstract

Context and history visualization plays an important role in visual data mining especially in the visual exploration of large and complex data sets. The preservation of context and history information in the visualization can improve user comprehension of the exploration process as well as enhance the reusability of mining techniques and parameters to archive the desired results. This chapter presents methodology and various interactive visualization techniques supporting visual data mining in general as well as for visual preservation of context and history information. Algorithms are also described in supporting such methodology for visual data mining in real time.

Mao Lin Huang, Quang Vinh Nguyen

Assisting Human Cognition in Visual Data Mining

Abstract

As discussed in Part 1 of the book in chapter “Form-Semantics-Function – A Framework for Designing Visualisation Models for Visual Data Mining” the development of consistent visualisation techniques requires systematic approach related to the tasks of the visual data mining process. Chapter “Visual discovery of network patterns of interaction between attributes” presents a methodology based on viewing visual data mining as a “reflection-in-action” process. This chapter follows the same perspective and focuses on the subjective bias that may appear in visual data mining. The work is motivated by the fact that visual, though very attractive, means also subjective, and non-experts are often left to utilise visualisation methods (as an understandable alternative to the highly complex statistical approaches) without the ability to understand their applicability and limitations. The chapter presents two strategies addressing the subjective bias: “guided cognition” and “validated cognition”, which result in two types of visual data mining techniques: interaction with visual data representations, mediated by statistical techniques, and validation of the hypotheses coming as an output of the visual analysis through another analytics method, respectively.

Simeon J. Simoff, Michael H. Böhlen, Arturas Mazeika

Part 3 – Tools and Applications

Immersive Visual Data Mining: The 3DVDM Approach

Abstract

A software system has been developed for the study of static and dynamic data visualization in the context of Visual Data Mining in Virtual Reality. We use a specific data set to illustrate how the visualization tools of the 3D Visual Data Mining (3DVDM) system can assist in detecting potentially interesting non-linear data relationships that are hard to discover using traditional statistical methods of analysis. These detected data structures can form a basis for specification of further explanatory statistical analysis. The visualization tools are shown to reveal many interesting patterns and in particular the dynamic data visualization appears to have a very promising potential.

To further explore the human faculties, sound has also been used to represent statistical data. Current technology enables us to create advanced real-time 3D soundscapes which may prove useful since the human ears’ field of hearing is larger than the eyes’ field of view, and thus is able to inform us on events happening in areas that we cannot see. The audio-visual tools in the 3DVDM system are tested and the effectiveness of them is discussed for situations where sound acts as support for visual exploration, as well as use of sound as a sole cue for analyzing data in VR.

Henrik R. Nagel, Erik Granum, Søren Bovbjerg, Michael Vittrup

DataJewel: Integrating Visualization with Temporal Data Mining

Abstract

In this chapter we describe DataJewel, a new temporal data mining architecture. DataJewel tightly integrates a visualization component, an algorithmic component and a database component. We introduce a new visualization technique called CalendarView as an implementation of the visualization component, and we introduce a data structure that supports temporal mining of large databases. In our architecture, algorithms can be tightly integrated with the visualization component and most existing temporal data mining algorithms can be leveraged by embedding them into DataJewel. This integration is achieved by an interface that is used by both the user and the algorithms to assign colors to events. The user interactively assigns colors to incorporate domain knowledge or to formulate hypotheses. The algorithm assigns colors based on discovered patterns. The same visualization technique is used for displaying both data and patterns to make it more intuitive for the user to identify useful patterns while exploring data interactively or while using algorithms to search for patterns. Our experiments in analyzing several large datasets from the airplane maintenance domain demonstrate the usefulness of our approach and we discuss its applicability to domains like homeland security, market basket analysis and web mining.

Mihael Ankerst, Anne Kao, Rodney Tjoelker, Changzhou Wang

A Visual Data Mining Environment

Abstract

It cannot be overstated that the knowledge discovery process still presents formidable challenges. One of the main issues in knowledge discovery is the need for an overall framework that can support the entire discovery process. It is worth noting the role and place of visualization in such a framework. Visualization enables or triggers the user to use his/her outstanding visual and mental capabilities, thereby gaining insight and understanding of data. The foregoing points to the pivotal role that visualization can play in supporting the user throughout the entire discovery process. The work reported in this chapter is part of a project aiming at developing an open data mining system with a visual interaction environment that supports the user in the entire process of mining knowledge.

Stephen Kimani, Tiziana Catarci, Giuseppe Santucci

Integrative Visual Data Mining of Biomedical Data: Investigating Cases in Chronic Fatigue Syndrome and Acute Lymphoblastic Leukaemia

Abstract

This chapter presents an integrative visual data mining approach towards biomedical data. This approach and supporting methodology are presented at a high level. They combine in a consistent manner a set of visualisation and data mining techniques that operate over an integrated data set of several diverse components, including medical (clinical) data, patient outcome and interview data, corresponding gene expression and SNP data, domain ontologies and health management data. The practical application of the methodology and the specific data mining techniques engaged are demonstrated on two case studies focused on the biological mechanisms of two different types of diseases: Chronic Fatigue Syndrome and Acute Lymphoblastic Leukaemia, respectively. The common between the cases is the structure of the data sets.

Paul Kennedy, Simeon J. Simoff, Daniel R. Catchpoole, David B. Skillicorn, Franco Ubaudi, Ahmad Al-Oqaily

Towards Effective Visual Data Mining with Cooperative Approaches

Abstract

Visual data-mining strategy lies in tightly coupling the visualizations and analytical processes into one data-mining tool that takes advantage of the strengths from multiple sources. We present concrete cooperation between automatic algorithms, interactive algorithms and visualization methods. The first kind of cooperation is an interactive decision tree algorithm CIAD. It allows the user to be helped by an automatic algorithm based on a support vector machine (SVM) to optimize the interactive split performed in the current tree node or to compute the best split in an automatic mode. Another effective cooperation is a visualization algorithm used to explain the results of SVM algorithm. The same visualization method can also be used to help the user in the parameters tuning step in input of automatic SVM algorithms. Then we present methods using both automatic and interactive methods to deal with very large datasets. The obtained results let us think it is a promising way to deal with very large datasets.

François Poulet

Backmatter

Titel: Visual Data Mining
herausgegeben von: Simeon J. Simoff
Michael H. Böhlen
Arturas Mazeika
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-71080-6
Print ISBN: 978-3-540-71079-0
DOI: https://doi.org/10.1007/978-3-540-71080-6