Skip to main content

2014 | Buch

Frequent Pattern Mining

insite
SUCHEN

Über dieses Buch

This comprehensive reference consists of 18 chapters from prominent researchers in the field. Each chapter is self-contained, and synthesizes one aspect of frequent pattern mining. An emphasis is placed on simplifying the content, so that students and practitioners can benefit from the book. Each chapter contains a survey describing key research on the topic, a case study and future directions. Key topics include: Pattern Growth Methods, Frequent Pattern Mining in Data Streams, Mining Graph Patterns, Big Data Frequent Pattern Mining, Algorithms for Data Clustering and more. Advanced-level students in computer science, researchers and practitioners from industry will find this book an invaluable reference.

Inhaltsverzeichnis

Frontmatter
1. An Introduction to Frequent Pattern Mining
Abstract
The problem of frequent pattern mining has been widely studied in the literature because of its numerous applications to a variety of data mining problems such as clustering and classification. In addition, frequent pattern mining also has numerous applications in diverse domains such as spatiotemporal data, software bug detection, and biological data. The algorithmic aspects of frequent pattern mining have been explored very widely. This chapter provides an overview of these methods, as it relates to the organization of this book.
Charu C. Aggarwal
2. Frequent Pattern Mining Algorithms: A Survey
Abstract
This chapter will provide a detailed survey of frequent pattern mining algorithms. A wide variety of algorithms will be covered starting from Apriori. Many algorithms such as Eclat, TreeProjection, and FP-growth will be discussed. In addition a discussion of several maximal and closed frequent pattern mining algorithms will be provided. Thus, this chapter will provide one of most detailed surveys of frequent pattern mining algorithms available in the literature.
Charu C. Aggarwal, Mansurul A. Bhuiyan, Mohammad Al Hasan
3. Pattern-Growth Methods
Abstract
Mining frequent patterns has been a focused topic in data mining research in recent years, with the development of numerous interesting algorithms for mining association, correlation, causality, sequential patterns, partial periodicity, constraint-based frequent pattern mining, associative classification, emerging patterns, etc. Many studies adopt an Apriori-like, candidate generation-and-test approach. However, based on our analysis, candidate generation and test may still be expensive, especially when encountering long and numerous patterns.
A new methodology, called frequent pattern growth, which mines frequent patterns without candidate generation, has been developed. The method adopts a divide-and-conquer philosophy to project and partition databases based on the currently discovered frequent patterns and grow such patterns to longer ones in the projected databases. Moreover, efficient data structures have been developed for effective database compression and fast in-memory traversal. Such a methodology may eliminate or substantially reduce the number of candidate sets to be generated and also reduce the size of the database to be iteratively examined, and, therefore, lead to high performance.
In this paper, we provide an overview of this approach and examine its methodology and implications for mining several kinds of frequent patterns, including association, frequent closed itemsets, max-patterns, sequential patterns, and constraint-based mining of frequent patterns. We show that frequent pattern growth is efficient at mining large data-bases and its further development may lead to scalable mining of many other kinds of patterns as well.
Jiawei Han, Jian Pei
4. Mining Long Patterns
Abstract
The value and importance of long patterns are gaining increasing recognition in a wide range of domains including bioinformatics, social network analysis, software engineering and business intelligence. Yet the task of mining long patterns has remained a challenge due to the prohibitively large number of smaller patterns which often need to be generated first. In this chapter, we first use a pattern lattice model to illustrate and compare various mining paradigms. Then we present recent studies for mining long patterns according to their respective pattern mining paradigms. For each category, we discuss the representative algorithms and the state-of-the-art development.
Feida Zhu
5. Interesting Patterns
Abstract
Pattern mining is one of the most important aspects of data mining. By far the most popular and well-known approach is frequent pattern mining. That is, to discover patterns that occur in many transactions. This approach has many virtues including monotonicity, which allows efficient discovery of all frequent patterns. Nevertheless, in practice frequent pattern mining rarely gives good results—the number of discovered patterns is typically gargantuan and they are heavily redundant.
Consequently, a lot of research effort has been invested toward improving the quality of the discovered patterns. In this chapter we will give an overview of the interestingness measures and other redundancy reduction techniques that have been proposed to this end.
In particular, we first present classic techniques such as closed and non-derivable itemsets that are used to prune unnecessary itemsets. We then discuss techniques for ranking patterns on how expected their score is under a null hypothesis—considering patterns that deviate from this expectation to be interesting. These models can either be static, as well as dynamic; we can iteratively update this model as we discover new patterns. More generally, we also give a brief overview on pattern set mining techniques, where we measure quality over a set of patterns, instead of individually. This setup gives us freedom to explicitly punish redundancy which leads to a more to-the-point results.
Jilles Vreeken, Nikolaj Tatti
6. Negative Association Rules
Abstract
Mining association rules associates events that took place together. In market basket analysis, these discovered rules associate items purchased together. Items that are not part of a transaction are not considered. In other words, typical association rules do not take into account items that are part of the domain but that are not together part of a transaction. Association rules are based on frequencies and count the transactions where items occur together. However, counting absences of items is prohibitive if the number of possible items is very large, which is typically the case. Nonetheless, knowing the relationship between the absence of an item and the presence of another can be very important in some applications. These rules are called negative association rules. We review current approaches for mining negative association rules and we discuss limitations and future research directions.
Luiza Antonie, Jundong Li, Osmar Zaiane
7. Constraint-Based Pattern Mining
Abstract
Many pattern mining systems are designed to solve one specific problem, such as frequent, closed or maximal frequent itemset mining, efficiently. Even though efficient, their specialized nature can make these systems difficult to apply in other situations than the one they were designed for. This chapter provides an overview of generic constraint-based mining systems. Constraint-based pattern mining systems are systems that with minimal effort can be programmed to find different types of patterns satisfying constraints. They achieve this genericity by providing (1) high-level languages in which programmers can easily specify constraints; (2) generic search algorithms that find patterns for any task expressed in the specification language. The development of generic systems requires an understanding of different classes of constraints. This chapter will first provide an overview of such classes constraints, followed by a discussion of search algorithms and specification languages.
Siegfried Nijssen, Albrecht Zimmermann
8. Mining and Using Sets of Patterns through Compression
Abstract
In this chapter we describe how to successfully apply the MDL principle to pattern mining. In particular, we discuss how pattern-based models can be designed and induced by means of compression, resulting in succinct and characteristic descriptions of the data.
As motivation, we argue that traditional pattern mining asks the wrong question: instead of asking for all patterns satisfying some interestingness measure, one should ask for a small, non-redundant, and interesting set of patterns—which allows us to avoid the pattern explosion. Firmly rooted in algorithmic information theory, the approach we discuss in this chapter states that the best set of patterns is that set that compresses the data best. We formalize this problem using the Minimum Description Length (MDL) principle, describe useful model classes, and briefly discuss algorithmic approaches to inducing good models from data. Last but not least, we describe how the obtained models—in addition to showing the key patterns of the data—can be used for a wide range of data mining tasks; hence showing that MDL selects useful patterns.
Matthijs van Leeuwen, Jilles Vreeken
9. Frequent Pattern Mining in Data Streams
Abstract
As the volume of digital commerce and communication has exploded, the demand for data mining of streaming data has likewise grown. One of the fundamental data mining tasks, for both static and streaming data, is frequent pattern mining. The goal of pattern mining is to identity frequently occurring patterns and structures. Such patterns may indicate scientific phenomena, economic or social trends, or even security threats. Moreover, not only is pattern discovery important by itself, but it is also a building block for machine learning tasks such as association rule induction. Traditionally, algorithms for pattern discovery have processed the entire dataset as a batch, with no restriction on how many passes through the data would be taken.
However, when the data are arriving in a continuous and unending stream, our algorithm must be limited to a single pass. Moreover, the length of the stream is indeterminate, so we cannot wait for it to end. We generate an initial result after seeing a certain quantity of data, and then we periodically revise the result. A particular challenge for frequent pattern discovery is the combinatorial explosion of candidate patterns
In this chapter, we present a structured review of online frequent pattern mining techniques. We classify the methods according to the type of pattern and data, the time window being considered, and the quality of the approximation.
Victor E. Lee, Ruoming Jin, Gagan Agrawal
10. Big Data Frequent Pattern Mining
Abstract
Frequent pattern mining is an essential data mining task, with a goal of discovering knowledge in the form of repeated patterns. Many efficient pattern mining algorithms have been discovered in the last two decades, yet most do not scale to the type of data we are presented with today, the so-called “Big Data”. Scalable parallel algorithms hold the key to solving the problem in this context. In this chapter, we review recent advances in parallel frequent pattern mining, analyzing them through the Big Data lens. We identify three areas as challenges to designing parallel frequent pattern mining algorithms: memory scalability, work partitioning, and load balancing. With these challenges as a frame of reference, we extract and describe key algorithmic design patterns from the wealth of research conducted in this domain.
David C. Anastasiu, Jeremy Iverson, Shaden Smith, George Karypis
11. Sequential Pattern Mining
Abstract
Sequential pattern mining, which discovers frequent subsequences as patterns in a sequence database, has been a focused theme in data mining research for over a decade. This problem has broad applications, such as mining customer purchase patterns and Web access patterns. However, it is also a challenging problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Abundant literature has been dedicated to this research and tremendous progress has been made so far. This chapter will present a thorough overview and analysis of the main approaches to sequential pattern mining.
Wei Shen, Jianyong Wang, Jiawei Han
12. Spatiotemporal Pattern Mining: Algorithms and Applications
Abstract
With the fast development of positioning technology, spatiotemporal data has become widely available nowadays. Mining patterns from spatiotemporal data has many important applications in human mobility understanding, smart transportation, urban planning and ecological studies. In this chapter, we provide an overview of spatiotemporal data mining methods. We classify the patterns into three categories: (1) individual periodic pattern; (2) pairwise movement pattern and (3) aggregative patterns over multiple trajectories. This chapter states the challenges of pattern discovery, reviews the state-of-the-art methods and also discusses the limitations of existing methods.
Zhenhui Li
13. Mining Graph Patterns
Abstract
Graph pattern mining becomes increasingly crucial to applications in a variety of domains including bioinformatics, cheminformatics, social network analysis, computer vision and multimedia. In this chapter, we first examine the existing frequent subgraph mining algorithms and discuss their computational bottleneck. Then we introduce recent studies on mining various types of graph patterns, including significant, representative and dense subgraph patterns. We also discuss the mining tasks in new problem settings such as a graph stream and an uncertain graph model. These new mining algorithms represent the state-of-the-art graph mining techniques: they not only avoid the exponential size of mining result, but also improve the applicability of graph patterns significantly.
Hong Cheng, Xifeng Yan, Jiawei Han
14. Uncertain Frequent Pattern Mining
Abstract
Frequent pattern mining aims to discover implicit, previously unknown and potentially useful knowledge—in the form of frequently occurring sets of items—that are embedded in data. Many of the models and algorithms developed in the early days mine frequent patterns from traditional transaction databases of precise data such as shopper market basket data, in which the contents of databases are known. However, we are living in an uncertain world, in which uncertain data can be found in various real-life applications. Hence, in recent years, researchers have paid more attention to frequent pattern mining from probabilistic datasets of uncertain data. This chapter covers key models, algorithms and topics about uncertain frequent pattern mining.
Carson Kai-Sang Leung
15. Privacy Issues in Association Rule Mining
Abstract
Data mining services require accurate input data for their results to be meaningful, but privacy concerns may impel users to provide spurious information. In this chapter, we study the different aspects of privacy that arise in association rule mining, with special emphasis on input data privacy, output rule privacy and owner privacy. For input privacy, we examine whether users could be encouraged to provide accurate data by ensuring that the mining process cannot, with any reasonable degree of certainty, discover specific information that violates their privacy. Then, in the context of output privacy, we present a taxonomy and a survey of recent approaches that have been applied to the association rule hiding problem. Here, the objective is to minimally modify the original database in a manner that makes the sensitive association rules to disappear while retaining the non-sensitive rules. Finally, we study popular cryptographic methods for preserving the privacy of the individual sources participating in distributed association rule mining.
Aris Gkoulalas-Divanis, Jayant Haritsa, Murat Kantarcioglu
16. Frequent Pattern Mining Algorithms for Data Clustering
Abstract
Discovering clusters in subspaces, or subspace clustering and related clustering paradigms, is a research field where we find many frequent pattern mining related influences. In fact, as the first algorithms for subspace clustering were based on frequent pattern mining algorithms, it is fair to say that frequent pattern mining was at the cradle of subspace clustering—yet, it quickly developed into an independent research field.
In this chapter, we discuss how frequent pattern mining algorithms have been extended and generalized towards the discovery of local clusters in high-dimensional data. In particular, we discuss several example algorithms for subspace clustering or projected clustering as well as point out recent research questions and open topics in this area relevant to researchers in either clustering or pattern mining.
Arthur Zimek, Ira Assent, Jilles Vreeken
17. Supervised Pattern Mining and Applications to Classification
Abstract
In this chapter we describe the use of patterns in the analysis of supervised data. We survey the different settings for finding patterns as well as sets of patterns. The pattern mining settings are categorized according to whether they include class labels as attributes in the data or whether they partition the data based on these labels. The pattern set mining settings are categorized along several dimensions, including whether they perform iterative mining or post-processing, operate globally or locally, and whether they use patterns directly or indirectly for prediction.
Albrecht Zimmermann, Siegfried Nijssen
18. Applications of Frequent Pattern Mining
Abstract
Frequent pattern mining has broad applications which encompass clustering, classification, software bug detection, recommendations, and a wide variety of other problems. In fact, the greatest utility of frequent pattern mining (unlike other major data mining problems such as outlier analysis and classification), is as an intermediate tool to provide pattern-centered insights for a variety of problems. In this chapter, we will study a wide variety of applications of frequent pattern mining. The purpose of this chapter is not to provide a detailed description of every possible application, but to provide the reader an overview of what is possible with the use of methods such as frequent pattern mining.
Charu C. Aggarwal
Backmatter
Metadaten
Titel
Frequent Pattern Mining
herausgegeben von
Charu C. Aggarwal
Jiawei Han
Copyright-Jahr
2014
Electronic ISBN
978-3-319-07821-2
Print ISBN
978-3-319-07820-5
DOI
https://doi.org/10.1007/978-3-319-07821-2