ABSTRACT
Discovery of sequential patterns is an essential data mining task with broad applications. Among several variations of sequential patterns, closed sequential pattern is the most useful one since it retains all the information of the complete pattern set but is often much more compact than it. Unfortunately, there is no parallel closed sequential pattern mining method proposed yet. In this paper we develop an algorithm, called Par-CSP (Parallel Closed Sequential Pattern mining), to conduct parallel mining of closed sequential patterns on a distributed memory system. Par-CSP partitions the work among the processors by exploiting the divide-and-conquer property so that the overhead of interprocessor communication is minimized. Par-CSP applies dynamic scheduling to avoid processor idling. Moreover, it employs a technique, called selective sampling to address the load imbalance problem. We implement Par-CSP using MPI on a 64-node Linux cluster. Our experimental results show that Par-CSP attains good parallelization efficiencies on various input datasets.
- R. Agrawal and R.Srikant. Mining sequential patterns.In Eleventh International Conference on Data Engineering pages 3--14, Taipei, Taiwan, 1995. Google ScholarDigital Library
- M.N. Garofalakis, R. Rastogi, and K.Shim.SPIRIT: Sequential pattern mining with regular expression constraints.In The VLDB Journal pages 223--234, 1999. Google ScholarDigital Library
- V. Guralnik and G. Karypis. Parallel tree-projection-based sequence mining algorithms. Parallel Comput. 30(4):443--472, 2004. Google ScholarDigital Library
- J. Han and J. Pei. Mining frequent patterns by pattern-growth: methodology and implications. SIGKDD Explor. Newsl. 2(2):14--20,2000. Google ScholarDigital Library
- J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal,and M.-C. Hsu. Freespan: frequent pattern-projected sequential pattern mining. In KDD'00 pages 355--359. ACM Press. Google ScholarDigital Library
- IBM datset generator forsequential patterns. http://www.almaden.ibm.com/software/quest/ResourcesGoogle Scholar
- J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan mining sequential patterns efficiently by prefix projected pattern growth. In ICDE'01 pages 215--226. Google ScholarDigital Library
- R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In P.M.G. Apers, M. Bouzeghoub, and G. Gardarin, editors, Proc. 5th Int. Conf. Extending Database Technology, EDBT volume 1057, pages 3--17. Springer-Verlag. Google ScholarDigital Library
- J. Wang and J. Han. BIDE efficient mining of frequent closed sequences. In ICDE'04 pages 79--91. Google ScholarDigital Library
- X. Yan, J. Han, and R. Afshar. Clospan: Mining closed sequential patterns in large datasets. In SDM'03 2003. Google ScholarDigital Library
- M.J. Zaki. Parallel sequence mining on shared-memory machines. Journal of Parallel and Distributed Computing 61(3):401--426,2001. Google ScholarDigital Library
- M.J. Zaki. Spade: An efficient algorithm for mining frequent sequences. Machine Learning 42(1-2):31--60, 2001. Google ScholarDigital Library
Index Terms
- Parallel mining of closed sequential patterns
Recommendations
A sampling-based framework for parallel data mining
PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programmingThe goal of data mining algorithm is to discover useful information embedded in large databases. Frequent itemset mining and sequential pattern mining are two important data mining problems with broad applications. Perhaps the most efficient way to ...
Parallelizing Subroutines in Sequential Programs
An algorithm for making sequential programs parallel is described, which first identifies all subroutines, then determines the appropriate execution mode and restructures the code. It works recursively to parallelize the entire program. We use Fortran ...
A Parallel Mining Algorithm for Closed Sequential Patterns
AINAW '07: Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops - Volume 01Mining closed sequential patterns is an important data mining task with broad applications, the large dataset acquires us to use the parallel technique to solve the problems in data mining. A new parallel algorithm named Par-ClosP is introduced in this ...
Comments