ABSTRACT
The classic Generalized Sequential Patterns (GSP) algorithm returns all frequent sequences present in a database. However, usually a few ones are interesting from a user's point of view. Thus, post-processing tasks are required in order to discard uninteresting sequences. To avoid this drawback, languages based on regular expressions (RE) were proposed to restrict frequent sequences to the ones that satisfy user-specified constraints. In all of these languages, REs are applied over items, which limits their applicability in complex real-world situations. We propose a much powerful language, based on regular expressions, denoted RE-SPaM, where the basic elements are constraints defined over the (temporal and non-temporal) attributes of the items to be mined. Expressions in this language may include attributes, functions over attributes, and variables. We specify the syntax and semantics of RE-SPaM, and present a comprehensive set of examples to illustrate its expressive power. We study in detail how the expressions can be used to prune the resulting sequences in the mining process. In addition, we introduce techniques that allow pruning sequences in the early stages of the process, reducing the need to access the database, making use of the categorization of the attributes that compose the items, and of the automaton that accepts the language generated by the RE. Finally, we present experimental results. Although in this paper we focus on trajectory databases, our approach is general enough for being applied to other settings.
- R. Agrawal, T. Imielinski, and A. Arun Swami. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Int'l Conference on Management of Data, 1993. Google ScholarDigital Library
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th Int'l Conference on Very Large Databases, 1994. Google ScholarDigital Library
- R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. of the Int'l Conference on Data Engineering (ICDE), 1995. Google ScholarDigital Library
- L. Cabibbo and R. Torlone. Querying multidimensional databases. In Proceedings DBPL'97, pages 253--269, East Park, Colorado, USA, 1997. Google ScholarDigital Library
- C. du Mouza and P. Rigaux. Mobility patterns. In Proceedings of the STDBM'04, pages 1--8, Toronto, Canada, 2004.Google Scholar
- M. N. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression constraints. In Proceedings of the 25th VLDB Conference, 1999. Google ScholarDigital Library
- M. N. Garofalakis, R. Rastogi, and K. Shim. Mining sequential patterns with regular expression constraints. In IEEE Transactions on Knowledge and Data Engineering, 2002. Google ScholarDigital Library
- R. H. Güting, M. H. Böhlen, M. Erwig, C. S. Jensen, N. A. Lorentzos, M. Schneider, and M. Vazirgiannis. A foundation for representing and quering moving objects. ACM Trans. Database Syst., 25(1):1--42, 2000. Google ScholarDigital Library
- J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. Freespan: frequent pattern-projected sequential pattern mining. In KDD '00: Proceedings of the sixth ACM SIGKDD Int'l conference on Knowledge discovery and data mining, pages 355--359, 2000. Google ScholarDigital Library
- H. Mannila, H. Toivonen, and I. Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First Int'l Conference on Knowledge Discovery and Data Mining (KDD'95), pages 210--215, 1995.Google Scholar
- J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Prefixspan: Mining sequential patterns by prefix-projected growth. In Proceedings of the 17th Int'l Conference on Data Engineering, pages 215--224, Washington, DC, USA, 2001. Google ScholarDigital Library
- J. Pei, J. Han, and W. Wang. Constraint-based sequential pattern mining: the pattern-growth methods. J. Intell. Inf. Syst., 28(2):133--160, 2007. Google ScholarDigital Library
- S. Spaccapietra, C. Parent, M. L. Damiani, J. A. Fernandes de Macedo, F. Porto, and C. Vangenot. A conceptual view of trajectories. In Technical Report, 2007.Google Scholar
- R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. of the Fifth Int'l Conference on Extending Database Technology (EDBT), 1996. Google ScholarDigital Library
- X. Yan, J. Han, and R. Afshar. Clospan: Mining closed sequential patterns in large databases. In SDM, 2003.Google ScholarCross Ref
Recommendations
Constraint-based sequential pattern mining: the pattern-growth methods
Constraints are essential for many sequential pattern mining applications. However, there is no systematic study on constraint-based sequential pattern mining . In this paper, we investigate this issue and point out that the framework developed for ...
Constraint-Based Pattern Mining in Multi-relational Databases
ICDMW '11: Proceedings of the 2011 IEEE 11th International Conference on Data Mining WorkshopsWe propose a new framework for constraint-based pattern mining in multi-relational databases. Distinguishing features of the framework are that (1) it allows finding patterns not only under anti-monotonic constraints, but also under monotonic ...
Efficient algorithms for mining high-utility itemsets in uncertain databases
High-utility itemset mining (HUIM) is a useful set of techniques for discovering patterns in transaction databases, which considers both quantity and profit of items. However, most algorithms for mining high-utility itemsets (HUIs) assume that the ...
Comments