skip to main content
10.1145/1516360.1516423acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free Access

Efficient constraint evaluation in categorical sequential pattern mining for trajectory databases

Authors Info & Claims
Published:24 March 2009Publication History

ABSTRACT

The classic Generalized Sequential Patterns (GSP) algorithm returns all frequent sequences present in a database. However, usually a few ones are interesting from a user's point of view. Thus, post-processing tasks are required in order to discard uninteresting sequences. To avoid this drawback, languages based on regular expressions (RE) were proposed to restrict frequent sequences to the ones that satisfy user-specified constraints. In all of these languages, REs are applied over items, which limits their applicability in complex real-world situations. We propose a much powerful language, based on regular expressions, denoted RE-SPaM, where the basic elements are constraints defined over the (temporal and non-temporal) attributes of the items to be mined. Expressions in this language may include attributes, functions over attributes, and variables. We specify the syntax and semantics of RE-SPaM, and present a comprehensive set of examples to illustrate its expressive power. We study in detail how the expressions can be used to prune the resulting sequences in the mining process. In addition, we introduce techniques that allow pruning sequences in the early stages of the process, reducing the need to access the database, making use of the categorization of the attributes that compose the items, and of the automaton that accepts the language generated by the RE. Finally, we present experimental results. Although in this paper we focus on trajectory databases, our approach is general enough for being applied to other settings.

References

  1. R. Agrawal, T. Imielinski, and A. Arun Swami. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Int'l Conference on Management of Data, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th Int'l Conference on Very Large Databases, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. of the Int'l Conference on Data Engineering (ICDE), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Cabibbo and R. Torlone. Querying multidimensional databases. In Proceedings DBPL'97, pages 253--269, East Park, Colorado, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. du Mouza and P. Rigaux. Mobility patterns. In Proceedings of the STDBM'04, pages 1--8, Toronto, Canada, 2004.Google ScholarGoogle Scholar
  6. M. N. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression constraints. In Proceedings of the 25th VLDB Conference, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. N. Garofalakis, R. Rastogi, and K. Shim. Mining sequential patterns with regular expression constraints. In IEEE Transactions on Knowledge and Data Engineering, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. H. Güting, M. H. Böhlen, M. Erwig, C. S. Jensen, N. A. Lorentzos, M. Schneider, and M. Vazirgiannis. A foundation for representing and quering moving objects. ACM Trans. Database Syst., 25(1):1--42, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. Freespan: frequent pattern-projected sequential pattern mining. In KDD '00: Proceedings of the sixth ACM SIGKDD Int'l conference on Knowledge discovery and data mining, pages 355--359, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Mannila, H. Toivonen, and I. Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First Int'l Conference on Knowledge Discovery and Data Mining (KDD'95), pages 210--215, 1995.Google ScholarGoogle Scholar
  11. J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Prefixspan: Mining sequential patterns by prefix-projected growth. In Proceedings of the 17th Int'l Conference on Data Engineering, pages 215--224, Washington, DC, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Pei, J. Han, and W. Wang. Constraint-based sequential pattern mining: the pattern-growth methods. J. Intell. Inf. Syst., 28(2):133--160, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Spaccapietra, C. Parent, M. L. Damiani, J. A. Fernandes de Macedo, F. Porto, and C. Vangenot. A conceptual view of trajectories. In Technical Report, 2007.Google ScholarGoogle Scholar
  14. R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. of the Fifth Int'l Conference on Extending Database Technology (EDBT), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. Yan, J. Han, and R. Afshar. Clospan: Mining closed sequential patterns in large databases. In SDM, 2003.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
    March 2009
    1180 pages
    ISBN:9781605584225
    DOI:10.1145/1516360

    Copyright © 2009 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 24 March 2009

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate7of10submissions,70%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader