skip to main content
10.1145/2452376.2452403acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Mining frequent serial episodes over uncertain sequence data

Published:18 March 2013Publication History

ABSTRACT

Data uncertainty has posed many unique challenges to nearly all types of data mining tasks, creating a need for uncertain data mining. In this paper, we focus on the particular task of mining probabilistic frequent serial episodes (P-FSEs) from uncertain sequence data, which applies to many real applications including sensor readings as well as customer purchase sequences. We first define the notion of P-FSEs, based on the frequentness probabilities of serial episodes under possible world semantics. To discover P-FSEs over an uncertain sequence, we propose: 1) an exact approach that computes the accurate frequentness probabilities of episodes; 2) an approximate approach that approximates the frequency of episodes using probability models; 3) an optimized approach that efficiently prunes a candidate episode by estimating an upper bound of its frequentness probability using approximation techniques.

We conduct extensive experiments to evaluate the performance of the developed data mining algorithms. Our experimental results show that: 1) while existing research demonstrates that approximate approaches are orders of magnitudes faster than exact approaches, for P-FSE mining, the efficiency improvement of the approximate approach over the exact approach is marginal; 2) although it has been recognized that the normal distribution based approximation approach is fairly accurate when the data set is large enough, for P-FSE mining, the binomial distribution based approximation achieves higher accuracy when the the number of episode occurrences is limited; 3) the optimized approach clearly outperforms the other two approaches in terms of the runtime, and achieves very high accuracy.

References

  1. A. Achar, S. Laxman, and P. S. Sastry. A unified view of the apriori-based algorithms for frequent episode discovery. Knowl. Inf. Syst., 31(2):223--250, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In KDD, pages 29--38, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng., 21(5):609--623, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE, pages 3--14, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Bernecker, H.-P. Kriegel, M. Renz, F. Verhein, and A. Züfle. Probabilistic frequent itemset mining in uncertain databases. In KDD, pages 119--128, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Bi and T. Zhang. Support vector classification with input data uncertainty. In NIPS, 2004.Google ScholarGoogle Scholar
  7. T. Calders, C. Garboni, and B. Goethals. Approximation of frequentness probability of itemsets in uncertain data. In ICDM, pages 749--754, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. K. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In PAKDD, pages 47--58, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Iwanuma, Y. Takano, and H. Nabeshima. A unified view of the apriori-based algorithms for frequent episode discovery. In IEEE Conference on Cybernetics and Intelligent Systems, pages 213--217, 2004.Google ScholarGoogle Scholar
  10. G. Karypis, M. V. Joshi, and V. Kumar. A Universal formulation of sequential patterns. Technical Report TR99-021, Department of Computer Science, University of Minnesota, Minneapolis, 1999.Google ScholarGoogle Scholar
  11. H.-P. Kriegel and M. Pfeifle. Density-based clustering of uncertain data. In KDD, pages 672--677, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Laxman. Discovering frequent episodes: fast algorithms, connections with HMMs and generalizations. PhD thesis, Banalore, India, 2006.Google ScholarGoogle Scholar
  13. S. Laxman, P. S. Sastry, and K. P. Unnikrishnan. Discovering frequent episodes and learning hidden markov models: A formal connection. IEEE Trans. Knowl. Data Eng., 17(11):1505--1517, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Laxman, V. Tankasali, and R. W. White. Stream prediction using a generative model based on frequent episodes in event sequences. In KDD, pages 453--461, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. K.-S. Leung, M. A. F. Mateo, and D. A. Brajczuk. A tree-based approach for frequent pattern mining from uncertain data. In PAKDD, pages 653--661, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov., 1(3):259--289, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Muzammal and R. Raman. Mining sequential patterns from probabilistic databases. In PAKDD (2), pages 210--221, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. K. Ngai, B. Kao, C. K. Chui, R. Cheng, M. Chau, and K. Y. Yip. Efficient clustering of uncertain data. In ICDM, pages 436--445, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Qin, Y. Xia, S. Prabhakar, and Y.-C. Tu. A rule-based classification algorithm for uncertain data. In ICDE, pages 1633--1640, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Ren, S. D. Lee, X. Chen, B. Kao, R. Cheng, and D. W.-L. Cheung. Naive bayes classification of uncertain data. In ICDM, pages 944--949, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Sun, R. Cheng, D. W. Cheung, and J. Cheng. Mining uncertain data with probabilistic guarantees. In KDD, pages 273--282, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Tong, L. Chen, Y. Cheng, and P. S. Yu. Mining frequent itemsets over uncertain databases. PVLDB, 5(11):1650--1661, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. P. Unnikrishnan, B. Q. Shadid, P. S. Sastry, and S. Laxman. Root cause diagnostics using temporal data mining. Patent Number(s) US 7509234, 2009.Google ScholarGoogle Scholar
  24. L. Wang, R. Cheng, S. D. Lee, and D. W.-L. Cheung. Accelerating probabilistic frequent itemset mining: a model-based approach. In CIKM, pages 429--438, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Z. Zhao, D. Yan, and W. Ng. Mining probabilistically frequent sequential patterns in uncertain databases. In EDBT, pages 74--85, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mining frequent serial episodes over uncertain sequence data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        EDBT '13: Proceedings of the 16th International Conference on Extending Database Technology
        March 2013
        793 pages
        ISBN:9781450315975
        DOI:10.1145/2452376

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 March 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate7of10submissions,70%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader