ABSTRACT
Data uncertainty has posed many unique challenges to nearly all types of data mining tasks, creating a need for uncertain data mining. In this paper, we focus on the particular task of mining probabilistic frequent serial episodes (P-FSEs) from uncertain sequence data, which applies to many real applications including sensor readings as well as customer purchase sequences. We first define the notion of P-FSEs, based on the frequentness probabilities of serial episodes under possible world semantics. To discover P-FSEs over an uncertain sequence, we propose: 1) an exact approach that computes the accurate frequentness probabilities of episodes; 2) an approximate approach that approximates the frequency of episodes using probability models; 3) an optimized approach that efficiently prunes a candidate episode by estimating an upper bound of its frequentness probability using approximation techniques.
We conduct extensive experiments to evaluate the performance of the developed data mining algorithms. Our experimental results show that: 1) while existing research demonstrates that approximate approaches are orders of magnitudes faster than exact approaches, for P-FSE mining, the efficiency improvement of the approximate approach over the exact approach is marginal; 2) although it has been recognized that the normal distribution based approximation approach is fairly accurate when the data set is large enough, for P-FSE mining, the binomial distribution based approximation achieves higher accuracy when the the number of episode occurrences is limited; 3) the optimized approach clearly outperforms the other two approaches in terms of the runtime, and achieves very high accuracy.
- A. Achar, S. Laxman, and P. S. Sastry. A unified view of the apriori-based algorithms for frequent episode discovery. Knowl. Inf. Syst., 31(2):223--250, 2012. Google ScholarDigital Library
- C. C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In KDD, pages 29--38, 2009. Google ScholarDigital Library
- C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng., 21(5):609--623, 2009. Google ScholarDigital Library
- R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE, pages 3--14, 1995. Google ScholarDigital Library
- T. Bernecker, H.-P. Kriegel, M. Renz, F. Verhein, and A. Züfle. Probabilistic frequent itemset mining in uncertain databases. In KDD, pages 119--128, 2009. Google ScholarDigital Library
- J. Bi and T. Zhang. Support vector classification with input data uncertainty. In NIPS, 2004.Google Scholar
- T. Calders, C. Garboni, and B. Goethals. Approximation of frequentness probability of itemsets in uncertain data. In ICDM, pages 749--754, 2010. Google ScholarDigital Library
- C. K. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In PAKDD, pages 47--58, 2007. Google ScholarDigital Library
- K. Iwanuma, Y. Takano, and H. Nabeshima. A unified view of the apriori-based algorithms for frequent episode discovery. In IEEE Conference on Cybernetics and Intelligent Systems, pages 213--217, 2004.Google Scholar
- G. Karypis, M. V. Joshi, and V. Kumar. A Universal formulation of sequential patterns. Technical Report TR99-021, Department of Computer Science, University of Minnesota, Minneapolis, 1999.Google Scholar
- H.-P. Kriegel and M. Pfeifle. Density-based clustering of uncertain data. In KDD, pages 672--677, 2005. Google ScholarDigital Library
- S. Laxman. Discovering frequent episodes: fast algorithms, connections with HMMs and generalizations. PhD thesis, Banalore, India, 2006.Google Scholar
- S. Laxman, P. S. Sastry, and K. P. Unnikrishnan. Discovering frequent episodes and learning hidden markov models: A formal connection. IEEE Trans. Knowl. Data Eng., 17(11):1505--1517, 2005. Google ScholarDigital Library
- S. Laxman, V. Tankasali, and R. W. White. Stream prediction using a generative model based on frequent episodes in event sequences. In KDD, pages 453--461, 2008. Google ScholarDigital Library
- C. K.-S. Leung, M. A. F. Mateo, and D. A. Brajczuk. A tree-based approach for frequent pattern mining from uncertain data. In PAKDD, pages 653--661, 2008. Google ScholarDigital Library
- H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov., 1(3):259--289, 1997. Google ScholarDigital Library
- M. Muzammal and R. Raman. Mining sequential patterns from probabilistic databases. In PAKDD (2), pages 210--221, 2011. Google ScholarDigital Library
- W. K. Ngai, B. Kao, C. K. Chui, R. Cheng, M. Chau, and K. Y. Yip. Efficient clustering of uncertain data. In ICDM, pages 436--445, 2006. Google ScholarDigital Library
- B. Qin, Y. Xia, S. Prabhakar, and Y.-C. Tu. A rule-based classification algorithm for uncertain data. In ICDE, pages 1633--1640, 2009. Google ScholarDigital Library
- J. Ren, S. D. Lee, X. Chen, B. Kao, R. Cheng, and D. W.-L. Cheung. Naive bayes classification of uncertain data. In ICDM, pages 944--949, 2009. Google ScholarDigital Library
- L. Sun, R. Cheng, D. W. Cheung, and J. Cheng. Mining uncertain data with probabilistic guarantees. In KDD, pages 273--282, 2010. Google ScholarDigital Library
- Y. Tong, L. Chen, Y. Cheng, and P. S. Yu. Mining frequent itemsets over uncertain databases. PVLDB, 5(11):1650--1661, 2012. Google ScholarDigital Library
- K. P. Unnikrishnan, B. Q. Shadid, P. S. Sastry, and S. Laxman. Root cause diagnostics using temporal data mining. Patent Number(s) US 7509234, 2009.Google Scholar
- L. Wang, R. Cheng, S. D. Lee, and D. W.-L. Cheung. Accelerating probabilistic frequent itemset mining: a model-based approach. In CIKM, pages 429--438, 2010. Google ScholarDigital Library
- Z. Zhao, D. Yan, and W. Ng. Mining probabilistically frequent sequential patterns in uncertain databases. In EDBT, pages 74--85, 2012. Google ScholarDigital Library
Index Terms
- Mining frequent serial episodes over uncertain sequence data
Recommendations
Mining uncertain data for constrained frequent sets
IDEAS '09: Proceedings of the 2009 International Database Engineering & Applications SymposiumData mining aims to search for implicit, previously unknown, and potentially useful pieces of information---such as sets of items that are frequently co-occurring together---that are embedded in data. The mined frequent sets can be used in the discovery ...
Efficient algorithms for mining constrained frequent patterns from uncertain data
U '09: Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain DataMining of frequent patterns is one of the popular knowledge discovery and data mining (KDD) tasks. It also plays an essential role in the mining of many other patterns such as correlation, sequences, and association rules. Hence, it has been the subject ...
Mining of Frequent Itemsets from Streams of Uncertain Data
ICDE '09: Proceedings of the 2009 IEEE International Conference on Data EngineeringFrequent itemset mining plays an essential role in the mining of various patterns and is in demand in many real-life applications. Hence, mining of frequent itemsets has been the subject of numerous studies since its introduction. Generally, most of ...
Comments