Skip to main content
Top
Published in: Knowledge and Information Systems 2/2015

01-08-2015 | Regular Paper

Mining sequential patterns from probabilistic databases

Authors: Muhammad Muzammal, Rajeev Raman

Published in: Knowledge and Information Systems | Issue 2/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper considers the problem of sequential pattern mining (SPM) in probabilistic databases. Specifically, we consider SPM in situations where there is uncertainty in associating an event with a source, model this kind of uncertainty in the probabilistic database framework and consider the problem of enumerating all sequences whose expected support is sufficiently large. We give an algorithm based on dynamic programming to compute the expected support of a sequential pattern. Next, we propose three algorithms for mining sequential patterns from probabilistic databases. The first two algorithms are based on the candidate generation framework—one each based on a breadth-first (similar to GSP) and a depth-first (similar to SPAM) exploration of the search space. The third one is based on the pattern-growth framework (similar to PrefixSpan). We propose optimizations that mitigate the effects of the expensive dynamic programming computation step. We give an empirical evaluation of the probabilistic SPM algorithms and the optimizations and demonstrate the scalability of the algorithms in terms of CPU time and the memory usage. We also demonstrate the effectiveness of the probabilistic SPM framework in extracting meaningful sequences in the presence of noise.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
In this section, we use parameter names derived from [1] rather than those introduced in Sect. 2 for consistency with previous work. For example, we use \(N\) rather than \(q\) to denote the number of items.
 
Literature
1.
go back to reference Agrawal R, Srikant R (1995) Mining sequential patterns. In: Yu Philip S, Chen Arbee LP (eds) ICDE. IEEE Computer Society, pp 3–14. ISBN 0-8186-6910-1 Agrawal R, Srikant R (1995) Mining sequential patterns. In: Yu Philip S, Chen Arbee LP (eds) ICDE. IEEE Computer Society, pp 3–14. ISBN 0-8186-6910-1
2.
go back to reference Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Apers Peter MG, Bouzeghoub Mokrane, Gardarin Georges (eds) EDBT, volume 1057 of LNCS. Springer, pp 3–17. ISBN 3-540-61057-X Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Apers Peter MG, Bouzeghoub Mokrane, Gardarin Georges (eds) EDBT, volume 1057 of LNCS. Springer, pp 3–17. ISBN 3-540-61057-X
3.
go back to reference Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Mach Learn 42(1/2):31–60CrossRef Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Mach Learn 42(1/2):31–60CrossRef
4.
go back to reference Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRef Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRef
5.
go back to reference Ayres J, Flannick J, Gehrke Js, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: KDD, pp 429–435. ACM. ISBN 1-58113-567-X Ayres J, Flannick J, Gehrke Js, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: KDD, pp 429–435. ACM. ISBN 1-58113-567-X
6.
go back to reference Aggarwal CC (ed) (2009) Managing and mining uncertain data. Springer, Berlin Aggarwal CC (ed) (2009) Managing and mining uncertain data. Springer, Berlin
7.
go back to reference Suciu D, Dalvi Nilesh N (2005) Foundations of probabilistic answers to queries. In: Özcan Fatma (ed) SIGMOD Conference. ACM, pp 963. ISBN 1-59593-060-4 Suciu D, Dalvi Nilesh N (2005) Foundations of probabilistic answers to queries. In: Özcan Fatma (ed) SIGMOD Conference. ACM, pp 963. ISBN 1-59593-060-4
8.
go back to reference Zhang Q, Li F, Yi K. Finding frequent items in probabilistic data. In: Wang [31], pp 819–832. ISBN 978-1-60558-102-6 Zhang Q, Li F, Yi K. Finding frequent items in probabilistic data. In: Wang [31], pp 819–832. ISBN 978-1-60558-102-6
9.
go back to reference Cormode G, Li F, Yi K (2009) Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE. IEEE, pp 305–316. ISBN 978-0-7695-3545-6 Cormode G, Li F, Yi K (2009) Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE. IEEE, pp 305–316. ISBN 978-0-7695-3545-6
10.
go back to reference Aggarwal CC, Li Y, Wang J, Wang J. Frequent pattern mining with uncertain data. In: Elder et al. [32], pp 29–38. ISBN 978-1-60558-495-9 Aggarwal CC, Li Y, Wang J, Wang J. Frequent pattern mining with uncertain data. In: Elder et al. [32], pp 29–38. ISBN 978-1-60558-495-9
11.
go back to reference Bernecker T, Kriegel H-P, Renz M, Verhein F, Züfle A. Probabilistic frequent itemset mining in uncertain databases. In: Elder et al. [32], pp 119–128. ISBN 978-1-60558-495-9 Bernecker T, Kriegel H-P, Renz M, Verhein F, Züfle A. Probabilistic frequent itemset mining in uncertain databases. In: Elder et al. [32], pp 119–128. ISBN 978-1-60558-495-9
12.
go back to reference Chui CK, Kao B (2008) A decremental approach for mining frequent itemsets from uncertain data. In: Washio T, Suzuki E, Ting KM, Inokuchi A (eds) PAKDD, volume 5012 of LNCS. Springer, pp 64–75. ISBN 978-3-540-68124-3 Chui CK, Kao B (2008) A decremental approach for mining frequent itemsets from uncertain data. In: Washio T, Suzuki E, Ting KM, Inokuchi A (eds) PAKDD, volume 5012 of LNCS. Springer, pp 64–75. ISBN 978-3-540-68124-3
13.
go back to reference Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: Zhou Z-H, Li H, Yang Q (eds) PAKDD, volume 4426 of LNCS. Springer, pp 47–58. ISBN 978-3-540-71700-3 Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: Zhou Z-H, Li H, Yang Q (eds) PAKDD, volume 4426 of LNCS. Springer, pp 47–58. ISBN 978-3-540-71700-3
14.
go back to reference Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Rao B, Krishnapuram B, Tomkins A, Yang Q (eds) KDD. ACM, pp 273–282. ISBN 978-1-4503-0055-1 Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Rao B, Krishnapuram B, Tomkins A, Yang Q (eds) KDD. ACM, pp 273–282. ISBN 978-1-4503-0055-1
15.
go back to reference Wang L, Cheng R, Lee SD, Cheung David Wai-Lok (2010) Accelerating probabilistic frequent itemset mining: a model-based approach. In: Huang J, Koudas N, Jones GJF, Wu X, Collins-Thompson K, An A (eds) CIKM. ACM, pp 429–438. ISBN 978-1-4503-0099-5 Wang L, Cheng R, Lee SD, Cheung David Wai-Lok (2010) Accelerating probabilistic frequent itemset mining: a model-based approach. In: Huang J, Koudas N, Jones GJF, Wu X, Collins-Thompson K, An A (eds) CIKM. ACM, pp 429–438. ISBN 978-1-4503-0099-5
16.
go back to reference Calders T, Garboni C, Goethals B (2010) Approximation of frequentness probability of itemsets in uncertain data. In: Webb GI, Liu B, Zhang C, Gunopulos D, Wu X (eds) ICDM. IEEE Computer Society, pp 749–754 Calders T, Garboni C, Goethals B (2010) Approximation of frequentness probability of itemsets in uncertain data. In: Webb GI, Liu B, Zhang C, Gunopulos D, Wu X (eds) ICDM. IEEE Computer Society, pp 749–754
17.
go back to reference Yang J, Wang W, Yu PS, Han J (2002) Mining long sequential patterns in a noisy environment. In: Franklin MJ, Moon B, Ailamaki A (eds) SIGMOD Conference. ACM, pp 406–417. ISBN 1-58113-497-5 Yang J, Wang W, Yu PS, Han J (2002) Mining long sequential patterns in a noisy environment. In: Franklin MJ, Moon B, Ailamaki A (eds) SIGMOD Conference. ACM, pp 406–417. ISBN 1-58113-497-5
18.
go back to reference Sun X, Orlowska ME, Li X (2003) Introducing uncertainty into pattern discovery in temporal event sequences. In: ICDM. IEEE Computer Society, pp 299–306. ISBN 0-7695-1978-4 Sun X, Orlowska ME, Li X (2003) Introducing uncertainty into pattern discovery in temporal event sequences. In: ICDM. IEEE Computer Society, pp 299–306. ISBN 0-7695-1978-4
21.
go back to reference Muzammal M, Raman R (2010) On probabilistic models for uncertain sequential pattern mining. In: Cao L, Feng Y, Zhong J (eds) ADMA (1), volume 6440 of LNCS, pp 60–72. Springer. ISBN 978-3-642-17315-8 Muzammal M, Raman R (2010) On probabilistic models for uncertain sequential pattern mining. In: Cao L, Feng Y, Zhong J (eds) ADMA (1), volume 6440 of LNCS, pp 60–72. Springer. ISBN 978-3-642-17315-8
22.
go back to reference Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. PVLDB 5(11):1650–1661 Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. PVLDB 5(11):1650–1661
23.
go back to reference Hua M, Pei J, Zhang W, Lin X. Ranking queries on uncertain data: a probabilistic threshold approach. In: Wang [31], pp 673–686. ISBN 978-1-60558-102-6 Hua M, Pei J, Zhang W, Lin X. Ranking queries on uncertain data: a probabilistic threshold approach. In: Wang [31], pp 673–686. ISBN 978-1-60558-102-6
24.
go back to reference Hooshadat M, Bayat S, Naeimi P, Mirian MS, Zaiane OR (2012) Uapriori: an algorithm for finding sequential patterns in probabilistic data. In: Kahraman C, Bozbura FT, Kerre EE (eds) UMKEDM. World Scientific, pp 907–912. ISBN 978-9814417730 Hooshadat M, Bayat S, Naeimi P, Mirian MS, Zaiane OR (2012) Uapriori: an algorithm for finding sequential patterns in probabilistic data. In: Kahraman C, Bozbura FT, Kerre EE (eds) UMKEDM. World Scientific, pp 907–912. ISBN 978-9814417730
25.
go back to reference Zhao Z, Yan D, Ng W (2012) Mining probabilistically frequent sequential patterns in uncertain databases. In: Rundensteiner EA, Markl V, Manolescu I, Amer-Yahia S, Naumann F, Ari I (eds) EDBT. ACM, pp 74–85. ISBN 978-1-4503-0790-1 Zhao Z, Yan D, Ng W (2012) Mining probabilistically frequent sequential patterns in uncertain databases. In: Rundensteiner EA, Markl V, Manolescu I, Amer-Yahia S, Naumann F, Ari I (eds) EDBT. ACM, pp 74–85. ISBN 978-1-4503-0790-1
26.
go back to reference Wan L, Chen L, Zhang C (2013) Mining frequent serial episodes over uncertain sequence data. In: Guerrini G, Paton NW (eds) EDBT, pp 215–226. ACM. ISBN 978-1-4503-1597-5 Wan L, Chen L, Zhang C (2013) Mining frequent serial episodes over uncertain sequence data. In: Guerrini G, Paton NW (eds) EDBT, pp 215–226. ACM. ISBN 978-1-4503-1597-5
27.
go back to reference Achar A, Ibrahim A, Sastry PS (2013) Pattern-growth based frequent serial episode discovery. Data Knowl Eng 87:91–108CrossRef Achar A, Ibrahim A, Sastry PS (2013) Pattern-growth based frequent serial episode discovery. Data Knowl Eng 87:91–108CrossRef
28.
go back to reference Kohavi R, Brodley C, Frasca B, Mason L, Zheng Z (2000) KDD-Cup 2000 organizers’ report: peeling the onion. SIGKDD Explor 2(2):86–98CrossRef Kohavi R, Brodley C, Frasca B, Mason L, Zheng Z (2000) KDD-Cup 2000 organizers’ report: peeling the onion. SIGKDD Explor 2(2):86–98CrossRef
29.
go back to reference Manning CD, Prabhakar R, Hinrich S (2008) Introduction to information retrieval. Cambridge University Press, New York, NY, USA. ISBN 0521865719, 9780521865715 Manning CD, Prabhakar R, Hinrich S (2008) Introduction to information retrieval. Cambridge University Press, New York, NY, USA. ISBN 0521865719, 9780521865715
30.
go back to reference Ugarte W, Boizumault P, Loudni S, Crémilleux B (2012) Soft threshold constraints for pattern mining. In: Ganascia J-G, Lenca P, Petit J-M (eds) Discovery Science, volume 7569 of Lecture notes in computer science. Springer, pp 313–327. ISBN 978-3-642-33491-7 Ugarte W, Boizumault P, Loudni S, Crémilleux B (2012) Soft threshold constraints for pattern mining. In: Ganascia J-G, Lenca P, Petit J-M (eds) Discovery Science, volume 7569 of Lecture notes in computer science. Springer, pp 313–327. ISBN 978-3-642-33491-7
31.
go back to reference Wang JTL (ed) (2008) Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2008, Vancouver, BC, Canada, June 10–12, ACM. ISBN 978-1-60558-102-6 Wang JTL (ed) (2008) Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2008, Vancouver, BC, Canada, June 10–12, ACM. ISBN 978-1-60558-102-6
32.
go back to reference Elder JF, Fogelman-Soulié F, Flach PA, Zaki MJ (eds) (2009) Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, June 28 - July 1, 2009. ACM. ISBN 978-1-60558-495-9 Elder JF, Fogelman-Soulié F, Flach PA, Zaki MJ (eds) (2009) Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, June 28 - July 1, 2009. ACM. ISBN 978-1-60558-495-9
Metadata
Title
Mining sequential patterns from probabilistic databases
Authors
Muhammad Muzammal
Rajeev Raman
Publication date
01-08-2015
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 2/2015
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-014-0766-7

Other articles of this Issue 2/2015

Knowledge and Information Systems 2/2015 Go to the issue

Premium Partner