Skip to main content
Erschienen in: Knowledge and Information Systems 2/2015

01.08.2015 | Regular Paper

Mining sequential patterns from probabilistic databases

verfasst von: Muhammad Muzammal, Rajeev Raman

Erschienen in: Knowledge and Information Systems | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper considers the problem of sequential pattern mining (SPM) in probabilistic databases. Specifically, we consider SPM in situations where there is uncertainty in associating an event with a source, model this kind of uncertainty in the probabilistic database framework and consider the problem of enumerating all sequences whose expected support is sufficiently large. We give an algorithm based on dynamic programming to compute the expected support of a sequential pattern. Next, we propose three algorithms for mining sequential patterns from probabilistic databases. The first two algorithms are based on the candidate generation framework—one each based on a breadth-first (similar to GSP) and a depth-first (similar to SPAM) exploration of the search space. The third one is based on the pattern-growth framework (similar to PrefixSpan). We propose optimizations that mitigate the effects of the expensive dynamic programming computation step. We give an empirical evaluation of the probabilistic SPM algorithms and the optimizations and demonstrate the scalability of the algorithms in terms of CPU time and the memory usage. We also demonstrate the effectiveness of the probabilistic SPM framework in extracting meaningful sequences in the presence of noise.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
In this section, we use parameter names derived from [1] rather than those introduced in Sect. 2 for consistency with previous work. For example, we use \(N\) rather than \(q\) to denote the number of items.
 
Literatur
1.
Zurück zum Zitat Agrawal R, Srikant R (1995) Mining sequential patterns. In: Yu Philip S, Chen Arbee LP (eds) ICDE. IEEE Computer Society, pp 3–14. ISBN 0-8186-6910-1 Agrawal R, Srikant R (1995) Mining sequential patterns. In: Yu Philip S, Chen Arbee LP (eds) ICDE. IEEE Computer Society, pp 3–14. ISBN 0-8186-6910-1
2.
Zurück zum Zitat Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Apers Peter MG, Bouzeghoub Mokrane, Gardarin Georges (eds) EDBT, volume 1057 of LNCS. Springer, pp 3–17. ISBN 3-540-61057-X Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Apers Peter MG, Bouzeghoub Mokrane, Gardarin Georges (eds) EDBT, volume 1057 of LNCS. Springer, pp 3–17. ISBN 3-540-61057-X
3.
Zurück zum Zitat Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Mach Learn 42(1/2):31–60CrossRef Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Mach Learn 42(1/2):31–60CrossRef
4.
Zurück zum Zitat Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRef Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRef
5.
Zurück zum Zitat Ayres J, Flannick J, Gehrke Js, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: KDD, pp 429–435. ACM. ISBN 1-58113-567-X Ayres J, Flannick J, Gehrke Js, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: KDD, pp 429–435. ACM. ISBN 1-58113-567-X
6.
Zurück zum Zitat Aggarwal CC (ed) (2009) Managing and mining uncertain data. Springer, Berlin Aggarwal CC (ed) (2009) Managing and mining uncertain data. Springer, Berlin
7.
Zurück zum Zitat Suciu D, Dalvi Nilesh N (2005) Foundations of probabilistic answers to queries. In: Özcan Fatma (ed) SIGMOD Conference. ACM, pp 963. ISBN 1-59593-060-4 Suciu D, Dalvi Nilesh N (2005) Foundations of probabilistic answers to queries. In: Özcan Fatma (ed) SIGMOD Conference. ACM, pp 963. ISBN 1-59593-060-4
8.
Zurück zum Zitat Zhang Q, Li F, Yi K. Finding frequent items in probabilistic data. In: Wang [31], pp 819–832. ISBN 978-1-60558-102-6 Zhang Q, Li F, Yi K. Finding frequent items in probabilistic data. In: Wang [31], pp 819–832. ISBN 978-1-60558-102-6
9.
Zurück zum Zitat Cormode G, Li F, Yi K (2009) Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE. IEEE, pp 305–316. ISBN 978-0-7695-3545-6 Cormode G, Li F, Yi K (2009) Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE. IEEE, pp 305–316. ISBN 978-0-7695-3545-6
10.
Zurück zum Zitat Aggarwal CC, Li Y, Wang J, Wang J. Frequent pattern mining with uncertain data. In: Elder et al. [32], pp 29–38. ISBN 978-1-60558-495-9 Aggarwal CC, Li Y, Wang J, Wang J. Frequent pattern mining with uncertain data. In: Elder et al. [32], pp 29–38. ISBN 978-1-60558-495-9
11.
Zurück zum Zitat Bernecker T, Kriegel H-P, Renz M, Verhein F, Züfle A. Probabilistic frequent itemset mining in uncertain databases. In: Elder et al. [32], pp 119–128. ISBN 978-1-60558-495-9 Bernecker T, Kriegel H-P, Renz M, Verhein F, Züfle A. Probabilistic frequent itemset mining in uncertain databases. In: Elder et al. [32], pp 119–128. ISBN 978-1-60558-495-9
12.
Zurück zum Zitat Chui CK, Kao B (2008) A decremental approach for mining frequent itemsets from uncertain data. In: Washio T, Suzuki E, Ting KM, Inokuchi A (eds) PAKDD, volume 5012 of LNCS. Springer, pp 64–75. ISBN 978-3-540-68124-3 Chui CK, Kao B (2008) A decremental approach for mining frequent itemsets from uncertain data. In: Washio T, Suzuki E, Ting KM, Inokuchi A (eds) PAKDD, volume 5012 of LNCS. Springer, pp 64–75. ISBN 978-3-540-68124-3
13.
Zurück zum Zitat Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: Zhou Z-H, Li H, Yang Q (eds) PAKDD, volume 4426 of LNCS. Springer, pp 47–58. ISBN 978-3-540-71700-3 Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: Zhou Z-H, Li H, Yang Q (eds) PAKDD, volume 4426 of LNCS. Springer, pp 47–58. ISBN 978-3-540-71700-3
14.
Zurück zum Zitat Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Rao B, Krishnapuram B, Tomkins A, Yang Q (eds) KDD. ACM, pp 273–282. ISBN 978-1-4503-0055-1 Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: Rao B, Krishnapuram B, Tomkins A, Yang Q (eds) KDD. ACM, pp 273–282. ISBN 978-1-4503-0055-1
15.
Zurück zum Zitat Wang L, Cheng R, Lee SD, Cheung David Wai-Lok (2010) Accelerating probabilistic frequent itemset mining: a model-based approach. In: Huang J, Koudas N, Jones GJF, Wu X, Collins-Thompson K, An A (eds) CIKM. ACM, pp 429–438. ISBN 978-1-4503-0099-5 Wang L, Cheng R, Lee SD, Cheung David Wai-Lok (2010) Accelerating probabilistic frequent itemset mining: a model-based approach. In: Huang J, Koudas N, Jones GJF, Wu X, Collins-Thompson K, An A (eds) CIKM. ACM, pp 429–438. ISBN 978-1-4503-0099-5
16.
Zurück zum Zitat Calders T, Garboni C, Goethals B (2010) Approximation of frequentness probability of itemsets in uncertain data. In: Webb GI, Liu B, Zhang C, Gunopulos D, Wu X (eds) ICDM. IEEE Computer Society, pp 749–754 Calders T, Garboni C, Goethals B (2010) Approximation of frequentness probability of itemsets in uncertain data. In: Webb GI, Liu B, Zhang C, Gunopulos D, Wu X (eds) ICDM. IEEE Computer Society, pp 749–754
17.
Zurück zum Zitat Yang J, Wang W, Yu PS, Han J (2002) Mining long sequential patterns in a noisy environment. In: Franklin MJ, Moon B, Ailamaki A (eds) SIGMOD Conference. ACM, pp 406–417. ISBN 1-58113-497-5 Yang J, Wang W, Yu PS, Han J (2002) Mining long sequential patterns in a noisy environment. In: Franklin MJ, Moon B, Ailamaki A (eds) SIGMOD Conference. ACM, pp 406–417. ISBN 1-58113-497-5
18.
Zurück zum Zitat Sun X, Orlowska ME, Li X (2003) Introducing uncertainty into pattern discovery in temporal event sequences. In: ICDM. IEEE Computer Society, pp 299–306. ISBN 0-7695-1978-4 Sun X, Orlowska ME, Li X (2003) Introducing uncertainty into pattern discovery in temporal event sequences. In: ICDM. IEEE Computer Society, pp 299–306. ISBN 0-7695-1978-4
21.
Zurück zum Zitat Muzammal M, Raman R (2010) On probabilistic models for uncertain sequential pattern mining. In: Cao L, Feng Y, Zhong J (eds) ADMA (1), volume 6440 of LNCS, pp 60–72. Springer. ISBN 978-3-642-17315-8 Muzammal M, Raman R (2010) On probabilistic models for uncertain sequential pattern mining. In: Cao L, Feng Y, Zhong J (eds) ADMA (1), volume 6440 of LNCS, pp 60–72. Springer. ISBN 978-3-642-17315-8
22.
Zurück zum Zitat Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. PVLDB 5(11):1650–1661 Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. PVLDB 5(11):1650–1661
23.
Zurück zum Zitat Hua M, Pei J, Zhang W, Lin X. Ranking queries on uncertain data: a probabilistic threshold approach. In: Wang [31], pp 673–686. ISBN 978-1-60558-102-6 Hua M, Pei J, Zhang W, Lin X. Ranking queries on uncertain data: a probabilistic threshold approach. In: Wang [31], pp 673–686. ISBN 978-1-60558-102-6
24.
Zurück zum Zitat Hooshadat M, Bayat S, Naeimi P, Mirian MS, Zaiane OR (2012) Uapriori: an algorithm for finding sequential patterns in probabilistic data. In: Kahraman C, Bozbura FT, Kerre EE (eds) UMKEDM. World Scientific, pp 907–912. ISBN 978-9814417730 Hooshadat M, Bayat S, Naeimi P, Mirian MS, Zaiane OR (2012) Uapriori: an algorithm for finding sequential patterns in probabilistic data. In: Kahraman C, Bozbura FT, Kerre EE (eds) UMKEDM. World Scientific, pp 907–912. ISBN 978-9814417730
25.
Zurück zum Zitat Zhao Z, Yan D, Ng W (2012) Mining probabilistically frequent sequential patterns in uncertain databases. In: Rundensteiner EA, Markl V, Manolescu I, Amer-Yahia S, Naumann F, Ari I (eds) EDBT. ACM, pp 74–85. ISBN 978-1-4503-0790-1 Zhao Z, Yan D, Ng W (2012) Mining probabilistically frequent sequential patterns in uncertain databases. In: Rundensteiner EA, Markl V, Manolescu I, Amer-Yahia S, Naumann F, Ari I (eds) EDBT. ACM, pp 74–85. ISBN 978-1-4503-0790-1
26.
Zurück zum Zitat Wan L, Chen L, Zhang C (2013) Mining frequent serial episodes over uncertain sequence data. In: Guerrini G, Paton NW (eds) EDBT, pp 215–226. ACM. ISBN 978-1-4503-1597-5 Wan L, Chen L, Zhang C (2013) Mining frequent serial episodes over uncertain sequence data. In: Guerrini G, Paton NW (eds) EDBT, pp 215–226. ACM. ISBN 978-1-4503-1597-5
27.
Zurück zum Zitat Achar A, Ibrahim A, Sastry PS (2013) Pattern-growth based frequent serial episode discovery. Data Knowl Eng 87:91–108CrossRef Achar A, Ibrahim A, Sastry PS (2013) Pattern-growth based frequent serial episode discovery. Data Knowl Eng 87:91–108CrossRef
28.
Zurück zum Zitat Kohavi R, Brodley C, Frasca B, Mason L, Zheng Z (2000) KDD-Cup 2000 organizers’ report: peeling the onion. SIGKDD Explor 2(2):86–98CrossRef Kohavi R, Brodley C, Frasca B, Mason L, Zheng Z (2000) KDD-Cup 2000 organizers’ report: peeling the onion. SIGKDD Explor 2(2):86–98CrossRef
29.
Zurück zum Zitat Manning CD, Prabhakar R, Hinrich S (2008) Introduction to information retrieval. Cambridge University Press, New York, NY, USA. ISBN 0521865719, 9780521865715 Manning CD, Prabhakar R, Hinrich S (2008) Introduction to information retrieval. Cambridge University Press, New York, NY, USA. ISBN 0521865719, 9780521865715
30.
Zurück zum Zitat Ugarte W, Boizumault P, Loudni S, Crémilleux B (2012) Soft threshold constraints for pattern mining. In: Ganascia J-G, Lenca P, Petit J-M (eds) Discovery Science, volume 7569 of Lecture notes in computer science. Springer, pp 313–327. ISBN 978-3-642-33491-7 Ugarte W, Boizumault P, Loudni S, Crémilleux B (2012) Soft threshold constraints for pattern mining. In: Ganascia J-G, Lenca P, Petit J-M (eds) Discovery Science, volume 7569 of Lecture notes in computer science. Springer, pp 313–327. ISBN 978-3-642-33491-7
31.
Zurück zum Zitat Wang JTL (ed) (2008) Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2008, Vancouver, BC, Canada, June 10–12, ACM. ISBN 978-1-60558-102-6 Wang JTL (ed) (2008) Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2008, Vancouver, BC, Canada, June 10–12, ACM. ISBN 978-1-60558-102-6
32.
Zurück zum Zitat Elder JF, Fogelman-Soulié F, Flach PA, Zaki MJ (eds) (2009) Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, June 28 - July 1, 2009. ACM. ISBN 978-1-60558-495-9 Elder JF, Fogelman-Soulié F, Flach PA, Zaki MJ (eds) (2009) Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, June 28 - July 1, 2009. ACM. ISBN 978-1-60558-495-9
Metadaten
Titel
Mining sequential patterns from probabilistic databases
verfasst von
Muhammad Muzammal
Rajeev Raman
Publikationsdatum
01.08.2015
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 2/2015
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-014-0766-7

Weitere Artikel der Ausgabe 2/2015

Knowledge and Information Systems 2/2015 Zur Ausgabe