Skip to main content

2017 | OriginalPaper | Buchkapitel

Exact and Approximate Minimal Pattern Mining

verfasst von : Arnaud Soulet, François Rioult

Erschienen in: Advances in Knowledge Discovery and Management

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Condensed representations have been studied extensively for 15 years. In particular, the maximal patterns of the equivalence classes have received much attention with very general proposals. In contrast, the minimal patterns remained in the shadows in particular because they are too numerous and they are difficult to extract. In this paper, we present a generic framework for exact and approximate minimal patterns mining by introducing the concept of minimizable set system. This framework based on set systems addresses various languages such as itemsets or strings, and at the same time, different metrics such as frequency. For instance, the free, \(\delta \)-free and the essential patterns are naturally handled by our approach, just as the minimal strings. Then, for any minimizable set system, we introduce a fast minimality checking method that is easy to incorporate in a depth-first search algorithm for mining the \(\delta \)-minimal patterns. We demonstrate that it is polynomial-delay and polynomial-space. Experiments on traditional benchmarks complete our study by showing that our approach is competitive with the best proposals.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We use the notation Xe instead of \(X \cup \{e\}\).
 
2
As this prototype mines non-derivable itemsets, it enable us to compute free patterns when the depth parameter is set to 1.
 
4
This dataset is provided with \({{\textsc {maxMotif}}}\): http://​research.​nii.​ac.​jp/​~uno/​codes.​htm.
 
Literatur
Zurück zum Zitat Arimura, H., & Uno, T. (2009). Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems. In SDM (pp. 1087–1098). SIAM. Arimura, H., & Uno, T. (2009). Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems. In SDM (pp. 1087–1098). SIAM.
Zurück zum Zitat Boulicaut, J.-F., Bykowski, A., & Rigotti, C. (2000). Approximation of frequency queries by means of free-sets. In D. A. Zighed, J. Komorowski & J. Żytkow (Eds.), PKDD. LNCS (Vol. 1910, pp. 75–85). Heidelberg: Springer. Boulicaut, J.-F., Bykowski, A., & Rigotti, C. (2000). Approximation of frequency queries by means of free-sets. In D. A. Zighed, J. Komorowski & J. Żytkow (Eds.), PKDD. LNCS (Vol. 1910, pp. 75–85). Heidelberg: Springer.
Zurück zum Zitat Boulicaut, J.-F., Bykowski, A., & Rigotti, C. (2003). Free-sets: A condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery, 7(1), 5–22.MathSciNetCrossRef Boulicaut, J.-F., Bykowski, A., & Rigotti, C. (2003). Free-sets: A condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery, 7(1), 5–22.MathSciNetCrossRef
Zurück zum Zitat Calders, T., & Goethals, B. (2003). Minimal k-free representations of frequent sets. In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003) (pp. 71–82). Heidelberg: Springer. Calders, T., & Goethals, B. (2003). Minimal k-free representations of frequent sets. In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003) (pp. 71–82). Heidelberg: Springer.
Zurück zum Zitat Calders, T., & Goethals, B. (2005). Depth-first non-derivable itemset mining. In SDM (pp. 250–261). Calders, T., & Goethals, B. (2005). Depth-first non-derivable itemset mining. In SDM (pp. 250–261).
Zurück zum Zitat Calders, T., Rigotti, C., & Boulicaut, J. F. (2004). A survey on condensed representations for frequent sets. In J.-F. Boulicaut, L. De Raedt, & H. Mannila (Eds.), Constraint-based mining and inductive databases. Lecture notes in computer science (Vol. 3848, pp. 64–80). Heidelberg: Springer. Calders, T., Rigotti, C., & Boulicaut, J. F. (2004). A survey on condensed representations for frequent sets. In J.-F. Boulicaut, L. De Raedt, & H. Mannila (Eds.), Constraint-based mining and inductive databases. Lecture notes in computer science (Vol. 3848, pp. 64–80). Heidelberg: Springer.
Zurück zum Zitat Casali, A., Cicchetti, R., & Lakhal, L. (2005). Essential patterns: A perfect cover of frequent patterns. In A. M. Tjoa & J. Trujillo (Eds.), DaWaK. Lecture notes in computer science (Vol. 3589, pp. 428–437). Heidelberg: Springer. Casali, A., Cicchetti, R., & Lakhal, L. (2005). Essential patterns: A perfect cover of frequent patterns. In A. M. Tjoa & J. Trujillo (Eds.), DaWaK. Lecture notes in computer science (Vol. 3589, pp. 428–437). Heidelberg: Springer.
Zurück zum Zitat Crémilleux, B., & Boulicaut, J.-F. (2003). Simplest rules characterizing classes generated by \(\delta \)-free sets. In M. Bramer, A. Preece, & F. Coenen (Eds.), Research and development in intelligent systems XIX (pp. 33–46). London: Springer.CrossRef Crémilleux, B., & Boulicaut, J.-F. (2003). Simplest rules characterizing classes generated by \(\delta \)-free sets. In M. Bramer, A. Preece, & F. Coenen (Eds.), Research and development in intelligent systems XIX (pp. 33–46). London: Springer.CrossRef
Zurück zum Zitat Eiter, T., & Gottlob, G. (2002). Hypergraph transversal computation and related problems in logic and AI. In S. Flesca, S. Greco, G. Ianni, & N. Leone (Eds.), JELIA. Lecture notes in computer science (Vol. 2424, pp. 549–564). Heidelberg: Springer. Eiter, T., & Gottlob, G. (2002). Hypergraph transversal computation and related problems in logic and AI. In S. Flesca, S. Greco, G. Ianni, & N. Leone (Eds.), JELIA. Lecture notes in computer science (Vol. 2424, pp. 549–564). Heidelberg: Springer.
Zurück zum Zitat Gao, C., Wang, J., He, Y., & Zhou, L. (2008). Efficient mining of frequent sequence generators. In WWW (pp. 1051–1052). ACM. Gao, C., Wang, J., He, Y., & Zhou, L. (2008). Efficient mining of frequent sequence generators. In WWW (pp. 1051–1052). ACM.
Zurück zum Zitat Gasmi, G., Yahia, S. B., Nguifo, E. M., & Bouker, S. (2007). Extraction of association rules based on literalsets. In Y. Song, J. Eder, & T. M. Nguyen (Eds.), DaWaK. Lecture notes in computer science (Vol. 4654, pp. 293–302). Heidelberg: Springer. Gasmi, G., Yahia, S. B., Nguifo, E. M., & Bouker, S. (2007). Extraction of association rules based on literalsets. In Y. Song, J. Eder, & T. M. Nguyen (Eds.), DaWaK. Lecture notes in computer science (Vol. 4654, pp. 293–302). Heidelberg: Springer.
Zurück zum Zitat Giacometti, A., Li, D. H., Marcel, P., & Soulet, A. (2013). 20 years of pattern mining: a bibliometric survey. SIGKDD Explorations, 15(1), 41–50.CrossRef Giacometti, A., Li, D. H., Marcel, P., & Soulet, A. (2013). 20 years of pattern mining: a bibliometric survey. SIGKDD Explorations, 15(1), 41–50.CrossRef
Zurück zum Zitat Hamrouni, T. (2012). Key roles of closed sets and minimal generators in concise representations of frequent patterns. Intelligent Data Analysis, 16(4), 581–631. Hamrouni, T. (2012). Key roles of closed sets and minimal generators in concise representations of frequent patterns. Intelligent Data Analysis, 16(4), 581–631.
Zurück zum Zitat Hébert, C., & Crémilleux, B. (2005). Mining frequent delta-free patterns in large databases. In A. Hoffmann, H. Motoda, & T. Scheffer (Eds.), Discovery science. Lecture notes in computer science (Vol. 3735, pp. 124–136). Heidelberg: Springer. Hébert, C., & Crémilleux, B. (2005). Mining frequent delta-free patterns in large databases. In A. Hoffmann, H. Motoda, & T. Scheffer (Eds.), Discovery science. Lecture notes in computer science (Vol. 3735, pp. 124–136). Heidelberg: Springer.
Zurück zum Zitat Jelassi, M. N., Largeron, C., & Yahia, S. B. (2014). Efficient unveiling of multi-members in a social network. Journal of Systems and Software, 94, 30–38.CrossRef Jelassi, M. N., Largeron, C., & Yahia, S. B. (2014). Efficient unveiling of multi-members in a social network. Journal of Systems and Software, 94, 30–38.CrossRef
Zurück zum Zitat Kryszkiewicz, M. (2005). Generalized disjunction-free representation of frequent patterns with negation. Journal of Experimental and Theoretical Artificial Intelligence, 17(1–2), 63–82.CrossRefMATH Kryszkiewicz, M. (2005). Generalized disjunction-free representation of frequent patterns with negation. Journal of Experimental and Theoretical Artificial Intelligence, 17(1–2), 63–82.CrossRefMATH
Zurück zum Zitat Li, J., Li, H., Wong, L., Pei, J. & Dong, G. (2006). Minimum description length principle: Generators are preferable to closed patterns. In AAAI (pp. 409–414). Li, J., Li, H., Wong, L., Pei, J. & Dong, G. (2006). Minimum description length principle: Generators are preferable to closed patterns. In AAAI (pp. 409–414).
Zurück zum Zitat Liu, B., Hsu, W. & Ma, Y. (1998). Integrating classification and association rule mining. In KDD (pp. 80–86). Liu, B., Hsu, W. & Ma, Y. (1998). Integrating classification and association rule mining. In KDD (pp. 80–86).
Zurück zum Zitat Liu, G., Li, J., & Wong, L. (2008). A new concise representation of frequent itemsets using generators and a positive border. Knowledge and Information Systems, 17(1), 35–56.MathSciNetCrossRef Liu, G., Li, J., & Wong, L. (2008). A new concise representation of frequent itemsets using generators and a positive border. Knowledge and Information Systems, 17(1), 35–56.MathSciNetCrossRef
Zurück zum Zitat Lo, D., Khoo, S. -C., & Li, J. (2008). Mining and ranking generators of sequential patterns. In SDM (pp. 553–564). SIAM. Lo, D., Khoo, S. -C., & Li, J. (2008). Mining and ranking generators of sequential patterns. In SDM (pp. 553–564). SIAM.
Zurück zum Zitat Lo, D., Khoo, S.-C., & Wong, L. (2009). Non-redundant sequential rules-theory and algorithm. Information Systems, 34(4–5), 438–453.CrossRef Lo, D., Khoo, S.-C., & Wong, L. (2009). Non-redundant sequential rules-theory and algorithm. Information Systems, 34(4–5), 438–453.CrossRef
Zurück zum Zitat Mannila, H. & Toivonen, H. (1996). Multiple uses of frequent sets and condensed representations (extended abstract). In E. Simoudis, J. Han & U. M. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA (pp. 189–194). AAAI Press. Mannila, H. & Toivonen, H. (1996). Multiple uses of frequent sets and condensed representations (extended abstract). In E. Simoudis, J. Han & U. M. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA (pp. 189–194). AAAI Press.
Zurück zum Zitat Murakami, K. & Uno, T. (2013). Efficient algorithms for dualizing large-scale hypergraphs. In ALENEX (pp. 1–13). Murakami, K. & Uno, T. (2013). Efficient algorithms for dualizing large-scale hypergraphs. In ALENEX (pp. 1–13).
Zurück zum Zitat Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Efficient mining of association rules using closed itemset lattices. Information Systems, 24(1), 25–46.CrossRefMATH Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Efficient mining of association rules using closed itemset lattices. Information Systems, 24(1), 25–46.CrossRefMATH
Zurück zum Zitat Rioult, F., Zanuttini, B., & Crémilleux, B. (2010). Nonredundant generalized rules and their impact in classification. In Z. W. Ras & L.-S. Tsay (Eds.), Advances in intelligent information systems. Studies in computational intelligence (Vol. 265, pp. 3–25). Heidelberg: Springer. Rioult, F., Zanuttini, B., & Crémilleux, B. (2010). Nonredundant generalized rules and their impact in classification. In Z. W. Ras & L.-S. Tsay (Eds.), Advances in intelligent information systems. Studies in computational intelligence (Vol. 265, pp. 3–25). Heidelberg: Springer.
Zurück zum Zitat Soulet, A., & Crémilleux, B. (2008). Adequate condensed representations of patterns. Data Mining and Knowledge Discovery, 17(1), 94–110.MathSciNetCrossRef Soulet, A., & Crémilleux, B. (2008). Adequate condensed representations of patterns. Data Mining and Knowledge Discovery, 17(1), 94–110.MathSciNetCrossRef
Zurück zum Zitat Soulet, A., Crémilleux, B., & Rioult, F. (2004). Condensed representation of EPs and patterns quantified by frequency-based measures. In Post-proceedings of knowledge discovery in inductive databases, pise. Heidelberg: Springer. Soulet, A., Crémilleux, B., & Rioult, F. (2004). Condensed representation of EPs and patterns quantified by frequency-based measures. In Post-proceedings of knowledge discovery in inductive databases, pise. Heidelberg: Springer.
Zurück zum Zitat Soulet, A., & Rioult, F. (2014). Efficiently depth-first minimal pattern mining. In V. S. Tseng., T. B. Ho., Z. Zhou., A. L. P. Chen., & H. Kao (Eds.), Proceedings 18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2014, Part I, Tainan, Taiwan, May 13–16, 2014. Lecture notes in computer science (Vol. 8443, pp. 28–39). Heidelberg: Springer. Soulet, A., & Rioult, F. (2014). Efficiently depth-first minimal pattern mining. In V. S. Tseng., T. B. Ho., Z. Zhou., A. L. P. Chen., & H. Kao (Eds.), Proceedings 18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2014, Part I, Tainan, Taiwan, May 13–16, 2014. Lecture notes in computer science (Vol. 8443, pp. 28–39). Heidelberg: Springer.
Zurück zum Zitat Szathmary, L., Valtchev, P., Napoli, A., & Godin, R. (2009). Efficient vertical mining of frequent closures and generators. In IDA. LNCS (Vol. 5772, pp. 393–404). Heidelberg: Springer. Szathmary, L., Valtchev, P., Napoli, A., & Godin, R. (2009). Efficient vertical mining of frequent closures and generators. In IDA. LNCS (Vol. 5772, pp. 393–404). Heidelberg: Springer.
Zurück zum Zitat Zaki, M.J. (2000). Generating non-redundant association rules. In KDD (pp. 34–43). Zaki, M.J. (2000). Generating non-redundant association rules. In KDD (pp. 34–43).
Zurück zum Zitat Zeng, Z., Wang, J., Zhang, J., & Zhou, L. (2009). FOGGER: an algorithm for graph generator discovery. In EDBT (pp. 517–528). Zeng, Z., Wang, J., Zhang, J., & Zhou, L. (2009). FOGGER: an algorithm for graph generator discovery. In EDBT (pp. 517–528).
Metadaten
Titel
Exact and Approximate Minimal Pattern Mining
verfasst von
Arnaud Soulet
François Rioult
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-45763-5_4