Skip to main content
Erschienen in: Journal of Intelligent Information Systems 3/2015

01.12.2015

Objectively evaluating condensed representations and interestingness measures for frequent itemset mining

verfasst von: Albrecht Zimmermann

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Itemset mining approaches, while having been studied for more than 15 years, have been evaluated only on a handful of data sets. In particular, they have never been evaluated on data sets for which the ground truth was known. Thus, it is currently unknown whether itemset mining techniques actually recover underlying patterns. Since the weakness of the algorithmically attractive support/confidence framework became apparent early on, a number of interestingness measures have been proposed. Their utility, however, has not been evaluated, except for attempts to establish congruence with expert opinions. Using an extension of the Quest generator proposed in the original itemset mining paper, we propose to evaluate these measures objectively for the first time, showing how many non-relevant patterns slip through the cracks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In 20th VLDB (pp. 487–499). Chile: Morgan Kaufmann. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In 20th VLDB (pp. 487–499). Chile: Morgan Kaufmann.
Zurück zum Zitat Bayardo, R.J. Jr., Goethals, B., Zaki, M.J. (Eds.) (2004). FIMI 04, proceedings of the IEEE ICDM workshop on FIM implementations. Brighton. Bayardo, R.J. Jr., Goethals, B., Zaki, M.J. (Eds.) (2004). FIMI 04, proceedings of the IEEE ICDM workshop on FIM implementations. Brighton.
Zurück zum Zitat Bie, T.D. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 407–446.MATHMathSciNetCrossRef Bie, T.D. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 407–446.MATHMathSciNetCrossRef
Zurück zum Zitat Blanchard, J., Guillet, F., Gras, R., Briand, H. (2005). Using information-theoretic measures to assess association rule interestingness. In J. Han, B.W. Wah, V. Raghavan, X. Wu, R. Rastogi (Eds.), ICDM (pp. 66–73). Houston: IEEE. Blanchard, J., Guillet, F., Gras, R., Briand, H. (2005). Using information-theoretic measures to assess association rule interestingness. In J. Han, B.W. Wah, V. Raghavan, X. Wu, R. Rastogi (Eds.), ICDM (pp. 66–73). Houston: IEEE.
Zurück zum Zitat Boulicaut, J.F., & Jeudy, B. (2001). Mining free itemsets under constraints. In M.E. Adiba, C. Collet, B.C. Desai (Eds.), IDEAS ’01 (pp. 322–329). Boulicaut, J.F., & Jeudy, B. (2001). Mining free itemsets under constraints. In M.E. Adiba, C. Collet, B.C. Desai (Eds.), IDEAS ’01 (pp. 322–329).
Zurück zum Zitat Brin, S., Motwani, R., Silverstein, C. (1997). Beyond market baskets: generalizing association rules to correlations. In J. Peckham (Ed.), (pp. 265–276). Brin, S., Motwani, R., Silverstein, C. (1997). Beyond market baskets: generalizing association rules to correlations. In J. Peckham (Ed.), (pp. 265–276).
Zurück zum Zitat Carvalho, D., Freitas, A., Ebecken, N. (2005). Evaluating the correlation between objective rule interestingness measures and real human interest. In A. Jorge, L. Torgo, P. Brazdil, R. Camacho, J. Gama (Eds.), PKDD (pp. 453–461). Springer. Carvalho, D., Freitas, A., Ebecken, N. (2005). Evaluating the correlation between objective rule interestingness measures and real human interest. In A. Jorge, L. Torgo, P. Brazdil, R. Camacho, J. Gama (Eds.), PKDD (pp. 453–461). Springer.
Zurück zum Zitat Cooper, C., & Zito, M. (2007). Realistic synthetic data for testing association rule mining algorithms for market basket databases. In J.N. Kok, J. Koronacki, R.L. de Mántaras, S. Matwin, D. Mladenic, A. Skowron (Eds.), PKDD (pp. 398–405). Springer. Cooper, C., & Zito, M. (2007). Realistic synthetic data for testing association rule mining algorithms for market basket databases. In J.N. Kok, J. Koronacki, R.L. de Mántaras, S. Matwin, D. Mladenic, A. Skowron (Eds.), PKDD (pp. 398–405). Springer.
Zurück zum Zitat Gouda, K., & Zaki, M.J. (2005). Genmax: an efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery, 11(3), 223–242.MathSciNetCrossRef Gouda, K., & Zaki, M.J. (2005). Genmax: an efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery, 11(3), 223–242.MathSciNetCrossRef
Zurück zum Zitat Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In W. Chen, J.F. Naughton, P.A. Bernstein (Eds.), SIGMOD conference (pp. 1–12). ACM. Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In W. Chen, J.F. Naughton, P.A. Bernstein (Eds.), SIGMOD conference (pp. 1–12). ACM.
Zurück zum Zitat Heikinheimo, H., Seppänen, J.K., Hinkkanen, E., Mannila, H., Mielikäinen, T. (2007). Finding low-entropy sets and trees from binary data. In P. Berkhin, R. Caruana, X. Wu (Eds.), KDD (pp. 350–359). ACM. Heikinheimo, H., Seppänen, J.K., Hinkkanen, E., Mannila, H., Mielikäinen, T. (2007). Finding low-entropy sets and trees from binary data. In P. Berkhin, R. Caruana, X. Wu (Eds.), KDD (pp. 350–359). ACM.
Zurück zum Zitat Lenca, P., Meyer, P., Vaillant, B., Lallich, S. (2008). On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.MATHCrossRef Lenca, P., Meyer, P., Vaillant, B., Lallich, S. (2008). On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.MATHCrossRef
Zurück zum Zitat Mampaey, M., & Vreeken, J. (2013). Summarizing categorical data by clustering attributes. Data Mining and Knowledge Discovery, 26(1), 130–173.MATHMathSciNetCrossRef Mampaey, M., & Vreeken, J. (2013). Summarizing categorical data by clustering attributes. Data Mining and Knowledge Discovery, 26(1), 130–173.MATHMathSciNetCrossRef
Zurück zum Zitat Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In C. Beeri, P. Buneman (Eds.), ICDT (pp. 398–416). Springer. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In C. Beeri, P. Buneman (Eds.), ICDT (pp. 398–416). Springer.
Zurück zum Zitat Peckham, J., & (Ed.) (1997). SIGMOD 1997, May 13–15. Tucson: ACM Press. Peckham, J., & (Ed.) (1997). SIGMOD 1997, May 13–15. Tucson: ACM Press.
Zurück zum Zitat Pei, J., Han, J., Mao, R. (2000). Closet: an efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD workshop on research issues in data mining and knowledge discovery (pp. 21–30). Pei, J., Han, J., Mao, R. (2000). Closet: an efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD workshop on research issues in data mining and knowledge discovery (pp. 21–30).
Zurück zum Zitat Pei, Y., & Zaïane, O. (2006). A synthetic data generator for clustering and outlier analysis. Tech. rep. Pei, Y., & Zaïane, O. (2006). A synthetic data generator for clustering and outlier analysis. Tech. rep.
Zurück zum Zitat Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. In Knowledge discovery in databases (pp. 229–248). AAAI/MIT Press. Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. In Knowledge discovery in databases (pp. 229–248). AAAI/MIT Press.
Zurück zum Zitat Ramesh, G., Zaki, M.J., Maniatty, W. (2005). Distribution-based synthetic database generation techniques for itemset mining. In IDEAS (pp. 307–316). IEEE. Ramesh, G., Zaki, M.J., Maniatty, W. (2005). Distribution-based synthetic database generation techniques for itemset mining. In IDEAS (pp. 307–316). IEEE.
Zurück zum Zitat Tan, P.N., Kumar, V., Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In KDD (pp. 32–41). ACM. Tan, P.N., Kumar, V., Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In KDD (pp. 32–41). ACM.
Zurück zum Zitat Vaillant, B., Lenca, P., Lallich, S. (2004). A clustering of interestingness measures. In E. Suzuki, S. Arikawa (Eds.), Discovery science (pp. 290–297). Springer. Vaillant, B., Lenca, P., Lallich, S. (2004). A clustering of interestingness measures. In E. Suzuki, S. Arikawa (Eds.), Discovery science (pp. 290–297). Springer.
Zurück zum Zitat Vreeken, J., van Leeuwen, M., Siebes, A. (2007). Preserving privacy through data generation. In N. Ramakrishnan, O. Zaiane (Eds.), ICDM (pp. 685–690). IEEE. Vreeken, J., van Leeuwen, M., Siebes, A. (2007). Preserving privacy through data generation. In N. Ramakrishnan, O. Zaiane (Eds.), ICDM (pp. 685–690). IEEE.
Zurück zum Zitat Wu, T., Chen, Y., Han, J. (2010). Re-examination of interestingness measures in pattern mining: a unified framework. Data Mining and Knowledge Discovery, 21(3), 371–397.MathSciNetCrossRef Wu, T., Chen, Y., Han, J. (2010). Re-examination of interestingness measures in pattern mining: a unified framework. Data Mining and Knowledge Discovery, 21(3), 371–397.MathSciNetCrossRef
Zurück zum Zitat Zaki, M.J. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3), 372–390.MathSciNetCrossRef Zaki, M.J. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3), 372–390.MathSciNetCrossRef
Zurück zum Zitat Zaki, M.J., & Hsiao, C.J. (1999). Charm: an efficient algorithm for closed association rule mining. Tech. rep., CS Department, Rensselaer Polytech Institute. Zaki, M.J., & Hsiao, C.J. (1999). Charm: an efficient algorithm for closed association rule mining. Tech. rep., CS Department, Rensselaer Polytech Institute.
Zurück zum Zitat Zaki, M.J., & Hsiao, C.J. (2002). Charm: an efficient algorithm for closed itemset mining. In Grossman, Han, Kumar, Mannila, Motwani (Eds.), SDM. SIAM. Zaki, M.J., & Hsiao, C.J. (2002). Charm: an efficient algorithm for closed itemset mining. In Grossman, Han, Kumar, Mannila, Motwani (Eds.), SDM. SIAM.
Zurück zum Zitat Zheng, Z., Kohavi, R., Mason, L. (2001). Real world performance of association rule algorithms. In KDD (pp. 401–406). Zheng, Z., Kohavi, R., Mason, L. (2001). Real world performance of association rule algorithms. In KDD (pp. 401–406).
Metadaten
Titel
Objectively evaluating condensed representations and interestingness measures for frequent itemset mining
verfasst von
Albrecht Zimmermann
Publikationsdatum
01.12.2015
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 3/2015
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-013-0297-9

Weitere Artikel der Ausgabe 3/2015

Journal of Intelligent Information Systems 3/2015 Zur Ausgabe

Premium Partner