Skip to main content
Erschienen in: Knowledge and Information Systems 3/2016

01.06.2016 | Regular Paper

Mining exceptional relationships with grammar-guided genetic programming

verfasst von: Jose Maria Luna, Mykola Pechenizkiy, Sebastian Ventura

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Given a database of records, it might be possible to identify small subsets of data which distribution is exceptionally different from the distribution in the complete set of data records. Finding such interesting relationships, which we call exceptional relationships, in an automated way would allow discovering unusual or exceptional hidden behaviour. In this paper, we formulate the problem of mining exceptional relationships as a special case of exceptional model mining and propose a grammar-guided genetic programming algorithm (MERG3P) that enables the discovery of any exceptional relationships. In particular, MERG3P can work directly not only with categorical, but also with numerical data. In the experimental evaluation, we conduct a case study on mining exceptional relations between well-known and widely used quality measures of association rules, which exceptional behaviour would be of interest to pattern mining experts. For this purpose, we constructed a data set comprising a wide range of values for each considered association rule quality measure, such that possible exceptional relations between measures could be discovered. Thus, besides the actual validation of MERG3P, we found that the Support and Leverage measures in fact are negatively correlated under certain conditions, while in general experts in the field expect these measures to be positively correlated.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
2
A sensitivity analysis was carried out. The results and statistical analysis could be reached at http://​www.​uco.​es/​grupos/​kdis/​kdiswiki/​index.​php/​Exceptional_​ARM.
 
3
JCLEC is available for download (http://​jclec.​sourceforge.​net).
 
4
All the data sets are publicly available for download from the UCI machine learning repository (http://​archive.​ics.​uci.​edu/​ml/​datasets/​).
 
Literatur
1.
Zurück zum Zitat Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB’94. Santiago de Chile, Chile, Morgan Kaufmann, pp. 487–499 Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB’94. Santiago de Chile, Chile, Morgan Kaufmann, pp. 487–499
2.
Zurück zum Zitat Berzal F, Blanco I, Sánchez D, Vila MA (2002) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6(3):221–235MATH Berzal F, Blanco I, Sánchez D, Vila MA (2002) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6(3):221–235MATH
3.
Zurück zum Zitat Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):9CrossRef Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):9CrossRef
4.
Zurück zum Zitat Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87MathSciNetCrossRef Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87MathSciNetCrossRef
5.
Zurück zum Zitat McKay RI, Nguyen XH, Whigham PA, Shan Y, O’Neill M (2010) Grammar-based genetic programming: a survey. Genet Program Evol Mach, 11(3–4):365–396 McKay RI, Nguyen XH, Whigham PA, Shan Y, O’Neill M (2010) Grammar-based genetic programming: a survey. Genet Program Evol Mach, 11(3–4):365–396
6.
Zurück zum Zitat Jaroszewicz S (2008) Minimum variance associations—discovering relationships in numerical data. In: The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Osaka, Japan, pp. 172–183 Jaroszewicz S (2008) Minimum variance associations—discovering relationships in numerical data. In: The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Osaka, Japan, pp. 172–183
7.
Zurück zum Zitat Koh YS, Rountree N (2010) Rare association rule mining and knowledge discovery: technologies for infrequent and critical event detection. Information Science Reference, HersheyCrossRef Koh YS, Rountree N (2010) Rare association rule mining and knowledge discovery: technologies for infrequent and critical event detection. Information Science Reference, HersheyCrossRef
8.
Zurück zum Zitat Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. The MIT Press, CambridgeMATH Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. The MIT Press, CambridgeMATH
9.
10.
11.
Zurück zum Zitat Leman D, Feelders A, Knobbe AJ (2008) Exceptional model mining. In: Proceedings of the European Conference in Machine Learning and Knowledge Discovery in Databases, volume 5212 of ECML/PKDD 2008, Antwerp, Belgium, Springer, pp. 1–16 Leman D, Feelders A, Knobbe AJ (2008) Exceptional model mining. In: Proceedings of the European Conference in Machine Learning and Knowledge Discovery in Databases, volume 5212 of ECML/PKDD 2008, Antwerp, Belgium, Springer, pp. 1–16
12.
Zurück zum Zitat Luna JM, Romero JR, Ventura S (2012) Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl Inf Syst 32(1):53–76CrossRef Luna JM, Romero JR, Ventura S (2012) Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl Inf Syst 32(1):53–76CrossRef
13.
Zurück zum Zitat Luna JM, Romero JR, Ventura S (2014) On the adaptability of G3PARM to the extraction of rare association rules. Knowl Inf Syst 38(2):391–418CrossRef Luna JM, Romero JR, Ventura S (2014) On the adaptability of G3PARM to the extraction of rare association rules. Knowl Inf Syst 38(2):391–418CrossRef
14.
Zurück zum Zitat Romero C, Luna JM, Romero JR, Ventura S (2010) Mining rare association rules from e-learning data. In: Proceedings of the 3rd International Conference on Educational Data Mining, EDM 2010, pp. 171–180 Romero C, Luna JM, Romero JR, Ventura S (2010) Mining rare association rules from e-learning data. In: Proceedings of the 3rd International Conference on Educational Data Mining, EDM 2010, pp. 171–180
15.
Zurück zum Zitat Romero C, Luna JM, Romero JR, Ventura S (2011) RM-Tool: a framework for discovering and evaluating association rules. Adv Eng Softw 42(8):566–576CrossRef Romero C, Luna JM, Romero JR, Ventura S (2011) RM-Tool: a framework for discovering and evaluating association rules. Adv Eng Softw 42(8):566–576CrossRef
16.
Zurück zum Zitat Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30:57–86CrossRef Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30:57–86CrossRef
17.
Zurück zum Zitat Ventura S, Romero C, Zafra A, Delgado JA, Hervás C (2008) JCLEC: a java framework for evolutionary computation. Soft Comput 12(4):381–392CrossRef Ventura S, Romero C, Zafra A, Delgado JA, Hervás C (2008) JCLEC: a java framework for evolutionary computation. Soft Comput 12(4):381–392CrossRef
18.
Zurück zum Zitat Webb GI (2001) Discovering associations with numeric variables. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’01. New York, NY, USA, ACM, pp. 383–388 Webb GI (2001) Discovering associations with numeric variables. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’01. New York, NY, USA, ACM, pp. 383–388
19.
Zurück zum Zitat Zafra A, Pechenizkiy M, Ventura S (2012) ReliefF-MI: an extension of ReliefF to multiple instance learning. Neurocomputing 75(1):210–218CrossRef Zafra A, Pechenizkiy M, Ventura S (2012) ReliefF-MI: an extension of ReliefF to multiple instance learning. Neurocomputing 75(1):210–218CrossRef
Metadaten
Titel
Mining exceptional relationships with grammar-guided genetic programming
verfasst von
Jose Maria Luna
Mykola Pechenizkiy
Sebastian Ventura
Publikationsdatum
01.06.2016
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2016
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-015-0859-y

Weitere Artikel der Ausgabe 3/2016

Knowledge and Information Systems 3/2016 Zur Ausgabe