Abstract
Relational rule learning algorithms are typically designed to construct classification and prediction rules. However, relational rule learning can be adapted also to subgroup discovery. This paper proposes a propositionalization approach to relational subgroup discovery, achieved through appropriately adapting rule learning and first-order feature construction. The proposed approach was successfully applied to standard ILP problems (East-West trains, King-Rook-King chess endgame and mutagenicity prediction) and two real-life problems (analysis of telephone calls and traffic accident analysis).
Article PDF
Similar content being viewed by others
References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A.I. (1996). Fast discovery of association rules. In Advances in knowledge discovery and data mining (pp. 307–328).
Aronis, J., & Provost, J. F. (1994). Efficiently constructing relational features from background knowledge for inductive machine learning. In AAAI-94 Workshop on Knowledge Discovery in Databases. (pp. 347–358).
Aronis, J. M., Provost, F. J., & Buchanan, B. G. (1996). Exploiting background knowledge in automated discovery. In Knowledge discovery and data mining (pp. 355–358).
Bayardo, R. (2002). Editorial: The many roles of constraints in data mining. SIGKDD Explorations, 4(1), i–ii.
Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning. In Proceedings of the 9th European Conference on Artificial Intelligence (pp. 147–149) Pitman.
Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Proceedings Fifth European Working Session on Learning (pp. 151–163). Berlin, Springer.
Clark, P., & Niblett, T. (1987). Induction in noisy domains. In Progress in Machine Learning (Proceedings of the 2nd European Working Session on Learning) (pp. 11–30). Sigma Press.
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, 261–283.
Cohen, W. W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the 12th International Conference on Machine Learning. Tahoe City, CA (pp. 115–123). Morgan Kaufmann.
Cohen, W. W. & Singer, Y. (1991). Hypothesis-driven constructive induction in AQ17: A method and experiments. In Proceedings of the IJCAI-91 Workshop on Evaluating and Changing Representations in Machine Learning (pp. 13–22).
De Raedt, L., Blockeel, H., Dehaspe, L., & Van Laer, W. (2001). Three companions for data mining in first order logic. In: S. Džeroski and N. Lavrač (Eds.), Relational Data Mining (pp. 105–139). Springer-Verlag.
De Raedt, L., & Dehaspe, L. (1997). Clausal discovery. Machine Learning, 26, 99–146.
Džeroski, S., Cestnik, B., & Petrovski, I. (1993). Using the m-estimate in rule induction. Journal of Computing and Information Technology, 1:1, 37–46.
Džeroski, S., & Lavrač N. (Eds.) (2001). Relational Data Mining. Berlin: Springer-Verlag.
Fawcett, T. (2001). Using Rule Sets to Maximize ROC Performance. In Proceedings of the International Conference on Data Mining (pp. 131–138).
Flach, P., & Lachiche, N. (1999). 1BC: A First-Order Bayesian Classifier. In S. Džeroski & P. Flach (Eds.), Proceedings of the 9th International Workshop on Inductive Logic Programming (pp. 92–103). Springer-Verlag.
Flach, P., Mladenić, D. Moyle, Raeymaekers S., Rauch J., Rawles S., Ribeiro R., Sclep G., Struyf J., Todorovski L., Torgo H. B. L., Wettschereck D., Wu S., Gartner T., Grobelnik M., Kavšek B., Kejkula M., Krzywania D., Lavrač N., & Ljubič P. (2003). On the road to knowledge: Mining 21 years of UK Tra**c Accedents Reports. In: D. Mladenić, N. Lavrač, M. Bohanec, & S. Moyle (Eds.), Data Mining and Decision Support: Integration and Collaboration (pp.143–156). Kluwer.
Gamberger, D., & Lavrač, N. (2002). Expert guided subgroup discovery: Methodology and application. Journal of Artificial Intelligence Research, 17, 501–527.
Garofalakis, M., & Rastogi, R. (2000). Scalable data mining with model constraints. SIKDD Explorations 2:2, 39–48.
Geibel, P., & Wysotzki, F. (1996). Learning relational concepts with decision trees. In L. Saitta (Ed.), Proceedings of the 13th International Conference on Machine Learning (pp. 166–174). Morgan Kaufmann.
Imielinsky, T., & Mannila, H. (1996). A database perspective on knowledge discovery. Communications of the ACM, 39:11, 58–64.
Kavšek, B., & Lavrač (2004). Analysis of example weighting in subgroup discoveryby comparison of three algorithms on a real-life data set. In J. Fuernkranz (Ed.), Proceedings of the ECML/PKDD Workshop on Advances in Inductive Rule Learning (pp. 64–76).
Kloesgen, W. (1996). EXPLORA: A multipattern and multistrategy discovery assistant. In Advances in Knowledge Discovery and Data Mining. (pp. 249–271). Menlo Park, CA: AAAI Press.
Kloesgen, W., & May, M. (2002). Census Data Mining—An Application. In Procs. 6th European Conference on Principles and Practice of Knowlede Discovery in Databases.
Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In Proceedings of the International Conference on Machine Learning (pp. 284–292).
Kramer, S., Lavrač, N., & Flach, P. (2001). Propositionalization Approaches to Relational Data Mining. In S. Džeroski & N. Lavrač (Eds.), Relational Data Mining (pp. 262–291). Springer-Verlag.
Kramer, S., Pfahringer, B., & Helma, C. (1998). Stochastic Propositionalizationof Non-determinate Background Knowledge. In D. Page (Ed.), Proceedings of the 8th International Conference on Inductive Logic Programming, Vol. 1446 of Lecture Notes in Artificial Intelligence (pp. 80–94). Springer-Verlag.
Krogel, M.-A., Rawles, S., & Železný, F., Flach, P. A., Lavrač, N., & Wrobel, S. (2003). Comparative evaluation of approaches to propositionalization. In Proceedings of the 13th International Conference on Inductive Logic Programming. Springer-Verlag.
Lavrač, N., & Džeroski, S. (1994). Inductive Logic Programming: Techniques and Applications. Ellis Horwood.
Lavrač, N. & Flach, P. A. (2001). An extended transformation approach to inductivelogic programming. ACM Transactions on Computational Logic, 2:4, 458–494.
Lavrač, N., Gamberger, D., & Jovanoski, V. (1999). A study of relevance for learningin deductive databases. Journal of Logic Programming, 40:2/3, 215–249.
Lavrač, N., Kavšek, B., Flach, P., & Todorovski, L. (2004). Subgroup Discovery with CN2-SD. Journal of Machine Learning Research, 5, 153–188.
Mannila, H., & Toivonen, H. (1997). Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1:3, 241–258.
Michie, D., Muggleton, S., Page, D., & Srinivasan, A. (1994). To the international computing community: A new East-West challenge. Technical report, Oxford University Computing Laboratory, Oxford, UK.
Muggleton, S. (1992). Inductive Logic Programming. Academic Press.
Muggleton, S. (1995). Inverse Entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming 13:3–4, 245–286.
Muggleton, S., Bain, M., Hayes-Michie, J., & Michie, D. (1989). An experimentalcomparison of human and machine learning formalism. In Proceedings of the 6th International Workshop on Machine Learning. (pp. 113–118).
Oliveira, A., & Sangiovanni-Vincentelli, A. (1992). Constructive induction using a non-greedy strategy for feature selection. In Proceedings of the 9th InternationalWorkshop on Machine Learning.
Pagallo, G., & Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine Learning, 5:1, 71–99.
Provost, F. J., & Fawcett, T. (1998). Robust classification systems for imprecise environments. In Proceedings of the 15th Conference on Artificial Intelligence (pp. 706–713).
Quinlan, J. (1990). Learning logical definitions from Relations. Machine Learning, 5, 239–266.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.
Rivest, R. L. (1987). Learning decision lists. Machine Learning 2:3, 229–246.
Sebag, M., & Rouveirol, C. (1997). Tractable induction and classification in first-order logic via stochastic matching. In Proceedings of the 15th InternationalJoint Conference on Artificial Intelligence (pp. 888–893). Morgan Kaufmann.
Srinivasan, A., & King, R. (1996). Feature construction with Inductive Logic Programming: A study of quantitative predictions of biological activity aided bystructural attributes. In Proceedings of the 6th International Workshop on Inductive Logic Programming. (pp. 89–104). Springer-Verlag.
Srinivasan, A., Muggleton, S. H., Sternberg, M. J. E., & King, R. D. (1996). Theories for mutagenicity: A study in first-order and feature-based induction. Artificial Intelligence, 84, 277–299.
Stahl, I. (1996). Predicate invention in inductive logic programming. In L. De Raedt (Ed.), Advances in Inductive Logic Programming. IOS Press (pp. 34–47).
Suzuki, E. (2004). Discovering interesting exception rules with rule pair. In J. Fuernkranz (Ed.), Proceedings of the ECML/PKDD Workshop on Advances in Inductive Rule Learning (pp. 163–178).
Turney, P. (1996). Low size-complexity inductive logic programming: the east-west challenge considered as a problem in cost-sensitive classification. In L. De Raedt (Ed.), Advances in Inductive Logic Programming. IOS Press (pp. 308–321).
Witten, I. H., & Frank, E. (1999). Data Mining: Practical Machine Learning Toolsand Techniques with Java Implementations. Morgan Kaufmann.
Witten, I. H., Frank, E., Trigg, L., Hall, M., Holmes, G., & Cunningxham, S. J. (1999). Weka: Practical Machine Learning Tools and Techniques with Java Implementations.
Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In J.Komorowski & J. Zytkow (Eds.), Proceedings of the First European Symposion on Principles of Data Mining and Knowledge Discovery (PKDD-97) (pp. 78–87). Berlin, Springer Verlag.
Wrobel, S. (2001). Inductive logic programming for knowledge discovery indatabases. In S. Džeroski & N. Lavrač (Eds.), Relational Data Mining. (pp. 74–101) Springer-Verlag.
Wrobel, S., & Džeroski, S. (1995). The ILP description learning problem: Towardsa general model-level definition of data mining in ILP. In K. Morik & J. Herrmann (Eds.), Proceedings of the Fachgruppentreffen Maschinelles Lernen(FGML-95). 44221 Dortmund, Univ. Dortmund.
Železný, F., Mikšovský, P., Štepánková, O., & Zídek, J. (2000). ILP for automated telephony. In J. Cussens & A. Frisch (Eds.), Proceedings of the Work-in-Progress Track at the 10th International Conference on Inductive Logic Programming (pp. 276–286).
Železný, F., Zídek, J., & Štěpánková, O. (2002). A learning system for decision support in telecommunications. In Proceedings of the 1st International Conference on Computing in an Imperfect World, Belfast 4/2002. Springer-Verlag.
Zucker, J.-D., & Ganascia, J.-G. (1996). Representation changes for efficient learning in structural domains. In L. Saitta (Ed.), Proceedings of the 13th International Conference on Machine Learning (pp. 543–551). Morgan Kaufmann
Zucker, J.-D., & Ganascia, J.-G. (1998). Learning structurally indeterminate clauses. In D. Page (Ed.), Proceedings of the 8th International Conference on Inductive Logic Programming (pp. 235–244). Springer-Verlag.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Hendrik Blockeel, David Jensen and Stefan Kramer
An erratum to this article is available at http://dx.doi.org/10.1007/s10994-006-8633-8.
Rights and permissions
About this article
Cite this article
Železný, F., Lavrač, N. Propositionalization-based relational subgroup discovery with RSD. Mach Learn 62, 33–63 (2006). https://doi.org/10.1007/s10994-006-5834-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-5834-0