Cluster-grouping: from subgroup discovery to clustering

Zimmermann, Albrecht; De Raedt, Luc

doi:10.1007/s10994-009-5121-y

Cluster-grouping: from subgroup discovery to clustering

Published: 16 June 2009

Volume 77, pages 125–159, (2009)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Cluster-grouping: from subgroup discovery to clustering

Download PDF

Albrecht Zimmermann¹ &
Luc De Raedt¹

1689 Accesses
21 Citations
3 Altmetric
Explore all metrics

Abstract

We introduce the problem of cluster-grouping and show that it can be considered a subtask in several important data mining tasks, such as subgroup discovery, mining correlated patterns, clustering and classification. The algorithm CG for solving cluster-grouping problems is then introduced, and it is incorporated as a component in several existing and novel algorithms for tackling subgroup discovery, clustering and classification. The resulting systems are empirically compared to state-of-the-art systems such as CN2, CBA, Ripper, Autoclass and CobWeb. The results indicate that the CG algorithm can be useful as a generic local pattern mining component in a wide variety of data mining and machine learning algorithms.

Article PDF

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

References

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th international conference on very large databases (pp. 487–499), Santiago de Chile, Chile, September 1994. San Mateo: Morgan Kaufmann.
Google Scholar
Atzmüller, M., & Puppe, F. (2006). SD-Map—a fast algorithm for exhaustive subgroup discovery. In J. Fürnkranz, T. Scheffer, & M. Spiliopoulou (Eds.), Proceedings of the tenth European conference on principles and practice of knowledge discovery in databases (pp. 6–17). Berlin: Springer.
Google Scholar
Bay, S. D., & Pazzani, M. J. (2001). Detecting group differences: mining constrast sets. Data Mining and Knowledge Discovery, 5(3), 213–246.
Article MATH Google Scholar
Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases.
Blockeel, H., De Raedt, L., & Ramon, J. (1998). Top-down induction of clustering trees. In J. W. Shavlik (Ed.), Proceedings of the fifteenth international conference on machine learning (pp. 55–63). San Mateo: Morgan Kaufmann.
Google Scholar
Brin, S., Motwani, R., & Silverstein, C. (1997). Beyond market baskets: Generalizing association rules to correlations. In J. Peckham (Ed.), SIGMOD 1997, Proceedings ACM SIGMOD international conference on management of data (pp. 265–276). New York: ACM.
Chapter Google Scholar
Bringmann, B., & Zimmermann, A. (2005). Tree²—Decision trees for tree structured data. In A. Jorge, L. Torgo, P. Brazdil, R. Camacho, & J. Gama (Eds.), 9th European conference on principles and practice of knowledge discovery in databases (pp. 46–58). Berlin: Springer.
Google Scholar
Cardie, C. (1993). Using decision trees to improve case-based learning. In Proceedings of the tenth international conference on machine learning (pp. 25–32), Amherst, Massachusetts, USA, June 1993. San Mateo: Morgan Kaufmann.
Google Scholar
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., & Freeman, D. (1988). Autoclass: A Bayesian classification system. In J. E. Laird (Ed.), Proceedings of the fifth international conference on machine learning (pp. 54–64), Ann Arbor, Michigan, USA, June 1988. San Mateo: Morgan Kaufmann.
Google Scholar
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, 261–283.
Google Scholar
Coenen, F., & Leng, P. (2005). Obtaining best parameter values for accurate classification. In J. Han, B. W. Wah, V. Raghavan, X. Wu, & R. Rastogi (Eds.), Proceedings of the fifth IEEE international conference on data mining (pp. 597–600), Houston, Texas, USA, November 2005. New York: IEEE.
Chapter Google Scholar
Cohen, W. W. (1995). Fast effective rule induction. In A. Prieditis, & S. J. Russell (Eds.), Proceedings of the twelfth international conference on machine learning (pp. 115–123), Tahoe City, California, USA, July 1995. San Mateo: Morgan Kaufmann.
Google Scholar
De Raedt, L. (2008). Logical and relational learning. Cognitive technologies. Berlin: Springer.
Book MATH Google Scholar
Dietterich, T. G., & Bakiri, G. (1991). Error-correcting output codes: A general method for improving multiclass inductive learning programs. In Proceedings of the 9th national conference on artificial intelligence (pp. 572–577), Anaheim, California, USA, July 1991. Menlo Park/Cambridge: AAAI Press/MIT Press.
Google Scholar
Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th international joint conference on artificial intelligence (pp. 1022–1029), Chambéry, France, August 1993. San Mateo: Morgan Kaufmann.
Google Scholar
Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2(2), 139–172.
Google Scholar
Fisher, D. H. (1996). Iterative optimization and simplification of hierarchical clusterings. Journal of Artificial Intelligence Research (JAIR), 4, 147–178.
MATH Google Scholar
Fisher, D. H., & Hapanyengwi, G. (1993). Database management and analysis tools of machine learning. Journal of Intelligent Information Systems, 2, 5–38.
Article Google Scholar
Flach, P. A., & Lachiche, N. (2001). Confirmation-guided discovery of first-order rules with Tertius. Machine Learning, 42(1/2), 61–95.
Article MATH Google Scholar
Frank, E., & Witten, I. H. (1999). Data mining: practical machine learning tools and techniques with Java implementations. San Mateo: Morgan Kaufmann.
Google Scholar
Fürnkranz, J. (2004). From local to global patterns: Evaluation issues in rule learning algorithms. In Morik et al. (2004) (pp. 20–38).
Fürnkranz, J., & Flach, P. A. (2005). ROC ‘n’ rule learning-towards a better understanding of covering algorithms. Machine Learning, 58(1), 39–77.
Article MATH Google Scholar
Gluck, M. A., & Corter, J. E. (1985). Information, uncertainty, and the utility of categories. In Proceedings of the 7th annual conference of the cognitive science society (pp. 283–287), Irvine, California, USA, 1985. Hillsdale: Erlbaum.
Google Scholar
Höppner, F. (2004). Local pattern detection and clustering. In Morik et al. (2004) (pp. 53–70).
Kavsek, B., & Lavrac, N. (2006). Apriori-SD: Adapting association rule learning to subgroup discovery. Applied Artificial Intelligence, 20(7), 543–583.
Article Google Scholar
Klösgen, W. (1996). Explora: A multipattern and multistrategy discovery assistant. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining. Cambridge: MIT Press.
Google Scholar
Lavrač, N., Flach, P. A., & Zupan, B. (1999). Rule evaluation measures: A unifying view. In S. Džeroski, P. A. Flach (Eds.), Proceedings of the 9th international workshop on inductive logic programming (pp. 174–185), Bled, Slovenia, June 1999. Berlin: Springer.
Google Scholar
Lavrač, N., Kavsek, B., Flach, P. A., & Todorovski, L. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5, 153–188.
Google Scholar
Li, W., Han, J., & Pei, J. (2001). CMAR: Accurate and efficient classification based on multiple class-association rules. In N. Cercone, T. Y. Lin, & X. Wu (Eds.), Proceedings of the 2001 IEEE international conference on data mining (pp. 369–376), San José, California, USA, November 2001. Los Alamitos: IEEE Computer Society.
Google Scholar
Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. In R. Agrawal, P. E. Stolorz, & G. Piatetsky-Shapiro (Eds.), Proceedings of the fourth international conference on knowledge discovery and data mining (pp. 80–86), New York City, New York, USA, August 1998. Menlo Park: AAAI Press.
Google Scholar
Masulli, F., & Valentini, G. (2000). Effectiveness of error correcting output codes in multiclass learning problems. In J. Kittler, & F. Roli (Eds.), Proceedings on the first international workshop on multiple classifier systems (pp. 107–116), Cagliari, Italy, June 2000. Berlin: Springer.
Chapter Google Scholar
Michalski, R. S., & Stepp, R. E. (1983). Learning from observation: Conceptual clustering. Machine Learning, An Artificial Intelligence Approach, 1, 331–363.
Google Scholar
Morik, K., Boulicaut, J.-F., & Siebes, A. (Eds.) (2004). Local pattern detection, international seminar, revised selected papers. Dagstuhl Castle, Germany, April 2004. Berlin: Springer.
Google Scholar
Morishita, S., & Sese, J. (2000). Traversing itemset lattices with statistical metric pruning. In Proceedings of the nineteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (pp. 226–236), Dallas, Texas, USA, May 2000. New York: ACM.
Chapter Google Scholar
Murthy, S. K. (1997). On growing better decision trees from data. PhD thesis, John Hopkins University, Baltimore, Maryland, USA.
Mutter, S., Hall, M., & Frank, E. (2004). Using classification to evaluate the output of confidence-based association rule mining. In G. I. Webb, & X. Yu (Eds.), Proceedings of the 17th Australian joint conference on artificial intelligence (pp. 538–549), Cairns, Australia, December 2004. Berlin: Springer.
Google Scholar
Nevins, A. J. (1995). A branch and bound incremental conceptual clusterer. Machine Learning, 18(1), 5–22.
Google Scholar
Perkowitz, M., & Etzioni, O. (1999). Adaptive web sites: Conceptual cluster mining. In T. Dean (Ed.), Proceedings of the sixteenth international joint conference on artificial intelligence (pp. 264–269), Stockholm, Sweden, July 1999. San Mateo: Morgan Kaufmann.
Google Scholar
Riddle, P. J., Segal, R., & Etzioni, O. (1994). Representation design and brut-force induction in a Boeing manufacturing domain. Applied Artificial Intelligence, 8(1), 125–147.
Article Google Scholar
Scheffer, T., & Wrobel, S. (2002). Finding the most interesting patterns in a database quickly by using sequential sampling. Journal of Machine Learning Research, 3, 833–862.
Article MathSciNet Google Scholar
Sese, J., & Morishita, S. (2004). Itemset classified clustering. In J.-F. Boulicaut, F. Esposito, F. Giannotti, & D. Pedreschi (Eds.), Proceedings of the 8th European conference on principles of data mining and knowledge discovery (pp. 398–409), Pisa, Italy, September 2004. Berlin: Springer.
Google Scholar
Talavera, L. (2000). Dynamic feature selection in incremental hierarchical clustering. In R. L. de Mántaras, & E. Plaza (Eds.), Proceedings of the 11th European conference on machine learning (pp. 392–403), Barcelona, Catalonia, Spain, May 2000. Berlin: Springer.
Google Scholar
Webb, G. I. (1995). Opus: An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research, 3, 431–465.
MATH Google Scholar
Webb, G. I. (2007). Discovering significant patterns. Machine Learning, 68(1), 1–33.
Article Google Scholar
Webb, G. I., & Zhang, S. (2005). K-optimal rule discovery. Data Mining and Knowledge Discovery, 10(1), 39–79.
Article MathSciNet Google Scholar
Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In J. Komorowski, & J. Zytkow (Eds.), Proceedings of the first European symposium on principles of data mining and knowledge discovery (PKDD ’97) (pp. 78–87), Trondheim, Norway, 1997. Berlin: Springer.
Google Scholar
Zimmermann, A., & De Raedt, L. (2004a). Cluster-grouping: From subgroup discovery to clustering. In J.-F. Boulicaut, F. Esposito, F. Giannotti, & D. Pedreschi (Eds.), Proceedings of the 15th European conference on machine learning (pp. 575–577), Pisa, Italy, September 2004. Berlin: Springer.
Google Scholar
Zimmermann, A., & De Raedt, L. (2004b). Corclass: Correlated association rule mining for classification. In E. Suzuki, & S. Arikawa (Eds.), Proceedings of the 7th international conference on discovery science (pp. 60–72), Padova, Italy, October 2004. Berlin: Springer.
Google Scholar
Zimmermann, A., & De Raedt, L. (2004c). Inductive querying for discovering subgroups and clusters. In J.-F. Boulicaut, L. De Raedt, & H. Mannila (Eds.), Constraint-based mining and inductive databases (pp. 380–399). Berlin: Springer.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Leuven, P.O. Box 2402, Belgium
Albrecht Zimmermann & Luc De Raedt

Authors

Albrecht Zimmermann
View author publications
You can also search for this author in PubMed Google Scholar
Luc De Raedt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Albrecht Zimmermann.

Additional information

Editor: Johannes Fürnkranz.

This paper integrates and extends an abstract published at ECML 2004 (Zimmermann and De Raedt 2004a) as well as related work, published at DS 2004 (Zimmermann and De Raedt 2004b), and as a book contribution (Zimmermann and De Raedt 2004c).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zimmermann, A., De Raedt, L. Cluster-grouping: from subgroup discovery to clustering. Mach Learn 77, 125–159 (2009). https://doi.org/10.1007/s10994-009-5121-y

Download citation

Received: 18 March 2005
Revised: 07 May 2009
Accepted: 07 May 2009
Published: 16 June 2009
Issue Date: October 2009
DOI: https://doi.org/10.1007/s10994-009-5121-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Cluster-grouping: from subgroup discovery to clustering

Abstract

Article PDF

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cluster-grouping: from subgroup discovery to clustering

Abstract

Article PDF

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation