ABSTRACT
Redescription mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of association rule mining, from finding implications to equivalences; as a form of conceptual clustering, where the goal is to identify clusters that afford dual characterizations; and as a form of constructive induction, to build features based on given descriptors that mutually reinforce each other. In this paper, we present the use of redescription mining as an important tool to reason about a collection of sets, especially their overlaps, similarities, and differences. We outline algorithms to mine all minimal (non-redundant) redescriptions underlying a dataset using notions of minimal generators of closed itemsets. We also show the use of these algorithms in an interactive context, supporting constraint-based exploration and querying. Specifically, we showcase a bioinformatics application that empowers the biologist to define a vocabulary of sets underlying a domain of genes and to reason about these sets, yielding significant biological insight.
- Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In 1st International Conference on Computational Logic, July 2000.]] Google ScholarDigital Library
- D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: a maximal frequent itemset algorithm for transactional databases. In IEEE Intl. Conf. on Data Engineering, pages pp. 443--452, April 2001.]] Google ScholarDigital Library
- B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer-Verlag, 1999.]] Google ScholarDigital Library
- A.P. Gasch, P.T. Spellman, C.M. Kao, O. Carmel-Harel, M.B. Eisen, G. Storz, D. Botstein, and P.O. Brown. Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes. Mol. Biol. Cell, Vol. 11:pages 4241--4257, 2000.]]Google ScholarCross Ref
- L. Nourine and O. Raynaud. A fast algorithm for building lattices. Information Processing Letters, 71:199--204, 1999.]] Google ScholarDigital Library
- J.L. Pflatz and R.E. Jamison. Closure systems and their structure. Information Sciences, 139:275--286, 2001.]]Google ScholarCross Ref
- N. Ramakrishnan, D. Kumar, B. Mishra, M. Potts, and R.F. Helm. Turning CARTwheels: An Alternating Algorithm for Mining Redescriptions. In Proc. KDD'04, pages 266--275, Aug 2004.]] Google ScholarDigital Library
- A. Sturn, J. Quackenbush, and Z. Trajanoski. Genesis: Cluster Analysis of Microarray Data. Bioinformatics, Vol. 18(1):pages 207--208, 2002.]]Google Scholar
- J. Wang, J. Han, and J. Pei. Closet+: Searching for the best strategies for mining frequent closed itemsets. In ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, August 2003.]] Google ScholarDigital Library
- J.J. Wyrick, F.C. Holstege, E.G. Jennings, H.C. Causton, D. Shore, M. Grunstein, E.S. Lander, and R.A. Young. Chromosomal Landscape of Nucleosome-Dependent Gene Expression and Silencing in Yeast. Nature, Vol. 402:pages 418--421, 1999.]]Google Scholar
- M. J. Zaki. Generating non-redundant association rules. In 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, pages pp. 34--43, August 2000.]] Google ScholarDigital Library
- M. J. Zaki and C.-J. Hsiao. ChARM: An efficient algorithm for closed itemset mining. In 2nd SIAM International Conference on Data Mining, pages pp. 457--473, April 2002.]]Google ScholarCross Ref
- M. J. Zaki and C.-J. Hsiao. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 17(4):462--478, April 2005.]] Google ScholarDigital Library
Index Terms
- Reasoning about sets using redescription mining
Recommendations
Dataless Transitions Between Concise Representations of Frequent Patterns
For many data mining problems in order to solve them it is required to discover frequent patterns. Frequent itemsets are useful e.g. in the discovery of association and episode rules, sequential patterns and clusters. Nevertheless, the number of ...
An efficient algorithm for incrementally mining frequent closed itemsets
The purpose of mining frequent itemsets is to identify the items in groups that always appear together and exceed the user-specified threshold of a transaction database. However, numerous frequent itemsets may exist in a transaction database, hindering ...
Mining uncertain data for constrained frequent sets
IDEAS '09: Proceedings of the 2009 International Database Engineering & Applications SymposiumData mining aims to search for implicit, previously unknown, and potentially useful pieces of information---such as sets of items that are frequently co-occurring together---that are embedded in data. The mined frequent sets can be used in the discovery ...
Comments