skip to main content
10.1145/1081870.1081912acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Reasoning about sets using redescription mining

Published:21 August 2005Publication History

ABSTRACT

Redescription mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of association rule mining, from finding implications to equivalences; as a form of conceptual clustering, where the goal is to identify clusters that afford dual characterizations; and as a form of constructive induction, to build features based on given descriptors that mutually reinforce each other. In this paper, we present the use of redescription mining as an important tool to reason about a collection of sets, especially their overlaps, similarities, and differences. We outline algorithms to mine all minimal (non-redundant) redescriptions underlying a dataset using notions of minimal generators of closed itemsets. We also show the use of these algorithms in an interactive context, supporting constraint-based exploration and querying. Specifically, we showcase a bioinformatics application that empowers the biologist to define a vocabulary of sets underlying a domain of genes and to reason about these sets, yielding significant biological insight.

References

  1. Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In 1st International Conference on Computational Logic, July 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: a maximal frequent itemset algorithm for transactional databases. In IEEE Intl. Conf. on Data Engineering, pages pp. 443--452, April 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer-Verlag, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A.P. Gasch, P.T. Spellman, C.M. Kao, O. Carmel-Harel, M.B. Eisen, G. Storz, D. Botstein, and P.O. Brown. Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes. Mol. Biol. Cell, Vol. 11:pages 4241--4257, 2000.]]Google ScholarGoogle ScholarCross RefCross Ref
  5. L. Nourine and O. Raynaud. A fast algorithm for building lattices. Information Processing Letters, 71:199--204, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J.L. Pflatz and R.E. Jamison. Closure systems and their structure. Information Sciences, 139:275--286, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  7. N. Ramakrishnan, D. Kumar, B. Mishra, M. Potts, and R.F. Helm. Turning CARTwheels: An Alternating Algorithm for Mining Redescriptions. In Proc. KDD'04, pages 266--275, Aug 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Sturn, J. Quackenbush, and Z. Trajanoski. Genesis: Cluster Analysis of Microarray Data. Bioinformatics, Vol. 18(1):pages 207--208, 2002.]]Google ScholarGoogle Scholar
  9. J. Wang, J. Han, and J. Pei. Closet+: Searching for the best strategies for mining frequent closed itemsets. In ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, August 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J.J. Wyrick, F.C. Holstege, E.G. Jennings, H.C. Causton, D. Shore, M. Grunstein, E.S. Lander, and R.A. Young. Chromosomal Landscape of Nucleosome-Dependent Gene Expression and Silencing in Yeast. Nature, Vol. 402:pages 418--421, 1999.]]Google ScholarGoogle Scholar
  11. M. J. Zaki. Generating non-redundant association rules. In 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, pages pp. 34--43, August 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. J. Zaki and C.-J. Hsiao. ChARM: An efficient algorithm for closed itemset mining. In 2nd SIAM International Conference on Data Mining, pages pp. 457--473, April 2002.]]Google ScholarGoogle ScholarCross RefCross Ref
  13. M. J. Zaki and C.-J. Hsiao. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 17(4):462--478, April 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Reasoning about sets using redescription mining

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
        August 2005
        844 pages
        ISBN:159593135X
        DOI:10.1145/1081870

        Copyright © 2005 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 August 2005

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader