skip to main content
10.1145/773153.773166acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Concise descriptions of subsets of structured sets

Published:09 June 2003Publication History

ABSTRACT

We study the problem of economical representation of subsets of structured sets, that is, sets equipped with a set cover. Given a structured set U, and a language L whose expressions define subsets of U, the problem of Minimum Description Length in L (L-MDL) is: "given a subset V of U, find a shortest string in L that defines V".We show that the simple set cover is enough to model a number of realistic database structures. We focus on two important families: hierarchical and multidimensional organizations. The former is found in the context of semistructured data such as XML, the latter in the context of statistical and OLAP databases. In the case of general OLAP databases, data organization is a mixture of multidimensionality and hierarchy, which can also be viewed naturally as a structured set. We study the complexity of the L-MDL problem in several settings, and provide an efficient algorithm for the hierarchical case.Finally, we illustrate the application of the theory to summarization of large result sets, (multi) query optimization for ROLAP queries, and XML queries.

References

  1. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. SIGMOD, pages 94--105. ACM Press, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, New York, 1979.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Gudmandsson and C. Levcopoulos. Close approximations of minimum rectangular covering. In FST & TCS'96, volume 1180 of LNCS, pages 135--146, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Kalnis and D. Papadias. Optimization algorithms for simultaneous multidimensional queries in OLAP environments. Lecture Notes in Computer Science, 2114:264--??, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Kimball. The Data Warehouse Toolkit. Wiley, 1996.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. S. Anil Kumar and H. Ramesh. Covering rectilinear polygons with axis-parallel rectangles. In Proceedings of the ACM Symposium on Theory of Computing, pages 445--454, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L.V.S. Lakshmanan, R. T. Ng, C. X. Wang, X. Zhou, and T. J. Johnson. The generalized MDL approach for summarization. In Proc. 28th VLDB Conference, pages 445--454, 1999.]]Google ScholarGoogle Scholar
  8. C. Levcopoulos and J. Gudmundsson. Approximation algorithms for covering polygons with squares and similar problems. In RANDOM: International Workshop on Randomization and Approximation Techniques in Computer Science. LNCS, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. N. De Meneses and C. C. De Souza. Exact solutions of rectangular partitions via integer programming. International Journal of Computational Geometry and Applications, 10(5):477--522, 2000.]]Google ScholarGoogle ScholarCross RefCross Ref
  10. Y. Zhao, P.M. Deshpande, J.F. Naughton, and A. Shukla. Simultaneous optimization and evaluation of multiple dimensional queries. In Proc. SIGMOD, pages 271--282, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Concise descriptions of subsets of structured sets

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
                June 2003
                291 pages
                ISBN:1581136706
                DOI:10.1145/773153
                • Conference Chair:
                • Frank Neven,
                • General Chair:
                • Catriel Beeri,
                • Program Chair:
                • Tova Milo

                Copyright © 2003 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 9 June 2003

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                PODS '03 Paper Acceptance Rate27of136submissions,20%Overall Acceptance Rate642of2,707submissions,24%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader