ABSTRACT
We study the problem of economical representation of subsets of structured sets, that is, sets equipped with a set cover. Given a structured set U, and a language L whose expressions define subsets of U, the problem of Minimum Description Length in L (L-MDL) is: "given a subset V of U, find a shortest string in L that defines V".We show that the simple set cover is enough to model a number of realistic database structures. We focus on two important families: hierarchical and multidimensional organizations. The former is found in the context of semistructured data such as XML, the latter in the context of statistical and OLAP databases. In the case of general OLAP databases, data organization is a mixture of multidimensionality and hierarchy, which can also be viewed naturally as a structured set. We study the complexity of the L-MDL problem in several settings, and provide an efficient algorithm for the hierarchical case.Finally, we illustrate the application of the theory to summarization of large result sets, (multi) query optimization for ROLAP queries, and XML queries.
- R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. SIGMOD, pages 94--105. ACM Press, 1998.]] Google ScholarDigital Library
- M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, New York, 1979.]] Google ScholarDigital Library
- J. Gudmandsson and C. Levcopoulos. Close approximations of minimum rectangular covering. In FST & TCS'96, volume 1180 of LNCS, pages 135--146, 1996.]] Google ScholarDigital Library
- P. Kalnis and D. Papadias. Optimization algorithms for simultaneous multidimensional queries in OLAP environments. Lecture Notes in Computer Science, 2114:264--??, 2001.]] Google ScholarDigital Library
- R. Kimball. The Data Warehouse Toolkit. Wiley, 1996.]]Google ScholarDigital Library
- V. S. Anil Kumar and H. Ramesh. Covering rectilinear polygons with axis-parallel rectangles. In Proceedings of the ACM Symposium on Theory of Computing, pages 445--454, 1999.]] Google ScholarDigital Library
- L.V.S. Lakshmanan, R. T. Ng, C. X. Wang, X. Zhou, and T. J. Johnson. The generalized MDL approach for summarization. In Proc. 28th VLDB Conference, pages 445--454, 1999.]]Google Scholar
- C. Levcopoulos and J. Gudmundsson. Approximation algorithms for covering polygons with squares and similar problems. In RANDOM: International Workshop on Randomization and Approximation Techniques in Computer Science. LNCS, 1997.]] Google ScholarDigital Library
- C. N. De Meneses and C. C. De Souza. Exact solutions of rectangular partitions via integer programming. International Journal of Computational Geometry and Applications, 10(5):477--522, 2000.]]Google ScholarCross Ref
- Y. Zhao, P.M. Deshpande, J.F. Naughton, and A. Shukla. Simultaneous optimization and evaluation of multiple dimensional queries. In Proc. SIGMOD, pages 271--282, 1998.]] Google ScholarDigital Library
Index Terms
- Concise descriptions of subsets of structured sets
Recommendations
Concise descriptions of subsets of structured sets
Special Issue: SIGMOD/PODS 2003We study the problem of economical representation of subsets of structured sets, which are sets equipped with a set cover or a family of preorders. Given a structured set U, and a language L whose expressions define subsets of U, the problem of minimum ...
Relationships between covering-based rough sets and relation-based rough sets
Rough set theory is an important technique to deal with vagueness and granularity in information systems. In rough set theory, relation-based rough sets and covering-based rough sets are two important extensions of the classical rough sets. This paper ...
Granular rough sets and granular shadowed sets: Three-way approximations in Pawlak approximation spaces
AbstractA Pawlak approximation space is a pair of a ground set/space and a quotient set/space of the ground set induced by an equivalence relation on the ground set. The quotient space is a simple granulation of the ground space such that an ...
Comments