skip to main content
article

Concise descriptions of subsets of structured sets

Published:01 March 2005Publication History
Skip Abstract Section

Abstract

We study the problem of economical representation of subsets of structured sets, which are sets equipped with a set cover or a family of preorders. Given a structured set U, and a language L whose expressions define subsets of U, the problem of minimum description length in L (L-MDL) is: “given a subset V of U, find a shortest string in L that defines V.” Depending on the structure and the language, the MDL-problem is in general intractable. We study the complexity of the MDL-problem for various structures and show that certain specializations are tractable. The families of focus are hierarchy, linear order, and their multidimensional extensions; these are found in the context of statistical and OLAP databases. In the case of general OLAP databases, data organization is a mixture of multidimensionality, hierarchy, and ordering, which can also be viewed naturally as a cover-structured ordered set. Efficient algorithms are provided for the MDL-problem for hierarchical and linearly ordered structures, and we prove that the multidimensional extensions are NP-complete. Finally, we illustrate the application of the theory to summarization of large result sets and (multi) query optimization for ROLAP queries.

References

  1. Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. 1998. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of SIGMOD 1998. ACM Press, New York, NY, 94--105.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agrawal, R., Gupta, A., and Sarawagi, S. 1997. Modeling multidimensional databases. In Proceedings of ICDE 1997. 232--243.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Batchelor, B. 1980. Hierarchical shape description based on convex hulls of concavities. J. Cybernet. 10, 205--210.]]Google ScholarGoogle ScholarCross RefCross Ref
  4. Cabibbo, L. and Torlone, R. 1997. Querying multidimensional databases. In Proceedings of the 6th DBPL Workshop. 319--335.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cabibbo, L. and Torlone, R. 1998. A logical approach to multidimensional databases. In Proceedings of EDBT 1998. 183--197.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Edmonds, J., Gryz, J., Liang, D., and Miller, R. J. 2001. Mining for empty rectangles in large data sets. In Proceedings of ICDT 2001. 174--188.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gyssens, M. and Lakshmanan, L. 1997. A foundation for multi-dimensional databases. In Proceedings of VLDB 1997. 106--115.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hansen, M. H. and Yu, B. 2001. Model selection and the principle of minimum description length. J. Amer. Statist. Assoc. 96, 454, 746--774.]]Google ScholarGoogle ScholarCross RefCross Ref
  10. Hurtado, C. and Mendelzon, A. 2002. OLAP dimensional constraints. In Proceedings of PODS 2002. 169--179.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kalnis, P. and Papadias, D. 2001. Optimization Algorithms for Simultaneous Multidimensional Queries in OLAP Environments. Lecture Notes in Computer Science, vol. 2114. Springer-Verlag, Berlin, Germany, 264--273.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Keil, J. 1999. Polygon decomposition. In Handbook of Computational Geometry. Elsevier Sciences, Amsterdem, The Netherlands, Chap. 11, 491--518.]]Google ScholarGoogle Scholar
  13. Kimball, R. 1996. The Data Warehouse Toolkit. Wiley, New York, NY.]]Google ScholarGoogle Scholar
  14. Kumar, V. S. A. and Ramesh, H. 1999. Covering rectilinear polygons with axis-parallel rectangles. In Proceedings of the ACM Symposium on Theory of Computing 1999. ACM Press, New York, NY, 445--454.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lakshmanan, L., Ng, R. T., Wang, C. X., Zhou, X., and Johnson, T. J. 1999. The generalized MDL approach for summarization. In Proceedings of VLDB 1999. 445--454.]]Google ScholarGoogle Scholar
  16. Lam, W. and Bacchus, F. 1994. Learning Bayesian belief networks: An approach based on the MDL principle. Comput. Intel. 10, 269--293.]]Google ScholarGoogle ScholarCross RefCross Ref
  17. Levcopoulos, C. and Gudmundsson, J. 1997. Approximation algorithms for covering polygons with squares and similar problems. In Proceedings of the International Workshop on Randomization and Approximation Techniques in Computer Science. Lecture Notes in Computer Science, vol. 1269. Springer, Berlin, Germany, 27--41.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Liang, W. and Orlowska, M. 2000. Optimizing multiple dimensional queries simultaneously in multidimensional databases. VLDB J. 8, 319--338.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lodi, E., Luccio, F., Mugnai, C., and Pagli, L. 1979. On two-dimensional data organization I. Fundam. Inform. 2, 211--226.]]Google ScholarGoogle Scholar
  20. Park, C., Kim, M., and Lee, Y. 2001. Rewriting OLAP queries using materialized views and dimension hierarchies in data warehouses. In Proceedings of ICDE 2001. 515--523.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tor, S. and Middleditch, A. 1984. Convex decomposition of simple polygons. ACM Trans. Graph. 3, 244--265.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zhao, Y., Deshpande, P., Naughton, J., and Shukla, A. 1998. Simultaneous optimization and evaluation of multiple dimensional queries. In Proceedings of SIGMOD 1998. 271--282.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Concise descriptions of subsets of structured sets

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader