skip to main content
article
Free Access

Adaptive record clustering

Published:01 June 1985Publication History
Skip Abstract Section

Abstract

An algorithm for record clustering is presented. It is capable of detecting sudden changes in users' access patterns and then suggesting an appropriate assignment of records to blocks. It is conceptually simple, highly intuitive, does not need to classify queries into types, and avoids collecting individual query statistics. Experimental results indicate that it converges rapidly; its performance is about 50 percent better than that of the total sort method, and about 100 percent better than that of randomly assigning records to blocks.

References

  1. 1 CHANG, J., AND FU, K. Extended K-d tree database organization: A dynamic multiattribute clustering method. IEEE Softw. Eng. (1981), 284-290.]]Google ScholarGoogle Scholar
  2. 2 DEW{TT, D., et al. Implementation techniques for main memory database systems. ACM SIGMOD, (1984), 1-8.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 ESWARAN, K.P. Placement of records in a file and file allocation in a computer network. IFIP (Aug. 1974), 304-307.]]Google ScholarGoogle Scholar
  4. 4 FLORY, A., GUNTHER, J., AND KOULOUMDIJAN, J. Database reorganization by clustering methods. Inf. Syst. 3, 1 (1978), 59-62.]]Google ScholarGoogle ScholarCross RefCross Ref
  5. 5 GHOSH, S.P. Database organization for Data Management. Academic Press, New York, 1977.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6 HAMMER, M., AND CHAN, A. Index selection in a self-adaptive database management system. ACM SIGMOD (1976), 1-8.]] Google ScholarGoogle Scholar
  7. 7 HAMMER, IVL, AND NIAMIR, B. A heuristic approach to attribute partitioning in a self-adaptive database management system. ACM SIGMOD (1976), 1-8.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 JAKOBSSON, M. Reducing block accesses in inverted files by partial clustering. Inf. Syst. 5 (1980),1-5.]]Google ScholarGoogle ScholarCross RefCross Ref
  9. 9 KNUTH, D. Sorting and Searching. Vol. 3, Addison-Wesley, Reading, Mass., 397-398.]]Google ScholarGoogle Scholar
  10. 10 Llou, J. H., AND YAO, S.B. Multidimensional clustering for database organization, inf. Syst. 2 (!977), !87-!98.]]Google ScholarGoogle Scholar
  11. 11 OMIECINSKI, E., AND SCHEUERMANN, P. A global approach to record clustering and file reorganization. Tech. Rep., Dept. of EECS, Northwestern Univ., Dec. 1983.]]Google ScholarGoogle Scholar
  12. 12 RIVEST, R. On self-organizing sequential search heuristics. Commun. ACM (!976), 63-67.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 SALTON, G. Dynamic Information and Library Processing. Prentice-Hail, Englewood Cliffs, N.J., 1975.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 SCHEFFE, H. The Analysis of Variance. John Wiley, New York, 1959.]]Google ScholarGoogle Scholar
  15. 15 VAN RIGSBERGEN, C.J. Information Retrieval. 2nd. Ed., Butterworth, London, 1980.]]Google ScholarGoogle Scholar
  16. 16 WILLIARD, D. Efficiently processing relational calculus expressions using range query'theory. ACM SIGMOD (1984), 164-175.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17 YAO, S.B. Approximating block accesses in database organization. Commun. ACM 20 (1977), 260-261.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18 Yu, C. T., AND CHEN, C. H. Adaptive document clustering. To appear in ACM SIGIR Conference, June 1985.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19 Yu, C. T., SIu, M. K., AND CHEN, C.H. File allocation in distributed databases with interaction between files. In Proceedings of Conference on Very Large Data Bases, (1983), 248-259.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 Yu, C. T., Sxu, M. K., LAM, K., AND CHEN, C.H. Adaptive file allocation in'a star computer network. IEEE COMPSAC (1983), 537-546. (Selected for reprint in IEEE Trans. Softw. Eng.).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21 Yu, C. T., SIu, M. K., LAM, K., AND TA{, F. Adaptive clustering schemes: General framework. IEEE COMPAC (Nov. 1981), 81-89.]]Google ScholarGoogle Scholar

Index Terms

  1. Adaptive record clustering

            Recommendations

            Reviews

            Donald Harris Kraft

            This paper defines the problem under consideration as the appropriate assignment of physical records (clustering) in a database to blocks so as to minimize the expected number of blocks accessed in response to a query. The average is taken over all possible queries. Since the problem has been shown to be NP-hard, heuristic clustering algorithms have been developed. One can totally sort the records on the basis of a concatenated sort key. This sort key is formed by concatenating each record's values for the various attributes in order of the frequency of use of the attributes. A partial sort modification sorts all the records in a buffer but does not bother to merge the buffers in the interest of efficiency. The algorithm presented here, however, allows the records to be clustered adaptively. First, each record is assigned an arbitrary position on the real line. Then, as each query occurs, those records deemed relevant to the query are moved closer to their centroid by an amount proportional to their distances from the centroid. The centroid is the average position of the relevant records. Moreover, as part of the query processing mechanism, an identical number of records is chosen at random, and each record is moved away from its centroid. Over time, a clear pattern of clustering will emerge. An analysis of the algorithm shows conditions for a proper constant of proportionality for moving similar records closer together towards their centroid. The objective is to decrease the sum of the squared distances of records in a cluster from their centroid summed over all clusters, while increasing the sum of the squared distances of cluster centroids from the centroid of all records. An Appendix gives the gory details of this analysis. The algorithm is then refined to take into account the relative cost, in terms of the expected number of blocks accessed, of the minimum clustering mechanism. This is done by adjusting the distance each record is moved. Of course, the final step is the assignment of clustered records to blocks. The authors present an impressive array of experimental results. They conclude that adaptive clustering can yield significant improvement over random assignment and even over total sorting. This is explained by the uneven access distribution (20 percent of the records are retrieved in response to 80 percent). The ability of this algorithm to yield improvement makes this paper necessary reading for those interested in retrieval issues.

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Database Systems
              ACM Transactions on Database Systems  Volume 10, Issue 2
              June 1985
              159 pages
              ISSN:0362-5915
              EISSN:1557-4644
              DOI:10.1145/3857
              Issue’s Table of Contents

              Copyright © 1985 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 June 1985
              Published in tods Volume 10, Issue 2

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader