article

Free Access

Adaptive record clustering

Authors:
C. T. Yu

Univ. of Illinois at Chicago, Chicago

Univ. of Illinois at Chicago, Chicago
View Profile

,
Cheing-mei Suen

Univ. of Illinois at Chicago, Chicago

Univ. of Illinois at Chicago, Chicago
View Profile

,
K. Lam

Univ. of Illinois at Chicago, Chicago

Univ. of Illinois at Chicago, Chicago
View Profile

,
M. K. Siu

Univ. of Illinois at Chicago, Chicago

Univ. of Illinois at Chicago, Chicago
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 10 Issue 2pp 180–204https://doi.org/10.1145/3857.3861

Published:01 June 1985Publication History

ACM Transactions on Database Systems

Abstract

An algorithm for record clustering is presented. It is capable of detecting sudden changes in users' access patterns and then suggesting an appropriate assignment of records to blocks. It is conceptually simple, highly intuitive, does not need to classify queries into types, and avoids collecting individual query statistics. Experimental results indicate that it converges rapidly; its performance is about 50 percent better than that of the total sort method, and about 100 percent better than that of randomly assigning records to blocks.

References

1 CHANG, J., AND FU, K. Extended K-d tree database organization: A dynamic multiattribute clustering method. IEEE Softw. Eng. (1981), 284-290.]]Google Scholar
2 DEW{TT, D., et al. Implementation techniques for main memory database systems. ACM SIGMOD, (1984), 1-8.]] Google ScholarDigital Library
3 ESWARAN, K.P. Placement of records in a file and file allocation in a computer network. IFIP (Aug. 1974), 304-307.]]Google Scholar
4 FLORY, A., GUNTHER, J., AND KOULOUMDIJAN, J. Database reorganization by clustering methods. Inf. Syst. 3, 1 (1978), 59-62.]]Google ScholarCross Ref
5 GHOSH, S.P. Database organization for Data Management. Academic Press, New York, 1977.]] Google ScholarDigital Library
6 HAMMER, M., AND CHAN, A. Index selection in a self-adaptive database management system. ACM SIGMOD (1976), 1-8.]] Google Scholar
7 HAMMER, IVL, AND NIAMIR, B. A heuristic approach to attribute partitioning in a self-adaptive database management system. ACM SIGMOD (1976), 1-8.]] Google ScholarDigital Library
8 JAKOBSSON, M. Reducing block accesses in inverted files by partial clustering. Inf. Syst. 5 (1980),1-5.]]Google ScholarCross Ref
9 KNUTH, D. Sorting and Searching. Vol. 3, Addison-Wesley, Reading, Mass., 397-398.]]Google Scholar
10 Llou, J. H., AND YAO, S.B. Multidimensional clustering for database organization, inf. Syst. 2 (!977), !87-!98.]]Google Scholar
11 OMIECINSKI, E., AND SCHEUERMANN, P. A global approach to record clustering and file reorganization. Tech. Rep., Dept. of EECS, Northwestern Univ., Dec. 1983.]]Google Scholar
12 RIVEST, R. On self-organizing sequential search heuristics. Commun. ACM (!976), 63-67.]] Google ScholarDigital Library
13 SALTON, G. Dynamic Information and Library Processing. Prentice-Hail, Englewood Cliffs, N.J., 1975.]] Google ScholarDigital Library
14 SCHEFFE, H. The Analysis of Variance. John Wiley, New York, 1959.]]Google Scholar
15 VAN RIGSBERGEN, C.J. Information Retrieval. 2nd. Ed., Butterworth, London, 1980.]]Google Scholar
16 WILLIARD, D. Efficiently processing relational calculus expressions using range query'theory. ACM SIGMOD (1984), 164-175.]] Google ScholarDigital Library
17 YAO, S.B. Approximating block accesses in database organization. Commun. ACM 20 (1977), 260-261.]] Google ScholarDigital Library
18 Yu, C. T., AND CHEN, C. H. Adaptive document clustering. To appear in ACM SIGIR Conference, June 1985.]] Google ScholarDigital Library
19 Yu, C. T., SIu, M. K., AND CHEN, C.H. File allocation in distributed databases with interaction between files. In Proceedings of Conference on Very Large Data Bases, (1983), 248-259.]] Google ScholarDigital Library
20 Yu, C. T., Sxu, M. K., LAM, K., AND CHEN, C.H. Adaptive file allocation in'a star computer network. IEEE COMPSAC (1983), 537-546. (Selected for reprint in IEEE Trans. Softw. Eng.).]] Google ScholarDigital Library
21 Yu, C. T., SIu, M. K., LAM, K., AND TA{, F. Adaptive clustering schemes: General framework. IEEE COMPAC (Nov. 1981), 81-89.]]Google Scholar

Index Terms

Adaptive record clustering

Recommendations

Document clustering as a record linkage problem
DocEng '18: Proceedings of the ACM Symposium on Document Engineering 2018

This work examines document clustering as a record linkage problem, focusing on named-entities and frequent terms, using several vector and graph-based document representation methods and k-means clustering with different similarity measures. The JedAI ...
Read More
Hierarchical Adaptive Clustering

This paper studies an adaptive clustering problem. We focus on re-clustering an object set, previously clustered, when the feature set characterizing the objects increases. We propose an adaptive clustering method based on a hierarchical agglomerative ...
Read More
An Approach of Standardization and Searching based on Hierarchical Bayesian Clustering (HBC) for Record Linkage System
C5 '07: Proceedings of the Fifth International Conference on Creating, Connecting and Collaborating through Computing

Information sources on the Web are controlled by different text formats, and have varying inconsistencies. Data form many online sources do not contain enough information to accurately link the records. To link record from different data sources, any ...
Read More

Reviews

Reviewer: Donald Harris Kraft

This paper defines the problem under consideration as the appropriate assignment of physical records (clustering) in a database to blocks so as to minimize the expected number of blocks accessed in response to a query. The average is taken over all possible queries. Since the problem has been shown to be NP-hard, heuristic clustering algorithms have been developed. One can totally sort the records on the basis of a concatenated sort key. This sort key is formed by concatenating each record's values for the various attributes in order of the frequency of use of the attributes. A partial sort modification sorts all the records in a buffer but does not bother to merge the buffers in the interest of efficiency. The algorithm presented here, however, allows the records to be clustered adaptively. First, each record is assigned an arbitrary position on the real line. Then, as each query occurs, those records deemed relevant to the query are moved closer to their centroid by an amount proportional to their distances from the centroid. The centroid is the average position of the relevant records. Moreover, as part of the query processing mechanism, an identical number of records is chosen at random, and each record is moved away from its centroid. Over time, a clear pattern of clustering will emerge. An analysis of the algorithm shows conditions for a proper constant of proportionality for moving similar records closer together towards their centroid. The objective is to decrease the sum of the squared distances of records in a cluster from their centroid summed over all clusters, while increasing the sum of the squared distances of cluster centroids from the centroid of all records. An Appendix gives the gory details of this analysis. The algorithm is then refined to take into account the relative cost, in terms of the expected number of blocks accessed, of the minimum clustering mechanism. This is done by adjusting the distance each record is moved. Of course, the final step is the assignment of clustered records to blocks. The authors present an impressive array of experimental results. They conclude that adaptive clustering can yield significant improvement over random assignment and even over total sorting. This is explained by the uneven access distribution (20 percent of the records are retrieved in response to 80 percent). The ability of this algorithm to yield improvement makes this paper necessary reading for those interested in retrieval issues.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Database Systems Volume 10, Issue 2
June 1985
159 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/3857
Issue’s Table of Contents

Copyright © 1985 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 1985
Published in tods Volume 10, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 51
  Total Citations
  View Citations
- 447
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adaptive record clustering

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Document clustering as a record linkage problem

Hierarchical Adaptive Clustering

An Approach of Standardization and Searching based on Hierarchical Bayesian Clustering (HBC) for Record Linkage System

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Adaptive record clustering

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Document clustering as a record linkage problem

Hierarchical Adaptive Clustering

An Approach of Standardization and Searching based on Hierarchical Bayesian Clustering (HBC) for Record Linkage System

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media