skip to main content
article
Free Access

Duplicate record elimination in large data files

Published:01 June 1983Publication History
Skip Abstract Section

Abstract

The issue of duplicate elimination for large data files in which many occurrences of the same record may appear is addressed. A comprehensive cost analysis of the duplicate elimination operation is presented. This analysis is based on a combinatorial model developed for estimating the size of intermediate runs produced by a modified merge-sort procedure. The performance of this modified merge-sort procedure is demonstrated to be significantly superior to the standard duplicate elimination technique of sorting followed by a sequential pass to locate duplicate records. The results can also be used to provide critical input to a query optimizer in a relational database system.

References

  1. 1 ASTRAHAN, M., BLASGEN, M.W., CHAMBERLIN, D.D., ESWARAN, K.P., GRAY, J.N., GRIFFITHS, P.P., KING, W.F., LORIE, R.A., MCJONES, P.R., MEHL, J.W., PUTZOLU, G.R., TRAIGER, I.L., WAVE, B.W., AND WATSON. V. System-R: A relational approach to database management. ACM Trans. Database Syst. I, 2 (June 1976), 97-137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 BABB E. Implementing a relational database by means of specialized hardware. A CM Trans. Database Syst. 4, 1 (March 1979), pp. 1-29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 KNUTH, D.E. The Art of Computer Programming, Vol. 3. Addison-Wesley, Reading, Mass., 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 MUNRO, I., AND SPIRA, P.M. Sorting and searching in multisets. Siam J. Comput. 5, 1 (March 1976).Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Duplicate record elimination in large data files

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Database Systems
          ACM Transactions on Database Systems  Volume 8, Issue 2
          June 1983
          120 pages
          ISSN:0362-5915
          EISSN:1557-4644
          DOI:10.1145/319983
          Issue’s Table of Contents

          Copyright © 1983 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 June 1983
          Published in tods Volume 8, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader