skip to main content
article
Free Access

Declustering of key-based partitioned signature files

Published:01 September 1996Publication History
Skip Abstract Section

Abstract

Access methods based on signature files can largely benefit from possibilities offered by parallel environments. To this end, an effective declustering strategy that would distribute signatures over a set of parallel independent disks has to be combined with a synergic clustering which is employed to avoid searching the whole signature file while executing a query. This article proposes two parallel signature file organizations, Hamming Filter (HF) and Hamming + Filter (H+F), whose common declustering strategy is based on error correcting codes, and where clustering is achieved by organizing signatures into fixed-size buckets, each containing signatures sharing the same key value. HF allocates signatures on disks in a static way and works well if a correct relationship holds between the parameters of the code and the size of the file. H+F is a generalization of HF suitable to manage highly dynamic files. It uses a dynamic declustering, obtained through a sequence of codes, and organizes a smooth migration of signatures between disks so that high performance levels are retained regardless of current file size. Theoretical analysis characterizes the best-case, expected, and worst-case behaviors of these organizations. Analytical results are verified by experiments on prototype systems.

References

  1. ABI)EI.-GHAFFAR, K. A. AND EL ABBADI, A. 1993. Optimal disk allocation for partial match queries. ACM Trans. Database Syst. 18, 1 !March), 132-156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ant~aA, S. R. AND ROBERTS, C.S. 1980. An associative parallel processor fi~r partial match retrieval using superimposed codes. In Proceedings of the Seventh Annual Symposium on Computer Architecture (France, May~, IEEE, 218-227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. CI^C~IA, P. AND ZgZm.A, P. 1993. Estimating accesses in partitioned signature file organizations. ACM Trans. Infi Syst. II, 2 (Apri}), 133-142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. DFpPIS~~, U. 1986. S-tree: A dynamic ba}anced signature index for office retrieval. In Proceedingx of thc Nlnth ACM SIG1R International Conference on Research and Developmets! in ln/brmatio, Retrieval fPisa, Ita}y, Sept.), 77-87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D~:WITT, D. AND GR~-W, J. 1992. Para}le} database systems: The future of high performance database systems. (',nnmu~z. ACM 35, 6 (dune), 85-98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D~,, H. C.~~r~ S(m~~l~:wsKh J. S. 1982. Disk allocation for Cartesian product files on multiple disk systems. ACM Trans. Database Syst. 7, I tMarch), 82-101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. FALO('TSOS, C. 1992. Signature files. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice-Hall, Englewood Cliffs, NJ, Chapter 4, 44-65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F^l,oIwsos, C. AND CHRISTODOULAKIS, S. 1987. Optimal signature extraction and information loss~ ACM Trans. Database Syst. 12, 3 (Sept. l, 395-428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. FAL()tTTSOS, C. AND CHR1STOI)OULAKIS, S. 1984. Signature files: An access method for documents and its analytical performance evaluation. ACM Trans. Office Infi Syst. 2, 4 ~Oct.), 267 288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. FALOt5TSOS, C. AND METAXAS, D. 1991. Disk allocation methods using error correcting codes. IEEE Trans. Comput. 40, 8 (Aug.), 907-914. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ft'.nWARA, T., ITO, M. I'~\SAMI, T., KATAOKA, M., AND OKtZl, O. 1991. Performance analysis of disk allocation method using error-correcting codes. IEEE Trans. lnfi TheoO' 37, '2 (March~, 379-384.Google ScholarGoogle Scholar
  12. GRANI)I, F., TIBERIO, P. AND ZEZULA, P. 1992. Frame-slice partitioned para}lel signature files. In Proceedings of the Fifteenth ACM SIGIR International Conference ICopenhagen, Denmark. Junel, 286-297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. LEE, D.L. 1986. A word-parallel, bit-serial signature processor for superimposed coding. In Proceedings of the Second International Conferenee on Data Engineering, (Las Angeles, CA, Feb.) IEEE, New York, 352-359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. LEE, D. L. ^ND LENG, C.-W. 1990. A partitioned signature file structure for multiattribute and text retrieval. In Proceedings of the Sixth International Conference on Data Engineering, IEEE, (Las Angeles, CA, Feb.), 389-397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. LEE, D. L. AND LENG, C.-W. 1989. Partitioned signature files: Design issues and performance evaluation. ACM Trans. Office lnf Syst. 7, 2 (April), 158-180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. LIN, Z. 1993. Concurrent frame signature ff}es. Distrib. Parallel Databases 1, 3 (July), 231-249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. LiN, Z. AND F^LOUTSOS, C. 1992. Frame-sliced signature files. IEEE Trans. Knotvl. Data Eng. 4, 3 (June), 281-289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. MACWILLI^MS, F. J. AND SLO^NE, N. J. A. 1988. The Theorv of Error-Correcting Codes. North-Holland, Amsterdam, The Netherlands.Google ScholarGoogle Scholar
  19. PETERSON, W. W. AND WELDON, E.J. 1972. Error Correcting Codes. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  20. POGUE, C. AND W{LLETT, P. 1987. Use of text signatures for document retrieval in a high parallel environment. Parallel Comput. 4, 259-268.Google ScholarGoogle ScholarCross RefCross Ref
  21. R()BErtTS, C.S. 1979. Partial match retrieval via the method of superimposed codes. Proc. IEEE 67, 12 (Dec.), 1624-1642.Google ScholarGoogle Scholar
  22. STANF{LL, C. AND KAHLE, B. 1986. Paralle} free-text search on the connection machine system. Commun. ACM 29, 12 (Der.), 1229-1239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. SUNG, Y.Y. 1985. Parallel searching for binary Cartesian product files. In Proceedings of the 1985 ACM CSC Conference (New Or{eans, LA, March), 163-172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. TmERIO, P. AND ZEZULA, P. 1993. Selecting signature files for specif~c applications. Inf Process. Manage. 29, 4, 487-498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. VERSOrFr, T. 1987. An updated table of minimum-distance bounds for binary linear codes. IEEE Trans. Inf Theory IT-33 (Sept.), 665-680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. ZEZULA, P., CIACCIA, P., AND TIBER{O, P. 1993. Hamming filter: A dynamic signature file arganization for parallel stores. In Proceedings of the Nineteenth VLDB International Conference, (Dublin, Ireland, Aug.), 314-327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. ZEZULA, P., RABITTI, F., AND TIBER~O, P. 1991. Dynamic partitioning of signature files. ACM Trans. Inf OEvst. 9, 4 (Oct.), 336-369. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Declustering of key-based partitioned signature files

                Recommendations

                Reviews

                Jaroslav Pokorny

                Declustering data on multiple disk systems has become an important research issue concerning partial match and range queries on both formatted and unformatted data. The paper proposes two parallel signature file organizations, Hamming filter and Hamming + filter. These organizations are based on parallelization of the quick filter (QF) key-based organization. Parallelization of QF can be considered as the problem of declustering buckets on disks so that, for each possible query, the I/O load is distributed uniformly among the available disks. In section 2, the authors give basic concepts related to binary linear error-correcting codes, key-based partitioned signature files, and Hamming filter (HF) organization [1]. This dynamic key-based signature file organization for multiple disks combines the dynamic clustering features of QF with a static declustering strategy based on linear error-correcting codes. Section 3 is devoted to an analysis of HF performance. Optimal, expected, and worst-case response times are derived. Further, it is shown that the performance of HF depends on the file size. Section 4 defines Hamming + filter (H + F), a new organization that guarantees high performance by automatically adjusting the static disk allocation strategy used in HF. The basic idea is to use variable-length keys not only to manage the signatures stored on each disk, but to assign signatures to disks. This requires using a sequence of error-correcting codes and one code for specific file size; it results in a smooth migration of signatures between disks, so no reorganization phase is needed. Section 5 summarizes experimental results on HF and H + F obtained by prototype systems developed by the authors. The tests focus on the quality of declustering, that is, how evenly the organization can distribute the I/O load on disks. H + F can guarantee good performance levels for even highly dynamic files, justifying the migration process that linearly reallocates signatures when the file grows or shrinks. Section 6 discusses related work and states some conclusions. The paper concludes with a number of appendices that contain the proofs of the most important theorems. The paper is written at a high level and significantly contributes to the theory of signatures. The results are also of practical importance.

                Access critical reviews of Computing literature here

                Become a reviewer for Computing Reviews.

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader