skip to main content
article
Free Access

Dynamic partitioning of signature files

Published:01 October 1991Publication History
First page image

References

  1. 1 AHUJA, S. R., AND ROBERTS, C. S. An associative parallel processor for partial match retrieval using superimposed codes. In Proceedings of the 7 th IEEE Annual Symposium on Computer Architecture (May 1980), pp. 218-227. Google ScholarGoogle Scholar
  2. 2 BERTINO, E., RABITTI, F., AND GraBS, S. Query processing in a multimedia document system. ACM Trans. Off. Inf. Syst. 6 i (Jan. 1988), 1-41. Google ScholarGoogle Scholar
  3. 3 BLOOM, B.H. Space time trade-offs in hash coding with allowable errors. Commun. ACM 13 7 (July 1970), 422-426. Google ScholarGoogle Scholar
  4. 4 CHANG, W. W., ANn SCHEK, H.J. A Signature Access Method for the Starburst Database System. In Proceedings of l5th International Conference on VLDB (Amsterdam, Aug. 1989), pp. 145-153. Google ScholarGoogle Scholar
  5. 5 CHANG, J. W., LEE, J. H., AND LEE, Y. J. Multikey access methods based on term discrimination and signature clustering. In ACM SIGIR, Proceedings of International Conference on Research and Development ~n Information Retrieval (Cambridge, Mass., June 1989), pp. 176-185. Google ScholarGoogle Scholar
  6. 6 CHOU, H. T., DEWITT, D. J., KATZ, R. H., AND KLUG, A. C. Design and implementation of the Wisconsin storage system. Softw. Pract. Exper. 15, 10 (Oct. 1985), 943-962. Google ScholarGoogle Scholar
  7. 7 CHRISTODOULAKIS, S., AND FALOUTSOS, C. Signature files: An access method for documents and its analytical performance evaluation. ACM Trans. Off. Inf. Syst. 2, 4 (Oct. 1984), 267-288. Google ScholarGoogle Scholar
  8. 8 CHRISTODOULAKIS, S, THEODORIDOV, M., Ho, F., PAPA, M., PATHRIA, et. al. Multimedia document presentation, information extraction and document formation in MINOS: A model and a system. ACM Trans. Off. Inf. Syst. 4, 4 (Oct. 1986), 345-383. Google ScholarGoogle Scholar
  9. 9 COLOMB, R. M., AND JAYASOORIAH, J. A clause indexing system for PROLOG based on superimposed coding. Australian Comput. J. 18, I (Feb. 1986), 18-25. Google ScholarGoogle Scholar
  10. 10 DEPPISCn, U. S-tree: A dynamic balanced signature index for office retrieval. In Proceedings of the ACM Conference on Research and Development in Informatwn Retrieval (Pisa, Sept. 8-10, 1986), pp. 77-87. Google ScholarGoogle Scholar
  11. 11 DUBOIS, D., AND PRADE, H. Fuzzy Sets and Systems. Theory and Applications. Academic Press, New York, 1980. Google ScholarGoogle Scholar
  12. 12 FAGIN, R., NIEVERGELT, N., PIPPENGER, N., AND STRONG, H.R. Extendible hashing - A fast access method for dynamic files. ACM Trans. Database Syst. 4, 3 (Sept. 1979), 315-344. Google ScholarGoogle Scholar
  13. 13 FALOUTSOS, C. Integrated access methods for message using signature files. In IFIP Working Conference on Methods and Tools for Office Systems (Pisa, Oct. 1986), pp. 135-157.Google ScholarGoogle Scholar
  14. 14 FALOUTSOS, C. Multiattribute hashing using gray codes. In Proceedings ofACM SIGMOD '86 (Washington, D.C., May 1986), pp. 227-238. Google ScholarGoogle Scholar
  15. 15 FALOTZTSOS, C. Signature-based text retrieval methods: A survey. In IEEE Computer Society Technical Committee on Data Engineering. Special issue on document retrieval, 13, 1 (Mar. 1990), 25 32. Google ScholarGoogle Scholar
  16. 16 FALOUTSOS, C., AND CYIRISTOI)OULAKIS, S. Description and performance analysis of signature file methods for office filing. ACM Trans. Off. Inf. Syst. 5, 3 (July 1987), 237-257. Google ScholarGoogle Scholar
  17. 17 FREESTON, M.W. The BANG file: a new kind of grid file. In Proceedings ofACM SIGMOD '87 (San Francisco, May 1987), pp. 260-269. Google ScholarGoogle Scholar
  18. 18 GONNET, G. H., AND LA~SON~ P. External hashing with limited internal storage. In Proceedings of ACM Symposium on Principles of Database Systems (Los Angeles, Calif. 1982), pp. 256-261. Google ScholarGoogle Scholar
  19. 19 GONNET, G. H., AND LARSON, P. External hashing with limited internal storage. J. ACM 35, 1 (Jan. lggS), 161-261. Google ScholarGoogle Scholar
  20. 20 GRAY, F. Pulse code communications. U.S. Patent 2632058, March 17, 1953.Google ScholarGoogle Scholar
  21. 21 HUTFLESZ, A., Six, H. W., AND WIDMAYER, P. Twin grid files: Space optimizing accessGoogle ScholarGoogle Scholar
  22. 22 LARSON, P. Linear hashing with partial expansions In Froceeatngs o1' Oth lnternattonat Conference on VLDB (Montreal, Oct. 1980), VLDB Endowment, 1980, pp. 212-223Google ScholarGoogle Scholar
  23. 23 LARSON, P., AND KAJLA, A. File organization - implementation of a method guaranteeing retrieval m one access. Commun. ACM 27, 7 (July 1984), 670-677 Google ScholarGoogle Scholar
  24. 24 LEE, D. L. A word-parallel, bit-serial signature processor for superimposed coding. In Proceedings of the 2nd IEEE International Conference on Data Engineering (Los Angeles, Calif., Feb. 1986), pp. 352-359. Google ScholarGoogle Scholar
  25. 25 LEE, D. L., AND LENG, C. Partitioned signature files: Design issues and performance evaluation. ACM Trans. Off. Inf. Syst. 7, 2 (April 1989), 158-180 Google ScholarGoogle Scholar
  26. 26 LEE, D. L., AND LENG, C. A Partitioned signature file structure for multiattribute and text retrieval. In Proceedings of the Sixth IEEE International Conference on Data Engineering (Los Angeles, Calif., Feb. 1990), pp. 389-397. Google ScholarGoogle Scholar
  27. 27 LITWlN, W. Linear hashing: a new tool for files and table addressing. In Proceedings of 6th Internatzonal Conference on VLDB (Montreal, Oct. 1980), pp. 212-223Google ScholarGoogle Scholar
  28. 28 MClLROY, M, D. Development of a spelling list. IEEE Trans Commun. COM-30, 1 (Jan. 1982), 91-99.Google ScholarGoogle Scholar
  29. 29 NmVEROELT, J., HINTERBERGER, H., AND SEVC~K, K C. The grid file: An adaptable, symmetric multikey file structure. ACM Trans. Database Syst. 9, 1 (Mar. 1984), 38-71. Google ScholarGoogle Scholar
  30. 30 OTOO, E. J Balanced multidimensional extendible hash tree. In Procee&ngs of the 5th A CM SIGA CT / SIGMOD Symposium on Principles of Database Systems. (Cambmdge, Mass., Mar 1986) pp 100-113. Google ScholarGoogle Scholar
  31. 31 PFALTZ, J. L., BERMAN, W. J., AND CACLEY, E. M. Partial match retrieval using indexed descriptor files. Comrnun. ACM. 23, 9 (Sept 1980), 522-528. Google ScholarGoogle Scholar
  32. 32 RABITTI, F., AND ZEZULA, P. A dynamic signature technique for multimedia databases. In Proceedings of the ACM SIGIR '90. Internatwnal Conference on Research and Development m Informatwn Retrieval (Brussels, Sept. 1990), pp. 193-210. Google ScholarGoogle Scholar
  33. 33 RABIrTI, F., AND ZLZKA, J. Evaluation of access methods to text documents in office systems. In Proceedzngs of 3rd Joint ACM-BCS Symposium on Research and Development ~n Informatwn Retrieval (Cambridge, England, July 1984), pp. 21-40. Google ScholarGoogle Scholar
  34. 34 RAMAMOHANARAO, t~., AND LLOYD, L W Dynamic hashing schemes Comput J. 25, 4 (Nov. 1982), 478-485.Google ScholarGoogle Scholar
  35. 35 RAMAMOHANARAO, K., AND SACKS-DAvIS, R. Recursive linear hashing. ACM Trans. Database Syst. 9, 3 (Sept. 1984), 369-391. Google ScholarGoogle Scholar
  36. 36 RAMAMOHANARAO, K , AND SHEPHERD, J. A superimposed codeword indexing scheme for very large prolog databases. In Proceedings of the 3rd International Conference on Logic Programming (London, July 1986), pp. 569-576. Google ScholarGoogle Scholar
  37. 37 RAMAMOHANARAO, K., AND TOUT, W.R. Dynamic external hashing with guaranteed single access retrieval. In Proceedings of FODO 1989 (Paris, June 1989), Springer-Verlag, pp. 187-201 Google ScholarGoogle Scholar
  38. 38 ROBERTS, C. S. Partial match retrieval via the method of the superimposed codes. Prec. IEEE 67, 12 (Dec. 1979), 1624-1642.Google ScholarGoogle Scholar
  39. 39 Romr4SON, J T. The K-D-B-Tree: A search structure for large multidimensional dynamic index. In Proceedings ofACM SIGNOD '81 (Ann Arbor, Mich., Apr.-May 1981), pp. 10-18. Google ScholarGoogle Scholar
  40. 40 SACKS-DAviS, R., AND RAMAMOHANARAO, K. Multikey access methods based on superimposed coding t~chniquos. ACM Trans. Database Syst 12, 4 (Dec. 1987), 655 696. Google ScholarGoogle Scholar
  41. 41 SACK-DAviS, R., AND RAMAMOHANARAO, K. A two level superimposed coding scheme for partial match retrieval. Inf. Syst. 8, 4 (Oct. 1983), 273-280.Google ScholarGoogle Scholar
  42. 42 SPRU6NOLI, R.J. Perfect hashing functions: A single probe retrieval method for static sets. Comrnun. ACM. 29, 12 (Dec. 1986), 1229-1239. Google ScholarGoogle Scholar
  43. 43 STANFILL, C., AND KAHLE, B. Parallel free-text on the connection machine system. Commun ACM. 29, 12 (Dec. 1986), 1229-1239. Google ScholarGoogle Scholar
  44. 44 THANOS, C., ED. Multimedia office filing and retrieval: The MULTOS approach. In North- Holland Series ~n Human Factors in Information Technology, North-Holland, Amsterdam, 1990. Google ScholarGoogle Scholar
  45. 45 TSICHRITZIS, D., CHRISTODOULAKIS, S., ECONOMOPOULES, P., FALOUTSOS, C., LEE, A., LEE, D., VA~DEMBROEK, J., AND Woo, C. A multimedia office filing system. In Proceedings of 9th International Conference on VLDB (Florence, Oct. 1983), VLDB Endowment, 1983, pp. 2-7. Google ScholarGoogle Scholar
  46. 46 WONG, H. K. T., LIu, H., OLKEN, F., ROTEM, D., AND WONG, L. Bit transposed files. In Proceedings of llth International Conference on VLDB (Stockholm, Aug. 1985), pp. 448-457.Google ScholarGoogle Scholar
  47. 47 Z~ZUL^, P., AND ZIZKA, J. FOPES: File organization performance prediction system In Foundations of Data Organization. S. Ghosh, Y. Kambayashi, and K. Tanaka, Eds. Plenum Press, New York, 1987, pp. 399-406.Google ScholarGoogle Scholar
  48. 48 ZEZULA, P. How to organize statistical files. J. Official Stat 3, i (Jan. 1987), 3-17.Google ScholarGoogle Scholar
  49. 49 ZEZULA, P. Linear hashing for signature files. In Proceedings of the IFIP TC6 and TC8 International Symposium on Network Information Processing Systems. (Sofia, Bulgaria, May 1988), pp. 192-196. Also available from Network Information Processing Systems, K. Boyanov, and R. Angelinov Eds. Elsevier Science, IFIP, 1989, 243-250.Google ScholarGoogle Scholar
  50. 50 ZEZULA, P., AND TmEmo, P. Selecting signature files for specific applications. CIOC-CNR, Bologna, Tech. Rep. 74, Nov. 1990.Google ScholarGoogle Scholar

Index Terms

  1. Dynamic partitioning of signature files

                    Recommendations

                    Reviews

                    Fazli Can

                    Signature files are especially useful for the retrieval of unformatted data such as text. In a signature file, each record is represented by a bit string called its signature. The weak point of this approach is the difficulty of term weight preservation; it is known that the quality of information retrieval is greatly affected by term weights. In this paper, a new signature file structure, quick filter, is introduced and evaluated. The method exploits the linear hashing scheme and, therefore, is appropriate for dynamic environments without the worry of updating an index file. The method is applicable to superimposed signatures, and it partitions the file using signature suffixes. The experiments show that it is suitable for applications supported by large files that are searched by query signatures with high weights (that is, with many 1s). The experiments also considered the relationship between query response time and the distribution of signature pages on the secondary storage. The space and retrieval performance comparisons of quick filter with sequential, bit-slice, and S-tree signature file structures demonstrate that the method is space efficient and provides the quickest file access for queries with high weights. This clearly written paper is useful for researchers working on methods of access to unformatted data. The extensive literature review, the derivation of the performance estimation formulas, the experimental design, and the pointers for further research will be especially beneficial. The method is simple and nice. The paper is enjoyable and worthy of the reader's attention.

                    Access critical reviews of Computing literature here

                    Become a reviewer for Computing Reviews.

                    Comments

                    Login options

                    Check if you have access through your login credentials or your institution to get full access on this article.

                    Sign in

                    Full Access

                    • Published in

                      cover image ACM Transactions on Information Systems
                      ACM Transactions on Information Systems  Volume 9, Issue 4
                      Oct. 1991
                      111 pages
                      ISSN:1046-8188
                      EISSN:1558-2868
                      DOI:10.1145/119311
                      Issue’s Table of Contents

                      Copyright © 1991 ACM

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      • Published: 1 October 1991
                      Published in tois Volume 9, Issue 4

                      Permissions

                      Request permissions about this article.

                      Request Permissions

                      Check for updates

                      Qualifiers

                      • article

                    PDF Format

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader