skip to main content
article

The design of a retrieval technique for high-dimensional data on tertiary storage

Published:01 June 2002Publication History
Skip Abstract Section

Abstract

In high-energy physics experiments, large particle accelerators produce enormous quantities of data, measured in hundreds of terabytes or petabytes per year, which are deposited onto tertiary storage. The experiments are designed to study the collisions of fundamental particles, called "events", each of which is represented as a point in a multi-dimensional universe. In these environments, the best retrieval performance can be achieved only if the data is clustered on the tertiary storage by all searchable attributes of the events. Since the number of these attributes is high, the underlying data-management facility must be able to cope with extremely large volumes and very high dimensionalities of data at the same time. The proposed indexing technique is designed to facilitate both clustering and efficient retrieval of high-dimensional data on tertiary storage. The structure uses an original space-partitioning scheme, which has numerous advantages over other space-partitioning techniques. While the main objective of the design is to support high-energy physics experiments, the proposed solution is appropriate for many other scientific applications.

References

  1. S. Berchtold, C. Bohm and H. P. Kriegel, "The Pyramid-Technique: Towards Breaking the Curse of Dimensionality," Proc. ACM SIGMOD Int. Conf. on Management of Data, 142-153, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Berchtold, D. A. Keim and H. P. Kriegel, "The X-tree: An Index Structure for High-Dimensional Data," Proc. 22nd Int. VLDB Conf., 28-39, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. S. Beyer, J. Goldstein, R. Ramakrishnan and U. Shaft, "When Is 'Nearest Neighbor' Meaningful?" Proc. 7th Int. Conf. on DB Theory, 217-235, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Comer, "The Ubiquitous B-tree," ACM Comp. Surveys,11(2):121-137, 1979.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. Foster and C. Kesselman, The Grid: Blueprint for a New Computing Infrastucture, Chapter 5, "Data-Intensive Computing," Morgan Kaufmann, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. Gaede and O. Gunther, "Multidimensional Access Methods," ACM Comp. Surveys,30(2):170-231, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Guttman, "R-trees: A Dynamic Index Structure for Spatial Searching," Proc. ACM SIGMOD Int. Conf. on Management of Data, 47-54, 1984.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. Hoschek, J. Jaen-Martinez, A. Samar, H. Stockinger and K. Stockinger, "Data Management in an International Data Grid Project," Proc. 1st IEEE/ACM Int. Workshop on Grid Computing, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Malon, Argonne National Laboratory, 2001 (private communication).]]Google ScholarGoogle Scholar
  10. E. J. Otoo, A. Shoshani and S. Hwang, "Clustering High Dimensional Massive Scientific Datasets," Proc. 13th Int. Conf. on Scientific and Statistical Database Management SSDBM'01, 147-157, 2001.]]Google ScholarGoogle Scholar
  11. R. Orlandic and B. Yu, "Implementing KDB-Trees to Support High-Dimensional Data," Proc. Int. Database Engineering and Applications Symposium IDEAS'2001, 58-67, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. T. Robinson, "The K-D-B Tree: A Search Structure for Large Multidimensional Dynamic Indexes," Proc. ACM SIGMOD Int. Conf. on Management of Data, 10-18, 1981.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Sakurai, M. Yoshikawa, S. Uemura and H. Kojima, "The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation," Proc. 26th Int. VLDB Conf., 516-526, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Shiers, "Building a Multi-Petabyte Database: The RD45 Project at CERN," in M. E. S. Loomis and A. B. Chaudri, editors, Object Databases in Practice, 164-176, Prentice Hall, 1997.]]Google ScholarGoogle Scholar
  15. A. Shoshani, L. M. Bernardo, H. Nordberg, D. Rotem and A. Sim, "Multidimensional Indexing and Query Coordination for Tertiary Storage Management," Proc. 11th Int. Conf. on Scientific and Statistical Database Management SSDBM'99, 214-225, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Shoshani, A. Sim, L. M. Bernardo and H. Nordberg, "Coordinating Simultaneous Caching of File Bundles from Tertiary Storage," Proc. 12th Int. Conf. on Scientific and Statistical Database Management SSDBM'2000, 196-206, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Weber, H.-J. Schek and S. Blott, "A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces," Proc. 24th Int. VLDB Conf., 194-205, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The design of a retrieval technique for high-dimensional data on tertiary storage
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader