article

Free Access

T2: a customizable parallel database for multi-dimensional data

Authors:
Chialin Chang

Dept. of Computer Science, University of Maryland, College Park, MD

Dept. of Computer Science, University of Maryland, College Park, MD
View Profile

,
Anurag Acharya

Dept. of Computer Science, University of California, Santa Barbara, CA

Dept. of Computer Science, University of California, Santa Barbara, CA
View Profile

,
Alan Sussman

Dept. of Computer Science, University of Maryland, College Park, MD

Dept. of Computer Science, University of Maryland, College Park, MD
View Profile

,
Joel Saltz

Dept. of Computer Science, University of Maryland, College Park, MD and Dept. of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD

Dept. of Computer Science, University of Maryland, College Park, MD and Dept. of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 27 Issue 1March 1998pp 58–66https://doi.org/10.1145/273244.273264

Published:01 March 1998Publication History

ACM SIGMOD Record

Abstract

As computational power and storage capacity increase, processing and analyzing large volumes of data play an increasingly important part in many domains of scientific research. Typical examples of large scientific datasets include long running simulations of time-dependent phenomena that periodically generate snapshots of their state (e.g. hydrodynamics and chemical transport simulation for estimating pollution impact on water bodies [4, 6, 20], magnetohydrodynamics simulation of planetary magnetospheres [32], simulation of a flame sweeping through a volume [28], airplane wake simulations [21]), archives of raw and processed remote sensing data (e.g. AVHRR [25], Thematic Mapper [17], MODIS [22]), and archives of medical images (e.g. confocal light microscopy, CT imaging, MRI, sonography).

These datasets are usually multi-dimensional. The data dimensions can be spatial coordinates, time, or experimental conditions such as temperature, velocity or magnetic field. The importance of such datasets has been recognized by several database research groups and vendors, and several systems have been developed for managing and/or visualizing them [2, 7, 14, 19, 26, 27, 29, 31].

These systems, however, focus on lineage management, retrieval and visualization of multi-dimensional datasets. They provide little or no support for analyzing or processing these datasets -- the assumption is that this is too application-specific to warrant common support. As a result, applications that process these datasets are usually decoupled from data storage and management, resulting in inefficiency due to copying and loss of locality. Furthermore, every application developer has to implement complex support for managing and scheduling the processing.

Over the past three years, we have been working with several scientific research groups to understand the processing requirements for such applications [1, 5, 6, 10, 18, 23, 24, 28]. Our study of a large set of applications indicates that the processing for such datasets is often highly stylized and shares several important characteristics. Usually, both the input dataset as well as the result being computed have underlying multi-dimensional grids, and queries into the dataset are in the form of ranges within each dimension of the grid. The basic processing step usually consists of transforming individual input items, mapping the transformed items to the output grid and computing output items by aggregating, in some way, all the transformed input items mapped to the corresponding grid point. For example, remote-sensing earth images are often generated by performing atmospheric correction on several days worth of raw telemetry data, mapping all the data to a latitude-longitude grid and selecting those measurements that provide the clearest view.

In this paper, we present T2, a customizable parallel database that integrates storage, retrieval and processing of multi-dimensional datasets. T2 provides support for many operations including index generation, data retrieval, memory management, scheduling of processing across a parallel machine and user interaction. It achieves its primary advantage from the ability to seamlessly integrate data retrieval and processing for a wide variety of applications and from the ability to maintain and process multiple datasets with different underlying grids. Most other systems for multi-dimensional data have focused on uniformly distributed datasets, such as images, maps, and dense multi-dimensional arrays. Many real datasets, however, are non-uniform or unstructured. For example, satellite data is a two dimensional strip that is embedded in a three dimensional space; water contamination studies use unstructured meshes to selectively simulate regions and so on. T2 can handle both uniform and non-uniform datasets.

T2 has been developed as a set of modular services. Since its structure mirrors that of a wide variety of applications, T2 is easy to customize for different types of processing. To build a version of T2 customized for a particular application, a user has to provide functions to pre-process the input data, map input data to elements in the output data, and aggregate multiple input data items that map to the same output element.

T2 presents a uniform interface to the end users (the clients of the database system). Users specify the dataset(s) of interest, a region of interest within the dataset(s), and the desired format and resolution of the output. In addition, they select the mapping and aggregation functions to be used. T2 analyzes the user request, builds a suitable plan to retrieve and process the datasets, executes the plan and presents the results in the desired format.

In Section 2 we first present several motivating applications and illustrate their common structure. Section 3 then presents an overview of T2, including its distinguishing features and a running example. Section 4 describes each database service in some detail. An example of how to customize several of the database services for a particular application is given in Section 5. T2 is a system in evolution. We conclude in Section 6 with a description of the current status of both the T2 design and the implementation of various applications with T2.

Index Terms

T2: a customizable parallel database for multi-dimensional data
1. Information systems
  1. Data management systems
    1. Database design and models
  2. Information systems applications

Recommendations

T2-adjusted computed diffusion-weighted imaging

PurposeTo introduce T2-adjusted computed DWI (T2-cDWI), a method that provides synthetic images at arbitrary b-values and echo times (TEs) that improve tissue contrast by removing or increasing T2 contrast in diffusion-weighted images. Materials and ...
Read More
Fully automatic brain extraction algorithm for axial T2-weighted magnetic resonance images

In this paper we propose two brain extraction algorithms (BEA) for T2-weighted magnetic resonance imaging (MRI) scans. The T2-weighted image is first filtered with a low pass filter (LPF) to remove or subdue the background noise. Then the image is ...
Read More
T2: A Customizable Parallel Database For Multi-dimensional Data
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMOD Record Volume 27, Issue 1
March 1998
103 pages
ISSN:0163-5808
DOI:10.1145/273244
Editor:
Michael Franklin
Univ. of Maryland, College Park
Issue’s Table of Contents
Copyright © 1998 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 1998
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 248
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

T2: a customizable parallel database for multi-dimensional data

ACM SIGMOD Record

Abstract

Cited By

Index Terms

Recommendations

T2-adjusted computed diffusion-weighted imaging

Fully automatic brain extraction algorithm for axial T2-weighted magnetic resonance images

T2: A Customizable Parallel Database For Multi-dimensional Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

T2: a customizable parallel database for multi-dimensional data

ACM SIGMOD Record

Abstract

Cited By

Index Terms

Recommendations

T2-adjusted computed diffusion-weighted imaging

Fully automatic brain extraction algorithm for axial T2-weighted magnetic resonance images

T2: A Customizable Parallel Database For Multi-dimensional Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media