2001 | OriginalPaper | Chapter
Data Mining in Integrated Data Access and Data Analysis Systems
Authors : Ruixin Yang, Menas Kafatos, Kwang-Su Yang, X. Sean Wang
Published in: Data Mining for Scientific and Engineering Applications
Publisher: Springer US
Included in: Professional Book Archive
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
The rapid increase in the volume of scientific data sets has resulted in distributed data information systems applicable to Earth system science. Such a system should help users to locate data sets, to provide preliminary research results quickly and to support data deliveries under users’ request. At George Mason University, we have been developing a data information system with both search and analysis components. In this system, three phases of data accesses are supported: phase one for meta-data search; phase two for on-line data analysis; and phase three for data ordering. For large volumes of data, searching on meta-data only will not be adequate. Scientists often need to search for data based on actual data values. This is a particular kind of data mining, which searches for data sets based on data content.In this chapter, we first describe the system architecture. We then develop the concept of a data pyramid model and propose a histogram clustering technique for content-based searches. We use the model and the related technique to answer content-based queries approximately but efficiently. We will also describe our prototypes that integrate the content-based searches into a data information system.