2001 | OriginalPaper | Buchkapitel
KDD Services at the Goddard Earth Sciences Distributed Active Archive Center
verfasst von : Christopher Lynnes, Robert Mack
Erschienen in: Data Mining for Scientific and Engineering Applications
Verlag: Springer US
Enthalten in: Professional Book Archive
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
NASA’s Goddard Earth Sciences Distributed Active Archive Center (GES DAAC) processes, stores and distributes earth science data from a variety of remote sensing satellites. End users of the data range from instrument scientists to global change and climate researchers to federal agencies and foreign governments. Many of these users apply Knowledge Discovery from Databases (KDD) techniques to large volumes of data (on the order of a terabyte) received from the GES DAAC. However, rapid advances in computer power are enabling increases in data processing that are outpacing tape drive performance and network capacity. As a result, the proportion of data that can be distributed to users continues to decrease. As mitigation, we are migrating more knowledge extraction (e.g., data mining and data reduction) activities into the data center in order to reduce the data volume that needs to be distributed and to offer the users a more useful and manageable product. This migration of activities faces several technical and human-factor challenges. As data reduction and mining algorithms are often quite specific to the user’s research needs, the user’s algorithm must be integrated virtually unchanged into the archive environment. Also, the archive itself is busy with everyday data archive and distribution activities and cannot be dedicated to, or even impacted by, the mining activities. Therefore, we schedule KDD “campaigns”, during which we schedule a wholesale retrieval of specific data products, offering users the opportunity to extract information from the data being retrieved during the campaign.