19th International CODATA Conference
Category: Knowledge Discovery

Data Mining Techniques in Support of Science Data Stewardship

Eric A. Kihn (Eric.A.Kihn@noaa.gov), NOAA/NGDC, USA
Mikhail Zhizhin (jjn@wdcb.ru), RAS/CGDS, Russia


"It's sink or swim as a tidal wave of data approaches. . Are scientists ready for the flood?" -Nature, June 1999. The quote illustrates a problem which is increasingly facing not only the NOAA/NESDIS Data Centers but also the scientific community at large, the number of eyes looking at data remains constant while the amount of data collected tends to follow
Moore's law. Most researchers are accustomed to focusing on a relatively small data set for a long time, using specific techniques to tease out patterns. However, at some fundamental level that paradigm has broken down. The most interesting problems facing science today involve integration and analysis of very large and often distributed environmental archives, such as those managed by the NESDIS environmental data centers. To effectively and completely turn data into knowledge, extracting the information content from multiple and disparate data sources, new techniques are required. Mathematically based methods exist which provide analysis, classification and forecast methods for large data volumes, in particular fuzzy logic based systems hold great promise as knowledge tools. These systems are capable of both mimicking and capturing human expertise thereby increasing the effective utilization of our vast environmental archives.

Tools based on fuzzy logic search and classification schemes can be coupled with the emerging global data access infrastructure for

  • Data quality control
  • Event detection and long-term trend detection
  • Data classification
  • Forecast
  • * Change detection *

In particular the last item above is important, because it's a way to focus the limited personnel and expertise of the scientific community on the most interesting bits of data. Whether the deviations detected represent fundamental changes in the environment being observed or changes in the instrumentation, capturing and analyzing that knowledge is crucial to understanding the whole Earth system. This talk will present results of our work in this area and discuss trends and challenges in the broader community.