Workshop on Mathematical and Telematics Techniques for Large Data Sets Characterization and Compression
15-16 July 1999

Alexei Gvishiani*, Jacques-Octave Dubois**, Alexander Beriozko*

* Centre of Geophysical Data Studies and Telematics Applications (CGDS) RAS
** Institut de Physique du Globe de Paris (IPGP)

15-16 July 1999 CODATA Task Group (TG) "Comparative Mathematical Methodologies for Data Handling and Knowledge Extraction" held the workshop "Mathematical and Telematics Techniques for Large Data Sets Characterization and Compression". The workshop was organized in Paris, France, in collaboration with Paris Institute of Physics of the Earth (Institut de Physique du Globe de Paris, IPGP). The sessions of the workshop were going on at the premises of IPGP - one of the biggest Earth Sciences institutes in Europe.

The principal objectives of the workshop were:

to discuss, evaluate and select a number of concrete comparative mathematical, artificial intelligence and telematics technologies to be applied by the TG in 1999-2000 to "sample" databases;
to establish "sample" on-line accessible databases in Earth and Planetary Sciences, and other disciplines to be studied by the TG using the above techniques;
to elaborate a plan of the TG work organization as structured and goal oriented virtual laboratory based in different parts of the world;
to discuss and prepare concrete plans of the TG projects and results demonstration at CODATA Conference in Italy in October 2000.

Thirty seven TG members, experts and affiliated participants from different parts of the world took active part in the workshop representing wide circles of specialists in pure and applied mathematics, artificial intelligence, computer networking technologies, Earth Sciences, civil engineering, botanics, space physics, chemistry, archeology, etc. Scientists and data specialists from France, Germany, Russia, Ukraine, USA and Uzbekistan participated in the workshop.

The workshop was opened by short presentation of the TG Chairman Prof. Alexei Gvishiani (CGDS, Russia), who formulated the workshop goals and objectives and presented necessary background information. He presented the working plan of the workshop and figured out its expected output. Ms. Kathleen Cass, new CODATA Executive Director, and Prof. Jacques-Emile Dubois, CODATA past President and Executive Council member coordinating the TG activities greeted the workshop on behalf of CODATA and introduced its objectives and structure to the workshop participants. Prof. Claude Jaupart, IPGP Director, gave his welcoming remarks to the workshop emphasizing that Earth Sciences in general and geophysics in particular provide one of the most important test areas for mathematical and telematics techniques of data compression. He welcomed recent CODATA moves in the direction of geophysics. Prof. Jacques-Octave Dubois (IPGP, France), TG Vice-Chairman, presented the workshop programme in detailes.

UNESCO was represented at the workshop by the coordinator of "UNESCO Virtual Laboratories Programme" Dr. John Rose. The workshop also attracted a number of IPGP students.

The workshop has been split up into three working sessions and computer demonstrations. The sessions focused on: 1) virtual laboratories for data handling and representation; 2) mathematical, logical, informatics and telematics techniques for large geophysical and geodynamical data sets representation; 3) mathematical and artificial intelligence techniques for data characterization and compression. The complete workshop programme and the list of participants will be available shortly on the Web-site of the TG. The draft version of the Web-site has been discussed and positively evaluated by the TG.

In Session 1, devoted to virtual laboratories, a kick-off paper on teleworking for Integrated World Data Center System (WDC) was presented by the authors (Herbert Kroehl, Eric Kihn, NOAA, USA; Alexei Gvishiani, Mikhail Zhizhin, CGDS, Russia). The paper described WDCs history, needs, requirements, system design, applications and benefits, and introduced new powerful tools for discovery, access, browse and delivery of data on Internet.

The following presentation focused the first paper onto American-Russian teleworking project SPIDR-II (Mikhail Zhizhin, Alexei Gvishiani, Alexei Burtsev, Tatiana Ilyina, CGDS, Russia; Herbert Kroehl, Eric Kihn, NOAA, USA) (, developed in the field of solar-terrestrial and satellite data representation, characterization and extraction. SPIDR-II is based on distributed network of synchronous databases and application servers and Java middle-ware for data mining, visualization and delivery. The system major features are: open source; portability; scalability; unified access to various "pluggable" relational databases; Web-based data-shopping interface; automatic database and software synchronization; dynamic time-series and imagery visualization; fuzzy search engines and AI-based data mining and forecast.

These two key presentations of Session 1 demonstrated to CODATA audience the system and actual techniques capable of ingesting and processing any data at any time, making automated inventories, accessing and retrieving data values, disseminating data in any format, conducting data quality assessment. The system was developed by virtual laboratory techniques, which is based on new methods of team work. Such methods allow to develop efficiently concrete goal oriented projects by distributed teams, as well as inter-team communications that overcome language barriers and distributed project management and synchronization. Such new approach to large data sets characterization and compression can be extremely useful in the future for the TG and CODATA in general.

Following this line of distributed team working Prof. Jean Bonnin (Institute of Physics of the Earth, Strasbourg, France) presented the virtual laboratory methodology application in "Europa Major Risks Project" of Open Partial Agreement on Major Disasters of Council of Europe. In his second presentation J. Bonnin focused this topic onto earthquake parametrization as an example of knowledge extraction.

Dr. John Rose (UNESCO) reported on the summary, findings and recommendations of International Expert Meeting on Virtual Laboratories, which was held 10-12 May 1999 in Ames, Iowa, USA, by International Institute of Theoretical and Applied Physics (IITAP) and UNESCO. Several strategies and technical recommendations, including collaborative browsing approach, is of real interest for the TG and CODATA in general.

Prof. Anne Kiremidjian (Stanford University, USA) made a deep presentation on methods and issues of data acquisition, integration, processing and referencing for seismic hazard and risk analysis, basing on state of the art Geographical Information Systems (GIS) and brand new Internet technologies. This presentation, being focused at seismic hazard problems in California, gave, at the same time, excellent example of large data sets compression and knowledge extraction. It was recommended that this GIS-based approach developed by A. Kiremidjian will be actively used by the TG as one of selected technologies for virtual laboratories developments.

Presenting state-of-the-art telematics activities of CODATA, Dr. Jean-Jacques Royer (CODATA, France) introduced "CODATA on the Web" (, which contains links to several comprehensive data bases, data extraction, processing and visualization tools and services. It was recommended that CODATA main Web-site will be mirrored in Moscow at CGDS 100 MBps speed Internet connection node.

Following the line of geomagnetic databases as a focus for the workshop, Dr. Mioara Alexandrescu-Mandea (IPGP, France) presented their joint study with Prof., Acad. Jean-Louis LeMouel (IPGP, France) devoted to INTERMAGNET project data handling and compression.

In Session 2, devoted to mathematical, logical, informatics and telematics techniques for large geophysical and geodynamical data sets representation, Prof., Acad. Vladimir Strakhov (Institute of Physics of the Earth, Russia) described his principally new linear approximation approach for knowledge extraction from Earth gravity and magnetic fields observation data. This brand new mathematical technique allows to reduce significantly both dimensionality of the system of equations and approximation error of the result. The method was recommended as one of the technologies to be applied by the TG to gravity "sample" databases. Following the move of V. Strakhov, it was decided to organize the workshop with the tentative title "Homogeneous Analytical Approximations of Magnetic and Gravity Anomalies for the Whole Europe and Natural Disaster Studies" in 2000, and to apply to Open Partial Agreement on Major Disasters of Council of Europe for financial support for this workshop.

An important contribution to the workshop has been provided by GIS community. As a result of French-Russian team work (Alexander Beriozko, CGDS, Russia; Michael Diament, Christine Deplus, Helene Hebert, IPGP, France), pattern recognition system of automated detection of linear structures in large geodynamic data sets was presented. The system is based on ATTILA GIS technology and was successfully applied to geodynamical data studies using bathymetry data sets. Prof. Hans-Jurgen Goetze (Free University of Berlin, Germany) presented new modern methods of 3D potential field modeling in an interoperable GIS (IOGIS) environment providing new opportunities in data processing.

Session 3, devoted to mathematical and artificial intelligence techniques for data characterization and compression, was opened by presentations of Prof. Jacques-Octave Dubois (IPGP, France) on dynamical systems for data handling and processing. Prof., Acad. Mikhail Zgurovsky (National Technical University of Ukraine) presented new mathematical methods of environmental modeling. This technique, being in mathematical sense dual to pattern recognition and geodynamic systems approach introduced by A. Gvishiani and J-O. Dubois, enables to apply comparative mathematical methodologies to large data sets in environmental monitoring. Important application of the technique is connected with Chernobyl disaster data analysis.

The presentation of Eric Kihn, Herbert Kroehl, NOAA, USA, Mikhail Zhizhin, Alexander Troussov, CGDS, Russia, was devoted to Environmental Scenario Generator, a WWW system based on fuzzy logic approach applied to space weather data, visualization of solar images and space environmental data representation. New methods of fuzzy clustering for knowledge extraction and interpretation were presented by French-Russian team (Sergei Agayan, Elena Graeva, Alexei Gvishiani, CGDS, Russia; Michael Diament, Jacques-Octave Dubois, Armando Galdeano, Pascal Sailhac, IPGP, France).

Computer demonstrations allowed the workshop participants to proceed with concrete joint work. The demonsrations included SPIDR (presented by E. Kihn), CODATA on the Web (presented by J-J. Royer), Basic Support for Cooperative Work (BSCW) System (presented by Horst Kremers, Germany)), automated system of linear structures recognition in geodynamical data studies (ATTILA) (presented by A. Beriozko), on-line world-wide Strong Motions Earthquake Database (SMDB) (presented by M. Zhizhin). Special interest gained the demonstration of the prototype of CODATA TG Web-site (presented by A. Gvishiani and A. Beriozko). The concrete work on the Web-site development that the TG implemented in 1999 was positively evaluated. It was strongly recommended to install the TG Web-site, containing necessary links to main CODATA Web-site, on Web-servers in Russia, Western Europe and USA.

The workshop was finalized by the discussion "The Potential Synergy of the Systems Reported and Methods Offered". The moderator H.Kremers opened the discussion with presentation of his paper "Formalizing semiotics of virtual laboratories". The discussion was focused at future workplans of the TG. A number of databases has been selected by the workshop as prototypes for applications of comparative mathematical methodologies. Corresponding projects will be implemented by the TG in 1999-2000. Among those projects are on-line geomagnetic, space physics and satellite imagery databases integrated by SPIDR; aeromagnetic, geomagnetic and geodynamic database developed by Institute of Physics of the Earth in Paris in cooperation with CGDS in Moscow; earthquake risk and hazard GIS-based databases developed by Stanford University, Institute of Physics of the Earth in Strasbourg and CGDS in Moscow. The detailed results of the workshop will be soon available on the TG Web-site.

The workshop was followed by the TG business meeting devoted to the TG recommendations for CODATA publications and some other matters. Three books were recommended to be written and submitted by the TG members for review and further publication in CODATA series "Data and Knowledge in Changing World" in 2000-2001. One of them is "Dynamic Systems and Dynamic Pattern Recognition in Geophysical Applications", Vol. 2, by A. Gvishiani and J-O. Dubois. The second book to be submitted by M. Zgurovsky, will be devoted to mathematical methods of environmental monitoring with direct applications to consequences of Chernobyl disaster. The third monography, proposed by V. Strakhov, is planned as the summarizing publication of international project of global geomagnetic and gravity fields approximations.

Following an introduction of J-E. Dubois, the TG business meeting discussed ideas and plans concerning next CODATA International Conference that will be held in Italy in autumn of 2000. A number of sessions were proposed by the TG for this conference: dynamical systems and non-linear effects in information handling and knowledge extraction; data compression techniques and computer telecommunication for satellite and weather databases; virtual laboratories in data management and knowledge extraction; informatics and telematics in data handling for life sciences; data and knowledge acquisition and exchange in major natural and technogenic disasters; etc.

