19 July 2000

Thematic Session: DATA MINING & VALIDATION

Summary written by Dr. Enrique Canessa (ICTP, Trieste, Italy)

An introductory overview was given by Dr. A. Faye (from Laboratoire de Recherche en Informatique University in Paris, France) on the evolution of database technology. The discussion focused on the motivation and importance of the integration of Data Mining and Data Warehousing, multimedia databases and Web-based database technology introduced in the 90s. Due to new automated data collection tools, fast computers lead to tremendous amounts of data (Terabytes) stored in unique repositories. The online analytical processing using data warehousing (i.e., stored collections of diverse organized data) and data mining (i.e., data analysis) leads to new interesting knowledge such as rules, correlations, patterns, etc.

Dr. Faye discussed issues on performance, query control and data quality from heterogeneous sources by using consistent data representations, codes and formats. Data Mining today is a big business. For example, in the USA $8 billion was invested in 1998. The production of 200-300 MB per day of consolidated data is useful for market and risk analysis and predictions. The knowledge discovered from databases by extracting non-trivial, implicit patterns has a range of possibilities in multiple disciplines including statistics, visualization, information science, and others. Resource planning (i.e., comparing resources available and optimizing spending) is another useful application of data mining technologies.

An open discussion then followed by Dr. Enrique Canessa (from ICTP in Trieste, Italy). He pointed out that the new data mining research field requires Information Technology experts with adequate skills. The problem of data exploitation requires parallel knowledge on the latest technology such as compression (i.e., for multimedia databases and scientific use) and distributed systems over the net, in addition to Web enabling technology. Prof. O. Ajayi (from Nigeria) briefly described the collection of meteorological data by mobile stations within Nigeria. The data is used for soil studies and mappings (i.e., the Agronet and the Healthnet projects in which 50 different universities are participating as case studies).

During the discussions, it was emphasized that the on-site capacity building for Information Technology is crucial to organize--and then identify--the different local needs of Africa in the field of Data Mining. Africa is going to be a producer, consumer, and collector of data. VSAT (“Very Small Aperture Terminal”)-based projects using satellite communications could be established as pilot projects, with the support of CODATA and the commitment from the university authorities in order to achieve self-sustainability for a period of, say, 2 years. Dr. Entsua-Mensah related his experiences in Ghana in this regards.

Data mining and the capacity building needed to implement it is already a challenge to Africa. It will open many new opportunities to academie which will then permeate into the African society, including remote areas. New solar technologies could help in achieving these goals for development.