19th 
  International CODATA Conference
  Category: Interoperability
Spatial 
  data integration in GIS environment
Elżbieta Bielecka (elzbieta.bielecka@igik.edu.pl)
  Institute of Geodesy and Cartography, 
  Poland
  Nowadays, there is a rapid growth of the availability of digital spatial data 
  and a growing need to use it for all kinds of GIS applications and 
  to support the decision-making process. The development of communication technology 
  makes it possible to collect datasets from a variety of sources and different 
  types of application. It seems to be a lot of databases, datasets, and other 
  geographical information like satellite images, aerial photographs, and maps 
  and it also becomes possible for every user to share some spatial data, and 
  not to collect it from the very beginning. Sharing data requires, first of all 
  wide information about the scope of data, and the place where they are stored, 
  furthermore translation from the original source of data into the user’s system 
  and adaptation to specific GIS applications. The 
  data adaptation process could be called data integration. Data integration is 
  the most valuable function of GIS, and the data that 
  is integrated meets user needs more precisely.
Data integration means combining of data files, datasets and 
  databases originating from different sources into a one common database. Hence 
  unification of codes, defining models of objects and data definitions is of 
  the utmost importance. Integration of spatial data consists also in creating 
  relations among various categories of descriptive and geometric data, as well 
  as joining them.
Data integration is the most valuable function of GIS. Users should realize that the proper 
  data integration usually requires a settlement of two conflicts: semantic and 
  spatial. Resolution of semantic heterogeneity in GIS still requires more study in order to offer more efficient 
  methodology. Spatial data integration requires extensive knowledge in the field 
  of geomatics as well as technical infrastructure. 
  Merging of different databases, datasets and data files is very complex, time 
  consuming and expensive task. It should be solved in terms of geometry and topology. 
  A very important aspect of spatial data integration is assurance data continuity 
  and topology. Data mismatch can stem from many factors including incompatible 
  projections, inconsistent map units, and different plotting scales. Differences 
  in the relative age of data sets may mean differences in data collection methods 
  and accuracy. The improper application of a datum to a dataset is an increasingly 
  common and very important cause of data alignment problems. All these discrepancies 
  and others should be removed during the integration process.
Data integration means also the implementation of vector, 
  raster, TIN and other data models into one seamless 
  geodatabase, and using them for analytical purposes 
  and spatial modeling. Usually data integration is time consuming and expensive. 
  As a result we have data well structured from an analytical perspective. 
Steps 
  towards an integrated geodatabase are as follows:
  -   Data transfer to the internal file format used in GIS software.
-  Examining the data (entities), solving semantic conflicts.
-  Transforming data to the fixed projection and the co-ordinate system; unifying 
    map units.
-  Spatial data merging (within one thematic layer):
  -    
     Generalization to provide similar data details
-    
     Edge matching and map joining 
-    
     Error correction and entering missing data
-    
     Forming topology
-    
     Verification of data consistency and error correction
-    
     Attaching attributes
  - Vertical data matching (among 
    different thematic layers covering the same area).
- Converting data to the appropriate 
    data model.
- Indexing. 
The afore mentioned steps describe 
  the general problem of data integration. Some activities may be neglected according to the data diversification 
  and existing discrepancies. However examination for solving semantic and spatial 
  conflicts is always required.
The goal of this paper was to give an overview of the problems 
  arising from dealing with dispersed data sources and to show some possible solutions. 
  The database created for the purpose of delimitation of the Less-Favoured 
  Areas in Poland will be set as an example. As this 
  database covers the entire country, and over 10 different data sources were 
  used, almost all problems connected with data integration occurred.