Privacy-preserving Data Mining of Medical Data using Data Separation Based Techniques

Authors: Gang Kou, Yi Peng, Yong Shi, and Zhengxin Chen

Data mining is concerned with the extraction of useful knowledge from various types of data. Medical data mining has been a popular data mining topic of late. Compared with other data mining areas, medical data mining has some unique characteristics. Since medical files are related to human subjects, privacy concern is taken more seriously than other data mining tasks. This paper applied data separation-based techniques to preserve privacy in classification of medical data. We take two approaches to protect privacy: one approach is to vertically partition the medical data and mine these partitioned data at multiple sites; the other approach is to horizontally split data across multiple sites. In the vertical partition approach, each site uses a portion of the attributes to compute its results and the distributed results are assembled at a central trusted party using majority-vote ensemble method. In the horizontal partition approach, data are distributed among several sites. Each site computes its own data and a central trusted party is responsible to integrate these results. We implement these two approaches using medical datasets from UCI KDD archive and report the experimental results.