Data Sharing: Security and Data Standards

Belinda Seto, National Institute of Biomedical Imaging and Bioengineering

The National Institutes of Health (NIH) implemented policy on data sharing in 2003. The policy reaffirmed the principle that data should be made as widely and freely available as possible while safeguarding the privacy of research participants, and protecting confidential and proprietary data. Restricted availability of unique resources upon which further studies are dependent can impede the advancement of research and the delivery of medical care. Therefore, research data supported with NIH funds should be made readily available for research purposes to qualified individuals within the scientific community.

One approach to sharing data is to establish a network of databases. However, there are a number of barriers to creating successful network, which can include fundamental differences in informatics infrastructure and communication tools used at various research sites. To the extent that commonalities can be implemented and data and tools shared, studies can be initiated more quickly and data sharing can be facilitated. Solutions will entail standards for data collection, processing, and archiving in such a way to allow interoperability among the databases and the ability to query data. Consideration should be given for open architectures for data collection and software to facilitate communication across different databases.

An important premise of sharing data is the protection of the privacy of individuals who participate in research and contribute to data. Privacy protection hinges on maintaining the confidentiality of data and security of databases. There should be clear policies for data security which may include data encryption, coding, and establishing limited access or a tiered approach to varying level of data access.

Sharing of research data is not only a laudable goal that the research community should aspire to but it is also an economic means to make research dollars go further. For clinical trials that are costly or involve unique subject populations and thus are unlikely to be replicated, sharing of the primary dataset is crucial to allow secondary analyses. Together the primary and secondary analyses will likely contribute to a robust knowledge base. Data sharing, however, must be done with care to protect individual privacy.