6. Data handling

Safe storage of biobank data is important. With the right computing skills, it has been possible to identify individuals’ genomic data, when databases have been made freely available online.(1) In 2008 the American National Institutes of Health (NIH)had to close its open access database for this reason. (2)

Data should be properly ‘encoded’, safely stored and available only to authorised researchers. There is legislation in the EU and its member states to ensure researchers have clear standards to work from.

To encode data, they can either be:

  • Anonymised (no identifying information is linked with the data)
  • Pseudonymised (data are given ‘false’ names, so that genetic and clinical information can be linked, but the identity of the person that the data comes from is not stored).

The use of pseudonyms works if there are enough people in the database that share the same clinical information. But if only one or two people have a particular disease, it might be possible to identify them and their genetic data.

The design of IT systems that control and store data is very important. Researchers have sophisticated systems for clinical and genetic data in pharmacogenomics research. For example, GENOmatch is a ‘data protection concept’ which was developed in a joint project between the pharmaceutical industry, software experts and academics. It protects data but also allows researchers to disclose results to individual people.

Another approach, which can be controversial, is taken by other genomic research studies, such as the Personal Genome Project UK. This project will make its data freely available online, as the NIH previously did. Names and addresses will not appear, but individuals taking part are warned that they could easily be identified and that their privacy cannot be guaranteed.

(1)  Dondorp and de Wert 2013. The “1000-dollar genome”: and ethical exploration. European Journal of Human Genetics 21:S6-S26.

(2) Ferguson W. 2013. A hacked database prompts debate about genetic privacy. Scientific American. http://www.scientificamerican.com/article.cfm?id=a-hacked-database-prompts