1. Data sources for pharmacoepidemiological research
1.6. Data Privacy considerations for the use of Big Data
Health data are seen as sensitive data not only by the patients/citizens but also by almost all health authorities. Thus, it is important to ensure compliance with the respective regulation, e.g. the General Data Protection Regulation (GDPR) in Europe (https://gdpr.eu). The GDPR was put into effect on May 25, 2018, to create a data privacy regulation framework across Europe. Though it was drafted and passed by the European Union (EU), it imposes obligations onto organisations anywhere, so long as they target or collect data related to people in the EU. The GDPR will levy harsh fines against those who violate its privacy and security standards, with penalties reaching into the tens of millions of euros. While the GDPR also tries to privilege the use of health data for research, this area is still fragmented across Europe as the countries interpret the respective clauses differently when transferring the GDPR into more specific national regulation.
The overall aim is to prevent the identification of individuals using the collected data. This is getting increasingly challenging, as with more and more -omics data becoming available the use of these data in combination with other data from major studies or registries carries the risk that an individual patient becomes identifiable.
Several approaches are currently followed to protect the patient’s personal data while on the other side allowing best use of the data for future research and improvements of healthcare:
1. De-facto Anonymisation
Pooling the patient data in one database gives the highest flexibility in using
the data. On the other side this approach would allow the identification of
individual patients if all patient data are transferred into the central
database. To address this challenge the IMI project HARMONY has developed a so-called
de-facto anonymization concept. This describes how the original EHR data of a
patient should be stripped from some key data to make it impossible with
reasonable effort to identify an individual while keeping the dataset complete
enough to allow high quality research. This approach has been evaluated by data
privacy experts as GDPR compliant. For details see: HARMONY Anonymization Concept
Reconciles Data Quality, Safety, and Privacy - HARMONY Alliance
(harmony-alliance.eu)
2. Federated Data Networks
Another approach to pool data from different sources are the so-called
Federated Data Networks. In this case the data stay in their original place and
access is controlled by the respective database owners. This allows linking of
the databases and harmonised searches across the different data sources harmonisation
of the databases and the use of a common data model (see above under 8.5). A
search / question will then be sent to all network partners, which will then
run the search in their database and the search results from all partners will
be combined. In this case no patient identifiers will be transferred to the
coordinating entity and data privacy will be ensured. For example, this
approach is used by the IMI project EHDEN which has recently supported OHDSI in
a virtual studyathon[1] to
inform healthcare decision makers in response to the COVID-19 pandemic (COVID19 Study-a-thon.)
3. Platforms empowering each patient to control access to their health data for different purposes and by different users.
There are first examples of not-for-profit organisations running health data platforms which allow participating patients to have total control and self-determination in their digital health environment. They aim at linking the different sources hosting the data of a patient and giving the patient full control of deciding for which purpose and to whom he/she wants to give access to the data. Examples are the platform data4health of the Hasso Plattner Institute in Potsdam, Germany (www.data4life.care/en ) or the platform MIDATA in Zurich, Switzerland (www.midata.coop/en/home), which links patient data on a regional and national level with full control by the participating patients.
[1] A studyathon is a multi-day web event bringing together a comprehensive and diverse range of experts to have an uninterrupted deep-dive on a specific disease or healthcare topic.