1. Data sources for pharmacoepidemiological research

1.6. Data Privacy considerations for the use of Big Data


Health data are seen as sensitive data not only by the patients/citizens but also by almost all health authorities. Thus, it is important to ensure compliance with the respective regulation, e.g. the General Data Protection Regulation (GDPR) in Europe (https://gdpr.eu). The GDPR was put into effect on May 25, 2018, to create a data privacy regulation framework across Europe. Though it was drafted and passed by the European Union (EU), it imposes obligations onto organisations anywhere, so long as they target or collect data related to people in the EU.  The GDPR will levy harsh fines against those who violate its privacy and security standards, with penalties reaching into the tens of millions of euros. While the GDPR also tries to privilege the use of health data for research, this area is still fragmented across Europe as the countries interpret the respective clauses differently when transferring the GDPR into more specific national regulation. 

 The overall aim is to prevent the identification of individuals using the collected data. This is getting increasingly challenging, as with more and more -omics data becoming available the use of these data in combination with other data from major studies or registries carries the risk that an individual patient becomes identifiable.

Several approaches are currently followed to protect the patient’s personal data while on the other side allowing best use of the data for future research and improvements of healthcare:

1. De-facto Anonymisation

Pooling the patient data in one database gives the highest flexibility in using the data. On the other side this approach would allow the identification of individual patients if all patient data are transferred into the central database. To address this challenge the IMI project HARMONY has developed a so-called de-facto anonymization concept. This describes how the original EHR data of a patient should be stripped from some key data to make it impossible with reasonable effort to identify an individual while keeping the dataset complete enough to allow high quality research. This approach has been evaluated by data privacy experts as GDPR compliant. For details see: HARMONY Anonymization Concept Reconciles Data Quality, Safety, and Privacy - HARMONY Alliance (harmony-alliance.eu)

2. Federated Data Networks

Another approach to pool data from different sources are the so-called Federated Data Networks. In this case the data stay in their original place and access is controlled by the respective database owners. This allows linking of the databases and harmonised searches across the different data sources harmonisation of the databases and the use of a common data model (see above under 8.5). A search / question will then be sent to all network partners, which will then run the search in their database and the search results from all partners will be combined. In this case no patient identifiers will be transferred to the coordinating entity and data privacy will be ensured. For example, this approach is used by the IMI project EHDEN which has recently supported OHDSI in a virtual studyathon[1] to inform healthcare decision makers in response to the COVID-19 pandemic (COVID19 Study-a-thon.)

3. Platforms empowering each patient to control access to their health data for different purposes and by different users.

There are first examples of not-for-profit organisations running health data platforms which allow participating patients to have total control and self-determination in their digital health environment. They aim at linking the different sources hosting the data of a patient and giving the patient full control of deciding for which purpose and to whom he/she wants to give access to the data. Examples are the platform data4health of the Hasso Plattner Institute in Potsdam, Germany (www.data4life.care/en ) or the platform MIDATA in Zurich, Switzerland (www.midata.coop/en/home), which links patient data on a regional and national level with full control by the participating patients.



[1] A studyathon is a multi-day web event bringing together a comprehensive and diverse range of experts to have an uninterrupted deep-dive on a specific disease or healthcare topic.