Safely linking big datasets to help find medical answers

Professor Serena Nik-Zainal is leading a Cambridge trial to find ways to analyse data across more than one secure research hub to help speed up advances in science and knowledge.

The aim of the study is to link large research data hubs without any of the original data being moved between sites. This form of data sharing, where trusted research environments are able to securely ‘talk’ to each other is known as a federation or federated learning.

Professor Nik-Zainal is an Honorary Consultant in Clinical Genetics at Cambridge University Hospitals (CUH) and the trial has received funding of £200K from UK Research and Innovation as part of the DARE UK (Data and Analytics Research Environments UK) programme.

Trusted research environments are secure spaces for researchers to access and analyse sensitive data. They also help prevent unauthorised access and re-identification of individuals from anonymised data. However, the ability for researchers to analyse data between two such environments is not currently possible which can delay new discoveries.

This project will bridge the gap between Genomics England and the NIHR Cambridge Biomedical Research Centre data. Both contain rich, secure, governed sources of fully consented clinical genomic data from patients.

After looking within the data from both research environments to find individuals with certain characteristics, a joint analysis will be run within both environments, and the results combined in a separate secure cloud environment.

This means that no original data will move, only the results.

Professor Serena Nik-Zainal said: "There's large amounts of data available for doctors and scientists to do their work but there are concerns about how we share data across multiple different sites."

"This project aims to help reduce the risk to data privacy so data doesn't have to move and stays exactly where it was created and scientists can perform their research situated wherever they are.

"It also aims to highlight the advantages of being able to work with data from more than one research environment and set new standards in the field of data sharing.

"This will hopefully unlock unprecedented possibilities for collaborations across the UK and lead to new discoveries with long term public benefit.”

The work has been approved as a Sprint Exemplar Project as part of Phase 1 of the DARE UK programme, which is delivered in partnership Health Data Research UK (HDR UK) and Administrative Data Research UK (ADR UK)

Professor Serena Nik-Zainal explains more in the video below:


Looking for something specific?