Comprehensive study finds mutations in non-coding genome are infrequent drivers of cancer

Lung cancer cells. Image credit: Anne Weston, Francis Crick Institute

A clearer picture of how DNA changes lead to cancer has emerged, following the most comprehensive evaluation of non-coding driver mutations to date by researchers at the Wellcome Sanger Institute, the Broad Institute of MIT and Harvard, Massachusetts General Hospital (MGH), Aarhus University Hospital and their collaborators.

The study, published earlier this month in Nature as part of a global Pan-Cancer Project*, discovered several new cancer drivers in non-coding genes. The overall conclusion, however, reaffirms that the vast majority of cancer drivers occur in protein-coding regions of the human genome. This knowledge will help to focus efforts on discovering new causes and treatments for cancer.

Also published today in Nature and related journals, are 22 further studies from the Pan-Cancer Project. The project represents an unprecedented international exploration of 2,600 cancer genomes, which significantly improves our fundamental understanding of cancer and zeros-in on mechanisms of cancer development.

Driver mutations are DNA changes that ‘drive’ cells down the path towards cancer. Depending on the type of cancer, anywhere from one to ten driver mutations are required for cancer to develop**.

Most large-scale genomic studies of cancer to date have focused on detecting driver mutations in protein-coding genes. As these coding sequences represent less than two per cent of the human genome, investigations into the remaining 98 per cent of the ‘non-coding’ genome*** have taken place in recent years. In 2013, driver mutations were discovered in the non-coding TERT gene across many cancer types, raising the possibility that there may be numerous non-coding driver mutations in the ‘dark matter’ of the genome.

This study is the most comprehensive evaluation of the extent of non-coding driver mutations in cancer to date, in terms of the number of methods employed, number of samples analysed, and the number of cancer, genome region and mutation types studied. Overall, 2,600 genomes of 38 different tumour types were analysed.

The team identified a number of new non-coding cancer-driving mutations, such as non-coding mutations in the 5’ untranslated region of the TP53 gene, which are associated with this gene being less strongly expressed, or ‘turned off’.

The results concluded, however, that mutations in the regulatory sequences surrounding cancer genes are relatively rare. Excluding mutations in the TERT gene, the number of non-coding driver mutations identified equated to around one (or fewer) in every 100 tumours. In comparison, protein-coding regions often harbour several driver mutations per tumour. Some non-coding drivers identified in previous studies were found to be the result of less accurate methodologies or the result of previously uncharacterised hyper-mutation processes.

“The fact that our results contrast so strongly with other studies is largely down to how rigorous our analysis has been. Despite using numerous methods, the largest dataset currently available and surveying a wide range of non-coding regions of the genome, we found very few genuine driver mutations outside protein-coding genes,” said Dr Federico Abascal, of the Wellcome Sanger Institute.

Dr Gad Getz, of the Broad Institute and MGH, said: “The non-coding driver mutations we identified, such as in the TP53 gene, add to the short list of non-coding driver mutations that already includes TERT, FOXA1 and a few other genes. By rigorously analysing the mechanisms that contribute to increased mutation rates, we were not only able to find new drivers but also raise doubts about previously reported ones that are affected by local mutational processes and artefacts uncovered in our study. We hope that our analysis will serve as the basis for future cancer genome studies.”

This unexpected result has important implications for the treatment of cancer. While technological advancements and larger cohorts will undoubtedly lead to the discovery of more non-coding driver mutations, it is unlikely that the ratio of coding to non-coding drivers will change significantly. This implies that efforts to develop new cancer treatments should primarily focus on protein-coding genes.

“Overall, our study suggests that while increasingly large datasets will continue to yield new coding and non-coding driver mutations, the vast majority of cancer drivers occur in the two per cent of the genome that codes for proteins. To us, this was an unexpected and important result. For cancer patients, this means that the vast majority of clinically-relevant mutations in a cancer are likely to be found in protein-coding sequences, which will simplify efforts for the clinical use of genome sequencing in cancer,” added Dr Inigo Martincorena, of the Wellcome Sanger Institute


*The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG), known as the Pan-Cancer Project, is the largest and most comprehensive study of whole cancer genomes yet. The collaboration involving more than 1,300 scientists and clinicians from 37 countries, analysed more than 2,600 genomes of 38 different tumour types, and has created a huge resource of primary cancer genomes, available to researchers worldwide to advance cancer research.

Main findings from the Pan-Cancer project:

  • The cancer genome is finite and knowable, but enormously complicated. By combining sequencing of the whole cancer genome with a suite of analysis tools, we can characterise every genetic change found in a cancer, all the processes that have generated those mutations, and even the order of key events during a cancer’s life history.
  • We are close to cataloguing all of the biological pathways involved in cancer and having a fuller picture of their actions in the genome. At least one causal mutation was found in virtually all of the cancers analysed and the processes that generate mutations were found to be hugely diverse -- from changes in single DNA letters to the reorganization of whole chromosomes. Multiple novel regions of the genome controlling how genes switch on and off were identified as targets of cancer-causing mutations.
  • Through a new method of “carbon dating”, the Pan-Cancer Project discovered that we can identify mutations which occurred years, sometimes even decades, before the tumour appears. This opens, theoretically, a window of opportunity for early cancer detection.
  • Tumour types can be identified accurately according to the patterns of genetic changes seen throughout the genome, potentially aiding the diagnosis of a patient’s cancer where conventional clinical tests could not identify its type. Knowledge of the exact tumour type could also help tailor treatments.

For access to all the open tier data in the Pan-Cancer project, go to

**For more information on driver mutations in different types of cancer, see the Sanger Institute website

***More information on protein-coding and non-coding genes is available at:


Esther Rheinbay, Morten Muhlig Nielsen and Federico Abascal et al. (2019). Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature. DOI:

The Nature collection landing page with all PanCancer publications will go live when the papers publish:


Image: Lung cancer cells

Image credit: Anne Weston, Francis Crick Institute


To read more information, click here.

The Wellcome Sanger Institute is one of the premier centres of genomic discovery and understanding in the world. It leads ambitious collaborations across the globe to provide the foundations for further research and transformative healthcare innovations.

Wellcome Sanger Institute