Machine learning algorithm helps in the search for new drugs
Researchers have designed a machine learning algorithm for drug discovery which has been shown to be twice as efficient as the industry standard, which could accelerate the process of developing new treatments for disease.
The ability to fish out four active molecules from six million is like finding a needle in a haystack.
- Alpha Lee
The researchers, led by the University of Cambridge, used their algorithm to identify four new molecules that activate a protein which is thought to be relevant for symptoms of Alzheimer’s disease and schizophrenia. The results are reported in the journal PNAS.
A key problem in drug discovery is predicting whether a molecule will activate a particular physiological process. It’s possible to build a statistical model by searching for chemical patterns shared among molecules known to activate that process, but the data to build these models is limited because experiments are costly and it is unclear which chemical patterns are statistically significant.
“Machine learning has made significant progress in areas such as computer vision where data is abundant,” said Dr Alpha Lee from Cambridge’s Cavendish Laboratory, and the study’s lead author. “The next frontier is scientific applications such as drug discovery, where the amount of data is relatively limited but we do have physical insights about the problem, and the question becomes how to marry data with fundamental chemistry and physics.”
The algorithm developed by Lee and his colleagues, in collaboration with biopharmaceutical company Pfizer, uses mathematics to separate pharmacologically relevant chemical patterns from irrelevant ones.
Importantly, the algorithm looks at both molecules known to be active and molecules known to be inactive and learns to recognise which parts of the molecules are important for drug action and which parts are not. A mathematical principle known as random matrix theory gives predictions about the statistical properties of a random and noisy dataset, which is then compared against the statistics of chemical features of active/inactive molecules to distil which chemical patterns are truly important for binding as opposed to arising simply by chance.
This methodology allows the researchers to fish out important chemical patterns not only from molecules that are active but also from molecules that are inactive – in other words, failed experiments can now be exploited with this technique.
Reproduced courtesy of the University of Cambridge
The University of Cambridge is acknowledged as one of the world's leading higher education and research institutions. The University was instrumental in the formation of the Cambridge Network and its Vice- Chancellor, Professor Stephen Toope, is also the President of the Cambridge Network.