Deep learning applied to drug discovery and repurposing

May 27, 2016

Deep neural networks for drug discovery (credit: Insilico Medicine, Inc.)

Scientists from Insilico Medicine, Inc. have trained deep neural networks (DNNs) to predict the potential therapeutic uses of 678 drugs, using gene-expression data obtained from high-throughput experiments on human cell lines from Broad Institute’s LINCS databases and NIH MeSH databases.

The supervised deep-learning drug-discovery engine used the properties of small molecules, transcriptional data, and literature to predict efficacy, toxicity, tissue-specificity, and heterogeneity of response.

“We used LINCS data from Broad Institute to determine the effects on cell lines before and after incubation with compounds, co-author and research scientist Polina Mamoshina explained to KurzweilIAI.

“We used gene expression data of total mRNA from cell lines extracted and measured before incubation with compound X and after incubation with compound X to identify the response on a molecular level. The goal is to understand how gene expression (the transcriptome) will change after drug uptake. It is a differential value, so we need a reference (molecular state before incubation) to compare.”

The research is described in a paper in the upcoming issue of the journal Molecular Pharmaceutics.

Helping pharmas accelerate R&D

Alex Zhavoronkov, PhD, Insilico Medicine CEO, who coordinated the study, said the initial goal of their research was to help pharmaceutical companies significantly accelerate their R&D and increase the number of approved drugs. “In the process we came up with more than 800 strong hypotheses in oncology, cardiovascular, metabolic, and CNS spaces and started basic validation,” he said.

The team measured the “differential signaling pathway activation score for a large number of pathways to reduce the dimensionality of the data while retaining biological relevance.” They then used those scores to train the deep neural networks.*

“This study is a proof of concept that DNNs can be used to annotate drugs using transcriptional response signatures, but we took this concept to the next level,” said Alex Aliper, president of research, Insilico Medicine, Inc., lead author of the study.

Via Pharma.AI, a newly formed subsidiary of Insilico Medicine, “we developed a pipeline for in silico drug discovery — which has the potential to substantially accelerate the preclinical stage for almost any therapeutic — and came up with a broad list of predictions, with multiple in silico validation steps that, if validated in vitro and in vivo, can almost double the number of drugs in clinical practice.”

Despite the commercial orientation of the companies, the authors agreed not to file for intellectual property on these methods and to publish the proof of concept.

Deep-learning age biomarkers

According to Mamoshina, earlier this month, Insilico Medicine scientists published the first deep-learned biomarker of human age — aiming to predict the health status of the patient — in a paper titled “Deep biomarkers of human aging: Application of deep neural networks to biomarker development” by Putin et al, in Aging; and an overview of recent advances in deep learning in a paper titled “Applications of Deep Learning in Biomedicine” by Mamoshina et al., also in Molecular Pharmaceutics.

Insilico Medicine is located in the Emerging Technology Centers at Johns Hopkins University in Baltimore, Maryland, in collaboration with Datalytic Solutions and Mind Research Network.

* In this study, scientists used the perturbation samples of 678 drugs across A549, MCF-7 and PC-3 cell lines from the Library of Integrated Network-Based Cellular Signatures (LINCS) project developed by the National Institutes of Health (NIH) and linked those to 12 therapeutic use categories derived from MeSH (Medical Subject Headings) developed and maintained by the National Library of Medicine (NLM) of the NIH.

To train the DNN, scientists utilized both gene level transcriptomic data and transcriptomic data processed using a pathway activation scoring algorithm, for a pooled dataset of samples perturbed with different concentrations of the drug for 6 and 24 hours. Cross-validation experiments showed that DNNs achieve 54.6% accuracy in correctly predicting one out of 12 therapeutic classes for each drug.

One peculiar finding of this experiment was that a large number of drugs misclassified by the DNNs had dual use, suggesting possible application of DNN confusion matrices in drug repurposing.


FutureTechnologies Media Group | Video presentation Insilico medicine


Abstract of Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data

Deep learning is rapidly advancing many areas of science and technology with multiple success stories in image, text, voice and video recognition, robotics and autonomous driving. In this paper we demonstrate how deep neural networks (DNN) trained on large transcriptional response data sets can classify various drugs to therapeutic categories solely based on their transcriptional profiles. We used the perturbation samples of 678 drugs across A549, MCF-7 and PC-3 cell lines from the LINCS project and linked those to 12 therapeutic use categories derived from MeSH. To train the DNN, we utilized both gene level transcriptomic data and transcriptomic data processed using a pathway activation scoring algorithm, for a pooled dataset of samples perturbed with different concentrations of the drug for 6 and 24 hours. When applied to normalized gene expression data for “landmark genes,” DNN showed cross-validation mean F1 scores of 0.397, 0.285 and 0.234 on 3-, 5- and 12-category classification problems, respectively. At the pathway level DNN performed best with cross-validation mean F1 scores of 0.701, 0.596 and 0.546 on the same tasks. In both gene and pathway level classification, DNN convincingly outperformed support vector machine (SVM) model on every multiclass classification problem. For the first time we demonstrate a deep learning neural net trained on transcriptomic data to recognize pharmacological properties of multiple drugs across different biological systems and conditions. We also propose using deep neural net confusion matrices for drug repositioning. This work is a proof of principle for applying deep learning to drug discovery and development.