Welcome to project Rephetio. We aim to predict the probability that a given approved small molecule will treat a given complex disease. First, we construct a hetnet for drug repurposing. The network integrates 29 public resources and contains 47 thousand nodes (of 11 types) and 2.3 million edges (of 24 types). Our edge prediction algorithm learns the informative types of paths for distinguishing disease-modifying indications. The versatility of hetnets, paired with the powerful Neo4j graph database, allows us to computationally elucidate mechanisms of efficacy and predict new uses for existing drugs at a highly disruptive scale.
Recent discussion View all 80 discussions
Hetnets are networks with multiple node and edge types . Hetnets excel at data integration and are a versatile and intuitive data structure. While specific incarnations of hetnets have long existed, such as bipartite or property graphs, general algorithms that accommodate the multiple types are a recent development . Our research tries to predict edges on hetnets, using an algorithm originally developed for social network analysis . Previously, we predicted disease–gene associations . Here, we predict repurposing drugs.
We've constructed a state of the art hetnet for drug repurposing called Hetionet. The network contains 47,031 nodes of 11 types and 2,250,197 edges of 24 types. The schema is shown in the metagraph below:
Nodes are identified using standardized terminologies to facilitate integration and prevent duplication. Edges are integrated from high-throughput databases, which were chosen for their quality, reusability, throughput, and relevance to pharmacology. We thank the community for helping us identify the most appropriate resources.
In general, we've dedicated a single GitHub repository to each resource. Versioning is accomplished using commit specific URLs. We expect several of these repositories to be helpful outside of this project. Examples include our analysis of LINCS L1000 .
Our approach requires a catalog of indications (compound–diseases treatments) for training and testing. Unfortunately, there was no open and structured catalog of indications, so we created our own by combining four resources. We are now having physicians curate the automated compilation to separate disease-modifying indications from symptomatic and non indications.
We use neo4j to store and operate on our hetnet . Neo4j is a powerful graph database. In addition our hetio python package provides an additional layer of functionality. Our project has led to the first public examples of duplicate node exclusion  and network permutation  in the cypher query language.
We're committed to making this project as useful to the community as possible. Therefore, the project is entirely open notebook. We strive to share all outputs upon their creation, under permissive open licenses such as CC0 or CC-BY. Furthermore, we have devoted considerable effort to handling data copyright complications . Much of this effort has been to save our downstream users the hassle. In other words, were we not compiling an open resource our legal burden would have been much diminished.
Mechanisms of efficacy
Stay tuned for our investigation into which data sources are informative of drug efficacy.
Drug repurposing predictions
Stay tuned for our predictions of the probability that a given small molecule treats a given complex disease.
Contributor Impact Leaderboard
TopicsMachine learningHeterogeneous NetworksSmall MoleculesBioinformaticsSystems PharmacologyDrug RepurposingComplex DiseaseMultipartite GraphsHNEPHetnets
Cite project asDaniel Himmelstein, Antoine Lizee, Pouya Khankhanian, Leo Brueggeman, Sabrina Chen, Dexter Hadley, Chrissy Hessler, Ari Green, Sergio Baranzini (2015) Rephetio: Repurposing drugs on a hetnet [project]. Thinklab. doi:10.15363/thinklab.4