Welcome to project Rephetio. We aim to predict the probability that a given approved small molecule will treat a given complex disease. First, we construct a hetnet for drug repurposing. The network integrates 29 public resources and contains 47 thousand nodes (of 11 types) and 2.3 million edges (of 24 types). Our edge prediction algorithm learns the informative types of paths for distinguishing disease-modifying indications. The versatility of hetnets, paired with the powerful Neo4j graph database, allows us to computationally elucidate mechanisms of efficacy and predict new uses for existing drugs at a highly disruptive scale.

Recent discussion View all 80 discussions

3 days ago
Describing Hetionet v1.0 through visualization and statistics
4 days ago
Prediction in epilepsy
5 days ago
Predictions of whether a compound treats a disease
29 days ago
Unifying disease vocabularies
51 days ago
Exploring the power of Hetionet: a Cypher query depot

Project components


Hetnets are networks with multiple node and edge types [1]. Hetnets excel at data integration and are a versatile and intuitive data structure. While specific incarnations of hetnets have long existed, such as bipartite or property graphs, general algorithms that accommodate the multiple types are a recent development [2]. Our research tries to predict edges on hetnets, using an algorithm originally developed for social network analysis [3]. Previously, we predicted disease–gene associations [4]. Here, we predict repurposing drugs.

Network construction

We've constructed a state of the art hetnet for drug repurposing called Hetionet. The network contains 47,031 nodes of 11 types and 2,250,197 edges of 24 types. The schema is shown in the metagraph below:

Nodes are identified using standardized terminologies to facilitate integration and prevent duplication. Edges are integrated from high-throughput databases, which were chosen for their quality, reusability, throughput, and relevance to pharmacology. We thank the community for helping us identify the most appropriate resources.

In general, we've dedicated a single GitHub repository to each resource. Versioning is accomplished using commit specific URLs. We expect several of these repositories to be helpful outside of this project. Examples include our analysis of LINCS L1000 [5].

Indications catalog

Our approach requires a catalog of indications (compound–diseases treatments) for training and testing. Unfortunately, there was no open and structured catalog of indications, so we created our own by combining four resources. We are now having physicians curate the automated compilation to separate disease-modifying indications from symptomatic and non indications.


We use neo4j to store and operate on our hetnet [6]. Neo4j is a powerful graph database. In addition our hetio python package provides an additional layer of functionality. Our project has led to the first public examples of duplicate node exclusion [7] and network permutation [8] in the cypher query language.

Open science

We're committed to making this project as useful to the community as possible. Therefore, the project is entirely open notebook. We strive to share all outputs upon their creation, under permissive open licenses such as CC0 or CC-BY. Furthermore, we have devoted considerable effort to handling data copyright complications [9]. Much of this effort has been to save our downstream users the hassle. In other words, were we not compiling an open resource our legal burden would have been much diminished.

Mechanisms of efficacy

Stay tuned for our investigation into which data sources are informative of drug efficacy.

Drug repurposing predictions

Stay tuned for our predictions of the probability that a given small molecule treats a given complex disease.


Renaming ‘heterogeneous networks’ to a more concise and catchy term
Daniel Himmelstein, Casey Greene, Sergio Baranzini (2015) Thinklab. doi:10.15363/thinklab.d104
Mining Heterogeneous Information Networks: Principles and Methodologies
Yizhou Sun, Jiawei Han (2012) Synthesis Lectures on Data Mining and Knowledge Discovery. doi:10.2200/S00433ED1V01Y201207DMK005
Co-author Relationship Prediction in Heterogeneous Bibliographic Networks
Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han (2011) 2011 International Conference on Advances in Social Networks Analysis and Mining. doi:10.1109/ASONAM.2011.112
Path exclusion conditions
Daniel Himmelstein (2015) Thinklab. doi:10.15363/thinklab.d134
Integrating resources with disparate licensing into an open network
Daniel Himmelstein, Lars Juhl Jensen, MacKenzie Smith, Katie Fortney, Caty Chung (2015) Thinklab. doi:10.15363/thinklab.d107

Contributor Impact Leaderboard

Lars Juhl Jensen + 319
Frederic Bastian + 200
Alexander Pico + 138
Casey Greene + 133
Craig Knox + 97
View all 44 contributors
Machine learningHeterogeneous NetworksSmall MoleculesBioinformaticsSystems PharmacologyDrug RepurposingComplex DiseaseMultipartite GraphsHNEPHetnets
Cite project as
Daniel Himmelstein, Antoine Lizee, Pouya Khankhanian, Leo Brueggeman, Sabrina Chen, Dexter Hadley, Chrissy Hessler, Ari Green, Sergio Baranzini (2015) Rephetio: Repurposing drugs on a hetnet [project]. Thinklab. doi:10.15363/thinklab.4