Rephetio: Repurposing drugs on a hetnet [rephetio]

Announcing PharmacotherapyDB: the Open Catalog of Drug Therapies for Disease

Introducing PharmacotherapyDB

I'm excited to announce the initial release of our catalog of drug therapies for disease. The catalog contains physician curated medical indications. It's available on figshare [1] and GitHub [2] and licensed to be maximally reusable.

This initial release contains 97 diseases and 601 drugs. Between these drug–disease pairs, there are 755 disease-modifying therapies, 390 symptomatic therapies, and 243 non-indications. To enable integrative analyses, drugs and diseases are coded using DrugBank and Disease Ontology identifiers.

The catalog adheres to pathophysiological principals first. Therefore, the catalog includes indications with a poor risk–benefit ratio that are rarely used in the modern clinic. Contributions are welcome as we hope to expand and refine the catalog over time.

History & Methods

One of our priorities from the beginning of this project was to construct a catalog of efficacious pharmacotherapies. Since our approach learns how to repurpose drugs based on the indications we feed it, a high quality indication catalog was a crucial.

Compilation and data integration

We began by looking for existing indication resources. In a discussion which generated 23 comments — the most of any Thinklab discussion to date — we received helpful suggestions from the community. Based on these suggestions and our research, we proceeded by integrating four resources:

  • MEDI-HPS [3] — indications from RxNorm, SIDER 2, MedlinePlus, and Wikipedia (discussed).
  • LabeledIn — indications extracted from drug labels by experts [4] and crowdsourced non-experts [5] (discussed).
  • ehrlink [6] — indications from electronic health records where physicians linked medications to problems (discussed).
  • PREDICT [7] — indications from UMLS relationships,, and drug labels (discussed).

We mapped these resources onto our slim sets of 137 diseases and 1,552 small molecule compounds. Taking the union of the four resources, we extracted 1,388 high-confidence indications.

Curation and categorization

Next, we decided physician curation was needed to separate disease-modifying from symptomatic indications. We recruited two physician curators (@chrissyhessler & Ari J. Green) to perform a pilot on 50 random indications. Then together, we defined disease modifying as "a drug that therapeutically changes the underlying or downstream biology of the disease" and symptomatic as "a drug that treats a significant symptom of the disease."

The two curators then each reviewed all 1,388 indications and classified them as disease modifying (DM), symptomatic (SYM), or a non-indication (NOT). The initial two curators disagreed 444 times. We recruited a third curator (@pouyakhankhanian) who had access to the prior curations. The third curator developed a detailed methodology that helped us reach consensus for the time being.

Future plans

We're receptive to feedback on how to improve PharmacotherapyDB. For future releases, we hope to curate the unpropagated indications, include additional sources, and expand our disease and drug vocabularies.

Category breakdown by resource

Using the consensus curation, we have gone back and calculated the composition of indication category by resource (notebook).

MEDI-HPS532 (67.1%)168 (21.2%)93 (11.7%)793 (100%)
PREDICT346 (59.7%)158 (27.2%)76 (13.1%)580 (100%)
EHRLink205 (44.3%)163 (35.2%)95 (20.5%)463 (100%)
LabeledIn183 (66.1%)72 (26.0%)22 (7.9%)277 (100%)

The table indicates that of the 793 indications we extracted from MEDI-HPS, 532 (67.1%) were disease modifying. In short, we found that MEDI-HPS and LabeledIn contained the highest percentage of disease-modifying indications. EHRLink, which is based on electronic health records, contained the highest percentage of symptomatic (35.2%) and non (20.5%) indications.

Category breakdown by number of resources

Next, we looked at the category composition based on the number of resources reporting each indication.

# of ResourcesDMSYMNOTTotal
1433 (47.4%)271 (29.6%)210 (23.0%)914 (100%)
2190 (66.2%)74 (25.8%)23 (8.0%)287 (100%)
375 (61.0%)38 (30.9%)10 (8.1%)123 (100%)
457 (89.1%)7 (10.9%)0 (0.0%)64 (100%)

The more resources that reported an indication the more likely it was to be disease modifying: indications in only a single resource were disease modifying 47.4% of the time whereas indications in all four resources were disease modifying 89.1% of the time.

Join to Reply
Status: Completed
Referenced by
Cite this as
Daniel Himmelstein (2016) Announcing PharmacotherapyDB: the Open Catalog of Drug Therapies for Disease. Thinklab. doi:10.15363/thinklab.d182

Creative Commons License