Project:
Rephetio: Repurposing drugs on a hetnet [rephetio]

Data nomenclature: naming and abbreviating our network types


We've created a preliminary network with 10 types of nodes (metanodes) and 27 types of edges (metaedges). Now an important detail is naming node and edge types appropriately.

For each metanode and metaedge, we also need abbreviations. We use the abbreviations to make writing out complete paths less cumbersome. For example, in our previous project, we abbreviated the Gene - interaction - Gene - expression - Tissue - localization - Disease path to GiGeTlD [1].

We have several conventions for naming and abbreviations, but they haven't been publicly explained or discussed. This discussion is now home to these topics.

Naming according to parts of speech

According to Chen's rules of thumb, we should use parts of speech as follows [1]:

  • common nouns for node labels (types)
  • proper nouns for node names
  • transitive verbs for relationship (edge) types
  • intransitive verbs for property (attribute) types
  • adjectives for node properties
  • adverbs for relationship properties

I'm not convinced about the last three, since our properties (data attributes for nodes and relationships) are often highly technical. However, I think we should adhere to the first three rules when possible.

Our node labels are already common nouns. Our node names are already proper nouns. However, we were using common nouns for relationship types. Thus, I switched to transitive verbs for relationship types (commit). The table below shows the noun (old) and verb (new) relationship type.

SourceTargetMetaedge (noun)Metaedge (verb)
compoundgenebindingbinds
compoundside effectcausationcauses
compoundgenedownregulationdownregulates
compounddiseaseindicationpalliates
compoundcompoundsimilarityresembles
compounddiseaseindicationtreats
compoundgeneupregulationupregulates
diseasegeneassociationassociates
diseasegenedownregulationdownregulates
diseaseanatomylocalizationlocalizes
diseasesymptompresencepresents
diseasediseasesimilarityresembles
diseasegeneupregulationupregulates
geneanatomydownregulationdownregulates
genegeneevolutionevolves
geneanatomyexpressionexpresses
genegeneinteractioninteracts
genebiological processparticipationparticipates
genecellular componentparticipationparticipates
genemolecular functionparticipationparticipates
genepathwayparticipationparticipates
geneperturbationregulationregulates
geneanatomyupregulationupregulates
genegeneknockdown downregulationknockdown downregulates
genegeneknockdown upregulationknockdown upregulates
genegeneoverexpression downregulationoverexpression downregulates
genegeneoverexpression upregulationoverexpression upregulates

In several cases, switching from noun to verb cut out several characters — a welcome occurrence. Switching relationship types to verbs also makes sense as part of our migration to neo4j. The neo4j convention is to use verbs for relationship types. In fact, a neo4j company explains relationships by saying:

Where nodes can be thought of as nouns, relationships can be thought of as verbs.

The compound-gene associations are not intuitive to me. I assume that when, for example, a compound downregulates a gene, it is supposed to mean that the compound inhibits the protein product encoded by the gene. However, if read at face value, it would mean that the compound binds to something else that through some signaling results in down-regulation of the gene (i.e. less transcription).

The gene-gene association "evolves" is bit of a misnomer, I think. Unless you are looking at ancestral genes, one gene will not have evolved from another gene. Rather two genes will share ancestry. In that case, the term "homology" is would be much clearer. Also, you probably want to be able to distinguish between orthologs and paralogs in your network.

Are the gene-anatomy relationships not backwards? I can understand what it means that means that the liver "upregulates" a gene (I assume it means that the gene is higher expressed in the liver than elsewhere). But I cannot comprehend what it would mean that a gene upregulates the liver.

Same goes for gene-pertubation relationships. I can understand that a pertubation regulates a gene, but how can a gene regulate a pertubation? And why is this type of association not divided into up- and down-regulation like everything else?

I am not entirely sure how useful the "knockdown downregulates" etc. types are. Usually "knockdown downregulates" would be interpreted to mean "upregulates" etc.

I assume that when, for example, a compound downregulates a gene, it is supposed to mean that the compound inhibits the protein product encoded by the gene. However, if read at face value, it would mean that the compound binds to something else that through some signaling results in down-regulation of the gene (i.e. less transcription).

Your face value interpretation is correct. Compound–downregulates–Gene means the compound decreases the transcriptional expression of the gene. We extracted these relationships from LINCS L1000.

The gene-gene association "evolves" is bit of a misnomer

I agree, "evolves" is not good. This edge signifies evolutionary rate covariation [1]. It's a mouthful, and I don't know the best way to shorten and verbify it. Perhaps "covaries" is an improvement?

Are the gene-anatomy relationships not backwards? … Same goes for gene-pertubation relationships.

Great point. We should present these edges in subject-verb-object order. I have switched the default orientation of the confusing metaedges (commit). In practice the object-verb-subject order may still arise, for example when representing paths.

I am not entirely sure how useful the "knockdown downregulates" etc. types are. Usually "knockdown downregulates" would be interpreted to mean "upregulates" etc.

I will look into collapsing:

  • knockdown downregulates with overexpression upregulates to create an upregulates edge
  • knockdown upregulates with overexpression downregulates to create a downregulates edge

Indication terminology

We've been referred to when a drug treats a disease as an "indication". While readers with a medical background understand the term, others find "indication" confusing.

Now we've split our indications into two categories: disease-modifying and symptomatic. Additionally, we've switched to using verbs to describe relationships.

Given these factors, I chose "treats" for disease-modifying indications and "palliates" for symptomatic indications. This terminology aligns with a recent repurposing study [1], which refers to

distinguishing non-causative and palliative from causative and effective treatments

While readers may not be familiar with the term palliates, it has an applicable and precise definition (making lookup easier):

Make (a disease or its symptoms) less severe or unpleasant without removing the cause

@pouyakhankhanian, do you think the treats/palliates terminology makes sense?

I certainly agree with maintaining the terminology consistent with prior studies. I think the terms "indication and "palliates" are well defined as you describe. My only concern is the use of the word "treat" to mean "disease-modifying" as opposed to symptom management, especially since it is very common to use the phrase "treat symptoms".

If there are other prior studies that use alternate terminology, it might be best to align with those. Otherwise, I would think the two goals are (1) maintain previous terminology and (2) make sure to define our terminology very clearly.

I'm not sure the phrase "drug X treats symptom Y" is that problematic, since symptom Y is the sentence's subject rather than a disease. I agree that we should maintain existing terminology, but I'm not finding much guidance in the literature.

Potential alternatives to "treats" for representing disease-modifying indications are: modifies, medicates, indicates, remedies, ameliorates, betters, improves, corrects, affects, alleviates, repairs, and cures. @pouyakhankhanian, do you prefer any of these verbs to "treats"?

And regardless of which term we pick, we'll make sure to define each relationship type.

Hetionet v1.0 type nomenclature

We've settled on a final type nomenclature for Hetionet v1.0 (our hetnet for this project). See the following tables:

  • Metanodes where metanode is the primary name, abbreviation is the 1–2 letter abbreviation, and label is the Neo4j node label.
  • Metaedges where metaedge is the primary name, unicode_metaedge is a styled version of the primary name, standard_metaedge is the primary edge orientation, and inverted indicates the non-primary edge orientation. The remaining columns are abbreviation, standard_abbreviation, source, and target.
  • Neo4j relationship types where metaedge is the primary name, rel_type is Neo4j relationship type, and direction notes whether edges are bidirectional (both) or directed (forward or backward).

Neo4j type nomenclature

We conform to the Neo4j style of CamelCase labels and ALL_CAPS relationship types. In addition, Neo4j relationship types are appended with metaedge standard abbreviations. This adds source/target-metanode awareness to relationship types and enables optimized queries.

 
Join to Reply
Status: Completed
Views
142
Topics
Referenced by
Cite this as
Daniel Himmelstein, Lars Juhl Jensen, Pouya Khankhanian (2016) Data nomenclature: naming and abbreviating our network types. Thinklab. doi:10.15363/thinklab.d162
License

Creative Commons License

Share