## Assessing the effectiveness of our hetnet permutations

We've previously discussed what hetnet permutation is and why we do it. To permute a hetnet, we go through each relationship type (metaedge) and repeatedly swap the target nodes of two random relationships (edges). This strategy is called XSwap [1].

We looked into performing the permutation in neo4j using cypher, but decided to stick with our python implementation since cypher's cost planner currently lacks the needed abilities.

## Implementation specifics

We closely followed the parameters from our previous study [2] and did the following:

• We created 5 permuted hetnets. The first permutated hetnet was created from the unpermuted hetnet; the second permutated hetnet was created from the first permutated hetnet; and so on until the fifth permutated hetnet was created from the fourth permutated hetnet. This iterative strategy is referred to as a Markov chain.
• To create each permuted hetnet, we separately permuted each metaedge. For a given metaedge, we attempted n XSwaps where n equals four times the number of edges (multipler = 4).
• Xswaps can be unsuccessful for several reasons. The same edge could have been randomly selected twice (referred to as same_edge). One or both of the potential new edges may already exist (duplicate or undirected_duplicate for select cases where a biderectional edge connects two nodes of the same type). One or both of the potential new edges may connect a node to itself (self_loop). In these instances, no swap is performed. In the future, we may switch to stopping a completing permutation after a certain number of successes rather than attempts.

## Assessing permutation effectiveness

For each permutation and each metaedge, we measure the progress of the randomization at 10 points (dataset). The measure we're primarily interested in is the percent of edges that are unchanged after a permutation (unchanged).

We find that the percent of unchanged edges varies by metaedge (notebook cell 4). It appears that we could safely reduce our multiplier from 4 to 2.5 and still generate permuted networks that are maximally diversified from their predecessor.

Of concern are metaedges where a high percentage of the edges do not change. This occurred when a high percentage of swaps resulted in already existing edges (notebook cell 6). Particularly troublesome was the Anatomy–expresses–Gene edge where almost all attempts yielded duplicated edges and only ~10% of edges changed from a permutation. I'm now inclined to revisit our previous observation that we're being too permissive regarding expression edge inclusion.

Metaedges whose edges do not change from permutation are limited in informativeness. Such edges hold little information besides their degree contribution to the nodes they connect. In the context of our expression edge, the problem is visible in the node degree distribution: most anatomies express 0 genes while a minority of anatomies express an extremely high number of genes (see the anatomy - expresses - gene panel on page 10).

Daniel Himmelstein Researcher

# Improved randomization of expression edges

We updated our method for extracting Anatomy–expresses–Gene relationships from Bgee. This update reduced the number of expression edges in our hetnet from 1,006,278 to 526,407. The number of genes with an expression edge went from 18,147 to 18,094. The number of anatomies with an expression edge went from 256 to 241.

The permutation of expression edges increased in effectiveness. Now ~25% of expression edges (as opposed to ~10% previously) change in a permuted hetnet. And this is in spite of fewer attempted swaps per permutation: I decreased the multiplier from 4 to 3 to reduce runtime.

Status: Completed
Views
71
Topics
Referenced by
Cite this as
Daniel Himmelstein (2016) Assessing the effectiveness of our hetnet permutations. Thinklab. doi:10.15363/thinklab.d178