Top_k_motifs : Returns (partially) wrong neighbors #12

tcrasset · 2020-02-20T09:28:44Z

Description

I didn't do anything per se, I rather implemented this algorithm in Java and I have some useful tweaks.

What I Observed

The implemented algorithm uses the numpy.argpartition() method.
As the documentation of this method states :

The k-th element will be in its final sorted position and all smaller elements will be moved before it
and all larger elements behind it. The order all elements in the partitions is undefined.

Concretely, we sort the distance_profile and we get the indices using
indices = np.argpartition(tmp, k)

However in the for loop, we go through every index, so it could be that the first k indices are trivial matches or get excluded in the subsequent iterations and the "motifs" that get picked as nearest neighbors are motifs that are in indices[k:] and thus the order of them is undefined i.e. unsorted. We thus end up with neighbors that are not really neighbors.
One fix for this is to just loop over indices[0:k] and notify the user that less that k neighbors were found if some are excluded.

Moreover, I think one should apply the exclusion zone after one adds an index to the found indices, so putting the code inside of the if condition.
Let's assume we have

exclusion_zone = 2
k = 5
indices = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 ,15]
distance_profile = [Inf, Inf, Inf, x, x, x, ...]

The for loop will go through the indices array, see that index 1 is excluded, and then exclude in again. All good until now.
However, once the loop gets to index 2, as the exclusion zone gets applied on either side of the index, index 4 gets excluded.
Arriving at index 3, it is excluded, and thus is excludes it again and index 5 is then put into the exclusion zone.
This goes on until all indices get consumed and the found indices remains empty, whereas it should have returned at least the index 4 as a neighbor.
I know this is an artificial example, I didn't test it using your library. I didn't read the calling code, but this could be a non-issue if the query is already excluded from the distance profile (I think, but I am not sure). However, the easiest way would be to put the exclusion zone inside of the if statement.

The text was updated successfully, but these errors were encountered:

RexKing6 · 2020-09-23T11:56:56Z

I think top k here is not precise because of np.argpartition(). It may be replaced with np.sort() or other functions to get the precise neighbors. My idea is that we should check if top-2 in exclusion zone after appending top-1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Top_k_motifs : Returns (partially) wrong neighbors #12

Top_k_motifs : Returns (partially) wrong neighbors #12

tcrasset commented Feb 20, 2020

RexKing6 commented Sep 23, 2020

Top_k_motifs : Returns (partially) wrong neighbors #12

Top_k_motifs : Returns (partially) wrong neighbors #12

Comments

tcrasset commented Feb 20, 2020

Description

What I Observed

RexKing6 commented Sep 23, 2020