[BUG] Does SSG handle ties appropriately? #977
Labels
bug
Something isn't working
not planned by core
Not planned to be worked on by core team, welcome to propose a plan of action
Expected Behavior
when computing a subgraph, ties are randomly broken.
Actual Behavior
The code is fairly opaque to me relative the "traditional" algorithms for solving signal subgraph, I'm sure they are great/way faster than how I would have implemented, but I can't totally follow exactly what's going on so I can't pinpoint specific lines. However, when ties are experienced in rankings, they should be randomly broken, not broken according to the metric that numpy uses (usually, lowest according to the datatype being ranked, to my knowledge). As far as I can tell, every sory/argsort/minimum/maximum called in the code uses the bare-bones numpy defaults for breaking ties, which is incorrect.
Suggested (potential) fix: write an implementation that uses scipy stats rankdata instead, use method to break ties as the average, and then use the number of tied entries for a particular duplicated value to obtain the "value range" that the tied entries would encompass. Then, just use numpy random choice to select from that value range without replacement and at random to reassign "randomly broken" ties. Alternatively, for every sort called, instead start by randomly reassigning positions, then do the sort/minimum/maximum/etc. (which will break ties in order of appearance, which is now random), and then find the mapping from initial appearance to randomly resorted appearance and then sort the initial array using the double mapping.
Any other information?
The procedures employed are not statistically principled.
The text was updated successfully, but these errors were encountered: