Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Does SSG handle ties appropriately? #977

Closed
ebridge2 opened this issue Jul 21, 2022 · 0 comments
Closed

[BUG] Does SSG handle ties appropriately? #977

ebridge2 opened this issue Jul 21, 2022 · 0 comments
Labels
bug Something isn't working not planned by core Not planned to be worked on by core team, welcome to propose a plan of action

Comments

@ebridge2
Copy link
Collaborator

ebridge2 commented Jul 21, 2022

Expected Behavior

when computing a subgraph, ties are randomly broken.

Actual Behavior

The code is fairly opaque to me relative the "traditional" algorithms for solving signal subgraph, I'm sure they are great/way faster than how I would have implemented, but I can't totally follow exactly what's going on so I can't pinpoint specific lines. However, when ties are experienced in rankings, they should be randomly broken, not broken according to the metric that numpy uses (usually, lowest according to the datatype being ranked, to my knowledge). As far as I can tell, every sory/argsort/minimum/maximum called in the code uses the bare-bones numpy defaults for breaking ties, which is incorrect.

Suggested (potential) fix: write an implementation that uses scipy stats rankdata instead, use method to break ties as the average, and then use the number of tied entries for a particular duplicated value to obtain the "value range" that the tied entries would encompass. Then, just use numpy random choice to select from that value range without replacement and at random to reassign "randomly broken" ties. Alternatively, for every sort called, instead start by randomly reassigning positions, then do the sort/minimum/maximum/etc. (which will break ties in order of appearance, which is now random), and then find the mapping from initial appearance to randomly resorted appearance and then sort the initial array using the double mapping.

Any other information?

The procedures employed are not statistically principled.

@ebridge2 ebridge2 added the bug Something isn't working label Jul 21, 2022
@bdpedigo bdpedigo added the not planned by core Not planned to be worked on by core team, welcome to propose a plan of action label Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working not planned by core Not planned to be worked on by core team, welcome to propose a plan of action
Projects
None yet
Development

No branches or pull requests

2 participants