Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCmetrics values are not constant for multiple runs #75

Open
JRicardo24 opened this issue Sep 7, 2022 · 3 comments
Open

PCmetrics values are not constant for multiple runs #75

JRicardo24 opened this issue Sep 7, 2022 · 3 comments

Comments

@JRicardo24
Copy link

Hello guys, is it normal that when we run the metrics module, on the exactly same dataset, that the values for isolation distance, l_ratio, d_prime and the 2 nearest_neighbours metrics change their values?

For some clusters the values are indeed pretty similar, but for others, like a cluster I have with 35k spikes, the isolation_distance varied from 361 to 556...

The biggest changes come from clusters with more spikes. Any thoughts about that? Is it normal?

Thank you
@jsiegle

@jsiegle
Copy link
Collaborator

jsiegle commented Sep 7, 2022

All of the PC metrics involve random subsampling of spikes to speed up the calculation.

The np.random module is initialized with the same seed value on each run, which should ensure that the results are the same each time. But it's possible the seeding is not working as expected.

isolation_distance in particular can be quite sensitive to the subsampled spikes, which is why we don't use it for any of our unit-level quality control. In fact, the only PC metric we've found to be generally useful is nearest_neighbors_hit_rate. Have you found that one to vary significantly between runs?

@JRicardo24
Copy link
Author

I understand. Yes you're right, from all the PC metrics, nearest_neighbors_hit_rate is the one that has most constant values between runs.
The default value for the number of spikes to subsample for computing PC metrics (max_spikes_for_unit) is set to 500, maybe it's the reason for some larger variations in isolation_distance and other metrics on units with significantly more spikes? If yes, what would it be recommended to use on a dataset with units ranging from a few dozen spikes all the way up to 35k? @jsiegle

@jsiegle
Copy link
Collaborator

jsiegle commented Oct 20, 2022

You can try increasing max_spikes_per_unit to 2000 or higher. That will increase the computation time, but should make the values more stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants