-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making PhySortingExtractor
more efficient
#2197
base: main
Are you sure you want to change the base?
Conversation
Right now the sorting loads all I think by default we should not do that, and only load what's strictly necessary. |
What about loading all by default, but give the option to opt out? I think that the average user would want to read all properties, and you are not the average user ;) |
I know ^^' |
@@ -178,6 +179,13 @@ def __init__( | |||
|
|||
self.add_sorting_segment(PhySortingSegment(spike_times_clean, spike_clusters_clean)) | |||
|
|||
# Caching spike vector for faster computation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the self.to_spike_vector
method with cache=True to avoid duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I call to_spike_vector
, it's not going to compute it efficiently.
Here I'm using the data that's already loaded in the files!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Can you add a comment along those lines so nobody stumbles like myself in the future. Like:
# Using properties available at __init__ to build the a cached spike_vector and avoid computation.
This PR though, makes Phy less eficient at initialization by doing a computation. If you pickle the file for example in a long chain like the ones that you like the use this will make the loading slower. What are you improving? Have you profiled? Remember the rules of optimization: |
I'm planning on removing some computation done beforehand, hence the fact that it's still a draft :) |
Makes sense, the draft escaped me. Don't forget to profile. |
I benchmarked the fastest way to load a tsv file by trying the following: # Benchmark using timeit
pd.read_csv("cluster_group.tsv", delimiter='\t') # ~ 7.08s for 10,000 iterations
dict(csv.reader(open('cluster_group.tsv', 'r'), delimiter='\t')) # ~ 0.27s for 10,000 iterations
list(csv.reader(open('cluster_group.tsv', 'r'), delimiter='\t')) # ~ 0.25s for 10,000 iterations Plus the fact that importing |
There is a question on how to retrieve the What do you think is best? |
I use the
is this a problem? If a unit is empty is there a reason I would want to keep it around? Isn't this a weird KS artifact that happens sometimes?
You mean like masks those units right? It's not that they are actually deleted, it's just that they are not analyzed in that session because they are not loaded. I would be curious about the exact time you encountered this. Since I used this extractor sometimes I might want to go back and double check myself. |
I myself don't want to keep an empty unit around, but I want to make sure that everybody is on the same page!
There's not deleted from the phy folder, but they are not in the
I made a tsv file but with the |
Cool thanks for the response. I would also agree to not load the empty units (even though that was old behavior), but if we want to stay consistent I could understand maintaining the old behavior. |
No description provided.