Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort input extensions #2705

Merged
merged 13 commits into from
May 21, 2024

Conversation

chrishalcrow
Copy link
Collaborator

Automatically sorts a list of extensions, for computation, so that all parents are “on the left” of their children. Improvement suggested by @alejoe91 .

For example, the SortingAnalyzer extension waveforms depends on random_spikes. If you run

sorting_analyzer.compute([“waveforms”, “random_spikes”])

this fails as the compute function scans left to right. It’s a bit annoying since the user has tried to include random_spikes. It’s also easy to image a user running sorting_analyzer.compute([“waveforms”]). si tells the user they need to have random_spikes and they naturally add it on the right of the list. So: worth improving!

This change would automatically sort the inputted list so that all parents are on the left of their children. The function itself doesn’t check if the whole list is valid (i.e. it doesn’t return an error if random_spikes isn’t there when calculating waveforms), it only sorts it. These checks happen downstream.

The function I’ve written is a bit awkward, because the compute parameters are dictionaries (if you input a list, this gets converted to a dictionary internally, so kwargs can be easily handled.). And dictionaries aren’t really meant to have an order: so insertions at given indices are tricky etc. Here, I turn the dict into two lists, do the sorting, then zip. If anyone thinks of a cleaner algorithm, let me know.

Also added tests.

@chrishalcrow chrishalcrow added the enhancement New feature or request label Apr 11, 2024
@h-mayorquin
Copy link
Collaborator

this fails as the compute function scans left to right. It’s a bit annoying since the user has tried to include random_spikes. It’s also easy to image a user running sorting_analyzer.compute([“waveforms”]). si tells the user they need to have random_spikes and they naturally add it on the right of the list. So: worth improving!

Isn't it easier to just tell the user to add the needed extension before the extension that generates the error?. I think this is easier to mantain and document. I also like it makes the dependency graph explicit on the list. All of this only matters if the dependency graph of the extensions is meant to become more complicated. If not, then either solution should work just fine.

If we go the way of parsing the dependency tree and re-order it as we do here we should be careful to not alter the list that the user passes as they might use it for something else and get an error because we modified their state. In concrete, we should copy at the beginning.

@chrishalcrow
Copy link
Collaborator Author

Isn't it easier to just tell the user to add the needed extension before the extension that generates the error?. I think this is easier to mantain and document. I also like it makes the dependency graph explicit on the list. All of this only matters if the dependency graph of the extensions is meant to become more complicated. If not, then either solution should work just fine.

You could be right. The error reporting also helps the user be aware of the parent/child structure which might help overall understanding. If we use an error reporting method, it could:

  1. Just report "please put parents to the left of children"
  2. Report the first error it finds "please put random_spikes to the left of waveforms"
  3. Find and report all the errors as a list! "please put random_spikes to the left of waveforms. Please put waveform to the left of templates"
    I like 3: helpful and educational. @alejoe91 , opinions?

If we go the way of parsing the dependency tree and re-order it as we do here we should be careful to not alter the list that the user passes as they might use it for something else and get an error because we modified their state. In concrete, we should copy at the beginning.

Yes, good point thanks! So I shouldn't have edited extensions - I'll change on the next update.

@alejoe91
Copy link
Member

I think it's not a mistake to pass ["templates", "random_spikes"] as input, so I don't think that the techicality of what depends o what is so educational that should trigger an error.

@chrishalcrow
Copy link
Collaborator Author

I've updated it so that extensions is not edited/updated. Thanks @h-mayorquin

So the other issue comes down to whether ['waveforms', 'random_spikes'] is an error or not. This decision is beyond my paygrade ;)

@alejoe91
Copy link
Member

I've updated it so that extensions is not edited/updated. Thanks @h-mayorquin

So the other issue comes down to whether ['waveforms', 'random_spikes'] is an error or not. This decision is beyond my paygrade ;)

I believe this should be allowed and it's not an error from the user! Especially when passing a dict as input to the compute, re-sorting the dict according to dependencies is unnecessarily complex

Comment on lines 281 to 282
assert sorted_extensions_4 == {"templates": {}, "random_spikes": {}, "waveforms": {}, "quality_metrics": {}}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have "random_spikes" at the first place no ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello. Yes, the tests were all useless since dict equality doesn't care about order. So, at the testing stage, I've made them OrderedDicts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict is now ordered by default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but equality doesn't care about the ordering, for backward compatibility. The following

a = {1: 3, 2: 4}
b = {2: 4, 1: 3}
a == b

outputs True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes of course. We should do
list(a.key()) == list(b.keys())
no ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, both seem to work, but if that's standard practice then let's go for keys. I've updated the tests, including getting rid of the "magic" sorted dictionaries.

@samuelgarcia
Copy link
Member

Hi Chris.
I am reading this very very late!!
I am sorry.
I did a few comments.

@samuelgarcia
Copy link
Member

merci!

@samuelgarcia samuelgarcia merged commit 26c145c into SpikeInterface:main May 21, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants