Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix analyzer sampling frequency check #2606

Merged
merged 8 commits into from May 21, 2024

Conversation

DradeAW
Copy link
Contributor

@DradeAW DradeAW commented Mar 20, 2024

Because of floating errors, the smapling frequency can be slightly different between recording and sorting

e.g. 30000.305042016807 and 30000.31

This fixes that issue (same code that was used in the waveform extractor)

Because of floating errors, the smapling frequency can be slightly different between recording and sorting

e.g. 30000.305042016807 and 30000.31

This fixes that issue (same code that was used in the waveform extractor)
@samuelgarcia
Copy link
Member

thanks for this.
We have a deeper problem for this. The sorter can round the sampleing rate and so in analyzer we should take the recording which is the one that have the most accurate normally unless we loose precision in the preprocessing.

@alejoe91 alejoe91 added the core Changes to core module label Mar 20, 2024
Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding some comments about long-term synchornization because we have encountered some similar problems in neuroconv and I am interested on understanding this better.

That said, I understand that:

  • This code was using in WaveformExtractor so some sort of compatbility is important here.
  • Maybe it is good to let the user perform some analysis even if the hard criteria of long-term synch is not hold (it will never hold undefinitly!). Specially if the recording and sorting are kind of not very long.

@@ -210,7 +211,7 @@ def create(
sparsity=None,
):
# some checks
assert sorting.sampling_frequency == recording.sampling_frequency
assert math.isclose(sorting.sampling_frequency, recording.sampling_frequency, abs_tol=1e-2, rel_tol=1e-5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably have thought deeper about this @DradeAW but what do you think of the following:

abs_tolerance of 1e-2 means that the sampling frequencies can differ by 0.01 Hz.

In other words, there is one excess cycle in the faster clock every 100 seconds.

Now, for a 30_000 Hz process this means (approximating by base) that we have a difference of 1/3 of a millisecond every 100 seconds (milliseconds_per_cycle = 1000 / 30_000 = 1/3). In other words, 1 milliseconds every 300 seconds or 1 millisecond of difference every 5 minutes.

That is, a relative tolerance of 0.01 Hz means 1 millisecond drift every 5 minutes.

In an hour they will differ by 12 milliseconds which starts to look bad for things like template estimation, right?

Is there something wrong here in my reasoning? I am trying to think on what is the justification for the degree of leniency in this absolute tolerance.

Should the criteria be that we don't want them to differe by more than milliseconds over a day long recording? How do you think about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small difference in sampling frequency is usually the result of rounding by floating-point values, or the sorter that outputs a rounded sampling frequency ; and not an ACTUAL difference between the recording and sorting objects.

But as @samuelgarcia pointed out, it begs the question of what analyzer.sampling_frequency should return (as we should return the most precise sampling frequency, the one not rounded)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that with the current relative tolerance some analysis will output incorrect results if the recording and the sorting are long enough (some hours).. In other words, you need some degree of long-term synch between the recording and the sorter objects and I am trying to think more sytematically about it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But to Heberto's point if we have a 4 hour recording (with a 0.01 tolerance difference due to rounding error), then we are off by 144 samples by the end of the recording. So the sorting and recording sample_indices could be off by more than a waveform and lead some spikes to look bad, no? I don't know the solution either, but it does make me wish the sample_frequency could be handled as an int instead....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the problem,

Both the recording and sorting use time samples (not seconds), so there is no conversion problem and they match perfectly (unless you specify spike timings in seconds, which nobody does)
I could manually set my recording sampling frequency to 30,000 and sorting to 20,000, and waveforms would still be perfect

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you're right. The indices should be fine..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand well you are saying that as long as the underlying data is fine it does not matter what the sampling_frequency , correct? This makes sense to me.

On top of that, you are thinking on the case where the recording and the sorter data actually match but the sorter is different for any reason (like rounding). That makes sense to me as well.

Honest question:
Why do we have the check then?
My understanding is that we are using the sampling_frequency as a proxy for synchronization of the underyling data. From that perspective, the criteria might be too lenient for the reasons described above. I do understand that the most strict criteria throws false positives for the cases that you described.

Maybe I am wrong about the purpose of the check. What do you think?

@samuelgarcia
Copy link
Member

I think that we should make a strict check and if this differ we warn the user and we force the sorting to have the exact same frequency than the recording. What do you think ?
The most important is to warn the user.

@DradeAW
Copy link
Contributor Author

DradeAW commented Mar 21, 2024

I don't like warnings as a general rule, because in the case where it's not your fault and you're fine with it, it's annoying to see them pop up all the time ^^'
But I'm fine with it if you think it's best

I'm completely fine with changing the sorting sampling frequency, since it's a copy (shared mem) and doesn't affect the original sorting object.

@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Mar 21, 2024

I think I agree with Sam here that the warning is the best option.
I also think that warnings should be actionable by the user that receives them (i.e. there should be something they can do about it).
For the case that @DradeAW describes, the action for the user (if they want to suppress the warning) is just to modify the sampling frequency of the sorter themselves, right? So that sounds an easy thing that they can do to remove the warning.

Unless the probablity of difference sampling frequencies causing and error is really really low, Sam proposal for being safe, seems like the way to go to me.

@alejoe91
Copy link
Member

After discussing with @samuelgarcia we agree that we can be more "relaxed" about this, but we need to add a warning and use the recording sampling frequency as it is more reliable and change in-place the sorting one (since the analyzer creates its own copy of the sorting object)

Copy link
Collaborator

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple quick thoughts on the warning/error messaging. I think these are pretty optional comments though.

else:
raise ValueError(
f"Sorting and Recording sampling frequencies are too different: "
f"recording: {recording.sampling_frequency} - sorting: {sorting.sampling_frequency}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @h-mayorquin 's usual point shouldn't our error help the user fix the problem. They are too different so the 3rd line of this could be something like:

Ensure that you are associating the correct Recording and Sorting when creating a SortingAnalyzer

Totally optional, though.

if sorting.sampling_frequency != recording.sampling_frequency:
if math.isclose(sorting.sampling_frequency, recording.sampling_frequency, abs_tol=1e-2, rel_tol=1e-5):
warnings.warn(
"Sorting and Recording have different sampling frequency. " "Using the one from the Recording"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption for this is that it is due to rounding of floats so do we want to be clear that a small difference might be okay. If I got this specific warning I would think I did something wrong.

warnings.warn(
"Sorting and Recording have a small difference in sampling frequency. This could be due to rounding of floats. Using the sampling frequency from the Recording."

@samuelgarcia samuelgarcia merged commit 8317eb5 into SpikeInterface:main May 21, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants