-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix analyzer sampling frequency check #2606
Conversation
Because of floating errors, the smapling frequency can be slightly different between recording and sorting e.g. 30000.305042016807 and 30000.31 This fixes that issue (same code that was used in the waveform extractor)
thanks for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding some comments about long-term synchornization because we have encountered some similar problems in neuroconv and I am interested on understanding this better.
That said, I understand that:
- This code was using in WaveformExtractor so some sort of compatbility is important here.
- Maybe it is good to let the user perform some analysis even if the hard criteria of long-term synch is not hold (it will never hold undefinitly!). Specially if the recording and sorting are kind of not very long.
@@ -210,7 +211,7 @@ def create( | |||
sparsity=None, | |||
): | |||
# some checks | |||
assert sorting.sampling_frequency == recording.sampling_frequency | |||
assert math.isclose(sorting.sampling_frequency, recording.sampling_frequency, abs_tol=1e-2, rel_tol=1e-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably have thought deeper about this @DradeAW but what do you think of the following:
abs_tolerance of 1e-2 means that the sampling frequencies can differ by 0.01 Hz.
In other words, there is one excess cycle in the faster clock every 100 seconds.
Now, for a 30_000 Hz process this means (approximating by base) that we have a difference of 1/3 of a millisecond every 100 seconds (milliseconds_per_cycle = 1000 / 30_000 = 1/3). In other words, 1 milliseconds every 300 seconds or 1 millisecond of difference every 5 minutes.
That is, a relative tolerance of 0.01 Hz means 1 millisecond drift every 5 minutes.
In an hour they will differ by 12 milliseconds which starts to look bad for things like template estimation, right?
Is there something wrong here in my reasoning? I am trying to think on what is the justification for the degree of leniency in this absolute tolerance.
Should the criteria be that we don't want them to differe by more than milliseconds over a day long recording? How do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small difference in sampling frequency is usually the result of rounding by floating-point values, or the sorter that outputs a rounded sampling frequency ; and not an ACTUAL difference between the recording and sorting objects.
But as @samuelgarcia pointed out, it begs the question of what analyzer.sampling_frequency
should return (as we should return the most precise sampling frequency, the one not rounded)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that with the current relative tolerance some analysis will output incorrect results if the recording and the sorting are long enough (some hours).. In other words, you need some degree of long-term synch between the recording and the sorter objects and I am trying to think more sytematically about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But to Heberto's point if we have a 4 hour recording (with a 0.01 tolerance difference due to rounding error), then we are off by 144 samples by the end of the recording. So the sorting and recording sample_indices could be off by more than a waveform and lead some spikes to look bad, no? I don't know the solution either, but it does make me wish the sample_frequency could be handled as an int
instead....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the problem,
Both the recording and sorting use time samples (not seconds), so there is no conversion problem and they match perfectly (unless you specify spike timings in seconds, which nobody does)
I could manually set my recording sampling frequency to 30,000 and sorting to 20,000, and waveforms would still be perfect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you're right. The indices should be fine..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I understand well you are saying that as long as the underlying data is fine it does not matter what the sampling_frequency
, correct? This makes sense to me.
On top of that, you are thinking on the case where the recording and the sorter data actually match but the sorter is different for any reason (like rounding). That makes sense to me as well.
Honest question:
Why do we have the check then?
My understanding is that we are using the sampling_frequency as a proxy for synchronization of the underyling data. From that perspective, the criteria might be too lenient for the reasons described above. I do understand that the most strict criteria throws false positives for the cases that you described.
Maybe I am wrong about the purpose of the check. What do you think?
I think that we should make a strict check and if this differ we warn the user and we force the sorting to have the exact same frequency than the recording. What do you think ? |
I don't like warnings as a general rule, because in the case where it's not your fault and you're fine with it, it's annoying to see them pop up all the time ^^' I'm completely fine with changing the sorting sampling frequency, since it's a copy (shared mem) and doesn't affect the original sorting object. |
I think I agree with Sam here that the warning is the best option. Unless the probablity of difference sampling frequencies causing and error is really really low, Sam proposal for being safe, seems like the way to go to me. |
After discussing with @samuelgarcia we agree that we can be more "relaxed" about this, but we need to add a warning and use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple quick thoughts on the warning/error messaging. I think these are pretty optional comments though.
else: | ||
raise ValueError( | ||
f"Sorting and Recording sampling frequencies are too different: " | ||
f"recording: {recording.sampling_frequency} - sorting: {sorting.sampling_frequency}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To @h-mayorquin 's usual point shouldn't our error help the user fix the problem. They are too different so the 3rd line of this could be something like:
Ensure that you are associating the correct Recording and Sorting when creating a SortingAnalyzer
Totally optional, though.
if sorting.sampling_frequency != recording.sampling_frequency: | ||
if math.isclose(sorting.sampling_frequency, recording.sampling_frequency, abs_tol=1e-2, rel_tol=1e-5): | ||
warnings.warn( | ||
"Sorting and Recording have different sampling frequency. " "Using the one from the Recording" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption for this is that it is due to rounding of floats so do we want to be clear that a small difference might be okay. If I got this specific warning I would think I did something wrong.
warnings.warn(
"Sorting and Recording have a small difference in sampling frequency. This could be due to rounding of floats. Using the sampling frequency from the Recording."
Because of floating errors, the smapling frequency can be slightly different between recording and sorting
e.g. 30000.305042016807 and 30000.31
This fixes that issue (same code that was used in the waveform extractor)