Fix analyzer sampling frequency check #2606

DradeAW · 2024-03-20T12:02:44Z

Because of floating errors, the smapling frequency can be slightly different between recording and sorting

e.g. 30000.305042016807 and 30000.31

This fixes that issue (same code that was used in the waveform extractor)

Because of floating errors, the smapling frequency can be slightly different between recording and sorting e.g. 30000.305042016807 and 30000.31 This fixes that issue (same code that was used in the waveform extractor)

samuelgarcia · 2024-03-20T12:16:12Z

thanks for this.
We have a deeper problem for this. The sorter can round the sampleing rate and so in analyzer we should take the recording which is the one that have the most accurate normally unless we loose precision in the preprocessing.

h-mayorquin

I am adding some comments about long-term synchornization because we have encountered some similar problems in neuroconv and I am interested on understanding this better.

That said, I understand that:

This code was using in WaveformExtractor so some sort of compatbility is important here.
Maybe it is good to let the user perform some analysis even if the hard criteria of long-term synch is not hold (it will never hold undefinitly!). Specially if the recording and sorting are kind of not very long.

h-mayorquin · 2024-03-20T16:24:11Z

src/spikeinterface/core/sortinganalyzer.py

@@ -210,7 +211,7 @@ def create(
        sparsity=None,
    ):
        # some checks
-        assert sorting.sampling_frequency == recording.sampling_frequency
+        assert math.isclose(sorting.sampling_frequency, recording.sampling_frequency, abs_tol=1e-2, rel_tol=1e-5)


You probably have thought deeper about this @DradeAW but what do you think of the following:

abs_tolerance of 1e-2 means that the sampling frequencies can differ by 0.01 Hz.

In other words, there is one excess cycle in the faster clock every 100 seconds.

Now, for a 30_000 Hz process this means (approximating by base) that we have a difference of 1/3 of a millisecond every 100 seconds (milliseconds_per_cycle = 1000 / 30_000 = 1/3). In other words, 1 milliseconds every 300 seconds or 1 millisecond of difference every 5 minutes.

That is, a relative tolerance of 0.01 Hz means 1 millisecond drift every 5 minutes.

In an hour they will differ by 12 milliseconds which starts to look bad for things like template estimation, right?

Is there something wrong here in my reasoning? I am trying to think on what is the justification for the degree of leniency in this absolute tolerance.

Should the criteria be that we don't want them to differe by more than milliseconds over a day long recording? How do you think about it?

A small difference in sampling frequency is usually the result of rounding by floating-point values, or the sorter that outputs a rounded sampling frequency ; and not an ACTUAL difference between the recording and sorting objects.

But as @samuelgarcia pointed out, it begs the question of what analyzer.sampling_frequency should return (as we should return the most precise sampling frequency, the one not rounded)

My point is that with the current relative tolerance some analysis will output incorrect results if the recording and the sorting are long enough (some hours).. In other words, you need some degree of long-term synch between the recording and the sorter objects and I am trying to think more sytematically about it.

But to Heberto's point if we have a 4 hour recording (with a 0.01 tolerance difference due to rounding error), then we are off by 144 samples by the end of the recording. So the sorting and recording sample_indices could be off by more than a waveform and lead some spikes to look bad, no? I don't know the solution either, but it does make me wish the sample_frequency could be handled as an int instead....

I don't understand the problem,

Both the recording and sorting use time samples (not seconds), so there is no conversion problem and they match perfectly (unless you specify spike timings in seconds, which nobody does)
I could manually set my recording sampling frequency to 30,000 and sorting to 20,000, and waveforms would still be perfect

Actually you're right. The indices should be fine..

if I understand well you are saying that as long as the underlying data is fine it does not matter what the sampling_frequency , correct? This makes sense to me.

On top of that, you are thinking on the case where the recording and the sorter data actually match but the sorter is different for any reason (like rounding). That makes sense to me as well.

Honest question:
Why do we have the check then?
My understanding is that we are using the sampling_frequency as a proxy for synchronization of the underyling data. From that perspective, the criteria might be too lenient for the reasons described above. I do understand that the most strict criteria throws false positives for the cases that you described.

Maybe I am wrong about the purpose of the check. What do you think?

samuelgarcia · 2024-03-21T08:21:37Z

I think that we should make a strict check and if this differ we warn the user and we force the sorting to have the exact same frequency than the recording. What do you think ?
The most important is to warn the user.

DradeAW · 2024-03-21T10:23:30Z

I don't like warnings as a general rule, because in the case where it's not your fault and you're fine with it, it's annoying to see them pop up all the time ^^'
But I'm fine with it if you think it's best

I'm completely fine with changing the sorting sampling frequency, since it's a copy (shared mem) and doesn't affect the original sorting object.

h-mayorquin · 2024-03-21T17:16:09Z

I think I agree with Sam here that the warning is the best option.
I also think that warnings should be actionable by the user that receives them (i.e. there should be something they can do about it).
For the case that @DradeAW describes, the action for the user (if they want to suppress the warning) is just to modify the sampling frequency of the sorter themselves, right? So that sounds an easy thing that they can do to remove the warning.

Unless the probablity of difference sampling frequencies causing and error is really really low, Sam proposal for being safe, seems like the way to go to me.

alejoe91 · 2024-05-13T10:33:26Z

After discussing with @samuelgarcia we agree that we can be more "relaxed" about this, but we need to add a warning and use the recording sampling frequency as it is more reliable and change in-place the sorting one (since the analyzer creates its own copy of the sorting object)

…patch-2

zm711

A couple quick thoughts on the warning/error messaging. I think these are pretty optional comments though.

zm711 · 2024-05-21T14:01:49Z

src/spikeinterface/core/sortinganalyzer.py

+            else:
+                raise ValueError(
+                    f"Sorting and Recording sampling frequencies are too different: "
+                    f"recording: {recording.sampling_frequency} - sorting: {sorting.sampling_frequency}"


To @h-mayorquin 's usual point shouldn't our error help the user fix the problem. They are too different so the 3rd line of this could be something like:

Ensure that you are associating the correct Recording and Sorting when creating a SortingAnalyzer

Totally optional, though.

zm711 · 2024-05-21T14:03:51Z

src/spikeinterface/core/sortinganalyzer.py

+        if sorting.sampling_frequency != recording.sampling_frequency:
+            if math.isclose(sorting.sampling_frequency, recording.sampling_frequency, abs_tol=1e-2, rel_tol=1e-5):
+                warnings.warn(
+                    "Sorting and Recording have different sampling frequency. " "Using the one from the Recording"


The assumption for this is that it is due to rounding of floats so do we want to be clear that a small difference might be okay. If I got this specific warning I would think I did something wrong.

warnings.warn( "Sorting and Recording have a small difference in sampling frequency. This could be due to rounding of floats. Using the sampling frequency from the Recording."

DradeAW added 2 commits March 20, 2024 13:02

Fix analyzer sampling frequency check

2f4acd3

Because of floating errors, the smapling frequency can be slightly different between recording and sorting e.g. 30000.305042016807 and 30000.31 This fixes that issue (same code that was used in the waveform extractor)

oops

e63fe6b

alejoe91 added the core Changes to core module label Mar 20, 2024

h-mayorquin reviewed Mar 20, 2024

View reviewed changes

Merge branch 'main' into patch-2

ef8b686

alejoe91 added 2 commits May 21, 2024 15:32

Merge branch 'main' of github.com:SpikeInterface/spikeinterface into …

a2f65c9

…patch-2

Improve check on sampling frequencies

b51f166

zm711 reviewed May 21, 2024

View reviewed changes

alejoe91 added 3 commits May 21, 2024 16:22

Zach's suggestion

108a50c

format

ea5e44a

Zach's suggestion 2

d1befac

samuelgarcia approved these changes May 21, 2024

View reviewed changes

samuelgarcia merged commit 8317eb5 into SpikeInterface:main May 21, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix analyzer sampling frequency check #2606

Fix analyzer sampling frequency check #2606

DradeAW commented Mar 20, 2024

samuelgarcia commented Mar 20, 2024

h-mayorquin left a comment

h-mayorquin Mar 20, 2024

DradeAW Mar 20, 2024

h-mayorquin Mar 20, 2024

zm711 Mar 20, 2024

DradeAW Mar 20, 2024

zm711 Mar 20, 2024

h-mayorquin Mar 20, 2024 •

edited

samuelgarcia commented Mar 21, 2024

DradeAW commented Mar 21, 2024

h-mayorquin commented Mar 21, 2024 •

edited

alejoe91 commented May 13, 2024

zm711 left a comment

zm711 May 21, 2024

zm711 May 21, 2024

Fix analyzer sampling frequency check #2606

Fix analyzer sampling frequency check #2606

Conversation

DradeAW commented Mar 20, 2024

samuelgarcia commented Mar 20, 2024

h-mayorquin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-mayorquin Mar 20, 2024 • edited

Choose a reason for hiding this comment

samuelgarcia commented Mar 21, 2024

DradeAW commented Mar 21, 2024

h-mayorquin commented Mar 21, 2024 • edited

alejoe91 commented May 13, 2024

zm711 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-mayorquin Mar 20, 2024 •

edited

h-mayorquin commented Mar 21, 2024 •

edited