Spike holes: Spikes within the batching edges are not detected #594

GaelleChapuis · 2024-02-29T11:57:47Z

As raw electrophysiology data is heavy, the spike sorting algorithm truncates the data into smaller portions (called batches, of size ~2.18 seconds for the latest versions of Kilosort) to work with it.

To ensure information continuity, the edges of these batches are processed slightly differently than the middle portion of the batch. We found that at the edge of the spike sorting batches, ~7ms of spikes are going undetected. We call this effect "spike holes".

Below we show an example of raw electrophysiology data with such "spike hole" highlighted in blue.

The voltage traces are displayed as grey scale; the y-axis represents the electrode channels, and the x-axis a short snippet (~80ms) of time. Spikes detected by Kilosort are marked with a green or red dots for good or bad units respectively. What you can see on this example is that spikes are not detected within this blue window, that corresponds precisely to the edge of a spike sorting batch.

Specifically, every 65 536 samples (~2.18 seconds at 30 kHz) spike detection is less likely for ~7 ms - and in many cases there is no spike detection at all during this period. This represents a loss of 0.3% of the data.

One way to detect those occurrences working on the spike trains is to compute the derivative of the spike times. We retain the samples that have a derivative > 7ms (i.e. there are no spikes detected for at least 7ms, equivalent to a “spike hole”), and compute the time difference between this sample and the next sample with such a derivative > 7ms.
To visualize the batch effect, we then plot the time difference between samples that are marked as having a 7ms hole, but divided by the batch size (~2.18 seconds), as a histogram.
Below is an example of such a plot for one recording, in blue. The decline in the height of the bars as the number of batches increases to the left is expected, as it is more likely to find “spike holes” separated by 1 batch. Indeed, a bar on this graph at e.g. 5 batches indicates that for 4 consecutive batches the neural signals were strong enough for a few spikes to be detected nonetheless within the batch edge.

The problem is more visible when plotting the same graph as above but across several recordings.
Below is a plot of >600 recordings (IBL, run with pyKilosort), where each recording is a vertical line (x-axis: PID number, y-axis: batch). The batch effect is seen as horizontal lines at integer values (1, 2, 3, 4).

The same effect is also visible on IBL recordings spike sorted using the Matlab version of the algorithm (see below). Note that here the effect is seen at every ½ batch size.

We used 91 datasets from Steinmetz and performed the same analysis using the original spike sorting output (likely version: Kilosort 1.0 - Matlab). Indeed, “spike holes” are visible, as per the graph below. Note the effect at ½ batch size.

We also saw this effect using the Allen dataset (visual coding; likely version: Kilosort 2.0):

Note that when re-sorting the Steinmetz dataset using pyKilosort, the issue is visible at the batch size:

This issue is therefore present on the earliest versions of the code, thereby affecting a wide range of datasets.

We are working on identifying the core CUDA function that would underlie this error.
For a minimal reproducible example, see this repo

The “spike hole” effect is an effect that occurs regularly, i.e. every ~2.18 seconds, for a duration of ~7ms. This effect is unlikely to drastically affect scientific results published with the error.

marius10p · 2024-02-29T21:07:44Z

This seems to be fixed in Kilosort4 which is live now. Please update and reopen this issue if you still have a problem.

GaelleChapuis · 2024-03-01T08:58:34Z

For reference, this is how we find such holes using the spike times, code from @oliche :

a = np.diff(spikes['times'][np.where(np.diff(spikes['times']) > (0.007))]) / 2.18689567
plt.hist(a, bins=500, range=[0, 10])

The 7ms is arbitrary. It may scale according to the batch size. For example, it would be interesting to try 3.5ms if the batch size is 1 second.

Please note that the absence of such hole with this method isn’t a conclusive diagnostic, as there could be a few spikes detected nonetheless within the batch edge - this is depending on the neural signal strength (and thereby how it was processed).

We would be curious to know how this is conclusively tested for in future Kilosort versions.

marius10p · 2024-03-01T09:10:03Z

I do the st%NT and histogram that. It shows clearly missing spikes for KS 2.5 and 3, but not 1, 2 and 4, though I haven't tested a lot of datasets yet. I found a possible place for the bug in the creation of the drift corrected file in 2.5 and 3.

GaelleChapuis · 2024-03-01T09:15:36Z

One note on how to interpret the graphs :

Please note that what we want from a spike sorting algorithm is to detect all the neural spiking signas that are present.

If your probe is in a brain region that is active enough, you should not see any gaps of spike for 7ms, i.e. the histogram we present above should be blank.

If you have such a recording (well detected, with a high enough firing rate so that there is no gap at 7ms), but do have spike holes due to the spike sorting algorithm, you will see the holes only at the batch edge time.
Example of recordings where this occurs are highlighted in the graph below, in green.

If on the other hand you do not detect well spikes (or if the region is not active enough), most of the histogram will be concentrated near 0 (as the time difference between gaps of 7ms will be small).
Example of recordings where this occurs are highlighted in the graph below, in red.

In other words, having strong lines at the batch size is a desired outcome in these examples, as it means the amount of spike detected is high and likely better captures the underlying raw signal.

marius10p · 2024-03-01T09:18:37Z

Sure, but I think the st%NT takes care of these confounds and gets more directly at the source of the problem. In the datasets I looked at, it was a lot more clear than when I tried your method.

GaelleChapuis · 2024-03-01T09:29:19Z

It also appears to affect different versions differently, i.e. it seems very strong in pykilosort, but less so in the Matlab versions.

Hello, the explanation was in response to the message above, as we would not want one to think the effect is more present in pyKilosort based on the histogram graphs above. If anything, it is a good thing to see these horizontal lines only at the batch size.

Using spike times modulo the batch size works also as a detection method for holes, but you have to be perfectly sure of your NT value so as to see a dip at 0. (please correct me if I misunderstand your approach !)

marius10p · 2024-03-01T09:38:52Z

Not sure I understand this argument because it is relatively indirect. It can't be a good thing to see the lines at the batch size, then you know for sure it's a problem. When you don't see the lines, you don't know if it's because we can't see the problem or there's no problem. Anyway, all the more reason to look at st%NT?

Unless someone explicitly changed the batch size, it should be the same as the default in config384.

agbondy · 2024-03-07T16:01:59Z

We also ran into this issue with Kilosort 2, but ONLY when non-default batch sizes are used. See screenshot for sorting results with batch size ~5s. I can confirm that using the default batch size in ks2 does not show this issue -- we looked very carefully. We only use the default batch size now, since we don't fully understand what batch sizes lead to this problem. Looking forward to trying KS4 soon -- thanks Marius et al. for this amazing resource!

marius10p · 2024-03-07T16:35:53Z

Ah, thank you so much Adrian, I think this might solve the mystery. I was indeed able to replicate the problem in KS2.5 with non-default batch size, and it's not there with default batch size. I will continue to investigate this in different version of Kilosort.

Correction: I replicated what Adrian is saying in Kilosort2, but not in Kilosort2.5, where I am still getting the holes with the default batch size.

marius10p · 2024-03-07T16:50:10Z

@GaelleChapuis can you please check whether the batch size was changed for the IBL runs? It should be 65,600 in Kilosort 2, 2.5 and 3. You mention 65,536 above.

marius10p · 2024-03-08T09:08:14Z

I found the main bug in Kilosort 2.5 and 3. The batch size was manually changed inside trackAndSort (another ntbuff was added). The patch0 releases fix this.

oliche · 2024-03-08T15:57:21Z

Thanks, we 'll try this out !
It's probably irrelevant now, but we use the standard batch size for IBL runs, Gaëlle was mentioning the pre-processing batch size that is unrelated to pykilosort.

marius10p · 2024-03-08T18:02:36Z

If it works for you with the Matlab version of Kilosort2.5, you can see the two locations where I made changes in the trackAndSort script and adapt that for pykilosort.

alejoe91 · 2024-03-20T14:41:25Z

@marius10p

zm711 and I noticed that the bug fix branches (at least for KS2.5) is fixing the spike holes, but there seem to be a misalignment with the spike times, so that the extracted waveforms don't look right. See this example: the auto/cross correlograms (bottom right) look correct, but the waveforms are just noise.

Any idea where the misalignment could take place?

zm711 · 2024-03-20T14:45:00Z

I've also tested with KS3 and see a similar set of noise waveforms in Phy. The same dataset analyzed pre-patch has nice waveforms.

marius10p · 2024-03-20T15:56:20Z

Can you please check if an offset of ntbuff samples realigns the spikes?

zm711 · 2024-03-20T16:57:30Z

Yeah it looks like if I shift by ntbuff samples the waveforms reappear in Phy. They are not exactly the same as pre-patch waveforms, but they look pretty close by eye.

marius10p · 2024-04-08T20:40:46Z

I have fixed both bugs that were causing the spike holes problem, one in a CUDA file and one in a Matlab file. I think the spike times are correct now, in versions 2.5.2 or 3.0.2 (new releases) or on the branches kilosort25 and kilosort3.

One of the two bugs also affects Kilosort1 and 2. I will try to fix them there as well.

alejoe91 · 2024-04-09T13:54:38Z

@marius10p I'm still getting the spike holes with v2.5.2 (commit - f2e570c)

Here are the spike counts histogram aligned with BATCH_SIZE:

marius10p · 2024-04-09T13:57:10Z

@alejoe91 thanks, did you change ops.NT to the new value? Also, ntbuff should be 64 and nt should be 61.

alejoe91 · 2024-04-09T13:59:23Z

I used default values:

  'ntbuff': 64,
  'NT': 65600,
  'wave_length': 61 # I guess this is nt, right?

marius10p · 2024-04-09T14:02:23Z

The new default value for NT is 64 * 1024 - ntbuff. I can make that more clear in the release notes.

alejoe91 · 2024-04-09T14:12:16Z

let me try that

alejoe91 · 2024-04-09T14:48:50Z

Spike holes are gone:

And waveforms look aligned :)

alejoe91 · 2024-04-09T15:03:34Z

Will test KS3 as well :)

alejoe91 · 2024-04-09T15:36:03Z

@marius10p

Here the results for v3.0.2:

There seem to be a drop in spike counts in the same "spike hole" window. Is it possible? Note that this is the same recording as #594 (comment)

Waveforms are good:

marius10p · 2024-04-09T15:40:08Z

I didn't see it in my own test and it shouldn't behave differently unless I forgot something, can you please run a longer/different recording to see if it replicates?

NT has to be the same new value, and ntbuff nt should be the defaults.

alejoe91 · 2024-04-09T15:57:32Z

I didn't see it in my own test and it shouldn't behave differently unless I forgot something, can you please run a longer/different recording to see if it replicates?

Sure I'll run on another longer session.

NT has to be the same new value, and ntbuff nt should be the defaults.

Yep, using new defaults!

alejoe91 · 2024-04-11T12:29:45Z

@marius10p I've run it on a different session (~30min long) and the drop in spike counts doesn't seem to be there, so all good!

Thanks for the fix!

KS2.5.2

KS3.0.2

zm711 · 2024-04-11T16:42:53Z

@marius10p,

Sorry for the delay on my end. We tested a 90min recording and also seemed fine. I've got a few people in my lab using KS2 for low channel count probes. Is the CUDA error in those fundamental in that they should update as soon as you update or is it super minor bug?

marius10p · 2024-04-11T16:48:24Z

@zm711 for 2.5 and 3 the holes are more extreme than for 2. But they are still there for 2.0, something like 30% reduced spikes in a window of ~4ms. I did updated the 2.0 as well, it's in the releases section.

zm711 · 2024-04-11T16:49:48Z

Thanks @marius10p, I'll let them know!

PaulMAnderson · 2024-04-16T09:31:30Z

The new default value for NT is 64 * 1024 - ntbuff. I can make that more clear in the release notes.

@zm711 for 2.5 and 3 the holes are more extreme than for 2. But they are still there for 2.0, something like 30% reduced spikes in a window of ~4ms. I did updated the 2.0 as well, it's in the releases section.

Can I get some clarification on this? On the Kilosort 2 patch1 it states this issue has been fixed and I see that the CUDA files have been modified. However, configFile384.m still has ops.NT defined as 64*1024 + ops.ntbuff i.e. 65600 not 65472.

Am I correct that this is wrong? That we need to subtract ops.ntbuff to avoid the 'holes' issue?

And if so what is the correct formula for calculating acceptable NT sizes? Is (n x 1024) - 64 safe as a general rule?

marius10p · 2024-04-16T12:12:40Z

Thanks for catching that, I now fixed ops.NT in configFile384. https://github.com/MouseLand/Kilosort/releases/tag/v2.0.2

The rule should be n x 1024 - ntbuff and 'ntbuff = 64, nt = 61', but you try it at your own risk as I have not tested other sizes.

P.S. Next time, try: can I please get some clarification on this.

PaulMAnderson · 2024-04-16T12:53:05Z

Thanks for the quick reply, I appreciate it.
And I'm sorry for being impolite, thanks again!

marius10p · 2024-04-16T13:12:05Z

No worries, let us know if anything else is unclear or if something does not work. We appreciate the feedback.

marius10p · 2024-04-16T13:12:37Z

I am going to keep this issue open so more people see it.

carsen-stringer mentioned this issue Feb 29, 2024

adding rtd and badges #595

Merged

carsen-stringer closed this as completed in #595 Feb 29, 2024

WeissShahaf mentioned this issue Mar 1, 2024

BuG: Re: spike holes #598

Closed

marius10p pushed a commit that referenced this issue Mar 8, 2024

fixes main bug for issue #594

10dce58

marius10p pushed a commit that referenced this issue Mar 8, 2024

fixes the main bug in issue #594, the spike holes problem

d2b5c27

JoeZiminski mentioned this issue Mar 11, 2024

KS2.5 / KS3 'spike holes' patches released SpikeInterface/spikeinterface#2562

Open

zm711 mentioned this issue Mar 20, 2024

BUG: filtered signal and prediction not aligned properly in v3.0.1 GUI #623

Closed

marius10p pushed a commit that referenced this issue Apr 8, 2024

Changes to fix the second bug that causes spike holes (issue #594).

f2e570c

marius10p pushed a commit that referenced this issue Apr 8, 2024

Fixes the second bug in the spike holes issue #594

c43f524

marius10p pushed a commit that referenced this issue Apr 8, 2024

fixes one of the bugs in the spike holes problem #594

8a74064

marius10p reopened this Apr 16, 2024

Spike holes: Spikes within the batching edges are not detected #594

Spike holes: Spikes within the batching edges are not detected #594

Comments

GaelleChapuis commented Feb 29, 2024

marius10p commented Feb 29, 2024

GaelleChapuis commented Mar 1, 2024 • edited

marius10p commented Mar 1, 2024

GaelleChapuis commented Mar 1, 2024

marius10p commented Mar 1, 2024

GaelleChapuis commented Mar 1, 2024

marius10p commented Mar 1, 2024

agbondy commented Mar 7, 2024 • edited

marius10p commented Mar 7, 2024 • edited

marius10p commented Mar 7, 2024 • edited

marius10p commented Mar 8, 2024

oliche commented Mar 8, 2024 • edited

marius10p commented Mar 8, 2024

alejoe91 commented Mar 20, 2024

zm711 commented Mar 20, 2024

marius10p commented Mar 20, 2024

zm711 commented Mar 20, 2024

marius10p commented Apr 8, 2024

alejoe91 commented Apr 9, 2024

marius10p commented Apr 9, 2024

alejoe91 commented Apr 9, 2024

marius10p commented Apr 9, 2024 • edited

alejoe91 commented Apr 9, 2024

alejoe91 commented Apr 9, 2024

alejoe91 commented Apr 9, 2024

alejoe91 commented Apr 9, 2024

marius10p commented Apr 9, 2024 • edited

alejoe91 commented Apr 9, 2024

alejoe91 commented Apr 11, 2024

zm711 commented Apr 11, 2024

marius10p commented Apr 11, 2024

zm711 commented Apr 11, 2024

PaulMAnderson commented Apr 16, 2024

marius10p commented Apr 16, 2024

PaulMAnderson commented Apr 16, 2024

marius10p commented Apr 16, 2024

marius10p commented Apr 16, 2024

GaelleChapuis commented Mar 1, 2024 •

edited

agbondy commented Mar 7, 2024 •

edited

marius10p commented Mar 7, 2024 •

edited

marius10p commented Mar 7, 2024 •

edited

oliche commented Mar 8, 2024 •

edited

marius10p commented Apr 9, 2024 •

edited

marius10p commented Apr 9, 2024 •

edited