Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epoch Fragmentation when predicting for short epochs (yasa in mice) #139

Open
matiasandina opened this issue Mar 15, 2023 · 3 comments
Open
Assignees
Labels
invalid 🚩 This doesn't seem right question 🙋 Further information is requested

Comments

@matiasandina
Copy link
Contributor

matiasandina commented Mar 15, 2023

I have been making some strides with my PhD project and still intend to migrate my predictions from Accusleep to yasa (#72). Not sure if you remember, but I used a public dataset labeled with Accusleep to train a yasa classifier that would work for mice. I haven't gotten down to making SleepStaging accept mice data yet because I am trying to make sure that the predictions are actually useful for me. I have multiple electrodes that I use for comparison between brain sites (and for quality control).

Overall, yasa's predictions are quite good and fast, but I do notice that there are two issues. First, there's obvious disagreement between channels, which could be potentially solved by taking the mode between channels. Notice that I'm predicting for 2 second epoch, so it's quite a lot of predictions and this jitter is somewhat expected. This is true for both yasa and Accusleep, but Accusleep has less dramatic differences between electrodes. Second, and more important, I notice that yasa's predictions are not as continuous as I would like (as compared with accusleep, which I think would be more similar to what a human can score). They are a bit fragmented. I tried to implement a few functions to smooth out these events (a rolling mode, and a function that replaces bouts with short duration using the previous values). I think it would be powerful to implement a method for binding the predictions into more logical chunks (either using my methods or extensions of them or using the probabilities from the SleepStaging itself).

If you have a moment to check the examples in the notebook, I'd appreciate it!

You can find the output of the quarto notebook in this branch:

https://github.com/matiasandina/yasa_classifier/tree/yasa-accusleep-eval

Either this notebook:

https://github.com/matiasandina/yasa_classifier/blob/yasa-accusleep-eval/sync_plots.ipynb

or this html file

https://github.com/matiasandina/yasa_classifier/blob/yasa-accusleep-eval/sync_plots.html is the output of the notebook.

I did not add the data so it's not possible to re-run it, but I did add the functions into src. This folder contains the implementation of the smoothing methods (likely inefficient and not fully tested, but it's a first approach).

@raphaelvallat raphaelvallat added this to To do in Automatic sleep staging via automation Mar 20, 2023
@raphaelvallat raphaelvallat added invalid 🚩 This doesn't seem right question 🙋 Further information is requested labels Mar 20, 2023
@raphaelvallat raphaelvallat self-assigned this Mar 20, 2023
@raphaelvallat
Copy link
Owner

Hi @matiasandina,

Apologies for the late response. First of all, let me copy-paste here what I sent in our related email conversation:

Regarding the fragmentation, I have noticed that indeed YASA outputs hypnograms that are way more fragmented that other (deep learning) automatic algorithms. In comparing the fragmentation against ground-truth PSG (in humans) however, I do find that YASA is either similarly fragmented or slightly more fragmented than PSG. But you're right that a light smoothing function may be able to help, especially for epochs with low prediction confidence.

Could you explain briefly the two smoothing methods that you have implemented (mode and gap)? And have you checked the accuracy against reference scoring? I just glanced at the code and I think that the mode method might oversmooth short stage transitions or awakenings, but perhaps this is desired when you're lookin at 2-sec windows. I think that the post-classification smoothing method will have to be different between human (30-sec) and rodents (2-sec).

Another approach for smoothing is to look at the predicted probabilities of the model, in particular the confidence. When confidence of the model is low, and the given epoch is different from epoch n-1 and n+1 (e.g. "N2", "REM", "N2") then you may want to smooth the low-confidence epoch (e.g. "N2", "N2", "N2").

Regarding the disagreement between channels, could you provide more details here (easier than to look at the notebook)?

  1. Which channels are you using?
  2. What is the disagreement in % accuracy or kappa?
  3. Which channel is the most accurate compared to reference PSG?
  4. Is taking the mode more accurate than just taking the best EEG channel?

Thanks
Raphael

@matiasandina
Copy link
Contributor Author

The mode method was just the fastest I could think that would smooth the data. It's basically a rolling mode.
This method will smooth too much the short awakenings (aka microarousals) and it might not be desired since you can recode all microarousals to be NREM manually anyways. I have not tested extensively, though longer stages are somewhat robust to rolling mode smoothing with small kernels.
The gap method checks for the lengths by doing run length encoding and it's a bit more complicated. The idea behind was to have a filter of what a minimum stage should be (e.g., For example, in general less than 2 epochs of 2 seconds would be too little even for human scored data). This would smooth the data less, but still fix the wrong predictions (e.g., a 2-4 second NREM epoch).

I agree with what you say about using the probs for the model might yield better results. I think the results would be better

I think I see more disagreement between channels in yasa than in Accusleep because Accusleep has a calibration steps that helps tune the parameters with a small amount of labelled data for each recording. I am using anterior and posterior channels, and both data quality and real differences will affect the stage prediction. I would need to look more into new recordings to fully answer your questions. But in short, I believe picking the best channel might be more accurate than the mode if that mode includes too many bad channels. Some channels are much better for REM (posterior) and some channels are show a very dominant delta (anterior). Those might be real differences and we might expect some disagreement. I believe humans score anterior/posterior depending on what they are mostly interested in (delta / spindles and rem).

@raphaelvallat
Copy link
Owner

Thanks for the detailed response!

But in short, I believe picking the best channel might be more accurate than the mode if that mode includes too many bad channels.

Yeah my sense is that this would be true in YASA even when scoring human data

Accusleep has a calibration steps that helps tune the parameters with a small amount of labelled data for each recording

That's an interesting approach, I'm sure that it does significantly increase the accuracy indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid 🚩 This doesn't seem right question 🙋 Further information is requested
Development

No branches or pull requests

2 participants