-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_raw_xdf(): loading assumes evenly sampling #385
Comments
Thanks @DominiqueMakowski, you are correct that we currently assume regularly spaced samples per stream. Using pandas to handle interpolation is actually very clever, I wonder why I haven't thought of it before 😆 (the tradeoff is of course that it is a rather large dependency, but maybe worth it even for MNELAB). Could you share the file with me so that I can play around with it to get a better grasp of the problem? Regarding your function, how can I avoid linear interpolation for such a long interruption? By setting Finally, I think it would be beneficial to get resampling directly into pyXDF. Did you check the implementation in xdf-modules/pyxdf#1 by any chance? |
Dropped you an email
I added a
I agree... |
So I set it to |
It does the transformation above so you just need to specify it in seconds, like 0.5 or 0.1. And yes None by default leaves all interruptions (normally) |
If we are to have similar code in NK and mnelab, we might want to outsource some of it to pyxdf. What I could see is a |
Agreed. Resampling should really be handled by pyXDF, and a proposed solution already exists (although I'm not sure how easy it will be to rebase and if it is still working). But this should be discussed directly with the pyXDF people (maybe in xdf-modules/pyxdf#1). |
For future reference: that method might suffer from some loss of precision, from my small experimentations using the union of existing and new indices was giving the best results |
@DominiqueMakowski is the description of the signals in your top post correct? I think you might have mixed up the colors. Just to be sure, the correct (expected) signal should contain a segment with missing data in the first second? When loading just stream 4 (with or without resampling), I get this time series: So I'm wondering if the import worked, and the problem is maybe in the |
One more observation, the MNELAB GUI doesn't let you choose sampling frequencies greater than the highest sampling frequency in the file, i.e. 1000Hz in this example. Even then, the signals look exactly like in the screenshot, so maybe it's because you resample to 2000Hz (I doubt it, but still worth checking)? |
No, I think the whole recording is like several minutes so it should be within the first minute or so (the time axis is messed up in my fig)
The upsampling is done to avoid aliasing when merging signals with uneven sampling rates, but it should have fairly minimal impact |
So your three example plots do not actually show the problem? Sorry, I'm confused now, but now I don't understand what the problem with MNELAB is... |
Can you zoom out in your fig to see all the signal horizontally? |
can you share the code to reproduce this fig? |
This is all done in MNELAB with GUI commands, but here is the corresponding code (available in View – History). For example, here's the code for loading all streams and resampling to 1000Hz: from copy import deepcopy
import mne
from mnelab.io import read_raw
datasets = []
data = read_raw(
"/Users/clemens/Data/biosignal-test-data/XDF/sub-01_ses-S001_task-HCT_run-001_eeg.xdf",
stream_ids=[1, 2, 3, 4, 5],
fs_new=1000.0,
preload=True
)
datasets.insert(0, data)
data.plot(events=events, n_channels=18) |
Yes, we currently only look at the first and last timestamps when resampling. Even without resampling, we only look at the time of the first timestamp and then use the effective sampling rate for the remaining samples. I guess we need to consider all timestamps. |
@DominiqueMakowski I wonder if interpolating missing data is the best solution. Would it not be better to use NaN values instead? Otherwise, it is difficult to determine if data collection (using a device with a given regular sampling frequency) worked, or if there was a gap where no data samples have been recorded. After all, you don't want to process the interpolated data, right? |
I opted for a user-defined duration, that allows to keep interruptions longer than a given time |
Ah, right! That's a good approach. So everything > than that duration is filled in as NaNs, right? |
I should have read the thread again, you already mentioned this before! Sorry about the noise! |
no worries haha I'm very often guilty of that as well |
Hi both,
thanks for addressing this issue!
Just wanted to share a thought. I wonder if the default should be 'none'. If I understand correct Dom's implementation, at the moment if the user does nothing, any interruption will be interpolated. It could lead to unwanted behaviour with users (perhaps less experienced, but I could see this happening in the other cases) who may not realised that there were interruptions in the signal in the first place (especially if they do automated processing downstream). For periods for more than a 'few' milliseconds, should we be interpolating non-stationary signals?
Cheers,
Panos
…-----Original Message-----
From: Dominique ***@***.***>
To: cbrnr ***@***.***>
Cc: Panos ***@***.***>; Mention ***@***.***>
Date: Wednesday, 11 October 2023 2:52 PM CEST
Subject: Re: [cbrnr/mnelab] read_raw_xdf(): resampling assumes evenly sampling (Issue #385)
no worries haha I'm very often guilty of that as well —
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Yes, none is probably not the best, but then the right default depends on signals, like 1 second of EEG is probably too much, but for other signals like EDA it could be alright. Another option is not to set a default but to throw warnings if a break is detected. The reader in neurokit is also made with neurokit in mind, which doesn't deal super well with nans |
Good point. Every regularly sampled XDF stream has a nominal sampling frequency, so we could use it to define a default. Conservatively, we could choose everything > 1/fs to be filled with NaNs, but this is likely too small. Maybe > 2/fs is a better choice? It seems like a value depending on fs makes more sense than an absolute time interval. |
I have another question @DominiqueMakowski. You are using |
I like the idea in principle, but going back to Dom's example, sometimes we measure skin conductance at 15Hz or more, but would probably tolerate 1 second of interpolation as (parts of) the signal are slow; whereas for EEG sampling rate can vary (128, 500, 2000Hz) but would not change much our tolerance.
What if we used the tag from the xdf? so if there are any discontinuities, stop the reading and ask them tomake signal specific thresholds e.g. {EEG:250, EDA:1000}?
…-----Original Message-----
From: Clemens ***@***.***>
To: cbrnr ***@***.***>
Cc: Panos ***@***.***>; Mention ***@***.***>
Date: Wednesday, 11 October 2023 3:31 PM CEST
Subject: Re: [cbrnr/mnelab] read_raw_xdf(): resampling assumes evenly sampling (Issue #385)
Good point. Every regularly sampled XDF stream has a nominal sampling frequency, so we could use it to define a default. Conservatively, we could choose everything > 1/fs to be filled with NaNs, but this is likely too small. Maybe > 2/fs is a better choice? It seems like a value depending on fs makes more sense than an absolute time interval. —
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
But the question still remains: how do you define a discontinuity based on the signal type? You'd have to use type-specific durations to determine it, or no? Technically, I think it's easiest to take the nominal fs to decide if there are gaps in the signal and then emit a warning. This relies only on the fs and not on the type and domain-specific interpretation of a signal (i.e. which gap is still acceptable). |
tbh I wouldn't be able to exactly explain how pandas work here, indeed their docs are a bit mysterious. All I can say is that from my trial and errors attempts that was the way that worked the best in preserving the original signal 🤷
I think that's fine, yeah. in general slower signals will tend to have a lower nominal frequency (at least for some devices). I think we can be fairly conservative with warnings, so users can then explicitly specify more liberal rules |
Quick comment, this problem also occurs without resampling, i.e. when loading just one stream. MNELAB currently does not handle gaps. It assumes that all data points are available at all time points defined by the nominal sampling frequency. In fact, MNELAB just looks at the first timestamp, but completely ignores all other timestamps. So to fix this problem, I think we will need to resample (interpolate) all XDF streams, even if it's just one stream. Then we can take a look at how resampling two (or more) streams to a common sampling frequency behaves. |
I was adding a wrapper around pyxdf to load and tidy up xdf data in Neurokit when I stumbled upon these issues:
xdf-modules/pyxdf#79
xdf-modules/pyxdf#1
So I decided to give mnelab a try to see if what I was doing was correct, but I found an issue that happens when there was interruptions in the streaming:
Essentially I plotted:
There is an interruption at the beginning at the stream, but mnelab probably interpolates linearly which distorts the whole thing.
I can send you that xdf file by email if you need :)
(also tagging my friend @pmavros as this might be relevant to our eeg processing)
The text was updated successfully, but these errors were encountered: