ft_selectdata failure #2405

schoffelen · 2024-04-10T10:36:36Z

Describe the bug
ft_selectdata fails on raw data with an unluckily chosen cfg.latency, and in presence of a sampleinfo field

To Reproduce
Steps to reproduce the behavior:

data.time{1}=1:20;
data.time{2}= 11:20;
data.trial{1}=randn(1,20);
data.trial{2}=randn(1,10);
data.label={'chan01'};
data.fsample=1;

cfgsel.latency = [1 8];
data2=ft_selectdata(cfgsel,data); % runs OK, yet returns a data structure with an empty second trial

data.sampleinfo=[1 20;21 30];
data3=ft_selectdata(cfgsel,data); % crashes, when attempting to update the sampleinfo, based on an empty time axis for the second trial

Expected behavior
No crash

Environment and versions (please complete the following information):

Stone age CentOS
MATLAB version 2023b
FieldTrip version master branch

schoffelen · 2024-04-10T11:08:45Z

In case no sampleinfo field exists, I wonder what happens downstream in the analysis if some of the trial/time cells are empty.

Just a quick check on the data2 structure from the previous comment: it causes a crash in ft_timelockanalysis

So, I would be up for adjusting ft_selectdata to only return the non-empty trials, (and adjust the sampleinfo/trialinfo accordingly)

robertoostenveld · 2024-04-14T13:44:58Z

If anyone happens to be reading along here: I discussed this with Jan-Mathijs over a coffee. We identified that if a raw data structure has no sampleinfo field, then it is technically not a problem to represent an empty trial that is internally consistent like this:

data = [];
data.label = {'1', '2', '3'};
data.trial{1} = zeros(3,0);
data.time{1} = zeros(1,0);
data.trialinfo = [1];

The problem appears if sampleinfo is present, as we don't have a way of coding the sample indices of an empty trial. Solutions might be to code them as [nan nan], but [-1 -1] would IMHO be just as valid.

schoffelen · 2024-04-14T18:46:21Z

I don't mind either, unless there's a technical reason to choose one above the other, it sounds like it's to be a majority vote? Currently, the PR fills in with a nan. This can be easily swapped with a [-1 -1].

Yet, taking at face value that the first index is the begin, and the second index is the end, [-1:-1] will yield a one-element sample-vector, which might be inconsistent with the dimensionality of the trial/time (I didn't check what [nan:nan] would yield, but I suspect this will lead to an error.

robertoostenveld · 2024-04-15T06:52:33Z

The [-1 -1] was not thought through well, and might not be a good choice. And you are correct that it would indeed result in an (invalid) one-sample trial.

I think what we want downstream is graceful errors, rather than mishap at random places in the code that make debugging hard. What about first extending ft_datatype_raw to check on validity of the input, for as far we can define "valid" at the moment? Should we also do that for timelock and freq structures? And source and volume structures?

schoffelen added bug unconfirmed labels Apr 10, 2024

schoffelen linked a pull request Apr 11, 2024 that will close this issue

Selectdata robustness improvement #2406

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ft_selectdata failure #2405

ft_selectdata failure #2405

schoffelen commented Apr 10, 2024

schoffelen commented Apr 10, 2024

robertoostenveld commented Apr 14, 2024

schoffelen commented Apr 14, 2024 •

edited

robertoostenveld commented Apr 15, 2024

ft_selectdata failure #2405

ft_selectdata failure #2405

Comments

schoffelen commented Apr 10, 2024

schoffelen commented Apr 10, 2024

robertoostenveld commented Apr 14, 2024

schoffelen commented Apr 14, 2024 • edited

robertoostenveld commented Apr 15, 2024

schoffelen commented Apr 14, 2024 •

edited