Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically assigning metadata #1162

Open
alexrockhill opened this issue Aug 4, 2023 · 15 comments
Open

Automatically assigning metadata #1162

alexrockhill opened this issue Aug 4, 2023 · 15 comments

Comments

@alexrockhill
Copy link
Contributor

Describe the problem

Metadata must be added separately when creating epochs, it's not part of read_raw_bids

Describe your solution

How do you handle extra event columns in an events.tsv? You can read them in separately and assign them but it might be nice to either 1) make an mne-bids helper function to read metadata or 2) put them as metadata on a raw object (not currently an attribute) automatically during read_raw_bids and then add them to the epochs when that raw object is epoched. Option 2 might be super clean for abstracting bids stuff

Describe possible alternatives

Above

Additional context

No response

@agramfort
Copy link
Member

agramfort commented Aug 8, 2023 via email

@alexrockhill
Copy link
Contributor Author

I think it would just be added automatically sounds nice to me, within the existing read_raw_bids API

@alexrockhill
Copy link
Contributor Author

alexrockhill commented Aug 8, 2023

Well actually that could look like passing only raw to mne.Epochs with events=None. That's a pretty big API change but it kind of seems like it's very sensible since annotations were introduced and you don't have to preload epochs. I think you'd need a lot of opinions on that though as it's super central to pretty much any workflow. Then the metadata would passed with the annotations to the epochs. When you load in with mne_bids.read_raw_bids it could check that your annotations are the same length as your metadata and raise an error if not.

@hoechenberger
Copy link
Member

hoechenberger commented Aug 9, 2023

I'm probably missing something here but ... shouldn't we add full support for Raw metadata the MNE first? I mean like we have it for Epochs. Even if that means storing that data in a separate file.

@hoechenberger
Copy link
Member

Oh never mind, we can store that stuff as annotations, as you suggested. Sorry about the confusion.

@agramfort
Copy link
Member

agramfort commented Aug 9, 2023 via email

@alexrockhill
Copy link
Contributor Author

Sorry if that was unclear. The motivation is from working with this dataset https://openneuro.org/datasets/ds003708/versions/1.0.3 with stimulation sites per trial. As far as I know, that can't be stored in annotations and is metadata that is nice associate with epochs. If metadata were allowed to be assigned to raw and checked that it was the length of annotations (first PR) then when events=None in mne.Epochs (with event_id=None) the events and event_id would be taken from the annotations and the metadata that was already checked that it was compatible would be assigned to the epochs. That way you wouldn't have to worry about pulling events and event_id from annotations , since they are already an attribute of the raw object, that makes sense to me. Maybe this is too big of an API change but I'm not sure I see an easier way to get the metadata during read_raw_bids.

@larsoner
Copy link
Member

larsoner commented Aug 9, 2023

the plan should be well designed so we don't end up in a rabbit hole

If metadata were allowed to be assigned to raw and checked that it was the length of annotations (first PR)

Unfortunately this already seems like we're going to head down a rabbit hole. We have all sorts of ways of setting/appending/deleting annotations. So now you have to either prevent these operations or insert empty metadata for new ones, etc. An alternative would be to make the Annotations object itself have the option to add arbitrary metadata, I guess like we do for Epochs (some of the logic could be shared). But this in itself could become complicated and a bit of a rabbit hole.

So I think the best/simplest/cleanest solution is the one mentioned above:

  1. make an mne-bids helper function to read metadata

In mne-bids-pipeline we have something a bit along these lines:

https://github.com/mne-tools/mne-bids-pipeline/blob/a931cae9ed41c9acdfef8b8e725eb1b817f1e697/mne_bids_pipeline/_import_data.py#L68-L106

I think a MNE-BIDS helper that returns an ndarray events and a DataFrame metadata of the same length makes the most sense. Then the applications/uses become pretty straightforward and transparent -- no need for magic, adding events=None, attaching to Raw, or modifying Annotations required.

@alexrockhill
Copy link
Contributor Author

alexrockhill commented Aug 9, 2023

That sounds reasonable, the only thing I'm thinking is that we ran into an issue where the events were mislabeled in the annotations in a raw brainvision file and the dataset curator labeled them correctly in the events.tsv. This was confusing when setting epochs.metadata and then trying to select using fancy indexing by taking epochs['electrical_stimulation'] for instance which went off of the incorrect annotations instead of the corrected metadata. So, maybe something like mne_bids.read_raw_epochs that ignores everything in the raw file and goes off the events.tsv might be the solution to that.

@larsoner
Copy link
Member

larsoner commented Aug 9, 2023

we ran into an issue where the events were mislabeled in the annotations in a raw brainvision file and the dataset curator labeled them correctly in the events.tsv... So, maybe something like mne_bids.read_raw_epochs that ignores everything in the raw file and goes off the events.tsv might be the solution to that.

I don't think we should write a new function for this. If we try to write a helper function to fix every instance of a broken dataset, we're going to end up with infinite functions. A better idea I think is 1) tell them to fix their dataset (which they might ignore), and then 2) write code to overcome this particular use case in the meantime. Now if this starts happening for dozens of datasets, then I think there's an argument to be made for a helper. But I really hope this is a rare exception...

@alexrockhill
Copy link
Contributor Author

I have stuff like this in BIDS datasets that I have too. It often happens when you have a version of task or recording with event codings and then you change them in a later version. I think the idea with BIDS is that you leave the raw file be so that everything you put in BIDS is not modified and just correct it in the sidecar.

That doesn't mean you necessarily need a helper function or new helper functions for everything but I do think that the idea that the sidecars overwrite anything in the files should be facilitated by mne-bids and right now the events.tsv file is ignored as far as official API. That's why I was thinking of reading metadata in read_raw_bids and checking that it matches with annotations in the raw file. I think it's an edge case but I think it's a super common edge case.

@agramfort
Copy link
Member

agramfort commented Aug 12, 2023 via email

@marcpabst
Copy link

marcpabst commented Oct 2, 2023

we ran into an issue where the events were mislabeled in the annotations in a raw brainvision file and the dataset curator labeled them correctly in the events.tsv... So, maybe something like mne_bids.read_raw_epochs that ignores everything in the raw file and goes off the events.tsv might be the solution to that.

I don't think we should write a new function for this. If we try to write a helper function to fix every instance of a broken dataset, we're going to end up with infinite functions. A better idea I think is 1) tell them to fix their dataset (which they might ignore), and then 2) write code to overcome this particular use case in the meantime. Now if this starts happening for dozens of datasets, then I think there's an argument to be made for a helper. But I really hope this is a rare exception...

I just wanted to chime in and share my strong disagreement with this viewpoint. I don't believe we should consider a dataset that relies on events.tsv to encode events as flawed.

In reality, the problem here lies more in how we handle this type of data, which isn't always ideal. Historically, dealing with events in EEG recordings has been quite limited due to technical constraints. Most systems lacked the ability to attach complex metadata to events. But what if we're working with a complex experimental design or need to add post-hoc metadata? Sure, we could manually tweak the triggers in the EEG recording file, but that's error-prone and goes against the spirit of the BIDS format. As @alexrockhill pointed out, this is a common challenge, and I believe that, for a BIDS dataset, we should consider events.tsv as the primary source of truth.

Perhaps a practical solution could be allowing for arbitrary metadata to be added to the Annotations object, similar to what's already possible with the metadata dataframe in Epochs. Is there a an Issue/PR for that somehwere already?

Anyway, I do really like MNE, but I don't think event handling is one of its strong points at the moment.

@larsoner
Copy link
Member

larsoner commented Oct 2, 2023

I don't believe we should consider a dataset that relies on events.tsv to encode events as flawed... I believe that, for a BIDS dataset, we should consider events.tsv as the primary source of truth

Ahh I see -- rereading I think you're right and I was wrong about this (sassuming the other MNE-BIDS people agree). as it could be a way to fix a flawed recording. To not force this behavior, though, it seems like a mne_bids.annotations_from_events_tsv or so seems like it should get you everything you need to fix your dataset. You'll have the native raw with bad events and/or bad annotations, then read the presumably correct ones from tsv, and replace as necessary before epoching.

when events=None in mne.Epochs (with event_id=None) the events and event_id would be taken from the annotations

FWIW recently the Epochs(..., events=None) meaning "call annotations to events under the hood" approach (re)surfaced at the MNE sprint and could make sense as well, feel free to check for and then open an MNE issue about that. But I think we can do this without needing to attach any metadata to raw.

@sappelhoff
Copy link
Member

regarding:

I think the idea with BIDS is that you leave the raw file be so that everything you put in BIDS is not modified and just correct it in the sidecar.

and

for a BIDS dataset, we should consider events.tsv as the primary source of truth.

I personally agree, and this is how we handle things in mne-bids.

However, within BIDS, this is an unresolved issue:

To not force this behavior, though, it seems like a mne_bids.annotations_from_events_tsv or so seems like it should get you everything you need to fix your dataset. You'll have the native raw with bad events and/or bad annotations, then read the presumably correct ones from tsv, and replace as necessary before epoching.

this sounds good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants