Example of the HED schema library for SCORE implementation #324

tpatpa · 2022-06-26T19:53:12Z

This PR includes examples of the HED schema library for the Standardized Computer-based Organized Reporting of EEG (SCORE) implementation in several EEG and invasive EEG datasets in the two ways to annotate HED in BIDS.

BIDS dataset was validated with bids-validator@1.9.3 and HED annotations were separately validated with the example jupyter notebook and the hedtools that will be incorporated in the bids-validator soon.

dorahermes · 2022-07-08T18:06:14Z

Before this example can be merged, this other pull request in the bids-specificartion that adds the use of an HED schema library bids-specification pull request needs to be merged first. Then the dataset_description.json can be updated to include the SCORE library schema.

xeeg_hed_score/dataset_description.json

xeeg_hed_score/sub-eegSeizureTUH/ses-eeg01/eeg/sub-eegSeizureTUH_ses-eeg01_channels.tsv

sappelhoff

it's nice to see a dataset with the MEF data format. Could you please make sure that all binary data files are zeroed out, that is, that they have 0kb?

see also: https://github.com/bids-standard/bids-examples/blob/master/CONTRIBUTING.md#why-do-we-only-host-truncated-data-with-0kb-size

sappelhoff · 2022-07-22T05:58:06Z

crosslinking, this PR needs a spec PR and a following validator PR to be merged first:

spec: [ENH] Added the specification for using HED libraries in BIDS bids-specification#1106
validator: Implementation and basic unit tests for HEDVersion with libraries bids-validator#1496

sappelhoff · 2022-09-27T09:59:19Z

The PRs mentioned in #324 (comment) have meanwhile been merged, so I think you can address some of the reviewer comments above @tpatpa

tpatpa · 2022-09-27T22:30:10Z

I made minor edits to match the latest HED-SCORE library schema. Validation is still failing because the validator is validating against a previous version of the HED-SCORE library schema. We are in the final stages before releasing the schema's latest version. Once released this will be validated and could be merged as well.

dorahermes · 2023-02-08T21:42:04Z

We just released HED-SCORE V1.0.0. If you follow the link make sure to click on "Show another version" to browse. The remaining question for this BIDS-HED-SCORE example is on how to list a set of channels affected by an event. It seems like the discussion in the electrophysiology derivatives was going for a Pythonic list of strings, eg ['C3','F3'] or ['C3-F3','P3-C3'] in a separate channels column. Any preference for these examples? @sappelhoff @robertoostenveld @VisLab @guiomar @CPernet

sappelhoff · 2023-02-09T09:05:45Z

congrats Dora, Tal, et al!

It seems like the discussion in the electrophysiology derivatives was going for a Pythonic list of strings

yes, this should be a link to the specific discussion thread: gdoc discussion

My preference is indeed the "pythonic list of strings" (clarified more in the above linked thread) but apart from Robert and me, nobody seems to have offered an opinion yet.

CPernet · 2023-02-09T09:32:54Z

+1 list of strings

VisLab · 2023-02-09T12:53:02Z

It seems like the discussion in the electrophysiology derivatives was going for a Pythonic list of strings, eg ['C3','F3'] or ['C3-F3','P3-C3'] in a separate channels column.

Could you clarify what would be meant by: ['C3-F3','P3-C3']? If it means a group of channels as ordered by the channels.tsv, I don't think this should be allowed. Which channels are in the data and how they are ordered can change during computation, so this is particularly risky for derivatives. The same comment applies to channel numbers.

Also, given that quotes are usually not used in events files, would [C3, F3] be preferred?

One further option, which perhaps the HED working group should discuss separately, is how HED might be used to define named groups of channels which could then be referred to by name. This does not preclude the need for deciding on a format for a channels column in the events files and their derivatives.

sappelhoff · 2023-02-09T12:58:28Z

Let me copy the discussion thread from the Gdoc here, perhaps that clarifies some things:

@sappelhoff :

I think a "pythonic" way to specify channels would be nice.

A list of strings where each string is an affected channel

An empty list means no channel is affected

"n/a" means the interpretation of "channel" is not applicable or the data is not available

We could additionally think about a shorthand to indicate "all channels are affected". Perhaps the string "all" would work for that ... if by chance a channel is named "all" this would still not be ambiguous, because specifying individual channels needs to be in a list like ["all"]

@robertoostenveld :

I don't care too mush, but think that we should not be restrictive in formatting.
And to note: channels in the annotations file do not have to correspond to channels in the data. E.g., how would you document an interictal spike that is observed in a bipolar refreferenced channel ("C3-F3"), whereas the data is stored and shared with a common reference ("C3-REF", or probably just "C3"). Annotations could also refer to electrodes rather than channels, such as "video inspection revealed that electrode C3 fell off at 10 minutes into the recording".

@sappelhoff:

Thanks for these valid examples. Regarding the first one, one could argue that if an interictal spike at "C3-F3" is observed, and the dataset curator wants to annotate this event, then they should include that channel in the data.

Regarding the "electrode" example: We could have an additional column called "electrode" that follows the same formatting as "channel" (the column we are discussing here), but strictly referring to electrodes.

Curators could then for each event specify any number of channels and/or electrodes that are related to the event.

VisLab · 2023-02-09T13:19:22Z

@sappelhoff :

I think a "pythonic" way to specify channels would be nice.

A list of strings where each string is an affected channel

An empty list means no channel is affected

"n/a" means the interpretation of "channel" is not applicable or the data is not available

We could additionally think about a shorthand to indicate "all channels are affected". Perhaps the string "all" would work for that ... if by chance a channel is named "all" this would still not be ambiguous, because specifying individual channels needs to be in a list like ["all"]

I am in favor of this format.

I realize that this discussion pertains to derivatives, however, it might also be relevant to events in the main BIDS specification.

Right now the BIDS specification of events.json indicates two options for describing column values (discounting HED): Levels and Units.

If Levels are given, the column is assumed to contain categorical values.

If Units are given, the column is assumed to contain numeric values expressed in the specified units.

Would this new format of column value require a new List type -- list of numbers or list of strings?

tpatpa · 2023-02-15T15:21:50Z

Right now the BIDS specification of events.json indicates two options for describing column values (discounting HED): Levels and Units.

Maybe we can use IntendedFor in the corresponding JSON file to link to the relevant channels file similar to how fieldmap data is linked to a specific scan?

sappelhoff · 2023-02-15T16:00:51Z

One further option, which perhaps the HED working group should discuss separately, is how HED might be used to define named groups of channels which could then be referred to by name. This does not preclude the need for deciding on a format for a channels column in the events files and their derivatives.

I agree that developing an "HED way" to deal with this problem would be very nice in addition to a "straight forward - non HED" BIDS solution.

Right now the BIDS specification of events.json indicates two options for describing column values (discounting HED): Levels and Units.

you are raising a very important point here, thanks -- I have not considered that my proposal would introduce a new "data type" for rows in TSV files apart from those defined by Levels and Units. It wouldn't make sense to consider the lists of strings (or the [], "all", and "n/a" values that'd be possible) as "Levels", because there'd simply be too many levels without providing a lot of insight.

We should:

think about alternative solutions
think whether introducing a List type makes sense in a more general sense (beyond our present problem), because I would be hesitant to introduce a concept that is only sensible / meaningful for one particular application
beyond the potential List data type, the present proposal would also permit "all" as a shorthand for a full list of channels and the standard "n/a". While permitting "n/a" is easy and needed anyhow, ... what do you think of the "all" shorthand? Needed? Superfluous? ...

In any case, we need some more people to chime in :-)

Maybe we can use IntendedFor in the corresponding JSON file to link to the relevant channels file similar to how fieldmap data is linked to a specific scan?

Not all JSON files have an IntendedFor metadata field, but I like the general direction here ...

VisLab · 2023-02-23T21:23:31Z

One other possibility would be just to use a string for the channel list rather than an actual list. Then BIDS doesn't have to be concerned about validating necessarily, although it could.

"[Cz7, O1, O2]"

Downstream tools could then do their own thing to interpret a channels column, parse, and make sure that these are real channels.

dorahermes · 2023-02-28T20:47:38Z

Right now the BIDS specification of events.json indicates two options for describing column values (discounting HED): Levels and Units.

you are raising a very important point here, thanks -- I have not considered that my proposal would introduce a new "data type" for rows in TSV files apart from those defined by Levels and Units. It wouldn't make sense to consider the lists of strings (or the [], "all", and "n/a" values that'd be possible) as "Levels", because there'd simply be too many levels without providing a lot of insight.

We should:

think about alternative solutions

think whether introducing a List type makes sense in a more general sense (beyond our present problem), because I would be hesitant to introduce a concept that is only sensible / meaningful for one particular application

beyond the potential List data type, the present proposal would also permit "all" as a shorthand for a full list of channels and the standard "n/a". While permitting "n/a" is easy and needed anyhow, ... what do you think of the "all" shorthand? Needed? Superfluous? ...

In any case, we need some more people to chime in :-)

I think a List data type would help. This may also be helpful beyond the present problem for other ephys cases where multiple channels or electrodes could be listed and I can imaging other cases where multiple items may to be indexed beyond channels or electrodes.

Maybe we can use IntendedFor in the corresponding JSON file to link to the relevant channels file similar to how fieldmap data is linked to a specific scan?

Not all JSON files have an IntendedFor metadata field, but I like the general direction here ...

This may be similar or comparable to a case in the BIDS-connectivity-BEP, for channel-to-channel connectivity metrics where we also want to interpret a column with respect to an _channels.tsv or _electrodes.tsv file (discussing there whether this should be a sourceAtlas, but it is not an actual atlas...).

Andesha · 2023-03-03T18:47:08Z

For the format of this field a "pythonic list" has the downside of encouraging developers to do an eval which is not a safe practice. A comma separated list (or a single item) would work effectively as a representation. Brackets aren't strictly necessary as these files are a TSV.

CCing @christinerogers for EEGnet

sappelhoff · 2023-03-03T22:09:11Z

For the format of this field a "pythonic list" has the downside of encouraging developers to do an eval which is not a safe practice. A comma separated list (or a single item) would work effectively as a representation. Brackets aren't strictly necessary as these files are a TSV.

I agree with you -- I guess I proposed it this way to distinguish n/a from a potential channel name n/a and to have a convenient shorthand all that is different from a channel potentially called all. But we can probably come up with better ways.

christinerogers · 2023-03-09T16:53:05Z

Super, thanks @sappelhoff -- I'll raise this at the BIDS maintainers meeting, since @Andesha 's request for code-agnostic (consistent, comma-separated instead of pythonic) lists helps those making BIDS-supporting data platforms (like EEGNet which is webfacing) across modalities.

Has this been discussed elsewhere in the BIDS specs, i.e. is there an implementation cost concern? cc @CPernet

Does this make sense to you @arnodelorme @VisLab ?
thoughts, @dorahermes @tpatpa ?

rwblair · 2023-03-09T20:19:01Z

Discussing this with a few maintainers we settled on allowing json arrays whose only values are strings as an acceptable solution. The json specification allows json documents to start with a square bracket so any languages implementation of the json parser should be sufficient.

We should also be able to represent this in the schema well enough to tip the validators off that elements of the column may need to be parsed.

@effigies did I misrepresent anything here?

christinerogers · 2023-03-09T20:46:31Z

Hi @rwblair ,
My takeaway from this maintainers meeting was that it wasn't decided, and will be raised again next time after some community consultation (@sappelhoff).
There's definitely concern about the embedded JSON suggestion on our side -- e.g. within a TSV it's not only hard to read but also can't be validated easily by those preparing data files, so significantly more error-prone.

effigies · 2023-03-09T21:03:39Z

@rwblair nothing misrepresented, but here's a more extended version:

We discussed two possible approaches:

Comma-separated values: a,b,c
JSON arrays: ["a","b","c"]

Advantages to CSV:

Simple. Just use split(val, ","), which will come in every language and is easy to write your own if forced.
You can eyeball it.

Advantages to JSON arrays:

Typed. The schema could declare a value as having type array[str] (actually something uglier, but you get it) and the validator can check it.
Proximal. If a tool is working with BIDS data, it should have access to a JSON parser. No eval needed.
Escapable. JSON supports unicode escapes, in case someone needs a tab or newline character inside their strings, which TSV with a CSV insert would interpret as an end-of-column or end-of-row. (I hope people don't do this, but in the extreme...)

rwblair · 2023-03-09T22:05:58Z

Hi @rwblair , My takeaway from this maintainers meeting was that it wasn't decided, and will be raised again next time after some community consultation (@sappelhoff). There's definitely concern about the embedded JSON suggestion on our side -- e.g. within a TSV it's not only hard to read but also can't be validated easily by those preparing data files, so significantly more error-prone.

My apologies for the ambiguity and finality in tone of my post. This was from a chat after the maintainers meeting you presented at. I completely agree with you that it needs more discussion here and in the next maintainers meeting.

christinerogers · 2023-03-10T18:44:38Z

thanks @rwblair - let's continue this detailed point in a more general/accessible place than this channel. cc @sappelhoff happy to follow your lead here.

Adding 'SourceDatasets' per comment (bids-standard#324 (comment))

VisLab · 2023-08-07T11:35:27Z

Looks good....

VisLab

This looks good --- but you will need to release SCORE 1.1.1 before this will validate.

dorahermes · 2023-08-14T19:31:41Z

Looks good to me, ok after it is validated!

New example of the HED SCORE schema library used for event annotations in several EEG and invasive EEG datasets. Dataset validated with bids-validator@1.9.3 and PENDING validation of HED annotations with HED python tools.

Annotations validates with example notebook (https://github.com/hed-standard/hed-examples/blob/main/hedcode/jupyter_notebooks/bids_validate_dataset_with_libraries.ipynb) and HED python tools (https://pypi.org/project/hedtools/)

[ERR] The validation on this HED string returned an error. (code: 104 - HED_ERROR) -- HED validated separately

- dataset_description.json fixed and waiting for [bids-specification pull request](bids-standard/bids-specification#1106) - units added to _channels.tsv for TUH subjects - Files truncated via [terminal](https://github.com/bids-standard/bids-examples/blob/master/CONTRIBUTING.md#why-do-we-only-host-truncated-data-with-0kb-size)

Correct annotations to match the latest prerelease library schema.

Example validated following latest schema release (HED score_1.0.0) and tools update using HED-examples jupyter notebook (https://github.com/hed-standard/hed-examples/blob/main/hedcode/jupyter_notebooks/bids/bids_validate_dataset_with_libraries.ipynb).

Array data in .tsv cells discussion is converging towards a comma-separated list (possibly wrapped delimiters list), so the example was edited accordingly - combined rows representing the same event with a list of channels in the channel column. See full discussion here: bids-standard/bids-specification#1446

Channel column in _events.tsv should match the channel name so updated _channels.tsv to accommodate 2 different references.

Adding 'SourceDatasets' per comment (bids-standard#324 (comment))

Updated example to use the HED-SCORE schema in its partnered version (score_1.1.0). - changed specified HEDVersion in dataset_description.json - removed HED-SCORE prefix (sc:) - changed column name from 'HED' to 'HED_annotations' since the use of 'HED' is only allowed as a second-level child in these files. The example was validated with bids-validator and HED jupyter notebook - BIDS data set that uses partnered schema has no HED validation errors.

Removing Other-organized-rhythm/non-interesting events (bckg) annotations since the background is considered baseline. If we end up with empty files they are removed.

corrected bids-validator error

Trailing " removed

Corrected mix-up with the TUH dataset stop_time and duration.

Updated example to use the HED-SCORE schema with score_1.1.1 - changed specified HEDVersion in dataset_description.json - changed event annotations to match labels used for annotating the Temple University Hospital EEG Corpus (TUEG) and their corresponding SCORE HED library schema annotations.

sappelhoff · 2023-10-27T08:37:36Z

@tpatpa I rebased your branch on bids-standard/bids-examples@master. You will have to run the following command locally before you continue to work on this: git fetch --all and then git reset --hard origin/master

NOTE if you have uncommitted local changes on your branch, do not run the above ☝️ ... just comment here in that case :-)

And in the future, it would be good if you could make a new git branch before submitting a pull request. Submitting pull requests directly from your master branch makes it difficult for you to stay up to date with the upstream repository (bids-standard/bids-examples@master).

sappelhoff

The CI fails with the following issue:

1: [ERR] The validation on this HED string returned an error. (code: 104 - HED_ERROR)
./dataset_description.json
Evidence: ERROR: [SCHEMA_LOAD_FAILED] Could not load HED schema "{"nickname":"","version":"1.1.1","library":"score","localPath":""}" from remote repository - "XMLHttpRequest is not defined". (For more information on this HED error, see https://hed-specification.readthedocs.io/en/latest/Appendix_B.html#schema-load-failed.)

Please visit https://neurostars.org/search?q=HED_ERROR for existing conversations about this issue.

cc @tpatpa @VisLab @dorahermes

Are you also sharing the full data of this example somewhere? For example on OSF or GIN?

Are the channel names such as FP1-F7 also how they occur in the .edf files? (in the full data, of course)

Furthermore: Why is this example called xeeg with an x, instead of just eeg?

This PR will also need to add the new example to our example index. I can do that for you, however, once we have resolved the above issue.

After that, I'd be happy to merge this.

VisLab · 2023-10-27T12:34:22Z

We're having a chicken and egg problem here... The HED version must be a released version, but SCORE 1.1.1 has not been released yet due to unresolved issues. @dorahermes can you review 1.1.1 and we can discuss the issues offline and move towards release.

tpatpa marked this pull request as draft June 26, 2022 19:53

tpatpa marked this pull request as ready for review June 26, 2022 19:53

tpatpa marked this pull request as draft June 26, 2022 19:54

tpatpa marked this pull request as ready for review June 26, 2022 19:56

dorahermes reviewed Jul 8, 2022

View reviewed changes

xeeg_hed_score/dataset_description.json Outdated Show resolved Hide resolved

xeeg_hed_score/dataset_description.json Outdated Show resolved Hide resolved

xeeg_hed_score/sub-eegSeizureTUH/ses-eeg01/eeg/sub-eegSeizureTUH_ses-eeg01_channels.tsv Outdated Show resolved Hide resolved

sappelhoff reviewed Jul 12, 2022

View reviewed changes

VisLab mentioned this pull request Aug 4, 2022

Implementation and basic unit tests for HEDVersion with libraries bids-standard/bids-validator#1496

Merged

tpatpa mentioned this pull request Jan 20, 2023

HED-SCORE library schema 1.0.0 release hed-standard/hed-schemas#54

Merged

tpatpa added a commit to tpatpa/bids-examples that referenced this pull request Jun 22, 2023

Update dataset_description.json

44cfb60

Adding 'SourceDatasets' per comment (bids-standard#324 (comment))

VisLab approved these changes Aug 10, 2023

View reviewed changes

tpatpa added 21 commits October 27, 2023 10:33

Added ne example: xeeg_hed_score

bfc6870

New example of the HED SCORE schema library used for event annotations in several EEG and invasive EEG datasets. Dataset validated with bids-validator@1.9.3 and PENDING validation of HED annotations with HED python tools.

fix readme

0f3b75d

correct json encoding

0a57ecd

added library prefix

b4fafb5

Fix Readme.md

080c734

Update README.md

f3f33bd

Adding schema prefix to sub-eegSeizureTUH

71ff1e7

Annotations validates with example notebook (https://github.com/hed-standard/hed-examples/blob/main/hedcode/jupyter_notebooks/bids_validate_dataset_with_libraries.ipynb) and HED python tools (https://pypi.org/project/hedtools/)

Validate with bids-validator@1.9.3

8c1c556

[ERR] The validation on this HED string returned an error. (code: 104 - HED_ERROR) -- HED validated separately

Match latest prerelease library schema.

2e60481

Correct annotations to match the latest prerelease library schema.

HED validation

dc0d830

Example validated following latest schema release (HED score_1.0.0) and tools update using HED-examples jupyter notebook (https://github.com/hed-standard/hed-examples/blob/main/hedcode/jupyter_notebooks/bids/bids_validate_dataset_with_libraries.ipynb).

Update _channels.tsv

34c7904

Channel column in _events.tsv should match the channel name so updated _channels.tsv to accommodate 2 different references.

Update dataset_description.json

1358471

Adding 'SourceDatasets' per comment (bids-standard#324 (comment))

Removed background annotations

3b6bf18

Removing Other-organized-rhythm/non-interesting events (bckg) annotations since the background is considered baseline. If we end up with empty files they are removed.

Update sub-eegSeizureTUH_ses-eeg01_task-rest_run-002_events.tsv

4eacc5d

corrected bids-validator error

Update sub-eegSeizureTUH_ses-eeg01_task-rest_run-002_events.tsv

e65c2ef

corrected bids-validator error

Update sub-eegSeizureTUH_ses-eeg01_task-rest_run-002_events.tsv

2288741

Trailing " removed

Corrected events.tsv durations

fa544ce

Corrected mix-up with the TUH dataset stop_time and duration.

sappelhoff force-pushed the master branch from 01241e5 to 0815a0e Compare October 27, 2023 08:34

sappelhoff reviewed Oct 27, 2023

View reviewed changes

dorahermes mentioned this pull request Nov 20, 2023

Annotations of bad segments of MEG/EEG/iEEG data bids-standard/bep021#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example of the HED schema library for SCORE implementation #324

Example of the HED schema library for SCORE implementation #324

tpatpa commented Jun 26, 2022

dorahermes commented Jul 8, 2022

sappelhoff left a comment •

edited

sappelhoff commented Jul 22, 2022 •

edited

sappelhoff commented Sep 27, 2022

tpatpa commented Sep 27, 2022

dorahermes commented Feb 8, 2023

sappelhoff commented Feb 9, 2023

CPernet commented Feb 9, 2023

VisLab commented Feb 9, 2023

sappelhoff commented Feb 9, 2023

VisLab commented Feb 9, 2023

tpatpa commented Feb 15, 2023

sappelhoff commented Feb 15, 2023

VisLab commented Feb 23, 2023

dorahermes commented Feb 28, 2023

Andesha commented Mar 3, 2023

sappelhoff commented Mar 3, 2023

christinerogers commented Mar 9, 2023 •

edited

rwblair commented Mar 9, 2023

christinerogers commented Mar 9, 2023 •

edited

effigies commented Mar 9, 2023

rwblair commented Mar 9, 2023

christinerogers commented Mar 10, 2023

VisLab commented Aug 7, 2023 •

edited

VisLab left a comment

dorahermes commented Aug 14, 2023

sappelhoff commented Oct 27, 2023

sappelhoff left a comment

VisLab commented Oct 27, 2023

Example of the HED schema library for SCORE implementation #324

Are you sure you want to change the base?

Example of the HED schema library for SCORE implementation #324

Conversation

tpatpa commented Jun 26, 2022

dorahermes commented Jul 8, 2022

sappelhoff left a comment • edited

Choose a reason for hiding this comment

sappelhoff commented Jul 22, 2022 • edited

sappelhoff commented Sep 27, 2022

tpatpa commented Sep 27, 2022

dorahermes commented Feb 8, 2023

sappelhoff commented Feb 9, 2023

CPernet commented Feb 9, 2023

VisLab commented Feb 9, 2023

sappelhoff commented Feb 9, 2023

VisLab commented Feb 9, 2023

tpatpa commented Feb 15, 2023

sappelhoff commented Feb 15, 2023

VisLab commented Feb 23, 2023

dorahermes commented Feb 28, 2023

Andesha commented Mar 3, 2023

sappelhoff commented Mar 3, 2023

christinerogers commented Mar 9, 2023 • edited

rwblair commented Mar 9, 2023

christinerogers commented Mar 9, 2023 • edited

effigies commented Mar 9, 2023

rwblair commented Mar 9, 2023

christinerogers commented Mar 10, 2023

VisLab commented Aug 7, 2023 • edited

VisLab left a comment

Choose a reason for hiding this comment

dorahermes commented Aug 14, 2023

sappelhoff commented Oct 27, 2023

sappelhoff left a comment

Choose a reason for hiding this comment

VisLab commented Oct 27, 2023

sappelhoff left a comment •

edited

sappelhoff commented Jul 22, 2022 •

edited

christinerogers commented Mar 9, 2023 •

edited

christinerogers commented Mar 9, 2023 •

edited

VisLab commented Aug 7, 2023 •

edited