Defining derivatives in a chain #8

sappelhoff · 2023-06-23T08:24:08Z

NOTE This stems from the bids derivatives workshop Copenhagen 2023

How to define derivatives in a "chain"?

Usually this is done via the Sources metadata. However, it might be nice to document the processing chain more explicitly, for example in the entities:

<source_entities>_desc-downsampled_<suffix>.<ext>
<source_entities>_desc-downsampled+filtered_<suffix>.<ext>

--> problem: this would result in very long filenames fairly quickly.

Alternatively, one could have much shorter labels for the preprocessing steps:

<source_entities>_desc-ds+filt_<suffix>.<ext>

--> problem: short labels like "ds" may be too general (i.e., take up a lot of "namespace")

Another alternative would be to have "generic" desc labels with an "inbuilt" index (label1, label2, etc.):

<source_entities>_desc-preproc1_<suffix>.<ext>
<source_entities>_desc-preproc2_<suffix>.<ext>
<source_entities>_desc-preproc3_<suffix>.<ext>

paired with a: <source_entities>_<suffix>.json

that is organized as:

{
    "PreprocessingChain": {
        "preproc1": "downsampling",
        "preproc2": "filtering",
        "preproc3": "..."
    }
}

--> Important note: order in a JSON object should have a meaning ... if it doesn't (to be clarified), we might need to use a JSON array.

More complete example:

{
    "PreprocessingChain": {
        "preproc1": "downsampling",
        "preproc2": "filtering",
        "preproc3": "..."
    },
    "downsampling": {
        "description": "lorem ipsum",
        "Sources": "bids::<source_entities>_<suffix>.<ext>",
        "Anti-aliasing-filter": "<link to key-value pair in SoftwareFilters>",
        "Method": "downsampling method (e.g., decimation --> taking every nth sample; or something else)",
    	"SamplingRate": 300
    },
    "filtering": {
        ... 
    }   
}

The text was updated successfully, but these errors were encountered:

christinerogers · 2023-06-23T09:03:51Z

short labels like "ds" may be too general (i.e., take up a lot of "namespace"

ds is probably best avoided given 30+ datasets in bids-examples are known by their ds number from openneuro etc

robertoostenveld · 2023-06-23T09:08:37Z

{
    "PreprocessingChain": [
    "downsampling": {
        "description": "lorem ipsum",
        "Sources": "bids::<source_entities>_<suffix>.<ext>",
        "Anti-aliasing-filter": "<link to key-value pair in SoftwareFilters>",
        "Method": "downsampling method (e.g., decimation --> taking every nth sample; or something else)",
    	"SamplingRate": 300
    },
    "filtering": {
        ... 
    }  
]
}

arnodelorme · 2023-06-23T09:10:45Z

no "custom fields"

{
    "PreprocessingChain": [
    {
        "Description": "downsampling 250 hz", [MANDATORY]
        "Sources": "bids::<source_entities>_<suffix>.<ext>", [MANDATORY]
        "<some fixed, defined name (tbd) indicating more info here>": { [OPTIONAL]
            "foo": "bar",
            "SamplingRate": 300
        }
    },
    {
        "Description": "HP filtering 1hz",
        ... 
    }  
]
}

dorahermes · 2023-06-23T09:21:32Z

Note - if the processing in the chain updates fields in the original _ieeg.json, these fields should be updated or removed.

robertoostenveld · 2023-06-23T09:28:24Z

    "GeneratedBy": [
    {
        "Name": "downsampling", [MANDATORY]
        "Description": "downsampling 250 hz", [OPTIONAL]
        "Sources": "bids::<source_entities>_<suffix>.<ext>", [OPTIONAL]
        "<some name indicating more info here>": { [OPTIONAL]
            "foo": "bar",
            "SamplingRate": 300
        }
    },
    {
        "Name": "filtering", [MANDATORY]
        "description": "HP filtering 1hz", [OPTIONAL]
        ... 
    }  
   ]

arnodelorme · 2023-06-23T09:57:22Z

    "<ProcessingChain>": [
    {
        "Name": "downsampling", [OPTIONAL]
        "Description": "Downsampling data at 250hz", [OPTIONAL]
        "Version": "0.1, [OPTIONAL]
        "Container": { [OPTIONAL]
            "foo": "bar",
            "SamplingRate": 300
        }
        "Sources": ["bids:<raw>:sub-001/eeg/xxx_eeg.edf"]
    },
    {
        "Name": "Filtering", [MANDATORY]
        "Description": "HP filtering 1hz", [OPTIONAL]
        "Sources": ["bids::sub-001/eeg/xxx_desc-downsample_eeg.edf"]
    }  
   ]

arnodelorme · 2023-06-23T12:06:48Z

This file should be named xxx_desc-filtered+downsampled+ICA+epoch_eeg.prov.jsonld next to xxx_desc-filtered+downsampled+ICA+epoch_eeg.set

Our recommendation: that we use JSON and that the validator ensures this is a compatible file (that can be converted to a graph).

{
  "@context": "https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
  "BIDSProvVersion": "dev",
  "records": {
    "prov:Agent": [
      {
        "label": "EEGLAB",
        "version": "v2023"
      }
    ],
    "prov:Activity": [
      {
        "@id": "xxxx1",
        "Label": "filtering the data at 0.1 Hz",
        "Used": "bids:<raw>:sub-001/eeg/xxx_eeg.edf"
      },
      {
        "@id": "xxxx2",
        "Label": "downsampling the data at 250 Hz",
        "Used": "bids::sub-001/eeg/xxx_desc-filtered.set"
      },
      {
        "@id": "xxxx3",
        "Label": "running ICA using Picard",
        "Used": "bids::sub-001/eeg/xxx_desc-filtered+downsampled_eeg.set"
      },
      {
        "@id": "xxxx4",
        "Label": "extracting epochs from -500 ms to 1000 ms",
        "Used": "bids::sub-001/eeg/xxx_desc-filtered+downsampled+ICA_eeg.set"
      }
    ],
    "prov:Entity": [
      {
        "AtLocation": "bids::sub-001/eeg/xxx_desc-filtered_eeg.set",
        "GeneratedBy": "xxxx1"
      },
      {
        "AtLocation": "bids::sub-001/eeg/xxx_desc-filtered+downsampled_eeg.set",
        "GeneratedBy": "xxxx2"
      },
      {
        "AtLocation": "bids::sub-001/eeg/xxx_desc-filtered+downsampled+ICA_eeg.set",
        "GeneratedBy": "xxxx3"
      },
      {
        "AtLocation": "bids::sub-001/eeg/xxx_desc-filtered+downsampled+ICA+epoch_eeg.set",
        "GeneratedBy": "xxxx4"
      }
    ]
  }
}

arnodelorme · 2023-06-23T12:21:37Z

After a discussion with Camille and Dora, this is what it could look like. There are mandatory fields (command, etc...) not included, so this is not even compliant with the beta BIDS provenance version.

{
  "@context": "https://purl.org/nidash/bidsprov/context.json",
  "BIDSProvVersion": "1.0.0",
  "@id": "bids:<raw>:sub-001/eeg/xxx_desc-filtered+downsampled+rereferenced_eeg.edf",
  "wasGeneratedBy": {
    "Label": "Average referencing the data"
  },
  "wasAssociatedWith": {
    "Label": "EEGLAB",
    "Version": 1
  },
  "used": {
    "@id": "bids::sub-001/eeg/xxx_desc-filtered+downsampled_eeg.edf",
    "wasGeneratedBy": {
      "Label": "Downsampling the data at 250 Hz"
    },
    "used": {
      "@id": "bids::sub-001/eeg/xxx_desc-filtered_eeg.edf",
      "wasGeneratedBy": {
        "Label": "High pass filtering the data at 1 Hz"
      },
      "used": {
        "@id": "bids:<raw>:sub-001/eeg/xxx_eeg.edf"
      }
    }
  }
}

sappelhoff · 2023-06-23T14:34:16Z

Trying to vizualize this via: https://github.com/bids-standard/BEP028_BIDSprov/tree/31c53505a7ebd16ede936720a8f114cd117d24e3/bids_prov#notes

{
  "@context": "https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
  "BIDSProvVersion": "dev",
  "records": {
    "prov:Agent": [
      {
        "@id": "bla",
        "label": "EEGLAB",
        "version": "v2023"
      }
    ],
    "prov:Activity": [
      {
        "@id": "xxxx1",
        "associatedWith": "bla",
        "label": "filtering the data at 0.1 Hz",
        "used": "sub-001/eeg/xxx_eeg.edf"
      },
      {
        "@id": "xxxx2",
        "label": "downsampling the data at 250 Hz",
        "used": "sub-001/eeg/xxx_desc-filtered.set"
      },
      {
        "@id": "xxxx3",
        "label": "running ICA using Picard",
        "used": "sub-001/eeg/xxx_desc-filtered+downsampled_eeg.set"
      },
      {
        "@id": "xxxx4",
        "label": "extracting epochs from -500 ms to 1000 ms",
        "used": "bids::sub-001/eeg/xxx_desc-filtered+downsampled+ICA_eeg.set"
      }
    ],
    "prov:Entity": [
      {
        "@id": "bids::sub-001/eeg/xxx_desc-filtered_eeg.set",
        "generatedBy": "xxxx1"
      },
      {
        "@id": "bids::sub-001/eeg/xxx_desc-filtered+downsampled_eeg.set",
        "generatedBy": "xxxx2"
      },
      {
        "@id": "bids::sub-001/eeg/xxx_desc-filtered+downsampled+ICA_eeg.set",
        "generatedBy": "xxxx3"
      },
      {
        "@id": "bids::sub-001/eeg/xxx_desc-filtered+downsampled+ICA+epoch_eeg.set",
        "generatedBy": "xxxx4"
      }
    ]
  }
}

cmaumet · 2023-06-23T14:35:38Z

{
  "@context": "https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
  "BIDSProvVersion": "dev",
  "records": {
    "prov:Agent": [
      {
        "@id": "bla",
        "label": "EEGLAB",
        "version": "v2023"
      }
    ],
    "prov:Activity": [
      {
        "@id": "xxxx1",
        "associatedWith": "bla",
        "label": "filtering the data at 0.1 Hz",
        "used": "sub-001/eeg/xxx_eeg.edf"
      },
      {
        "@id": "xxxx2",
        "label": "downsampling the data at 250 Hz",
        "used": "sub-001/eeg/xxx_desc-filtered_eeg.set"
      },
      {
        "@id": "xxxx3",
        "label": "running ICA using Picard",
        "used": "sub-001/eeg/xxx_desc-filtered+downsampled_eeg.set"
      },
      {
        "@id": "xxxx4",
        "label": "extracting epochs from -500 ms to 1000 ms",
        "used": "sub-001/eeg/xxx_desc-filtered+downsampled+ICA_eeg.set"
      }
    ],
    "prov:Entity": [
      {
        "@id": "sub-001/eeg/xxx_desc-filtered_eeg.set",
        "generatedBy": "xxxx1"
      },
      {
        "@id": "sub-001/eeg/xxx_desc-filtered+downsampled_eeg.set",
        "generatedBy": "xxxx2"
      },
      {
        "@id": "sub-001/eeg/xxx_desc-filtered+downsampled+ICA_eeg.set",
        "generatedBy": "xxxx3"
      },
      {
        "@id": "sub-001/eeg/xxx_desc-filtered+downsampled+ICA+epoch_eeg.set",
        "generatedBy": "xxxx4"
      }
    ]
  }
}

CPernet · 2023-07-03T07:07:02Z

I'm working on the guidelines and it is mentioned that jsonld is not mandatory, still if not used we need to document the chain in simple terms - the discussion was along the lines of

<source_entities>_desc-preproc_<suffix>.<ext>
and have a preproc.json -- what happened to that? this does not document provenance, only the chain of events

arnodelorme · 2023-07-03T07:12:24Z

I think it is still in flux mostly because BIDS-provenance is not finalized. The consensus was that we would work with BIDS-provenance people to make the format simpler to use. Arno

CPernet · 2023-07-03T07:13:49Z

so no more preproc.json?

CPernet · 2023-07-03T07:30:11Z

<source_entities>_desc-preproc_<suffix>.<ext>
preproc.json (not provenance, just the chain)
{
"step1": "downsampling at 250Hz",
"step2": "high pass filtering at 0.05Hz",
"step3": "..."
}

Anything more than that can use prov.

<source_entities>_desc-preproc_<suffix>.json will anyway contain all sorts of relevant info for re-usage.

arnodelorme · 2023-07-03T07:42:30Z

Yes, no more but Robert can comment

robertoostenveld · 2023-07-03T07:49:07Z

There would indeed not be a preproc.json to go along with xxx_desc-preproc_eeg.json. Rather the human readable description goes in a descriptions.tsv file with (at least) two columns: "desc_id" and "description". That table can be at the subject/session/modality level, but due to inheritance can also be at the top level of the derivative dataset.

Machine readable details about the processing go in the prov.jsonld file.

CPernet · 2023-07-03T07:50:25Z

ah yes thx!

robertoostenveld · 2023-07-03T07:50:31Z

I have a FieldTrip example that shows how it would look like, although I did not add the prov.jsonld files yet. It is now uploading to the cloud, takes some time since hotel wifi is slow...

CPernet · 2023-07-03T07:51:26Z

i wish I was with you guys :-(

robertoostenveld · 2023-09-11T08:04:56Z

I have a FieldTrip example that shows how it would look like, although I did not add the prov.jsonld files yet.

The example is available from https://surfdrive.surf.nl/files/index.php/s/M9KiX2r9DcW7ujI

The bids_derivative folder contains a derived dataset, more or less following the pipeline that is documented on one of the FieldTrip tutorials.

It does not yet include the prov.jsonld files.

CPernet · 2023-09-11T08:53:12Z

In https://bids.neuroimaging.io/bep023 (PET) I used the same approach but I also capture the chain, again is a non full provenance way, thx to free text

sub-X_desc-preproc_pet.nii  
sub-X_desc-proc_pet.nii

descriptions.tsv

desc_id	description
mc	Motion correction with MCFLIRT
sm	Smoothing at 8mm
pvc	Partial volume correction
preproc	Data were preprocessed in the following order: motion corrected, registered to the T1w MRI image and smoothed,
proc	From the preproc image, partial volume correction was performed followed by kinetic modelling

CPernet · 2023-09-11T08:54:46Z

@robertoostenveld since one uses desc- I'm guessing 1st column should be desc-id and not description_id (as in your file)

robertoostenveld · 2023-09-11T09:31:45Z

@robertoostenveld since one uses desc- I'm guessing 1st column should be desc-id and not description_id (as in your file)

As it is participant_id with the participants.tsv https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#participants-file (and not sub_id), I thought we would use the same here. But that is a detail, and I am flexible in this respect.

For easy access to all, my descriptions.tsv contains

description_id	description
preproc	preprocessing with a pre-stimulus baseline correction
avg	averaging over trials, per condition
planar	planar gradient transformation
combined	combined planar gradient transformation

The actual order of the steps (preproc, avg, planar, combined) cannot yet be derived from the filenames or descriptions.tsv, that would require the provenance (or the Steps as you have them in the PET derivatives bep023 google doc). The provenance would detail that the output of the preproc serves as input to the avg step.

cmaumet · 2023-09-19T14:35:53Z

Note: as a follow-up to the BIDS-Prov examples we worked on together in Copenhagen, an updated version is now available in the BIDS-Prov repo: https://github.com/bids-standard/BEP028_BIDSprov/blob/master/examples/simple_example/simple_example.prov.jsonld

cmaumet mentioned this issue Jun 23, 2023

Follow up Copenhagen BIDS-Prov meeting bids-standard/BEP028_BIDSprov#109

Open

larsoner mentioned this issue Jul 12, 2023

ENH: Provenance mne-tools/mne-bids-pipeline#763

Open

cmaumet mentioned this issue Jul 25, 2023

Ephys example bids-standard/BEP028_BIDSprov#111

Open

This was referenced Sep 14, 2023

visualizer.py fails on chained MEG processing pipeline [update: solved] bids-standard/BEP028_BIDSprov#120

Closed

review of example that Arno shared #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defining derivatives in a chain #8

Defining derivatives in a chain #8

sappelhoff commented Jun 23, 2023 •

edited

christinerogers commented Jun 23, 2023

robertoostenveld commented Jun 23, 2023 •

edited by sappelhoff

arnodelorme commented Jun 23, 2023 •

edited by sappelhoff

dorahermes commented Jun 23, 2023

robertoostenveld commented Jun 23, 2023 •

edited by sappelhoff

arnodelorme commented Jun 23, 2023 •

edited by sappelhoff

arnodelorme commented Jun 23, 2023 •

edited

arnodelorme commented Jun 23, 2023 •

edited

sappelhoff commented Jun 23, 2023

cmaumet commented Jun 23, 2023 •

edited

CPernet commented Jul 3, 2023 •

edited

arnodelorme commented Jul 3, 2023 via email

CPernet commented Jul 3, 2023

CPernet commented Jul 3, 2023

arnodelorme commented Jul 3, 2023 via email

robertoostenveld commented Jul 3, 2023 •

edited

CPernet commented Jul 3, 2023

robertoostenveld commented Jul 3, 2023

CPernet commented Jul 3, 2023

robertoostenveld commented Sep 11, 2023

CPernet commented Sep 11, 2023

CPernet commented Sep 11, 2023

robertoostenveld commented Sep 11, 2023

cmaumet commented Sep 19, 2023

Defining derivatives in a chain #8

Defining derivatives in a chain #8

Comments

sappelhoff commented Jun 23, 2023 • edited

christinerogers commented Jun 23, 2023

robertoostenveld commented Jun 23, 2023 • edited by sappelhoff

arnodelorme commented Jun 23, 2023 • edited by sappelhoff

dorahermes commented Jun 23, 2023

robertoostenveld commented Jun 23, 2023 • edited by sappelhoff

arnodelorme commented Jun 23, 2023 • edited by sappelhoff

arnodelorme commented Jun 23, 2023 • edited

arnodelorme commented Jun 23, 2023 • edited

sappelhoff commented Jun 23, 2023

cmaumet commented Jun 23, 2023 • edited

CPernet commented Jul 3, 2023 • edited

arnodelorme commented Jul 3, 2023 via email

CPernet commented Jul 3, 2023

CPernet commented Jul 3, 2023

arnodelorme commented Jul 3, 2023 via email

robertoostenveld commented Jul 3, 2023 • edited

CPernet commented Jul 3, 2023

robertoostenveld commented Jul 3, 2023

CPernet commented Jul 3, 2023

robertoostenveld commented Sep 11, 2023

CPernet commented Sep 11, 2023

CPernet commented Sep 11, 2023

robertoostenveld commented Sep 11, 2023

cmaumet commented Sep 19, 2023

sappelhoff commented Jun 23, 2023 •

edited

robertoostenveld commented Jun 23, 2023 •

edited by sappelhoff

arnodelorme commented Jun 23, 2023 •

edited by sappelhoff

robertoostenveld commented Jun 23, 2023 •

edited by sappelhoff

arnodelorme commented Jun 23, 2023 •

edited by sappelhoff

arnodelorme commented Jun 23, 2023 •

edited

arnodelorme commented Jun 23, 2023 •

edited

cmaumet commented Jun 23, 2023 •

edited

CPernet commented Jul 3, 2023 •

edited

robertoostenveld commented Jul 3, 2023 •

edited