Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining derivatives in a chain #8

Open
sappelhoff opened this issue Jun 23, 2023 · 24 comments
Open

Defining derivatives in a chain #8

sappelhoff opened this issue Jun 23, 2023 · 24 comments

Comments

@sappelhoff
Copy link
Member

sappelhoff commented Jun 23, 2023

NOTE This stems from the bids derivatives workshop Copenhagen 2023


How to define derivatives in a "chain"?

Usually this is done via the Sources metadata. However, it might be nice to document the processing chain more explicitly, for example in the entities:

  1. <source_entities>_desc-downsampled_<suffix>.<ext>
  2. <source_entities>_desc-downsampled+filtered_<suffix>.<ext>

--> problem: this would result in very long filenames fairly quickly.

Alternatively, one could have much shorter labels for the preprocessing steps:

  1. <source_entities>_desc-ds+filt_<suffix>.<ext>

--> problem: short labels like "ds" may be too general (i.e., take up a lot of "namespace")

Another alternative would be to have "generic" desc labels with an "inbuilt" index (label1, label2, etc.):

  1. <source_entities>_desc-preproc1_<suffix>.<ext>
  2. <source_entities>_desc-preproc2_<suffix>.<ext>
  3. <source_entities>_desc-preproc3_<suffix>.<ext>

paired with a: <source_entities>_<suffix>.json

that is organized as:

{
    "PreprocessingChain": {
        "preproc1": "downsampling",
        "preproc2": "filtering",
        "preproc3": "..."
    }
}

--> Important note: order in a JSON object should have a meaning ... if it doesn't (to be clarified), we might need to use a JSON array.

More complete example:

{
    "PreprocessingChain": {
        "preproc1": "downsampling",
        "preproc2": "filtering",
        "preproc3": "..."
    },
    "downsampling": {
        "description": "lorem ipsum",
        "Sources": "bids::<source_entities>_<suffix>.<ext>",
        "Anti-aliasing-filter": "<link to key-value pair in SoftwareFilters>",
        "Method": "downsampling method (e.g., decimation --> taking every nth sample; or something else)",
    	"SamplingRate": 300
    },
    "filtering": {
        ... 
    }   
}
@christinerogers
Copy link

short labels like "ds" may be too general (i.e., take up a lot of "namespace"

ds is probably best avoided given 30+ datasets in bids-examples are known by their ds number from openneuro etc

@robertoostenveld
Copy link
Collaborator

robertoostenveld commented Jun 23, 2023

{
    "PreprocessingChain": [
    "downsampling": {
        "description": "lorem ipsum",
        "Sources": "bids::<source_entities>_<suffix>.<ext>",
        "Anti-aliasing-filter": "<link to key-value pair in SoftwareFilters>",
        "Method": "downsampling method (e.g., decimation --> taking every nth sample; or something else)",
    	"SamplingRate": 300
    },
    "filtering": {
        ... 
    }  
]
}

@arnodelorme
Copy link
Collaborator

arnodelorme commented Jun 23, 2023

no "custom fields"

{
    "PreprocessingChain": [
    {
        "Description": "downsampling 250 hz", [MANDATORY]
        "Sources": "bids::<source_entities>_<suffix>.<ext>", [MANDATORY]
        "<some fixed, defined name (tbd) indicating more info here>": { [OPTIONAL]
            "foo": "bar",
            "SamplingRate": 300
        }
    },
    {
        "Description": "HP filtering 1hz",
        ... 
    }  
]
}

@dorahermes
Copy link
Member

Note - if the processing in the chain updates fields in the original _ieeg.json, these fields should be updated or removed.

@robertoostenveld
Copy link
Collaborator

robertoostenveld commented Jun 23, 2023

    "GeneratedBy": [
    {
        "Name": "downsampling", [MANDATORY]
        "Description": "downsampling 250 hz", [OPTIONAL]
        "Sources": "bids::<source_entities>_<suffix>.<ext>", [OPTIONAL]
        "<some name indicating more info here>": { [OPTIONAL]
            "foo": "bar",
            "SamplingRate": 300
        }
    },
    {
        "Name": "filtering", [MANDATORY]
        "description": "HP filtering 1hz", [OPTIONAL]
        ... 
    }  
   ]

@arnodelorme
Copy link
Collaborator

arnodelorme commented Jun 23, 2023

    "<ProcessingChain>": [
    {
        "Name": "downsampling", [OPTIONAL]
        "Description": "Downsampling data at 250hz", [OPTIONAL]
        "Version": "0.1, [OPTIONAL]
        "Container": { [OPTIONAL]
            "foo": "bar",
            "SamplingRate": 300
        }
        "Sources": ["bids:<raw>:sub-001/eeg/xxx_eeg.edf"]
    },
    {
        "Name": "Filtering", [MANDATORY]
        "Description": "HP filtering 1hz", [OPTIONAL]
        "Sources": ["bids::sub-001/eeg/xxx_desc-downsample_eeg.edf"]
    }  
   ]

@arnodelorme
Copy link
Collaborator

arnodelorme commented Jun 23, 2023

This file should be named xxx_desc-filtered+downsampled+ICA+epoch_eeg.prov.jsonld next to xxx_desc-filtered+downsampled+ICA+epoch_eeg.set

Our recommendation: that we use JSON and that the validator ensures this is a compatible file (that can be converted to a graph).

{
  "@context": "https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
  "BIDSProvVersion": "dev",
  "records": {
    "prov:Agent": [
      {
        "label": "EEGLAB",
        "version": "v2023"
      }
    ],
    "prov:Activity": [
      {
        "@id": "xxxx1",
        "Label": "filtering the data at 0.1 Hz",
        "Used": "bids:<raw>:sub-001/eeg/xxx_eeg.edf"
      },
      {
        "@id": "xxxx2",
        "Label": "downsampling the data at 250 Hz",
        "Used": "bids::sub-001/eeg/xxx_desc-filtered.set"
      },
      {
        "@id": "xxxx3",
        "Label": "running ICA using Picard",
        "Used": "bids::sub-001/eeg/xxx_desc-filtered+downsampled_eeg.set"
      },
      {
        "@id": "xxxx4",
        "Label": "extracting epochs from -500 ms to 1000 ms",
        "Used": "bids::sub-001/eeg/xxx_desc-filtered+downsampled+ICA_eeg.set"
      }
    ],
    "prov:Entity": [
      {
        "AtLocation": "bids::sub-001/eeg/xxx_desc-filtered_eeg.set",
        "GeneratedBy": "xxxx1"
      },
      {
        "AtLocation": "bids::sub-001/eeg/xxx_desc-filtered+downsampled_eeg.set",
        "GeneratedBy": "xxxx2"
      },
      {
        "AtLocation": "bids::sub-001/eeg/xxx_desc-filtered+downsampled+ICA_eeg.set",
        "GeneratedBy": "xxxx3"
      },
      {
        "AtLocation": "bids::sub-001/eeg/xxx_desc-filtered+downsampled+ICA+epoch_eeg.set",
        "GeneratedBy": "xxxx4"
      }
    ]
  }
}

@arnodelorme
Copy link
Collaborator

arnodelorme commented Jun 23, 2023

After a discussion with Camille and Dora, this is what it could look like. There are mandatory fields (command, etc...) not included, so this is not even compliant with the beta BIDS provenance version.

{
  "@context": "https://purl.org/nidash/bidsprov/context.json",
  "BIDSProvVersion": "1.0.0",
  "@id": "bids:<raw>:sub-001/eeg/xxx_desc-filtered+downsampled+rereferenced_eeg.edf",
  "wasGeneratedBy": {
    "Label": "Average referencing the data"
  },
  "wasAssociatedWith": {
    "Label": "EEGLAB",
    "Version": 1
  },
  "used": {
    "@id": "bids::sub-001/eeg/xxx_desc-filtered+downsampled_eeg.edf",
    "wasGeneratedBy": {
      "Label": "Downsampling the data at 250 Hz"
    },
    "used": {
      "@id": "bids::sub-001/eeg/xxx_desc-filtered_eeg.edf",
      "wasGeneratedBy": {
        "Label": "High pass filtering the data at 1 Hz"
      },
      "used": {
        "@id": "bids:<raw>:sub-001/eeg/xxx_eeg.edf"
      }
    }
  }
}

@sappelhoff
Copy link
Member Author

Trying to vizualize this via: https://github.com/bids-standard/BEP028_BIDSprov/tree/31c53505a7ebd16ede936720a8f114cd117d24e3/bids_prov#notes

{
  "@context": "https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
  "BIDSProvVersion": "dev",
  "records": {
    "prov:Agent": [
      {
        "@id": "bla",
        "label": "EEGLAB",
        "version": "v2023"
      }
    ],
    "prov:Activity": [
      {
        "@id": "xxxx1",
        "associatedWith": "bla",
        "label": "filtering the data at 0.1 Hz",
        "used": "sub-001/eeg/xxx_eeg.edf"
      },
      {
        "@id": "xxxx2",
        "label": "downsampling the data at 250 Hz",
        "used": "sub-001/eeg/xxx_desc-filtered.set"
      },
      {
        "@id": "xxxx3",
        "label": "running ICA using Picard",
        "used": "sub-001/eeg/xxx_desc-filtered+downsampled_eeg.set"
      },
      {
        "@id": "xxxx4",
        "label": "extracting epochs from -500 ms to 1000 ms",
        "used": "bids::sub-001/eeg/xxx_desc-filtered+downsampled+ICA_eeg.set"
      }
    ],
    "prov:Entity": [
      {
        "@id": "bids::sub-001/eeg/xxx_desc-filtered_eeg.set",
        "generatedBy": "xxxx1"
      },
      {
        "@id": "bids::sub-001/eeg/xxx_desc-filtered+downsampled_eeg.set",
        "generatedBy": "xxxx2"
      },
      {
        "@id": "bids::sub-001/eeg/xxx_desc-filtered+downsampled+ICA_eeg.set",
        "generatedBy": "xxxx3"
      },
      {
        "@id": "bids::sub-001/eeg/xxx_desc-filtered+downsampled+ICA+epoch_eeg.set",
        "generatedBy": "xxxx4"
      }
    ]
  }
}

image

@cmaumet
Copy link

cmaumet commented Jun 23, 2023

{
  "@context": "https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
  "BIDSProvVersion": "dev",
  "records": {
    "prov:Agent": [
      {
        "@id": "bla",
        "label": "EEGLAB",
        "version": "v2023"
      }
    ],
    "prov:Activity": [
      {
        "@id": "xxxx1",
        "associatedWith": "bla",
        "label": "filtering the data at 0.1 Hz",
        "used": "sub-001/eeg/xxx_eeg.edf"
      },
      {
        "@id": "xxxx2",
        "label": "downsampling the data at 250 Hz",
        "used": "sub-001/eeg/xxx_desc-filtered_eeg.set"
      },
      {
        "@id": "xxxx3",
        "label": "running ICA using Picard",
        "used": "sub-001/eeg/xxx_desc-filtered+downsampled_eeg.set"
      },
      {
        "@id": "xxxx4",
        "label": "extracting epochs from -500 ms to 1000 ms",
        "used": "sub-001/eeg/xxx_desc-filtered+downsampled+ICA_eeg.set"
      }
    ],
    "prov:Entity": [
      {
        "@id": "sub-001/eeg/xxx_desc-filtered_eeg.set",
        "generatedBy": "xxxx1"
      },
      {
        "@id": "sub-001/eeg/xxx_desc-filtered+downsampled_eeg.set",
        "generatedBy": "xxxx2"
      },
      {
        "@id": "sub-001/eeg/xxx_desc-filtered+downsampled+ICA_eeg.set",
        "generatedBy": "xxxx3"
      },
      {
        "@id": "sub-001/eeg/xxx_desc-filtered+downsampled+ICA+epoch_eeg.set",
        "generatedBy": "xxxx4"
      }
    ]
  }
}

@CPernet
Copy link
Collaborator

CPernet commented Jul 3, 2023

I'm working on the guidelines and it is mentioned that jsonld is not mandatory, still if not used we need to document the chain in simple terms - the discussion was along the lines of

<source_entities>_desc-preproc_<suffix>.<ext>
and have a preproc.json -- what happened to that? this does not document provenance, only the chain of events

@arnodelorme
Copy link
Collaborator

arnodelorme commented Jul 3, 2023 via email

@CPernet
Copy link
Collaborator

CPernet commented Jul 3, 2023

so no more preproc.json?

@CPernet
Copy link
Collaborator

CPernet commented Jul 3, 2023

<source_entities>_desc-preproc_<suffix>.<ext>
preproc.json (not provenance, just the chain)
{
"step1": "downsampling at 250Hz",
"step2": "high pass filtering at 0.05Hz",
"step3": "..."
}

Anything more than that can use prov.

<source_entities>_desc-preproc_<suffix>.json will anyway contain all sorts of relevant info for re-usage.

@arnodelorme
Copy link
Collaborator

arnodelorme commented Jul 3, 2023 via email

@robertoostenveld
Copy link
Collaborator

robertoostenveld commented Jul 3, 2023

There would indeed not be a preproc.json to go along with xxx_desc-preproc_eeg.json. Rather the human readable description goes in a descriptions.tsv file with (at least) two columns: "desc_id" and "description". That table can be at the subject/session/modality level, but due to inheritance can also be at the top level of the derivative dataset.

Machine readable details about the processing go in the prov.jsonld file.

@CPernet
Copy link
Collaborator

CPernet commented Jul 3, 2023

ah yes thx!

@robertoostenveld
Copy link
Collaborator

I have a FieldTrip example that shows how it would look like, although I did not add the prov.jsonld files yet. It is now uploading to the cloud, takes some time since hotel wifi is slow...

@CPernet
Copy link
Collaborator

CPernet commented Jul 3, 2023

i wish I was with you guys :-(

@robertoostenveld
Copy link
Collaborator

I have a FieldTrip example that shows how it would look like, although I did not add the prov.jsonld files yet.

The example is available from https://surfdrive.surf.nl/files/index.php/s/M9KiX2r9DcW7ujI

The bids_derivative folder contains a derived dataset, more or less following the pipeline that is documented on one of the FieldTrip tutorials.

It does not yet include the prov.jsonld files.

@CPernet
Copy link
Collaborator

CPernet commented Sep 11, 2023

In https://bids.neuroimaging.io/bep023 (PET) I used the same approach but I also capture the chain, again is a non full provenance way, thx to free text

sub-X_desc-preproc_pet.nii  
sub-X_desc-proc_pet.nii  

descriptions.tsv

desc_id description
mc Motion correction with MCFLIRT
sm Smoothing at 8mm
pvc Partial volume correction
preproc Data were preprocessed in the following order: motion corrected, registered to the T1w MRI image and smoothed,
proc From the preproc image, partial volume correction was performed followed by kinetic modelling

@CPernet
Copy link
Collaborator

CPernet commented Sep 11, 2023

@robertoostenveld since one uses desc- I'm guessing 1st column should be desc-id and not description_id (as in your file)

@robertoostenveld
Copy link
Collaborator

@robertoostenveld since one uses desc- I'm guessing 1st column should be desc-id and not description_id (as in your file)

As it is participant_id with the participants.tsv https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#participants-file (and not sub_id), I thought we would use the same here. But that is a detail, and I am flexible in this respect.

For easy access to all, my descriptions.tsv contains

description_id description
preproc preprocessing with a pre-stimulus baseline correction
avg averaging over trials, per condition
planar planar gradient transformation
combined combined planar gradient transformation

The actual order of the steps (preproc, avg, planar, combined) cannot yet be derived from the filenames or descriptions.tsv, that would require the provenance (or the Steps as you have them in the PET derivatives bep023 google doc). The provenance would detail that the output of the preproc serves as input to the avg step.

@cmaumet
Copy link

cmaumet commented Sep 19, 2023

Note: as a follow-up to the BIDS-Prov examples we worked on together in Copenhagen, an updated version is now available in the BIDS-Prov repo: https://github.com/bids-standard/BEP028_BIDSprov/blob/master/examples/simple_example/simple_example.prov.jsonld

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants