Generate dummy BIDS datasets for performance evaluation of PyBIDS

This project has been developed to generate dummy BIDS datasets for testing PyBIDS performance on large datasets.

In particular, it contains a dummy BIDS dataset with one subject and python scripts to:

code/produce_dummy_bids_datasets.py: generate dummy BIDS datasets with a varying number of subject
code/evaluate_pybids.py: evaluate pybids on these datasets
code/plot_results.py: plot the results and save the runtime and memory plots in HTML files

License

The repository contains code and data with different licenses:

The code is licensed under the Apache 2.0 license.
The dummy nifti image used in this repository is the result of a dicom to nifti conversion with dcm2niix of the dummy dicom series provided by https://github.com/TranslationalML/PACSMAN_data, and distributed under the Creative Commons Attribution 4.0 International Public License.

Please check the respective LICENSE files for more details.

Dependencies

The project requires Python >=3.10 and depends on the following Python packages:

pybids==0.16.4
pandas
tqdm
argparse
memory_profiler
plotly
kaleido

A requirements.txt file is provided for convenience to install them, which consists of running in a terminal:

pip install -r requirements.txt

Results

Description

Machine: Ubuntu 22.04, 32 CPUs, 64GB
Test set: 4 dummy BIDS datasets with 10, 100, 1000, and 10000 subjects respectively
Contexts: "BIDSLayout initialization", "BIDSLayout update"
Modes: "no-database-load", "database-load"
PyBIDS version: 0.16.4

	Runtime	Memory
BIDSLayout initialization
BIDSLayout update

Notes

In this project, we patched the vanilla implementation of the BIDSLayoutIndexer with a new method (index_dir) which indexes only a specific directory and its sub-directories.

https://github.com/TranslationalML/pybids_performance_eval/blob/83192c69b6fe4743ff2e4a45700d474de41e448e/code/evaluate_pybids.py#L146-L181

Indeed, PyBIDS does not provide any public method to update a BIDSLayout object (Referred as incremental indexing - See PyBIDS issue). If new files are added to the BIDS dataset, a new BIDSLayout has to be created which reindex the whole content, even though a database file is employed.

How to reproduce

How to run the whole workflow

For convenience, we are providing the script run_eval_workflow.sh which will take care of running all the scripts described below. With this script, you only have to run in a terminal the following command:

sh run_eval_workflow.sh

This will generate the TSV results file and all the figures in the root directory of the project.

Note that you would need to have Python >=3.10 with all the dependencies installed (See List of dependencies).

How to generate the dummy BIDS datasets

This is achieved by using the script code/produce_dummy_bids_datasets.py.

Usage

usage: produce_dummy_bids_datasets.py [-h] [--template_bids_dir TEMPLATE_BIDS_DIR]
                                     [--n_subjects N_SUBJECTS [N_SUBJECTS ...]] [--output_dir OUTPUT_DIR]

Produce dummy BIDS dataset(s) for testing.

options:
    -h, --help            show this help message and exit
    --template_bids_dir TEMPLATE_BIDS_DIR
                        Path to template BIDS dataset. If not specified, the template BIDS dataset will be the
                        root directory.
    --n_subjects N_SUBJECTS [N_SUBJECTS ...]
                        List of number of subjects to produce. Each item in the list will produce a dataset with
                        that number of subjects.
    --output_dir OUTPUT_DIR
                        Path to output directory. If not specified, the output directory will be derivatives/ of
                        the root directory.

Example

If you wish to generate 4 dummy datasets in the derivatives/ folder with a number of 10, 100, 1000, and 10000 subjects, this would correspond to the following command:

python /path/to/produce_dummy_bids_datasets.py --n_subjects 10 100 1000 10000

How to run the performance evaluation

This is achieved by using the script code/evaluate_pybids.py.

Usage

usage: evaluate_pybids.py [-h] [--n_subjects N_SUBJECTS [N_SUBJECTS ...]] [--datasets_root DATASETS_ROOT]
                          [-o OUTPUT_TSV]

Evaluate the performance of PyBIDS on dummy BIDS datasets generated by `produce_dummy_bids_datasets.py`.

options:
    -h, --help            show this help message and exit
    --n_subjects N_SUBJECTS [N_SUBJECTS ...]
                            Number of subjects to include in the dataset
    --datasets_root DATASETS_ROOT
                            Path to the root directory containing the datasets, generated by the
                            `produce_dummy_bids_datasets.py` script. Note that the dataset directories are expected to be
                            named dummy-<n_subjects> where <n_subjects> is the number of subjects. Default: derivatives/ in
                            the root directory.
    -o OUTPUT_TSV, --output_tsv OUTPUT_TSV
                            Path to output TSV file. If not specified, the results will be printed to stdout.

Example

For instance, if you wish to evaluate pybids on the 4 dummy datasets previously created in the derivatives/ folder, you would run the following command:

python /path/to/evaluate_pybids.py \
    --n_subjects 10 100 1000 10000 \
    --output_tsv '/path/to/profiling_results.tsv'

How to generate the plots

This is achieved by using the script code/plot_results.py.

Usage

usage: plot_results.py [-h] --input_tsv INPUT_TSV

Plot the results of the PyBIDS evaluation and save them as HTML and PNG files. The output HTML files will be saved with
the same prefix as the input TSV file, but with a suffix and extension added to the end. For example, if the input TSV
file is named `profiling_results.tsv`, then the output HTML/PNG files will be named `profiling_results_time.html/png`
and `profiling_results_memory.html/png`.

options:
    -h, --help            show this help message and exit
    --input_tsv INPUT_TSV
                            Path to input TSV file. If not specified, the results will be read from stdin.

Example

For instance, if you wish to create the plots from the evaluation previously run, you would run the following command:

python /path/to/plot_results.py \
    --input_tsv '/path/to/profiling_results.tsv'

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
code		code
sub-000001		sub-000001
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset_description.json		dataset_description.json
participants.tsv		participants.tsv
run_eval_workflow.sh		run_eval_workflow.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

sub-000001

sub-000001

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

dataset_description.json

dataset_description.json

participants.tsv

participants.tsv

run_eval_workflow.sh

run_eval_workflow.sh

Repository files navigation

Generate dummy BIDS datasets for performance evaluation of PyBIDS

License

Dependencies

Results

Description

Notes

How to reproduce

How to run the whole workflow

How to generate the dummy BIDS datasets

Usage

Example

How to run the performance evaluation

Usage

Example

How to generate the plots

Usage

Example

About

Releases 2

Languages

License

Lundin-brain-tumour-research-center/pybids_performance_eval

Folders and files

Latest commit

History

Repository files navigation

Generate dummy BIDS datasets for performance evaluation of PyBIDS

License

Dependencies

Results

Description

Notes

How to reproduce

How to run the whole workflow

How to generate the dummy BIDS datasets

Usage

Example

How to run the performance evaluation

Usage

Example

How to generate the plots

Usage

Example

About

Topics

Resources

License

Stars

Watchers

Forks

Languages