Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminology change: from datasets > dataset > pipeline; pipelines > pipeline_group > pipeline #30

Open
2 tasks done
adlersantos opened this issue May 18, 2021 · 1 comment
Assignees
Labels
cleanup Cleanup or refactor code revision: readme Improvements or additions to the README

Comments

@adlersantos
Copy link
Member

adlersantos commented May 18, 2021

Description

The concept of a dataset is starting to become an overloaded term. It could mean the following:

  • A BigQuery dataset which is a collection of tables. This is the original definition we based the datasets folder from.
  • A collection of datasets can also be called a dataset. e.g. the Vizgen dataset, which includes the Mouse Brain Map dataset.
  • The other way applies just as well: a subset of larger dataset/s can also be called a dataset. e.g. the Mouse Brain Map dataset which is part of the Vizgen dataset

Plus, in the future, we can expect pipelines that need to onboard multiple datasets in one go. Such a concept is difficult to align using the current hierarchy.

Proposed

The proposal here is to switch from using the datasets/DATASET/PIPELINE hierarchy into the pipelines/PIPELINE_GROUP/PIPELINE hierarchy.

# CURRENT 
datasets/
    vizgen/                      (dataset)
        mouse_brain_map          (pipeline)
        some_genome_collection   (pipeline)
    covid19/                     (dataset)
        national_cases           (pipeline)
        racial_stats             (pipeline)        


# PROPOSED
pipelines/
    vizgen/                      (pipeline group)
        mouse_brain_map          (pipeline)
        some_genome_collection   (pipeline)
    covid19/                     (pipeline group)
        national_cases           (pipeline)
        racial_stats             (pipeline)        

Checklist

  • I created this issue in accordance with the Code of Conduct.
  • This issue is appropriately labeled.
@adlersantos adlersantos added cleanup Cleanup or refactor code revision: readme Improvements or additions to the README labels May 18, 2021
@adlersantos adlersantos changed the title Terminology change: from datasets => pipelines; dataset => pipeline_group Terminology change: from datasets > dataset > pipeline; pipelines > pipeline_group > pipeline May 18, 2021
@adlersantos adlersantos self-assigned this May 26, 2021
@adlersantos
Copy link
Member Author

@shanecglass Hope you can review if this makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Cleanup or refactor code revision: readme Improvements or additions to the README
Projects
None yet
Development

No branches or pull requests

1 participant