Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unifying base data object #49

Open
27 of 33 tasks
RemyLau opened this issue Nov 8, 2022 · 0 comments
Open
27 of 33 tasks

Unifying base data object #49

RemyLau opened this issue Nov 8, 2022 · 0 comments
Labels
Priority-P0 Top priority

Comments

@RemyLau
Copy link
Collaborator

RemyLau commented Nov 8, 2022

Currently, there are several different dataset objects specialized for each task and model (e.g., CellTypeDataset, ClusteringDataset), each of them takes a variety of specialized arguments that are not directly related to the underlying data, e.g., save path, processing scheme, choice of tissue. This complexity makes it quite hard to maintain the code base and implement new methods/datasets.

To improve this situation, we need to isolate raw dataset objects from transformation/processing methods.

  • Base data object
    • Take AnnData as an input and save it as a private attribute (read-only?).
    • Construct data loaders that load g, x, y, etc., to be passed to the model for training/evaluation.
  • Dataset object
    • Download option
    • Transformation option
    • Dataset from paper (preprocessed) -> used to benchmark the reproducibility of the reimplemented model
  • Transformation
    • Leverage functionalities from scanpy (recall that now the base data object store an AnnData object as a (private) attribute

To fix

Single modality

Spatial

Multi modality

  • examples/multi_modality/joint_embedding/scmvae.py
  • examples/multi_modality/joint_embedding/dcca.py
  • examples/multi_modality/joint_embedding/jae.py
  • examples/multi_modality/joint_embedding/scmogcnv2.py
  • examples/multi_modality/joint_embedding/scmogcn.py
  • examples/multi_modality/match_modality/cmae.py
  • examples/multi_modality/match_modality/scmm.py
  • examples/multi_modality/match_modality/scmogcn.py
  • examples/multi_modality/predict_modality/babel.py (update babel example script to use dance data object #89)
  • examples/multi_modality/predict_modality/cmae.py
  • examples/multi_modality/predict_modality/scmm.py
  • examples/multi_modality/predict_modality/scmogcn.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority-P0 Top priority
Projects
None yet
Development

No branches or pull requests

1 participant