Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding cycle consistency loss and VampPrior to scVI #2383

Open
Hrovatin opened this issue Jan 9, 2024 · 15 comments · May be fixed by #2421
Open

Adding cycle consistency loss and VampPrior to scVI #2383

Hrovatin opened this issue Jan 9, 2024 · 15 comments · May be fixed by #2421

Comments

@Hrovatin
Copy link

Hrovatin commented Jan 9, 2024

We were considering adding cycle-consistency loss and VampPrior as described in Integrating single-cell RNA-seq datasets with substantial batch effects | bioRxiv) to scvi-tools package, preferably directly into scVI.

The current implementation is based on a cVAE that works with normalized+log transformed data: https://github.com/theislab/cross_system_integration/blob/main/cross_system_integration/model/_xxjointmodel.py

Adding VampPrior would require changes in Model to initialise VampPrior pseudoinputs https://github.com/theislab/cross_system_integration/blob/322549224cddcd5d375f8b49cb9f0e0e77c2be1f/cross_system_integration/model/_xxjointmodel.py#L87C10-L87C10 and in module to replace prior: https://github.com/theislab/cross_system_integration/blob/322549224cddcd5d375f8b49cb9f0e0e77c2be1f/cross_system_integration/module/_xxjointmodule.py#L165C11-L165C11

Adding cycle consistency would require changes in adata setup to specify which batch covariate should be used as the "system" that is specifically corrected for with Lcyc - we specify systems and batches within systems (not corrected by Lcyc) for which batch_key and categorical/continous_covariate_keys could be used respectively. Besides, changes in the forward pass and an additional loss would need to be added, see this file: https://github.com/theislab/cross_system_integration/blob/multiple_sys_model/cross_system_integration/module/_xxjointmodule.py#L235 - This branch has slightly different implementation than in the paper, but I would suggest adding this one as it is more general.

@martinkim0
Copy link
Contributor

Hi, thanks for this suggestion. I think adding these changes would be a great addition to the codebase. However, for now I am opposed to adding them to the scVI model directly for two reasons:

  1. Backward compatibility: We can't change the defaults in the scVI model due to compatibility and reproducibility with previous versions of scvi-tools
  2. Visibility: If your goal is to increase visibility for the publication, then it's more beneficial to implement and announce this as a new model under scvi.external

What do you think? Are you interested in contributing code directly or wanted us to add the changes? Either works, but the latter might take a bit longer.

@Hrovatin
Copy link
Author

Hrovatin commented Jan 9, 2024

I was initially thinking that this could be an optional extension of scVI, making both not used by default (via new parameters with defaults being False). The users could then decide to enable it in their new code. I thought this would be much easier for maintenance than adding a new model and would also improve outreach as anyone using scVI could directly test it.

Alternatively, I could also add it as a new model - the current code is anyway based on scvi-tools although with quite some modifications. However, I am not sure that adding a new model would make sense as I will not be able to maintain it in the future and it is directly compatible with scVI anyways.

I can help contribute to some extent as I am familiar with the implementation, but support on your side would also be appreciated - we can discuss it.

@canergen
Copy link
Contributor

canergen commented Jan 9, 2024

Hi, I really enjoyed reading your preprint. It's interesting to see your good results with CycleConsistencyLoss as it was overintegrating in some examples in our hand (slightly different implementation and this might be the reason). I think CycleConsistencyLoss should be in an external model. It requires major changes to the model and will blow up the code. We currently make sure that external models are compatible with recent code changes. It's not excluded that in the future, external models that are not used after integration into scvi-tools might be depreciated. I don't see this coming before scvi-tools v2.0 though. For VAMPprior, we have MoG priors (similar to VAMP but logits are learnable) in scvi-tools (developed for MrVI) but used in several unpublished new scvi-tools models. Adding those things, doesn't really change the code and this could live in scVI IMO.
@martinkim0 A general wrapper for priors might be the best case to implement those and having prior_kwargs in the module code.

@Hrovatin
Copy link
Author

Hrovatin commented Jan 9, 2024

Agreed that cycle consistency adds quite an overhead. Will probably proceed with making it external then, making minimal adaptations to make it compatible with the latest scvi version. We can then later discuss if some code needs to be improved to reduce repetition etc.

For us using an appropriate distance in cycle consistency made an important impact - we tested a few. - What was different in your implementation?

For priors, I already have an implementation like you suggest (for standard normal, Vamp, and GMM), although some changes outside of this are also needed (e.g. pseudoinputs etc).

@Hrovatin
Copy link
Author

Hrovatin commented Jan 14, 2024

@canergen I now added the model here.
I still need to make a tutorial and make sure that the integration still looks ok after re-implementation (for now I only did basic tests that it runs).
After I add the tutorial should I just open a PR or is there you want me to already adapt before?

Also, for the tutorial if I have a small data object (38.2 MB). - Where should it be stored? In scvi/external/mypackage/data/?
And where in the tutorials folder would you suggest me to put the tutorial? In scRNA-seq?

@canergen
Copy link
Contributor

@PierreBoyeau used cycle consistency loss in our hands, do you see substantial differences to our trials?
In which cases do you recommend GMM over VAMP?
@martinkim0 Can you review the code? I would suggest to have the function to multiple priors in the CycleConsistency external model and then we can port it to the other scVI models from there. Does it sound reasonable to you?

@Hrovatin
Copy link
Author

@martinkim0 I was also asked to add existing scArches functionality (probably from ArchesMixin) into my model. Would this be currently even possible given that my covariate structure is so different?

@martinkim0
Copy link
Contributor

@Hrovatin For scArches, compatibility mostly depends on how you register covariates with AnnDataManager as well as the names of the base neural net components used in the model. Feel free to open a PR from your fork, and I can take a look/review. You can also add tests to preliminarily see if scArches works out-of-the-box with your model.

@canergen Sounds good, we can keep the priors code in external for now.

@Hrovatin
Copy link
Author

In that case I assume it will not be compatible - are you planing to extend scArches to be more flexible wrt covariates?

I now also made the PR and added a tutorial here: scverse/scvi-tutorials#212

@Hrovatin
Copy link
Author

We will be hopefully re-submitting our manuscript next week. Is it ok if we write that we added our model to scvi-tools external and it is still under review?

@martinkim0
Copy link
Contributor

Sorry for the late response. Totally fine with me, please go ahead!

@Hrovatin
Copy link
Author

Hrovatin commented Feb 5, 2024

I was wondering when the PR for the code & the tutorial could be merged? - I have a talk about this work at M2D2 series from Valence labs/MILA next Tuesday and it would be great if I could provide links to the scvi-tools repo rather than my fork.

@canergen
Copy link
Contributor

canergen commented Feb 5, 2024

Hi, to do some expectation management. The code will be part of a stable relese with version 1.2. We are currently finishing version 1.1. Timeline for version 1.2 is on the order of 4 months. This timeline is long as we for example work with precommits to make sure every change works within the scverse ecosystem.
I would recommend using the branch for reproducibility purposes or the original repository before version 1.2 is released even after the branch is merged into main. We will not merge it into main before version 1.1 is released.

@martinkim0
Copy link
Contributor

martinkim0 commented Feb 5, 2024

@Hrovatin Sorry for the delay in reviewing - I've taken another pass at it now. I think we'll need to add a couple more changes and trim down code where needed before merging into main. We can try our best to get it in before Tuesday, but no promises. In terms of linking to the scvi-tools repo, feel free to reference the PR.

@Hrovatin
Copy link
Author

Hrovatin commented Feb 6, 2024

Thank you for letting me know - I just wanted to make sure to share a link version that is not on my own fork which will not be maintained in the future. I will then add links to the PR if we cant merge into main beforehand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants