Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve runtime of kBET #68

Open
3 tasks
mbuttner opened this issue Apr 29, 2022 · 5 comments
Open
3 tasks

Improve runtime of kBET #68

mbuttner opened this issue Apr 29, 2022 · 5 comments

Comments

@mbuttner
Copy link
Collaborator

mbuttner commented Apr 29, 2022

kBET is slow and partly because it's running many computations multiple times (for instance, to obtain good stats for the rejection rate).

  • ensure that neighbourhoods are computed at most once
  • revisit the subsampling implementation
  • use a more efficient kNN computation (FNN at the moment)
@liliay
Copy link

liliay commented Nov 9, 2022

Hi there,

First of all thank you for kBET, very useful tool.
I am trying to kBET to assess the integration quality of single cell samples (processed with Seurat). First question : in this case, the batch number would be the number of cells in my integrated objects, OR the number of samples (ie. "stimulated", "non stimulated") ?
I used the recommanded lines (separating knn computation from kBET function) unfortunaltely the running times are huge.
My object is 17k cells x 20k genes ?
Would you advise me to randomly subset my data before getting to kBET ?

Thank you in advance for you help,

Best,
Lilia

@mbuttner
Copy link
Collaborator Author

Hi @liliay

thank you for trying kBET.

  1. I would use the batch label of the cells, not the condition.
  2. About the runtime: I recommend to reduce the number of initial dimensions. You can compute a PCA on the data and use only the first 50 PCs, or in case you have integrated the data with Seurat, take the embedding space as input. This should be on much lower dimension. Random subsampling might not be necessary.

Best,
Maren

@adamgayoso
Copy link

Just wanted to plug our extremely fast python version of kbet

YosefLab/scib-metrics#60

It will be in this package soon. It does not have all the same functionality (no bootstrapping currently), but these things should not be difficult to add.

@mbuttner
Copy link
Collaborator Author

@adamgayoso

Thanks for sharing this! Your code looks quite neat and it is fantastic so learn about the speed-up. Did you also include an estimate on the neighborhood size? I might have missed in the code.

@adamgayoso
Copy link

we did not as it seemed the original scib package used a fixed k

https://github.com/theislab/scib/blob/da9c39b89b95b2ec34b6f547445e931571120ba6/scib/metrics/kbet.py#L144-L151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants