Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scRNA-seq from different donors as a genotype-vcf input for vireo #100

Open
mariafiruleva opened this issue Aug 14, 2023 · 6 comments
Open
Labels
enhancement New feature or request

Comments

@mariafiruleva
Copy link

mariafiruleva commented Aug 14, 2023

Hi!

First of all, thank you for the great tool.

I have single-cell RNA sequencing (cell-ranger) data for e.g 3 donors (=> 3 bam files, one per donor), as well as pooled scRNA-seq data for the same donors (=> 1 bam file, the same 3 donors).

I want to call variants for the non-pooled scRNA-seq 3 bam files and then use them as donor-wise vcf inputs for vireo in order to demultiplex the pooled one. What is the best approach to do that?

Thank you very much!

Best,
Mariia

@hxj5
Copy link
Collaborator

hxj5 commented Aug 15, 2023

Hi, thanks for the qeuestion.

You may combine the donor-wise VCF files with bcftools merge and then pass the merged VCF to vireo -d $DONOR_GT_FILE. See vireo issue 13 and issue 33 for detailed discussion and its manual for full parameters.

@mariafiruleva
Copy link
Author

Hi, thanks for the qeuestion.

You may combine the donor-wise VCF files with bcftools merge and then pass the merged VCF to vireo -d $DONOR_GT_FILE. See vireo issue 13 and issue 33 for detailed discussion and its manual for full parameters.

Thanks a lot for your feedback!

As far as I understand, cellsnp-lite was used (issue 33) on bulk RNA-seq data which is not my case.

I ran cellsnp-lite using mode 1a (scRNA-seq data with input barcodes & bam files & --genotype): genotype information (GT) only available in the cellSNP.cells.vcf.gz file at single-cell level. I need this information at donor level in order to be able to use it for demultiplexing.

My question is: should I use different mode / specify additional parameters for mode1a / manually extract GT information from cellSNP.cells.vcf.gz?

@hxj5
Copy link
Collaborator

hxj5 commented Aug 16, 2023

EDIT: To genotype 10x scRNA-seq data in a pseudo-bulk manner with cellsnp-lite mode 1b (or mode 2b), it is recommended to subset the BAM file first, by extracting the alignment records with valid cell barcodes only. Here the valid cell barcodes are typically the cell barcodes stored in the cellranger output folder filtered_gene_bc_matrices, which are the cells with high-quality sequencing data.

We may update cellsnp-lite to enable genotyping specific cells from 10x scRNA-seq BAM file in a pseudo-bulk manner without the need to subset (e.g., by simply adding "GT" and/or "PL" fields into cellSNP.base.vcf file, or adding an --bulk option to explicitly inform cellsnp-lite to genotype in a pseudo-bulk manner when -b is specified). (20230824)


original answer:

Thanks for the clarification.

You may try using cellsnp-lite to genotype each donor in a pseudo-bulk manner (e.g., with cellsnp-lite mode 1b & --genotype). The output cellSNP.cells.vcf.gz should contain GT and PL tags (note that GT, GP, PL are all valid values for vireo --genoTag while PL is the default).

@mariafiruleva
Copy link
Author

Thanks for the clarification.

You may try using cellsnp-lite to genotype each donor in a pseudo-bulk manner (e.g., with cellsnp-lite mode 1b & --genotype). The output cellSNP.cells.vcf.gz should contain GT and PL tags (note that GT, GP, PL are all valid values for vireo --genoTag while PL is the default).

Thanks a lot again!

I'm worrying if I should specify --UMItag in that case (mode1b + --genotyping) -- do you have any concerns about that?

@hxj5
Copy link
Collaborator

hxj5 commented Aug 17, 2023

Specifying --UMItag in mode 1b still works, it should count UMIs instead of reads.

@mariafiruleva
Copy link
Author

I ran cellsnp-lite with two modes: one without providing GT, and the other with mode1b and the --genotype (as you suggested).

The results are highly similar, with more than 95% of cells assigned to specific donors based on GT corresponding to cells assigned to anonymously-labelled donors without GT; unassigned cells and doublets were also highly overlapped between the two modes. I was also happy to see that cells identified as unassigned in both modes were low-quality cells, based on their mitochondrial content, number of genes, and number of counts.

Thanks a lot for your help!

@hxj5 hxj5 added the enhancement New feature or request label Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants