Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling SNPs using multiple bam files with single cell data #65

Open
scachero opened this issue Jun 28, 2022 · 5 comments
Open

Calling SNPs using multiple bam files with single cell data #65

scachero opened this issue Jun 28, 2022 · 5 comments

Comments

@scachero
Copy link

Hi,

I am trying to call SNPs pooling data from multiple 10X cellranger bam files (e.g. SIGAA8, SIGAB8, SIGAC8). Each bam contain cells from 5 genotypes. From what i can see this is a Mode 1a (droplet-based single cells) case but that mode seems to only accept one bam at a time and i want to pool them to have more cells for the SNP calling.

How would you recommend to do this?

I also thought of merging the bams but then the shared barcodes would look like a single cell to cellsnp-lite, wouldn't they?

Many thanks!

Seba

@hxj5
Copy link
Collaborator

hxj5 commented Jun 29, 2022

Hi, you may use -S option to specify multiple bam files in mode 1a.

Yes, the shared barcodes in different bam files would be treated as one single cell. To distinguish these shared barcodes, you may add suffix to the barcodes, e.g., -1 to the barcodes of the first bam & input barcode list, -2 to the second.

@scachero
Copy link
Author

scachero commented Jun 29, 2022

Thanks a lot for the answer! So i should use '-S bamfile1.bam bamfile2.bam' to specify multiple bams and then i make a barcode tsv file for each bam adding -1 and -2. Which flag shuold i use to pass these multiple files? I assume the order of the barcode files should be the same as the bam files.

Thanks a lot!

Seba

@hxj5
Copy link
Collaborator

hxj5 commented Jun 30, 2022

you could merge all barcode tsv files into one tsv file and specify with -b option. When merged, the order of the barcode files does not matter as cellsnp-lite can distinguish the barcodes if the suffixes have been added.

ps. A file listing all bam files is expected for -S option.

@scachero
Copy link
Author

But how can cellsnp-lite assign in cases of barcode duplications between bams? My understanding is that the bam file has the barcode sequence but not the suffixes.

This would mean that while the single merged tsv file with all barcodes will not have duplications (actactactact-1 will be different from actactactact-2),

Nevertheless, the same identical barcodes will be present in two different bams (bam1 will have actactactact and bam 2 will also have actactactact).

This is why i thought that correpsondence had to be kept between the barcodes and the bam file they correspond to.

Can you clarify where I am getting this wrong Xianjie?

Thanks a lot for your help!

Seba

@hxj5
Copy link
Collaborator

hxj5 commented Jun 30, 2022

Hi, the barcodes in the bam files (e.g., barcode string in the CB tag), together with the barcodes in tsv files, are also expected to be modifed to have the suffixes (-1 to the first bam, -2 to the second). This could be done with a script (e.g., with pysam or self-written file parser).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants