Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlap of NUM.SNPS/NUM.READS between different mixed sample results #58

Open
JABioinf opened this issue Nov 8, 2022 · 2 comments
Open

Comments

@JABioinf
Copy link

JABioinf commented Nov 8, 2022

Thank you for developing this tool!
I've run both demuxlet and freemuxlet (using 1000g-based common variant vcf as suggested, including optimization suggested by [https://github.com/aertslab/popscle_helper_tools]. I've used them for different mixed samples (3 genotype-mixes) with even different genotypes combination (but from the same type of tissue). I often have compatible results between demuxlet and freemuxlet, which makes me think that the implementation works. But I noticed that combinations of NUM.SNPS and NUM.READS by BARCODES tend to overlap even between completely unrelated samples and popscle run, which I would think is unexpected.

For instance intersecting the freemuxlet results between 2 samples of 17k cells and 15k cells, I observe more than 3k cells from the 1st sample with identical values for NUM.SNPS and NUM.READS in the second sample. This include barcodes with more than a thousand SNPs considered. Here are the values found in both samples for barcodes with the highest number of reads considered:
NUM.SNPS NUM.READS
1618 1797
1626 1765
1555 1764
1256 1455
1247 1419
1166 1329
1195 1289
1192 1280
1244 1258
..This looks in my opinion unlikely to happen by chance between unrelated samples.

I am still investigating this observation to identify its reason, and if it could come from my implementation of your software. but:

  • Have you observed such phenomenon? Is it inherent to the distribution of reads in droplets that by chance results distribution overlap?
  • If it's not expected, do you have suggestion on how to identify the source of this effect?
  • Can any of the output by demuxlet/freemuxlet help me determine how come droplet have the same information between unrelated samples?
  • Could this be a version issue for popscle?
  • or maybe because I'm filtering the vcf and bam beforehand limiting the number of reads and SNPs investigated?

Thank you for your help (or from anyone else that would have explanation for this).

@hyunminkang
Copy link
Contributor

I'm not sure why that would happen. What options did you use? Does it correctly use UB and CB tags?

@JABioinf
Copy link
Author

Thanks for your reply. I've used default options through the following command:
popscle dsc-pileup --sam $bamloc --vcf $vcfloc --group-list $barcodeloc --out ${sample}.demux.pileup
with barcodeloc=outs/filtered_feature_bc_matrix/barcodes.tsv.gz directly from Cellranger output (10xGenomics sample) and a filtered bam file containing only reads overlapping vcf positions and a cell barcode.
and:
popscle demuxlet --plp Demuxlet/pileupfiles/${sample}.demux.pileup --vcf $vcfloc --field PL --out ${sample}_demuxlet
or for freemuxlet:
popscle freemuxlet --plp Freemuxlet/pileupfiles/${sample}.pileup --nsample 3 --out ${sample}_freemuxlet

Is there any further recommendation to change default parameters?

I've for now confirm the overall accuracy of the deconvolution of demuxlet and freemuxlet using single-genotype sample and an artificial mixture of them (merging the fasq in CellRanger) suggesting implementation and results are correct, but I'd like to still understand this observation.
Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants