Overlap of NUM.SNPS/NUM.READS between different mixed sample results #58

JABioinf · 2022-11-08T01:12:34Z

Thank you for developing this tool!
I've run both demuxlet and freemuxlet (using 1000g-based common variant vcf as suggested, including optimization suggested by [https://github.com/aertslab/popscle_helper_tools]. I've used them for different mixed samples (3 genotype-mixes) with even different genotypes combination (but from the same type of tissue). I often have compatible results between demuxlet and freemuxlet, which makes me think that the implementation works. But I noticed that combinations of NUM.SNPS and NUM.READS by BARCODES tend to overlap even between completely unrelated samples and popscle run, which I would think is unexpected.

For instance intersecting the freemuxlet results between 2 samples of 17k cells and 15k cells, I observe more than 3k cells from the 1st sample with identical values for NUM.SNPS and NUM.READS in the second sample. This include barcodes with more than a thousand SNPs considered. Here are the values found in both samples for barcodes with the highest number of reads considered:
NUM.SNPS NUM.READS
1618 1797
1626 1765
1555 1764
1256 1455
1247 1419
1166 1329
1195 1289
1192 1280
1244 1258
..This looks in my opinion unlikely to happen by chance between unrelated samples.

I am still investigating this observation to identify its reason, and if it could come from my implementation of your software. but:

Have you observed such phenomenon? Is it inherent to the distribution of reads in droplets that by chance results distribution overlap?
If it's not expected, do you have suggestion on how to identify the source of this effect?
Can any of the output by demuxlet/freemuxlet help me determine how come droplet have the same information between unrelated samples?
Could this be a version issue for popscle?
or maybe because I'm filtering the vcf and bam beforehand limiting the number of reads and SNPs investigated?

Thank you for your help (or from anyone else that would have explanation for this).

hyunminkang · 2022-11-11T13:57:45Z

I'm not sure why that would happen. What options did you use? Does it correctly use UB and CB tags?

JABioinf · 2022-11-16T16:31:07Z

Thanks for your reply. I've used default options through the following command:
popscle dsc-pileup --sam $bamloc --vcf $vcfloc --group-list $barcodeloc --out ${sample}.demux.pileup
with barcodeloc=outs/filtered_feature_bc_matrix/barcodes.tsv.gz directly from Cellranger output (10xGenomics sample) and a filtered bam file containing only reads overlapping vcf positions and a cell barcode.
and:
popscle demuxlet --plp Demuxlet/pileupfiles/${sample}.demux.pileup --vcf $vcfloc --field PL --out ${sample}_demuxlet
or for freemuxlet:
popscle freemuxlet --plp Freemuxlet/pileupfiles/${sample}.pileup --nsample 3 --out ${sample}_freemuxlet

Is there any further recommendation to change default parameters?

I've for now confirm the overall accuracy of the deconvolution of demuxlet and freemuxlet using single-genotype sample and an artificial mixture of them (merging the fasq in CellRanger) suggesting implementation and results are correct, but I'd like to still understand this observation.
Best,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overlap of NUM.SNPS/NUM.READS between different mixed sample results #58

Overlap of NUM.SNPS/NUM.READS between different mixed sample results #58

JABioinf commented Nov 8, 2022

hyunminkang commented Nov 11, 2022

JABioinf commented Nov 16, 2022

Overlap of NUM.SNPS/NUM.READS between different mixed sample results #58

Overlap of NUM.SNPS/NUM.READS between different mixed sample results #58

Comments

JABioinf commented Nov 8, 2022

hyunminkang commented Nov 11, 2022

JABioinf commented Nov 16, 2022