You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using dsc-pileup and freemuxlet to identify donors/doublets of a two-donor mixture. I pre-filter the bam file using the popscle_helper_tools (https://github.com/aertslab/popscle_helper_tools) linked in other github issues. I find that for some of my samples, the output from freemuxlet (.best file) does not contain all the barcodes in the original barcodes.tsv file. I can see in the log file that all the barcodes are read into dsc-pileup, but slightly fewer are read into freemuxlet. I think they are being filtered in dsc-pileup, but can't find any info about why. Is this expected behavior and is there a way to track why these cells are removed?
Thanks!
Here are some relevant snippets of the log file:
The following parameters are available. Ones with "[]" are in effect:
Options for input SAM/BAM/CRAM : --sam [A_4dpi_MOCK_1_possorted_genome_filtered_bam.bam],
--tag-group [CB], --tag-UMI [UB]
Options for input VCF/BCF : --vcf [ucsc.hg38.liftover.out.nochr.c1_22.nohbb_renamed.vcf.gz],
--sm, --sm-list
Output Options : --out [A_4dpi_MOCK_1_pileup_filteredbam],
--sam-verbose [1000000],
--vcf-verbose [10000], --skip-umi
SNP-overlapping Read filtering Options : --cap-BQ [40], --min-BQ [13],
--min-MQ [20], --min-TD,
--excl-flag [3844]
Cell/droplet filtering options : --group-list [barcodes.tsv],
--min-total, --min-uniq, --min-snp
Run with --help for more detailed help messages of each argument.
NOTICE [2021/02/08 17:29:21] - Finished loading 16695 droplet/cell barcodes to consider
NOTICE [2021/02/08 17:29:21] - Initializing BCF reader..
NOTICE [2021/02/08 17:29:21] - Finished identifying 0 samples to load from VCF/BCF
NOTICE [2021/02/08 17:29:21] - Initializing SAM reader..
NOTICE [2021/02/08 17:29:21] - Reading 0 reads at chr1:1 and skipping 0
NOTICE [2021/02/08 17:30:36] - Reading 1000000 reads at chr1:6197681 and skipping 482659
NOTICE [2021/02/08 17:31:19] - WARNING: Cannot find UMI tag UB from 1981277-th read A00111:625:HT7HLDSXY:3:2666:19361:10160 at chr1:15734312-1573443
3. Treating all of them as a single UMI
NOTICE [2021/02/08 22:45:32] - Finished reading 216797 markers from the VCF file
NOTICE [2021/02/08 22:45:32] - Total number input reads : 395101280
NOTICE [2021/02/08 22:45:32] - Total number of read-QC-passed reads : 206083344
NOTICE [2021/02/08 22:45:32] - Total number of skipped reads with ignored barcodes : 0
NOTICE [2021/02/08 22:45:32] - Total number of non-skipped reads with considered barcodes : 206083344
NOTICE [2021/02/08 22:45:32] - Total number of gapped/noninformative reads : 106047105
NOTICE [2021/02/08 22:45:32] - Total number of base-QC-failed reads : 2583351
NOTICE [2021/02/08 22:45:32] - Total number of redundant reads : 6992853
NOTICE [2021/02/08 22:45:32] - Total number of pass-filtered reads : 90460035
NOTICE [2021/02/08 22:45:32] - Total number of pass-filtered reads overlapping with multiple SNPs : 15008085
NOTICE [2021/02/08 22:45:32] - Starting to prune out cells with too few reads...
NOTICE [2021/02/08 22:45:32] - Finishing pruning out 0 cells with too few reads...
NOTICE [2021/02/08 22:45:32] - Writing cell information
NOTICE [2021/02/08 22:45:32] - Finished writing cell information
NOTICE [2021/02/08 22:45:32] - Build indices for sparse matrix
NOTICE [2021/02/08 23:04:51] - Finished builing indices for sparse matrix
Finished dsc-pileup
and reading into freemuxlet. Can see that it is processing 16,616 droplets instead of the 16,695 that go into dsc-pileup.
The following parameters are available. Ones with "[]" are in effect:
Options for input pileup : --plp [A_4dpi_MOCK_1_pileup_filteredbam],
--init-cluster
Output Options : --out [A_4dpi_MOCK_1_freemuxlet],
--nsample [2], --aux-files,
--verbose [100]
Options for statistical inference : --doublet-prior [0.50],
--geno-error [0.10], --bf-thres [5.41],
--frac-init-clust [1.00],
--iter-init [10], --keep-init-missing,
--randomize-singlet-score, --seed
Read filtering Options : --cap-BQ [20], --min-BQ [13]
Cell/droplet filtering options : --group-list, --min-total, --min-umi,
--min-snp
Run with --help for more detailed help messages of each argument.
NOTICE [2021/02/08 23:13:21] - Loading pileup information with prefix A_4dpi_MOCK_1_pileup_filteredbam
NOTICE [2021/02/08 23:13:21] - Reading barcode information from A_4dpi_MOCK_1_pileup_filteredbam.cel.gz..
NOTICE [2021/02/08 23:13:21] - Finished loading 16616 droplets, skipping 0
NOTICE [2021/02/08 23:13:21] - Reading variant information from A_4dpi_MOCK_1_pileup_filteredbam.var.gz..
NOTICE [2021/02/08 23:13:21] - Finished loading 216797 variants..
NOTICE [2021/02/08 23:13:21] - Reading pileup information from A_4dpi_MOCK_1_pileup_filteredbam.plp.gz..
NOTICE [2021/02/08 23:15:31] - Finished loading 111986528 UMIs in total..
The text was updated successfully, but these errors were encountered:
Hi,
I'm using dsc-pileup and freemuxlet to identify donors/doublets of a two-donor mixture. I pre-filter the bam file using the popscle_helper_tools (https://github.com/aertslab/popscle_helper_tools) linked in other github issues. I find that for some of my samples, the output from freemuxlet (.best file) does not contain all the barcodes in the original barcodes.tsv file. I can see in the log file that all the barcodes are read into dsc-pileup, but slightly fewer are read into freemuxlet. I think they are being filtered in dsc-pileup, but can't find any info about why. Is this expected behavior and is there a way to track why these cells are removed?
Thanks!
Here are some relevant snippets of the log file:
and reading into freemuxlet. Can see that it is processing 16,616 droplets instead of the 16,695 that go into dsc-pileup.
The text was updated successfully, but these errors were encountered: