Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing barcodes in freemuxlet output #40

Open
lb15 opened this issue Feb 10, 2021 · 2 comments
Open

missing barcodes in freemuxlet output #40

lb15 opened this issue Feb 10, 2021 · 2 comments

Comments

@lb15
Copy link

lb15 commented Feb 10, 2021

Hi,

I'm using dsc-pileup and freemuxlet to identify donors/doublets of a two-donor mixture. I pre-filter the bam file using the popscle_helper_tools (https://github.com/aertslab/popscle_helper_tools) linked in other github issues. I find that for some of my samples, the output from freemuxlet (.best file) does not contain all the barcodes in the original barcodes.tsv file. I can see in the log file that all the barcodes are read into dsc-pileup, but slightly fewer are read into freemuxlet. I think they are being filtered in dsc-pileup, but can't find any info about why. Is this expected behavior and is there a way to track why these cells are removed?

Thanks!

Here are some relevant snippets of the log file:

The following parameters are available. Ones with "[]" are in effect:
           Options for input SAM/BAM/CRAM : --sam [A_4dpi_MOCK_1_possorted_genome_filtered_bam.bam],
                                            --tag-group [CB], --tag-UMI [UB]
                Options for input VCF/BCF : --vcf [ucsc.hg38.liftover.out.nochr.c1_22.nohbb_renamed.vcf.gz],
                                            --sm, --sm-list
                           Output Options : --out [A_4dpi_MOCK_1_pileup_filteredbam],
                                            --sam-verbose [1000000],
                                            --vcf-verbose [10000], --skip-umi
   SNP-overlapping Read filtering Options : --cap-BQ [40], --min-BQ [13],
                                            --min-MQ [20], --min-TD,
                                            --excl-flag [3844]
           Cell/droplet filtering options : --group-list [barcodes.tsv],
                                            --min-total, --min-uniq, --min-snp

Run with --help for more detailed help messages of each argument.

NOTICE [2021/02/08 17:29:21] - Finished loading 16695 droplet/cell barcodes to consider
NOTICE [2021/02/08 17:29:21] - Initializing BCF reader..
NOTICE [2021/02/08 17:29:21] - Finished identifying 0 samples to load from VCF/BCF
NOTICE [2021/02/08 17:29:21] - Initializing SAM reader..
NOTICE [2021/02/08 17:29:21] - Reading 0 reads at chr1:1 and skipping 0
NOTICE [2021/02/08 17:30:36] - Reading 1000000 reads at chr1:6197681 and skipping 482659
NOTICE [2021/02/08 17:31:19] - WARNING: Cannot find UMI tag UB from 1981277-th read A00111:625:HT7HLDSXY:3:2666:19361:10160 at chr1:15734312-1573443
3. Treating all of them as a single UMI


NOTICE [2021/02/08 22:45:32] - Finished reading 216797 markers from the VCF file
NOTICE [2021/02/08 22:45:32] - Total number input reads : 395101280
NOTICE [2021/02/08 22:45:32] - Total number of read-QC-passed reads : 206083344 
NOTICE [2021/02/08 22:45:32] - Total number of skipped reads with ignored barcodes : 0
NOTICE [2021/02/08 22:45:32] - Total number of non-skipped reads with considered barcodes : 206083344
NOTICE [2021/02/08 22:45:32] - Total number of gapped/noninformative reads : 106047105
NOTICE [2021/02/08 22:45:32] - Total number of base-QC-failed reads : 2583351
NOTICE [2021/02/08 22:45:32] - Total number of redundant reads : 6992853
NOTICE [2021/02/08 22:45:32] - Total number of pass-filtered reads : 90460035
NOTICE [2021/02/08 22:45:32] - Total number of pass-filtered reads overlapping with multiple SNPs : 15008085
NOTICE [2021/02/08 22:45:32] - Starting to prune out cells with too few reads...
NOTICE [2021/02/08 22:45:32] - Finishing pruning out 0 cells with too few reads...
NOTICE [2021/02/08 22:45:32] - Writing cell information
NOTICE [2021/02/08 22:45:32] - Finished writing cell information
NOTICE [2021/02/08 22:45:32] - Build indices for sparse matrix
NOTICE [2021/02/08 23:04:51] - Finished builing indices for sparse matrix
Finished dsc-pileup

and reading into freemuxlet. Can see that it is processing 16,616 droplets instead of the 16,695 that go into dsc-pileup.

The following parameters are available. Ones with "[]" are in effect:
            Options for input pileup : --plp [A_4dpi_MOCK_1_pileup_filteredbam],
                                       --init-cluster
                      Output Options : --out [A_4dpi_MOCK_1_freemuxlet],
                                       --nsample [2], --aux-files,
                                       --verbose [100]
   Options for statistical inference : --doublet-prior [0.50],
                                       --geno-error [0.10], --bf-thres [5.41],
                                       --frac-init-clust [1.00],
                                       --iter-init [10], --keep-init-missing,
                                       --randomize-singlet-score, --seed
              Read filtering Options : --cap-BQ [20], --min-BQ [13]
      Cell/droplet filtering options : --group-list, --min-total, --min-umi,
                                       --min-snp

Run with --help for more detailed help messages of each argument.

NOTICE [2021/02/08 23:13:21] - Loading pileup information with prefix A_4dpi_MOCK_1_pileup_filteredbam
NOTICE [2021/02/08 23:13:21] - Reading barcode information from A_4dpi_MOCK_1_pileup_filteredbam.cel.gz..
NOTICE [2021/02/08 23:13:21] - Finished loading 16616 droplets, skipping 0
NOTICE [2021/02/08 23:13:21] - Reading variant information from A_4dpi_MOCK_1_pileup_filteredbam.var.gz..
NOTICE [2021/02/08 23:13:21] - Finished loading 216797 variants..
NOTICE [2021/02/08 23:13:21] - Reading pileup information from A_4dpi_MOCK_1_pileup_filteredbam.plp.gz..
NOTICE [2021/02/08 23:15:31] - Finished loading 111986528 UMIs in total..
@ShaiberAlon
Copy link

I am running into the same issue.

@lb15 , did you ever get to the bottom of this?

@Griffan , @hyunminkang , thank you for developing this tool! Do you have any insight on this issue?

@hyunminkang
Copy link
Contributor

Maybe the droplets have no reads observed overlapping with SNPs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants