Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Popscle demuxlet vs freemuxlet output stability. #41

Open
xmignot opened this issue Feb 15, 2021 · 4 comments
Open

Popscle demuxlet vs freemuxlet output stability. #41

xmignot opened this issue Feb 15, 2021 · 4 comments

Comments

@xmignot
Copy link

xmignot commented Feb 15, 2021

Hi,
I'm trying to demultiplex the sequence results of a series of 10x experiments (both 3' and 5' chemistry). I started by using demuxlet (we have gwas data available for the samples), but also ran freemuxlet using 1000 genomes VCF filtered as described in the tutorial as a reference. We additionally have multiseq results (a more involved demultiplexing protocol that I'm treating as ground truth) on just the 3' data. I'm a little concerned about the results from freemuxlet, as they appear to map very noisily to the demuxlet/multiseq sample ids.
I built a mapping of consensus SNG barcodes between each protocol, and while demuxlet maps very cleanly to the multiseq labels in the 3' data for both the 3' and the 5' data the freemuxlet clusters are distributed across lots of sample ids.
As an example, here are some rows from each mapping:

[demuxlet to multiseq]
109D12: ['109D12: 0.9238', '61C07: 0.0092', '119A02: 0.0074', '119A04: 0.0067', '113E02: 0.006']
...
[freemuxlet to multiseq]
6: ['61C04: 0.2754', '119A03: 0.2748', '119A02: 0.1642', '61D08: 0.1314', '119B12: 0.0609']

Do you have any advice on how to debug this or insights into what could be going on? I haven't tried passing the variant gwas positions used in demuxlet to freemuxlet as a reference, but I imagine this should give more consistent results. However, I want to be able to use the 1000 genomes variants as it seems this would be another way to independently validate the demultiplexed barcodes - additionally I've been advised they are probably more effective for freemuxlet.
Thanks!

@hyunminkang
Copy link
Contributor

hyunminkang commented Feb 15, 2021 via email

@xmignot
Copy link
Author

xmignot commented Feb 15, 2021

How would you recommend comparing those generated VCF files to our genotype data? If one of the clusters is ambient mRNA wouldn't you expect to see just one cluster mapping very noisily and then all of the others mapping fairly well to particular sample ids? Or maybe a much higher fraction of DBL assignments?
It's possible that this is the case but I'm using the filtered_feature_matrix 10x output barcodes so there should already be some degree of QC - I'm wondering if because this noisiness showed up in both the 5' and 3' data this indicates the problem is more likely to be related to the reference VCF file?
Thanks for the prompt reply! I appreciate the help -
Xavier

@hyunminkang
Copy link
Contributor

hyunminkang commented Feb 16, 2021 via email

@xmignot
Copy link
Author

xmignot commented Feb 17, 2021

Alright, I will start with that! Thanks for the pointers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants