Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

estimate_1n_coverage_1d_subsets averages over two distinct high peaks #123

Open
KamilSJaron opened this issue Aug 17, 2023 · 0 comments
Open
Labels
0.3.0 Oriel Upcoming version of smudgeplot enhancement New feature or request

Comments

@KamilSJaron
Copy link
Owner

About your genome

The tetraploid Sacharomyces (SRR3265401)

The AB/AABB, AAB and AAAB subset have one major peak each and they end up being estimated to be

30.4 2236.5
20.2 1431.4
14.6 3077.4

The 20.2 is simply a messup, but 30.4 and 14.6 are a mistake by denominator - AB is dividing by the coverage AABB smudge (thinking it's AB) which leads to doubling of the AB/AABB coverage estimate; 14.6 is close to truth because the first peak indeed is the AAAB.

Weight mean ends with an estimate in between of two possible interpretations ~20. Which is a bit unfortunate, perhaps a weighted median would do a better justice. But this should be tested with many many genomes.

@KamilSJaron KamilSJaron added enhancement New feature or request 0.3.0 Oriel Upcoming version of smudgeplot labels Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.3.0 Oriel Upcoming version of smudgeplot enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant