Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drop reads containing specific set of kmers #6

Open
yannickwurm opened this issue Mar 17, 2015 · 2 comments
Open

drop reads containing specific set of kmers #6

yannickwurm opened this issue Mar 17, 2015 · 2 comments

Comments

@yannickwurm
Copy link

Hey @lh3, this looks great.

But we do low-coverage (5x) sequencing of many (non-human) individuals - where removing rare kmers is a bad idea. So our ideal approach is to combine all data into a big dataset (500-1000x coverage total), use that to identify bad kmers, dump those kmers to a file. Then go through each individual low-coverage dataset to eliminate the list of bad kmers. Can you add an option to bfc that can help with this last step? Or is it already hidden somewhere?

Cheers,
Yannick

@lh3
Copy link
Owner

lh3 commented Mar 17, 2015

500X-1000X total coverage is too much for bfc to handle. You could consider KMC2, though I don't how long it will take. You may also consider to ask @jts and Thomas Kean from Sanger. They are/were doing similar things.

@yannickwurm
Copy link
Author

ok understand - thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants