drop reads containing specific set of kmers #6

yannickwurm · 2015-03-17T11:20:42Z

Hey @lh3, this looks great.

But we do low-coverage (5x) sequencing of many (non-human) individuals - where removing rare kmers is a bad idea. So our ideal approach is to combine all data into a big dataset (500-1000x coverage total), use that to identify bad kmers, dump those kmers to a file. Then go through each individual low-coverage dataset to eliminate the list of bad kmers. Can you add an option to bfc that can help with this last step? Or is it already hidden somewhere?

Cheers,
Yannick

The text was updated successfully, but these errors were encountered:

lh3 · 2015-03-17T16:12:16Z

500X-1000X total coverage is too much for bfc to handle. You could consider KMC2, though I don't how long it will take. You may also consider to ask @jts and Thomas Kean from Sanger. They are/were doing similar things.

yannickwurm · 2015-03-19T10:09:40Z

ok understand - thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drop reads containing specific set of kmers #6

drop reads containing specific set of kmers #6

yannickwurm commented Mar 17, 2015

lh3 commented Mar 17, 2015

yannickwurm commented Mar 19, 2015

drop reads containing specific set of kmers #6

drop reads containing specific set of kmers #6

Comments

yannickwurm commented Mar 17, 2015

lh3 commented Mar 17, 2015

yannickwurm commented Mar 19, 2015