Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vsearch uchime_denovo for chimera removal (or flagging) #631

Open
andand opened this issue Aug 30, 2023 · 2 comments
Open

Add vsearch uchime_denovo for chimera removal (or flagging) #631

andand opened this issue Aug 30, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@andand
Copy link

andand commented Aug 30, 2023

Description of feature

We have found in deeply sequenced (1 M reads / sample) COI datasets that a lot of chimeras remain (0 - 15% of the ASVs are chimeras) after running through DADA2, including removeBimeraDenovo() with default settings. Running uchime_denovo in vsearch seems to work well in removing remaining chimeric ASVs in our data (this allows for some mismatches between parents and children, which default removeBimeraDenovo does not) without removing "true" ASVs. Would be nice to have this as an option in nf-core/ampliseq. @johnne knows more about this.

@andand andand added the enhancement New feature or request label Aug 30, 2023
@johnne
Copy link
Contributor

johnne commented Aug 30, 2023

Hi,
Yes we implemented this additional chimera removal step via vsearch in this ASV-clustering workflow. See here in the Readme for an overview, and the relevant rules are in workflow/rules/chimeras.smk. Briefly, chimera detection is run either in 'batchwise' or 'samplewise' mode where the former runs the algorithm on ASVs found in all samples together, while the latter first splits the ASV input into one file per sample (based on ASV presence determined from a counts-file) then runs chimera detection on each of those file.

The output from vsearch chimera detection is then used to filter out chimeric ASVs using different parameters such as

  • number/fraction of samples shared between chimera and parents
  • number of samples in which an ASV has to be marked as chimeric

Let me know if you want to discuss this and how to implement it in ampliseq.

@d4straub
Copy link
Collaborator

Hi!
Chimera removal is important, so I think this is indeed interesting.

Running uchime_denovo in vsearch seems to work well in removing remaining chimeric ASVs in our data (this allows for some mismatches between parents and children, which default removeBimeraDenovo does not) without removing "true" ASVs.

Yes, indeed default removeBimeraDenovo does not allow mismatches between chimera and parent, but using a config one could modify this behavior in ampliseq by overwriting that line, e.g. by using -c chimera.config that contains

process {
    withName: DADA2_RMCHIMERA {
        ext.args = 'method="consensus", minSampleFraction = 0.9, ignoreNNegatives = 1, minFoldParentOverAbundance = 2, minParentAbundance = 8, allowOneOff = TRUE, minOneOffParentDistance = 4, maxShift = 16'
    }
}

Would you be able to test whether that doesnt improve the chimera removal for your case in a similar manner? (Just want to make sure existing settings are not already covering this.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants