Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to analyse 454 sequencing data #669

Open
d4straub opened this issue Dec 1, 2023 · 2 comments
Open

Allow to analyse 454 sequencing data #669

d4straub opened this issue Dec 1, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@d4straub
Copy link
Collaborator

d4straub commented Dec 1, 2023

Description of feature

Idea

454 sequencing is probably not as much used for amplicon sequencing as Illumina MiSeq nowadays, but it was popular some time ago and is still used. It would be good to allow analysing 454 data with this pipeline as well.

Evaluation of requirements for analysing 454 data

Requirements

In order to allow standardized analysis of 454 data with nf-core/ampliseq, only minor additions would be needed. According to https://benjjneb.github.io/dada2/faq.html#can-i-use-dada2-with-my-454-or-ion-torrent-data, 454 sequencing data should be analysed with:

  • dada(..., HOMOPOLYMER_GAP_PENALTY=-1, BAND_SIZE=32)
  • filterAndTrim(..., maxLen=XXX) # XXX depends on the chemistry

Already available

The above is quite close to what IonTorrent data should be analysed with (and is implemented in nf-core/ampliseq with --iontorrent):

  • dada(..., HOMOPOLYMER_GAP_PENALTY=-1, BAND_SIZE=32)
  • filterAndTrim(..., trimLeft=15)

Usage of --iontorrent causes currently:

  • single end reads expected
  • expects that the forward and reverse primer is present in the read, see here
  • uses filterAndTrim with trimLeft = 15, see here
  • causes denoising with BAND_SIZE = 32, HOMOPOLYMER_GAP_PENALTY = -1, see here
  • taxonomic classification also with reverse complement, see here & here

I never had 454 data, but a short googling revealed its single end, primers seem to be typically expected at beginning and end, so that seems all fine. However, the setting --iontorrent is similar but imperfect for 454, because it includes trimLeft=15 which isnt recommended for 454 data.

Short term solution for analysing 454 data

Warning: not tested, theoretical solution! Feedback needed!

In the current pipeline (v2.7.1), one could easily overwrite the imperfection of --iontorrent with -c pyroseq.config where the config file pyroseq.config includes:

process {
    max_len = params.max_len ?: "Inf"
    withName: DADA2_FILTNTRIM {
        ext.args = [
            'maxN = 0, truncQ = 2, trimRight = 0, minQ = 0, rm.lowcomplex = 0, orient.fwd = NULL, matchIDs = FALSE, id.sep = "\\\\s", id.field = NULL, n = 1e+05, OMP = TRUE, qualityType = "Auto"',
            "maxEE = ${params.max_ee}",
            "minLen = ${params.min_len}, maxLen = $max_len, rm.phix = TRUE"
        ].join(',').replaceAll('(,)*$', "")
        publishDir = [
            path: { "${params.outdir}/dada2/args" },
            mode: params.publish_dir_mode,
            pattern: "*.args.txt"
        ]
    }
}

In addition, --max_len should be set appropriately.

So to conclude: Currently, for 454 data, use --iontorrent -c pyroseq.config --max_len <int> where the config file is described above and <int> depends on the chemistry.

Implementation

An additional parameter such as --454 could be added that almost mirrors --iontorrent settings except filterAndTrim(..., trimLeft=15).

Additional 454 test data and & usage documentation would need an update.

@d4straub d4straub added the enhancement New feature or request label Dec 1, 2023
@erikrikarddaniel
Copy link
Member

As soon as we know that Tobias' group (or someone else) gets this to work it would be great to add. If someone from the group would like to contribute, could be a perfect beginners task.

My only, very slight, comment is that perhaps params can't be all numbers?

@d4straub
Copy link
Collaborator Author

d4straub commented Dec 1, 2023

perhaps params can't be all numbers?

Quite possible! Never tested :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants