Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop full suite of tests for manual execution #501

Open
5 of 15 tasks
jfy133 opened this issue Sep 1, 2023 · 4 comments
Open
5 of 15 tasks

Develop full suite of tests for manual execution #501

jfy133 opened this issue Sep 1, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@jfy133
Copy link
Member

jfy133 commented Sep 1, 2023

Description of feature

A major problem we currently have during development is our CI tests are nowhere near comprehensive enough due to the pipeline utilising extremely large database files that do not fit in GHA resource allocations.

We should develop and document a suite of manual tests developers should run on their own infrastructure to ensure the pipeline is indeed working as intended.

mag missing configs and tests

For Automated CI

For manual CI

Does not need a database
Datbases on AWS
  • Config five (shared with below)
    • CAT
    • GTDB
Databases NOT on AWS
  • Config five (shared with above)
    • CheckM (in CI but not in a config)
    • GUNC
    • Metaeuk
@prototaxites
Copy link
Contributor

Metaeuk

For MetaEuk, specifying params.metaeuk_mmseqs_db = "UniProtKB/Swiss-Prot" only entails downloading a small database - doing a quick check, the fasta it's based on is only 87Mb. So that should potentially be feasible to run more automatedly?

@jfy133
Copy link
Member Author

jfy133 commented Feb 16, 2024

@prototaxites

Yeah that definitely should be feasible! Is it a single file with a public URL?

@prototaxites
Copy link
Contributor

@prototaxites

Yeah that definitely should be feasible! Is it a single file with a public URL?

"UniProtKB/Swiss-Prot" is the string passed to the mmseqs databases command, which downloads the latest release of the database AFAIK. Now that I think about it, I'm not sure there's a way to specify a version, unfortunately, which limits reproducibility.

Alternative would be to specify the URL of a fasta file to --metaeuk_db - in the MetaEuk module test, I passed it the yeast .faa in the test-data repo: https://github.com/nf-core/modules/blob/master/tests/modules/nf-core/metaeuk/easypredict/main.nf, which seemed to work OK, but it might be better to find a prokaryotic file to use with the test data.

@jfy133
Copy link
Member Author

jfy133 commented May 23, 2024

List of tools that need to be somehow covered, where they are covered in currently:

tool config comment
adapterremoval test_adapterremoval maybe could be moved into ancient-dna, as they are people who mostly use it?
aria2
bbmap/bbnorm test_bbnorm Short test
bcftools test_ancient_dna
cat
checkm
centrifuge test
concoct
dastool
fastp test
fastqc test
freebayes test_ancient_dna
genomad
gtdbtk
gunc
gunzip test
krona test
maxbin test
metabat2 test
metaeuk test_adapterremoval
mmseqs
multiqc test
prodigal test
prokka test
pydamage test_ancient_dna
samtools test_ancient_dna
seqtk test_bbnorm Short test
tiara test_adapterremoval note has a special DASTOOL_FASTATOCONTIGBIN_TIARA process that doesn't actually run DASTOOL!
bowtie2 (phix) test
bowtie2 (host)
bowtie2 (assembly) test
busco test
CAT
filtlong
kraken2 test
megahit test
spades test
spadeshybrid
nanolyse
nanopore
porechop
quast test
tiara

Additional:

context config
samplesheet input
assembly input

Got up to test_bbnorm, next is test_binrefinement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants