Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The annotation_preprocessing new pipeline does not filter out the contigs less than 1000 nucleotides #90

Open
LucileSol opened this issue Mar 29, 2023 · 7 comments

Comments

@LucileSol
Copy link
Contributor

The annotation_preprocessing new pipeline does not filter out the contigs less than 1000 nucleotides
The old pipeline was doing it so we need now to do it manually if there are contigs of less than 1000 nucleotides.
To be fixed eventually.

can use https://github.com/NBISweden/GAAS/blob/master/bin/gaas_fasta_purify.pl for now (I think I need to test it)

@mahesh-panchal
Copy link
Collaborator

Can you check the script written by Nextflow (.command.sh) to see if it has the --size 1000 in it?

@LucileSol
Copy link
Contributor Author

yes :

#!/bin/bash -ue
gaas_fasta_purify.pl \
    --size 1000 \
    --infile genome_uppercase.fa \
    --output genome_uppercase_purified

cat <<-END_VERSIONS > versions.yml
"ANNOTATION_PREPROCESSING:ASSEMBLY_PURIFY":
    gaas: 1.2.0
END_VERSIONS

@LucileSol
Copy link
Contributor Author

LucileSol commented Mar 29, 2023

and gaas_fasta_purify.pl does not remove the contigs or not anymore. I tried it separately and the contigs were still there

@mahesh-panchal
Copy link
Collaborator

Then check if the --size option has changed name from a version update

@Juke34
Copy link
Collaborator

Juke34 commented Mar 29, 2023

Interesting, GAAS has the same release since 2020 (v1.2), the script should continue to work in the same way.

@mahesh-panchal
Copy link
Collaborator

Is this still an issue? Can you provide me some data I can replicate the issue with?

@mahesh-panchal
Copy link
Collaborator

The GAAS script works. The module works independently of the workflow.
Testing the workflow with a sample file:

>seq1
ACGTACGTACGT
>seq2
ACGTACGT
>seq3
ACGTACGTACGT

custom.config:

process {
    withName: 'ASSEMBLY_PURIFY' {
        ext.args = '--size 10'
    }
}

command:

nextflow run main.nf -profile test,docker,gitpod --subworkflow 'annotation_preprocessing' -c custom.config --genome sample.fasta

also works successfully.

purified file:

>seq1
ACGTACGTACGT
>seq3
ACGTACGTACGT

I'm not able to replicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants