Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bwameth index creation fails #352

Open
FutureFellwalker opened this issue Oct 23, 2023 · 7 comments
Open

bwameth index creation fails #352

FutureFellwalker opened this issue Oct 23, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@FutureFellwalker
Copy link

Description of the bug

Hi.

I'm encountering an error with the methylsig pipeline when using bwameth aligner. The pipeline runs fine with Bismark, but my mapping is low (~40%) and I want to compare against bwameth.

The error seems to be in preparing the reference genome index. I tried saving the reference genome locally and running offline, but encounter the same issue. Tried with both methylsig 2.4.0 and 2.5.0.

Command used and terminal output

Command input:

~/nextflow run /path/to/nf-core-methylseq_2.5.0/2_5_0/ \
--input samplesheet.csv \
--outdir nfcore-methylseq-2.5.0-bwameth \
--genome hg38 \
--save_reference \
--rrbs \
--pbat \
--aligner bwameth \
--methyl_kit \
-profile singularity \
--email user@email.com

Error:
ERROR ~ Error executing process > 'NFCORE_METHYLSEQ:METHYLSEQ:PREPARE_GENOME:SAMTOOLS_FAIDX'

Caused by:
  Not a valid path value type: groovyx.gpars.dataflow.DataflowVariable (DataflowVariable(value=/ngi-igenomes/igenomes/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa))


Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Relevant files

nextflow.log

System information

Nextflow version 23.04.3
HPC, LSF executor
nf-core/methylseq 2.4.0 and 2.5.0
Singularity container

@FutureFellwalker FutureFellwalker added the bug Something isn't working label Oct 23, 2023
@mz448
Copy link

mz448 commented Nov 6, 2023

Hey,
I have successfully run the pipeline using --aligner bismark

Issue description:

I am having the same issue reported here when I am trying to repeat the analysis using bwameth instead of bismark:

Test

The following test works fine

nextflow run nf-core/methylseq --aligner bwameth --outdir $RefDir -profile test,conda

Summary of my experimental conditions:

  • Reference .fasta genome
  • fasta index for --bwa_meth_index generated with bwameth.py index Genome.fasta
  • 4 samples (3 biological replicates each (Double paired-end reads for each replicate)). These are *.fastq.gz files
  • I can not provide a single --fasta_index, nor do I know how to feed multiple sample indexes into the pipeline. I assume this is something that the pipeline does by itself using the --input $SampleSheet.csv , but am I wrong?

Command used and error message:

Pipeline Info

N E X T F L O W ~ version 23.10.0
nf-core/methylseq v2.5.0-g66c6138
Conda

nextflow run nf-core/methylseq \
--input $SampleSheet.csv \
--fasta $Genome.fasta \
--outdir $RefDir \
--bwa_meth_index $DirContainingRefFastaGenome \
--email user@email.com \
--aligner bwameth \
--comprehensive \
--max_cpus 40 \
--max_memory 400.GB \
-profile conda

Error: 
-[nf-core/methylseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_METHYLSEQ:METHYLSEQ:PREPARE_GENOME:SAMTOOLS_FAIDX'

Caused by:
  Not a valid path value type: groovyx.gpars.dataflow.DataflowVariable (DataflowVariable(value=/local/storage/Projects/ppar_emseq/data/005_ppar_emseq_muscle/sub/P_parae_wMito+lambda+puc19.fasta))


Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

I have tried including and excluding the following flags, but no version has worked:

--save_align_intermeds \
--ignore_flags \
--save_reference \ 

I appreciate any help! :)

Maximiliano

@mz448
Copy link

mz448 commented Nov 9, 2023

Mmm, I guess I found the problem.
This is NOT a bug. It is the result of insufficient input to the pipeline with bwameth

You need to provide the 2 indexed versions of the genome

  1. --bwa_meth_index $DirContainingRefGenome versions \

Install bwameth and use bwameth.py --reference $RefGenome.fasta
This will produce a series of indexed converted genomes in a container folder (Later, you will use that container folder path as the argument in the --bwa_meth_index flag)

  1. --fasta_index $RefGenome.fai \

Use SamTools samtools faidx $RefGenome.fasta
It will produce a $RefGenome.fai
This is an index of the non-converted genome
See a good explanation here

@ewels
Copy link
Member

ewels commented Feb 3, 2024

This is NOT a bug. It is the result of insufficient input to the pipeline with bwameth

I disagree. You've found your way around the bug by providing more inputs, meaning that the pipeline did not need to create them. However, the pipeline should automatically generate the index files itself if only given a Fasta file.

This is what we need to fix:

  Not a valid path value type: groovyx.gpars.dataflow.DataflowVariable (DataflowVariable(value=/local/storage/Projects/ppar_emseq/data/005_ppar_emseq_muscle/sub/P_parae_wMito+lambda+puc19.fasta))

Looks like there's something wrong with how the channels are being built / supplied.

@ewels ewels changed the title methylseq - bwameth index failure, pipeline does't run, but bismark does bwameth index creation fails Feb 3, 2024
@jkh00
Copy link

jkh00 commented Feb 26, 2024

Hi,

I also encountered the same issue when trying to run bwameth without supplying the indexes. But in my case, supplying --fasta_index was sufficient to work around the bug (my run succeeded), which narrowed down and suggested that the issue arises somewhere during the process of building index using samtools faidx.

@drothen15
Copy link

Has this been fixed? I'm still having the same error when trying to use bwameth as the aligner

@oliviapetrillo
Copy link

Has this been fixed? I'm still having the same error when trying to use bwameth as the aligner

@drothen15 Not sure if a fix has been included in a release yet, but I was able to fix this locally by modifying the SAMTOOLS_FAIDX process inputs.

In the prepare_genome.nf file, when calling the SAMTOOLS_FAIDX process if the params.fasta_index is not specified, there is an issue when trying to pass a value channel in as part of the tuple. I think it would work if you either 1) removed the tuple, or 2) switched the ch_fasta input to be just the file(params.fasta) instead of Channel.value(file(params.fasta)).

I resolved it by removing the tuple altogether, and then updating the main.nf containing the SAMTOOLS_FAIDX process to expect a val and path input, instead of a tuple with a val and path. If you just change the original process input as specified in prepare_genome.nf to be a file instead of the value channel, there should be no need to update the SAMTOOLS_FAIDX:main.nf.

One solution (that I used):

// inside prepare_genome.nf
SAMTOOLS_FAIDX([:], ch_fasta)

instead of

SAMTOOLS_FAIDX([[:], ch_fasta])

And then modifying SAMTOOLS_FAIDX:main.nf:

input: 
val(meta)
path(fasta)

instead of

input: 
tuple val(meta), path(fasta)

@LucaZanella15
Copy link

Mmm, I guess I found the problem. This is NOT a bug. It is the result of insufficient input to the pipeline with bwameth

You need to provide the 2 indexed versions of the genome

  1. --bwa_meth_index $DirContainingRefGenome versions \

Install bwameth and use bwameth.py --reference $RefGenome.fasta
This will produce a series of indexed converted genomes in a container folder (Later, you will use that container folder path as the argument in the --bwa_meth_index flag)

  1. --fasta_index $RefGenome.fai \

Use SamTools samtools faidx $RefGenome.fasta
It will produce a $RefGenome.fai
This is an index of the non-converted genome
See a good explanation here

Sorry, I have a doubt: is the input to --bwa_meth_index, i.e. $DirContainingRefGenome, the output of the bwameth.py --reference $RefGenome.fasta as mentioned in this reply or the output of bwameth.py index $RefGenome.fasta?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants