Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Both '--read_length' and '--macs_gsize' not specified! Please specify either to infer MACS2 genome size for peak calling. #328

Open
ariadnaaterrades opened this issue Jan 10, 2023 · 1 comment
Labels
docs Documentation enhancement

Comments

@ariadnaaterrades
Copy link

Description of the bug

Hi,

I've tried to run the pipeline and I've ended up having an error which didn't show up the last time I ran it:

./nextflow run nf-core/chipseq --single_end --input ./design.csv --outdir ./Results_chipseq --genome GRCh38 -profile singularity --narrow_peak

Both '--read_length' and '--macs_gsize' not specified! Please specify either to infer MACS2 genome size for peak calling.

As you mention in your pipeline's documentation: https://nf-co.re/chipseq/2.0.0/parameters, --read_length is necessary for the peak calling if --macs_gsize is not provided. However, when you look at the --macs_gsize explanation, the documentation says that is not necessary to provide the --macs_gsize as long as you have been provided the --genome.

Having said that, I was wondering whether the documentation is not updated yet due to the fact that the --read_length parameter was included in the latest release. If this is the case, I'd really appreciate if you can include where we can find the --read_length and --macs_gsize information for several genomes in order to be able to run the pipeline. If it is not the case, I'd really appreciate if you can check the code in order to know what is going on.

Thank you,
Ariadna

Command used and terminal output

`./nextflow run nf-core/chipseq --single_end --input ./design.csv --outdir ./Results_chipseq --genome GRCh38 -profile singularity --narrow_peak`

Both '--read_length' and '--macs_gsize' not specified! Please specify either to infer MACS2 genome size for peak calling.

Relevant files

No response

System information

N E X T F L O W ~ version 22.04.3
nf-core/chipseq v2.0.0

@ariadnaaterrades ariadnaaterrades added the bug Something isn't working label Jan 10, 2023
@JoseEspinosa
Copy link
Member

Hi @ariadnaaterrades
This is the intended behavior. You need to either explicitly provide the macs gsize using the --macs_gsize parameter or otherwise, you need to provide the length of your reads using the --read_length parameter. When the latter parameter is set together with a genome available in the igenomes config then the macs gsize is retrieved using the corresponding map here. The reason is that the genome size is different for different read lengths. If the genome it is not available in the igenomes config then the pipeline calculates macs gsize using the unique-kmers.py script of khmer as explained here but for this again we need to know which is the size of the reads that is set by --read_length. We discussed to set a default read length but we were afraid that then some users will just use the default value and not be aware of the behavior discussed above. Does it makes sense to you now?
Anyway, probably we should improve the documentation regarding this behavior.

@JoseEspinosa JoseEspinosa added docs Documentation enhancement and removed bug Something isn't working labels Apr 17, 2023
@JoseEspinosa JoseEspinosa mentioned this issue Apr 19, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation enhancement
Projects
None yet
Development

No branches or pull requests

2 participants