Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Workflow not running #681

Open
JihedC opened this issue Apr 13, 2021 · 20 comments
Open

BUG: Workflow not running #681

JihedC opened this issue Apr 13, 2021 · 20 comments
Labels
bug Something isn't working

Comments

@JihedC
Copy link

JihedC commented Apr 13, 2021

Hi,

I have been trying to run the chip-seq workflow of seq2science. It starts but stops when 7% of the jobs are done.

seq2science --version
seq2science: v0.5.1

To Reproduce
Please include your config.yaml, your samples.tsv, and the complete/relevant output.

Both config.yaml and samples.tsv were generated from seq2science init chip-seq

  • config.yaml:
# tab-separated file of the samples
samples: samples.tsv

# pipeline file locations
result_dir: ./results  # where to store results
genome_dir: ./genomes  # where to look for or download the genomes
# fastq_dir: ./results/fastq  # where to look for or download the fastqs


# contact info for multiqc report and trackhub
email: yourmail@here.com

# produce a UCSC trackhub?
create_trackhub: true

# how to handle replicates
biological_replicates: fisher  # change to "keep" to not combine them
technical_replicates: merge    # change to "keep" to not combine them

# which trimmer to use
trimmer: fastp

# which aligner to use
aligner: bwa-mem2

# filtering after alignment
remove_blacklist: true
min_mapping_quality: 30
only_primary_align: true

# peak caller
peak_caller:
  macs2:
      --keep-dup 1 --buffer-size 10000

## differential gene expression analysis
#contrasts:
#  - 'descriptive_name_all_HEL'
  • samples.tsv :
# for help with filling out the samples.tsv:
# https://vanheeringen-lab.github.io/seq2science/content/workflows/chip_seq.html#filling-out-the-samples-tsv
# also make sure that you use tab as a delimiter
sample  assembly        descriptive_name
GSM4404624      hg38    HEL

I get several error messages, I include the complete log file:
seq2science.2021-04-13T103059.065792.log

The log file in seq2science/results/log/bwa-mem2_index/hg38.log:

Looking to launch executable "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/7fa92a1c/bin/bwa-mem2.avx", simd = .avx
Launching executable "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/7fa92a1c/bin/bwa-mem2.avx"
[bwa_index] Pack FASTA... 18.78 sec
* Entering FMI_search
init ticks = 204386466299
ref seq len = 6199501436
binary seq ticks = 136647971146

Those are the files I got in the genome folder:

(seq2science) jchouaref@res-hpc-exe028:/exports/humgen/jihed/seq2science/genomes/hg38$ tree
.
├── hg38.annotation.bed.gz
├── hg38.annotation.gtf.gz
├── hg38.fa
├── hg38.fa.fai
├── hg38.fa.sizes
├── hg38.gaps.bed
├── index
├── README.txt
└── tmpevip0jtt

Do you think the problem comes from there?

@JihedC JihedC added the bug Something isn't working label Apr 13, 2021
@Maarten-vd-Sande
Copy link
Member

Is it possible there is a shell message being output? Those are not captured in stdout/stderr, and won't end up in the log but will be printed in your terminal. Segfaults and memory issues are examples of this.

Maybe setting aligner: bwa-mem instead of bwa-mem2 helps in this case? bwa-mem2 is extremely memory hungry

@JihedC
Copy link
Author

JihedC commented Apr 13, 2021

The cluster was a bit busy today, I hope it will run during the night

@JihedC
Copy link
Author

JihedC commented Apr 14, 2021

Hi Maarten,

I have a good news, I tried the atac-seq workflow as well yesterday and it worked just fine, I'll try it today with my own samples.

Concerning the chip-seq workflow, I tried the modification you suggested the genome is now hg38 and the aligner:bwa-mem, this time it produced the bwa-index. So that's at least one thing we now, I'll ask for more memory next time I try with bwa-mem2. But the jobs still blocked at rule complement_blacklist, the error there is that :

Error: The genome file /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.sizes has no valid entries (are you sure it's a 2-column bedtools genome file). Exiting.

The file hg38.fa.sizesis empty.

May be the problem comes from this:

[Tue Apr 13 18:48:33 2021]
localrule get_genome_support_files:
    input: /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa
    output: /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.fai, /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.sizes, /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.gaps.bed
    jobid: 45
    wildcards: assembly=hg38

Warning: the following output files of rule get_genome_support_files were not present when the DAG was created:
{'/exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.sizes'}

I tried to dig into the rules to find how the hg38.fa.sizes is created from hg38.fa but I can't find it from the python script.

Do you have any idea what the problem can be problem?
For now I will try to use this hg.fa.sizes assuming that the file contains the chromosome sizes.

Here are attached:

@Maarten-vd-Sande
Copy link
Member

Good news, I am happy at least some is working for you!

We just got a "freshly" installed server this morning, and I, unfortunately, can not reproduce this error there 😞 ...

The warning is indeed suspicious, however it also happenend on my successful run. I made an issue for this #682, but I don't think it's causing the problem.

One thing I noticed in the terminal output is the line:

Chromosome "chr1" undefined in /exports/humgen/jihed/seq2science_rna/genomes/hg38/hg38.fa.sizes

as stdout/stderr that is not captured by our rule. However that also just seems to indicate that the .fa.sizes file is empty..

The ATAC-seq workflow is practically a copy of the chip-seq workflow, except that some defaults are set differently, so this is quite surprising to me. 🤔 @siebrenf I remember we had some file-latency ish error in the past with genomepy. The rule was registred as finished succesfully, but it was still running in the background somehow. Are we sure this was "solved"?

Perhaps @JihedC you could try adding a long sleep (e.g. 1 minute) at the end of this script? https://github.com/vanheeringen-lab/seq2science/blob/master/seq2science/scripts/genome_support.py. Maybe the cluster somehow needs some time to sync updates to files?

@Maarten-vd-Sande
Copy link
Member

p.s. depending on whether or not you are used to conda/python packaging, adding the sleep might be extremely trivial, or quite complicated. Let me know if you don't know how to do it, I can type it out for you 😄

@JihedC
Copy link
Author

JihedC commented Apr 15, 2021

Hi Maarten,

I don't know how to do it, could you help me?
I am using conda. I can't find where the scripts are saved in the environment.

I could find, I think genome.py in the /exports/humgen/jihed/miniconda3/envs/seq2science/bin but it doesn't look like the one you mentioned:

#!/bin/sh
'''exec' /exports/humgen/jihed/miniconda3/envs/seq2science/bin/python "$0" "$@"
' '''
# -*- coding: utf-8 -*-
import re
import sys
from genomepy.cli import cli
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(cli())

@Maarten-vd-Sande
Copy link
Member

It should be in /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/scripts/genome_support.py

add

import time
time.sleep(60)

at the bottom

@JihedC
Copy link
Author

JihedC commented Apr 15, 2021

Ok great thanks for the quick reply!

@siebrenf
Copy link
Member

@siebrenf I remember we had some file-latency ish error in the past with genomepy. The rule was registred as finished succesfully, but it was still running in the background somehow. Are we sure this was "solved"?

it was a latency/communication issue with scripts in general, and it sounds like a plausible cause for this error!

@JihedC
Copy link
Author

JihedC commented Apr 15, 2021

So I tried two things:

  • add the sleep time as discussed above
  • change the genome to mm10 and use my own samples.

And I have got the same issue with the .fa.sizes file, it's also empty with another genome:

Error: The genome file /exports/humgen/jihed/seq2science_rna_Seq/genomes/mm10/mm10.fa.sizes has no valid entries (are you sure it's a 2-column bedtools genome file). Exiting.

I have also got a similar error with the Zebra fish genome. You said it worked fine on your computer? May be there is something wrong with our cluster computer for the download of this file?

@Maarten-vd-Sande
Copy link
Member

Yeah I honestly don't know what is going on, and it would be best if this can be fixed somehow...

One thing to try is to download the genome directly through genomepy, and see if you can use that .fa.sizes. Genomepy comes with seq2science, so you do not have to install anything

genomepy install [genome name] -g [location]

Let's hope you can just copy the freshly downloaded .fa.sizes from to the corrupt seq2science one, and you can at least just run the workflows from there...

@Maarten-vd-Sande
Copy link
Member

Let me know if you get it working (or not)

@JihedC
Copy link
Author

JihedC commented Apr 19, 2021

Yes I will update you as soon as I can. I had a little issue with memory space which slowed me a bit. Now it should be okay.

@JihedC
Copy link
Author

JihedC commented Apr 20, 2021

Hi Maarten,

Here is what I did to try to make the chip-seq workflow run.
My plan was to try to align ChIP-seq SE data from mouse to mm10 using bowtie2. Here are the samples.tsv and the config.yaml:

# for help with filling out the samples.tsv:
# https://vanheeringen-lab.github.io/seq2science/content/workflows/chip_seq.html#filling-out-the-samples-tsv
# also make sure that you use tab as a delimiter
sample  assembly        descriptive_name
GSM1555120      mm10    Kap1_a
# tab-separated file of the samples
samples: samples.tsv

# pipeline file locations
result_dir: ./results  # where to store results
genome_dir: ./genomes  # where to look for or download the genomes
# fastq_dir: ./results/fastq  # where to look for or download the fastqs


# contact info for multiqc report and trackhub
email: j.chouaref@lumc.nl

# produce a UCSC trackhub?
create_trackhub: true

# how to handle replicates
biological_replicates: fisher  # change to "keep" to not combine them
technical_replicates: merge    # change to "keep" to not combine them

# which trimmer to use
trimmer: fastp

# which aligner to use
aligner: bowtie2

# filtering after alignment
remove_blacklist: true
min_mapping_quality: 30
only_primary_align: true

# peak caller
peak_caller:
  macs2:
      --keep-dup 1 --buffer-size 10000

## differential gene expression analysis
#contrasts:
#  - 'descriptive_name_all_HEL'

Since I got an empty file for the mm10.size.fa file, I downloaded mm10 with genomepy (a great discovery btw 😊 ). I think I got everything I need to run the pipeline with this:

drwx--S--- 2 jchouaref 5-A-SHARK_hg_bioinf          0 Apr 19 14:40 tmp883uf8d8
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 2730872818 Apr 19 14:41 mm10.fa
drwxr-sr-x 3 jchouaref 5-A-SHARK_hg_bioinf         25 Apr 19 15:00 index
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf       3082 Apr 19 15:02 mm10.fa.fai
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf          0 Apr 19 15:02 mm10.gaps.bed
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf        435 Apr 19 15:02 README.txt
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf   18410093 Apr 19 15:02 mm10.annotation.gtf.gz
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf    5658076 Apr 19 15:02 mm10.annotation.bed.gz
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf       1405 Apr 19 15:20 mm10.fa.sizes

Note that mm10.gaps.bed is empty.

I ran the workflow on slurm and I have got the following problem

             ____  ____   __
            / ___)(  __) /  \
            \___ \ ) _) (  O )
            (____/(____) \__\)
                   ____
                  (___ \
                   / __/
                  (____)
   ____   ___  __  ____  __ _   ___  ____
  / ___) / __)(  )(  __)(  ( \ / __)(  __)
  \___ \( (__  )(  ) _) /    /( (__  ) _)
  (____/ \___)(__)(____)\_)__) \___)(____)

version: 0.5.1
docs: https://vanheeringen-lab.github.io/seq2science

Checking if seq2science was run already, if something in the configuration was changed, and if so, if seq2science needs to re-run any jobs.
Checking if samples are available online...
This can take some time.
Done!


CONFIGURATION VARIABLES:
samples                : /exports/humgen/jihed/seq2science/samples.tsv
bigwig_dir             : /exports/humgen/jihed/seq2science/results/bigwigs
counts_dir             : /exports/humgen/jihed/seq2science/results/counts
fastq_dir              : /exports/humgen/jihed/seq2science/results/fastq
final_bam_dir          : /exports/humgen/jihed/seq2science/results/final_bam
genome_dir             : /exports/humgen/jihed/seq2science/genomes
log_dir                : /exports/humgen/jihed/seq2science/results/log
qc_dir                 : /exports/humgen/jihed/seq2science/results/qc
result_dir             : /exports/humgen/jihed/seq2science/results
sra_dir                : /exports/humgen/jihed/seq2science/results/sra
trimmed_dir            : /exports/humgen/jihed/seq2science/results/fastq_trimmed
aligner                : bowtie2
cli_call               : ['/exports/humgen/jihed/miniconda3/envs/seq2science/bin/seq2science', 'run', 'chip-seq', '--cores', '20']
cores                  : 20
create_qc_report       : True
create_trackhub        : True
deeptools_flags        : --normalizeUsing BPM
deeptools_multibamsummary: --distanceBetweenBins 9000 --binSize 1000
deeptools_plotcorrelation: --colorMap RdYlBu_r --plotNumbers
deeptools_qc           : True
email                  : j.chouaref@lumc.nl
fqext                  : ['R1', 'R2']
fqsuffix               : fastq
logbase                : 2
markduplicates         : REMOVE_DUPLICATES=true -Xms4G -Xmx6G MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=999
min_mapping_quality    : 30
only_primary_align     : True
peak_caller            : {'macs2': '--keep-dup 1 --buffer-size 10000'}
peak_windowsize        : 100
remove_blacklist       : True
slop                   : 100
trimmer                : fastp
layout:                : {'GSM1555120': 'SINGLE'}



Building DAG of jobs...
Done. Now starting the real run.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Provided resources: parallel_downloads=3, deeptools_limit=16, R_scripts=1, mem_gb=94
Job counts:
        count   jobs
        1       bedgraph_bigwig
        1       bedtools_slop
        1       bowtie2_align
        1       bowtie2_index
        1       chipseeker
        1       combine_peaks
        1       combine_qc_files
        1       complement_blacklist
        1       computeMatrix
        1       coverage_table
        3       edgeR_normalization
        1       fastp_SE
        1       featureCounts
        1       get_genome_annotation
        1       get_genome_support_files
        4       log_normalization
        1       macs2_callpeak
        1       mark_duplicates
        4       mean_center
        1       mt_nuc_ratio_calculator
        1       multiqc
        1       multiqc_explain
        1       multiqc_header_info
        1       multiqc_rename_buttons
        1       multiqc_samplesconfig
        1       multiqc_schema
        1       onehot_peaks
        1       peak_bigpeak
        1       plotFingerprint
        1       plotProfile
        1       quantile_normalization
        1       run2sra
        1       runs2sample
        1       samtools_index
        1       samtools_presort
        2       samtools_stats
        1       seq2science
        1       setup_blacklist
        1       sieve_bam
        1       sra2fastq_SE
        1       trackhub
        1       unzip_annotation
        51

[Mon Apr 19 15:00:38 2021]
localrule multiqc_rename_buttons:
    output: /exports/humgen/jihed/seq2science/results/qc/sample_names_mm10.tsv
    jobid: 41
    wildcards: assembly=mm10

[Mon Apr 19 15:00:38 2021]
localrule get_genome_support_files:
    input: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa
    output: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa.fai, /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa.sizes, /exports/humgen/jihed/seq2science/genomes/mm10/mm10.gaps.bed
    jobid: 39
    wildcards: assembly=mm10

[Mon Apr 19 15:00:39 2021]
localrule multiqc_schema:
    output: /exports/humgen/jihed/seq2science/results/qc/schema.yaml
    jobid: 42

[Mon Apr 19 15:00:39 2021]
localrule multiqc_header_info:
    output: /exports/humgen/jihed/seq2science/results/qc/header_info.yaml
    jobid: 40

[Mon Apr 19 15:00:39 2021]
localrule multiqc_samplesconfig:
    output: /exports/humgen/jihed/seq2science/results/qc/samplesconfig_mqc.html
    jobid: 43

[Mon Apr 19 15:00:39 2021]
localrule setup_blacklist:
    input: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa
    output: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.customblacklist.bed
    jobid: 31
    wildcards: assembly=mm10

[Mon Apr 19 15:00:39 2021]
rule multiqc_explain:
    output: /exports/humgen/jihed/seq2science/results/log/workflow_explanation_mqc.html
    jobid: 45

[Mon Apr 19 15:00:39 2021]
rule get_genome_annotation:
    input: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa
    output: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.annotation.gtf.gz, /exports/humgen/jihed/seq2science/genomes/mm10/mm10.annotation.bed.gz
    log: /exports/humgen/jihed/seq2science/results/log/get_annotation/mm10.genome.log
    jobid: 49
    benchmark: /exports/humgen/jihed/seq2science/results/benchmark/get_annotation/mm10.genome.benchmark.txt
    wildcards: raw_assembly=mm10
    priority: 1
    resources: parallel_downloads=1

[Mon Apr 19 15:00:39 2021]
rule run2sra:
    output: /exports/humgen/jihed/seq2science/results/sra/SRR2014796/SRR2014796/SRR2014796.sra
    log: /exports/humgen/jihed/seq2science/results/log/run2sra/SRR2014796.log
    jobid: 51
    benchmark: /exports/humgen/jihed/seq2science/results/benchmark/run2sra/SRR2014796.benchmark.txt
    wildcards: run=SRR2014796
    resources: parallel_downloads=1


[Mon Apr 19 15:00:39 2021]
rule bowtie2_index:
    input: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.fa
    output: /exports/humgen/jihed/seq2science/genomes/mm10/index/bowtie2/
    log: /exports/humgen/jihed/seq2science/results/log/bowtie2_index/mm10.log
    jobid: 14
    benchmark: /exports/humgen/jihed/seq2science/results/benchmark/bowtie2_index/mm10.benchmark.txt
    wildcards: assembly=mm10
    priority: 1
    threads: 4

[Mon Apr 19 15:01:04 2021]
Finished job 40.
1 of 51 steps (2%) done
[Mon Apr 19 15:01:04 2021]
Finished job 41.
2 of 51 steps (4%) done
[Mon Apr 19 15:01:04 2021]
Finished job 42.
3 of 51 steps (6%) done
[Mon Apr 19 15:01:29 2021]
Finished job 45.
4 of 51 steps (8%) done
Activating conda environment: /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/2969b8b6
Activating conda environment: /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/323808ca
Activating conda environment: /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/b8363b14
Activating conda environment: /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/b8363b14
[Mon Apr 19 15:02:10 2021]
Finished job 43.
5 of 51 steps (10%) done
[Mon Apr 19 15:02:49 2021]
Finished job 39.
6 of 51 steps (12%) done
[Mon Apr 19 15:02:54 2021]
Finished job 49.
7 of 51 steps (14%) done
[Mon Apr 19 15:03:47 2021]
Finished job 51.
8 of 51 steps (16%) done

It was stuck at the job 8 for 2 days and then stopped due to the time limit I set for the slurm job. The problem was that the bowtie 2 index files were incomplete and for some reason it was not communicated to me:

             ____  ____   __
            / ___)(  __) /  \
            \___ \ ) _) (  O )
            (____/(____) \__\)
                   ____
                  (___ \
                   / __/
                  (____)
   ____   ___  __  ____  __ _   ___  ____
  / ___) / __)(  )(  __)(  ( \ / __)(  __)
  \___ \( (__  )(  ) _) /    /( (__  ) _)
  (____/ \___)(__)(____)\_)__) \___)(____)

version: 0.5.1
docs: https://vanheeringen-lab.github.io/seq2science

Checking if seq2science was run already, if something in the configuration was changed, and if so, if seq2science needs to re-run any jobs.
Checking if samples are available online...
This can take some time.
Done!


CONFIGURATION VARIABLES:
samples                : /exports/humgen/jihed/seq2science/samples.tsv
bigwig_dir             : /exports/humgen/jihed/seq2science/results/bigwigs
counts_dir             : /exports/humgen/jihed/seq2science/results/counts
fastq_dir              : /exports/humgen/jihed/seq2science/results/fastq
final_bam_dir          : /exports/humgen/jihed/seq2science/results/final_bam
genome_dir             : /exports/humgen/jihed/seq2science/genomes
log_dir                : /exports/humgen/jihed/seq2science/results/log
qc_dir                 : /exports/humgen/jihed/seq2science/results/qc
result_dir             : /exports/humgen/jihed/seq2science/results
sra_dir                : /exports/humgen/jihed/seq2science/results/sra
trimmed_dir            : /exports/humgen/jihed/seq2science/results/fastq_trimmed
aligner                : bowtie2
cli_call               : ['/exports/humgen/jihed/miniconda3/envs/seq2science/bin/seq2science', 'run', 'chip-seq', '--cores', '20']
cores                  : 20
create_qc_report       : True
create_trackhub        : True
deeptools_flags        : --normalizeUsing BPM
deeptools_multibamsummary: --distanceBetweenBins 9000 --binSize 1000
deeptools_plotcorrelation: --colorMap RdYlBu_r --plotNumbers
deeptools_qc           : True
email                  : j.chouaref@lumc.nl
fqext                  : ['R1', 'R2']
fqsuffix               : fastq
logbase                : 2
markduplicates         : REMOVE_DUPLICATES=true -Xms4G -Xmx6G MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=999
min_mapping_quality    : 30
only_primary_align     : True
peak_caller            : {'macs2': '--keep-dup 1 --buffer-size 10000'}
peak_windowsize        : 100
remove_blacklist       : True
slop                   : 100
trimmer                : fastp
layout:                : {'GSM1555120': 'SINGLE'}



Building DAG of jobs...
IncompleteFilesException:
The files below seem to be incomplete. If you are sure that certain files are not incomplete, mark them as complete with

    snakemake --cleanup-metadata <filenames>

To re-generate the files rerun your command with the --rerun-incomplete flag.
Incomplete files:
/exports/humgen/jihed/seq2science/genomes/mm10/index/bowtie2/

I am going to try with bwa again.

@JihedC
Copy link
Author

JihedC commented Apr 20, 2021

The issue is still the same: mm10.customblacklist.bed

Error in rule setup_blacklist:
    jobid: 0
    output: /exports/humgen/jihed/seq2science/genomes/mm10/mm10.customblacklist.bed

RuleException:
FileNotFoundError in line 38 of /exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/bam_cleaning.smk:
[Errno 2] No such file or directory: '/exports/humgen/jihed/seq2science/genomes/mm10/mm10.blacklist.bed'
  File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2168, in run_wrapper
  File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/bam_cleaning.smk", line 38, in __rule_setup_blacklist
  File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 529, in _callback
  File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/concurrent/futures/thread.py", line 57, in run
  File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
  File "/exports/humgen/jihed/miniconda3/envs/seq2science/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2199, in run_wrapper
Exiting because a job execution failed. Look above for error message

Do you have may be this file? or an example of its format?

@Maarten-vd-Sande
Copy link
Member

Maarten-vd-Sande commented Apr 20, 2021

I made a mm10 folder for you.

http://ocimum.science.ru.nl/mm10/

When running seq2science the first time with these files I think you need to use something like:

seq2science run chip-seq --skip-rerun --cores 24 --snakemakeOptions touch=True

This is necessary because the timestamps will be messed up from downloading the file, and otherwise snakemake/seq2science will try to re-create these files

@JihedC
Copy link
Author

JihedC commented Apr 21, 2021

Thank you so much for these files! I have added them to my genomes/mm10 folder.

Unfortunately it still does not work. Here are the log and the slurmoutput:

seq2science.2021-04-21T100027.917233.log
slurm-2426180.txt

The run goes so fast I am doubting that it's doing anything. Here is the content of the bwa-index:

jchouaref@res-hpc-lo01:/exports/humgen/jihed/seq2science/genomes/mm10/index/bwa-mem$ ls -ltr
total 5616152
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 2730871864 Apr 20 12:41 mm10.bwt
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf  682717945 Apr 20 12:41 mm10.pac
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf       2857 Apr 20 12:41 mm10.ann
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf      11032 Apr 20 12:41 mm10.amb
-rw-r--r-- 1 jchouaref 5-A-SHARK_hg_bioinf 1365435936 Apr 20 12:56 mm10.sa

So it created it but the results folder is almost empty:

jchouaref@res-hpc-lo01:/exports/humgen/jihed/seq2science/results$ ls -l
total 76
drwxr-sr-x  9 jchouaref 5-A-SHARK_hg_bioinf 203 Apr 20 13:03 benchmark
drwxr-sr-x  3 jchouaref 5-A-SHARK_hg_bioinf  59 Apr 20 13:01 fastq
drwxr-sr-x  2 jchouaref 5-A-SHARK_hg_bioinf   0 Apr 20 17:10 fastq_trimmed
drwxr-sr-x 28 jchouaref 5-A-SHARK_hg_bioinf 921 Apr 20 17:13 log
drwxr-sr-x  3 jchouaref 5-A-SHARK_hg_bioinf 168 Apr 20 13:03 qc
drwxr-sr-x  3 jchouaref 5-A-SHARK_hg_bioinf  28 Apr 19 15:00 sra

Do you think it because I am using a swatch command to distribute the job on the cluster? Then snakemake doesn't actually know which job are done or not?

@Maarten-vd-Sande
Copy link
Member

I have honestly no clue what is going on here.. Sorry, I don't think I can help you 😭

@siebrenf
Copy link
Member

If the ATAC-seq run did work, but you get timeouts and unexplained errors, then maybe the issue is the server occupancy/load?
This is a longshot, but you could try to run the workflow with few cores (so it runs only 1 job at a time), and run at a quiet moment.
If Maarten's genome folder works it shouldn't re-index, so the RAM is spared.
seq2science run chip-seq --skip-rerun --cores 5

@JihedC
Copy link
Author

JihedC commented Apr 21, 2021

No worries @Maarten-vd-Sande I will try that @siebrenf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants