Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: running workflow on workstation with local flag #138

Closed
flefler opened this issue Aug 29, 2023 · 17 comments
Closed

[Usage]: running workflow on workstation with local flag #138

flefler opened this issue Aug 29, 2023 · 17 comments
Assignees
Labels
question Further information is requested

Comments

@flefler
Copy link

flefler commented Aug 29, 2023

Hi Francisco,

I recently downloaded metagem, I used the manual installation guideline. I'm trying to run fastp but keep getting this error. I am unsure what could be causing this. I was able to run createFolders, downloadToy, organizeData, and check with no problems. I am using sample1 from the toy dataset.

(metagem) dail@swamp:~/metaGEM/workflow$ bash /home/dail/metaGEM/workflow/metaGEM.sh -t fastp -l

=================================================================================================================================
Developed by: Francisco Zorrilla, Kiran R. Patil, and Aleksej Zelezniak___________________________________________________________
Publication: doi.org/10.1101/2020.12.31.424982___________________________/\\\/\\\\/\/\
/\///////////\////////////\_/\_
/\/\/\/\//\
/\//\

/\_/\_
/\\/\\\
/\\_/\/\\/\\\
/\///\/\/
/\

/\///\///\/\/////\////\////
////////\/\
/////\/\///////
/\
///\/
/\

/\//\/\/\\\
/\/\\_/\/\/\/\____////\_
_______/\/\/\
//////////\/_/\/////\/\_______/\
/\/\_/\
_/\/\/\//\\_//\_
//\\/_
//\\\/
/\\\\/\____________/\
//////////////////////////////////___////////////
///////////////
///
___________///

A Snakemake-based pipeline desinged to predict metabolic interactions directly from metagenomics data using high performance computer clusters

Version: 1.0.5

Setting current directory to root in config.yaml file ...

Parsing Snakefile to target rule: fastp ...

Do you wish to continue with these parameters? (y/n)y
Proceeding with fastp job(s) ...

Please verify parameters set in the config.yaml file:

path:
root: /home/dail/metaGEM/workflow
scratch: $TMP
folder:
data: dataset
logs: logs
assemblies: assemblies
scripts: scripts
crossMap: crossMap
concoct: concoct
maxbin: maxbin
metabat: metabat
refined: refined_bins
reassembled: reassembled_bins
classification: GTDBTk
abundance: abundance
GRiD: GRiD
GEMs: GEMs
SMETANA: SMETANA
memote: memote
qfiltered: qfiltered
stats: stats
proteinBins: protein_bins
dnaBins: dna_bins
pangenome: pangenome
kallisto: kallisto
kallistoIndex: kallistoIndex
benchmarks: benchmarks
prodigal: prodigal
blastp: blastp
blastp_db: blastp_db
scripts:
kallisto2concoct: kallisto2concoct.py
prepRoary: prepareRoaryInput.R
binFilter: binFilter.py
qfilterVis: qfilterVis.R
assemblyVis: assemblyVis.R
binningVis: binningVis.R
modelVis: modelVis.R
compositionVis: compositionVis.R
taxonomyVis: taxonomyVis.R
carveme: media_db.tsv
toy: download_toydata.txt
GTDBtkVis:
cores:
fastp: 4
megahit: 48
crossMap: 48
concoct: 48
metabat: 48
maxbin: 48
refine: 48
reassemble: 48
classify: 2
gtdbtk: 48
abundance: 16
carveme: 4
smetana: 12
memote: 4
grid: 24
prokka: 2
roary: 12
diamond: 12
params:
cutfasta: 10000
assemblyPreset: meta-sensitive
assemblyMin: 1000
concoct: 800
metabatMin: 50000
seed: 420
minBin: 1500
refineMem: 1600
refineComp: 50
refineCont: 10
reassembleMem: 1600
reassembleComp: 50
reassembleCont: 10
carveMedia: M8
smetanaMedia: M1,M2,M3,M4,M5,M7,M8,M9,M10,M11,M13,M14,M15A,M15B,M16
smetanaSolver: CPLEX
roaryI: 90
roaryCD: 90
envs:
metagem: envs/metagem
metawrap: envs/metawrap
prokkaroary: envs/prokkaroary

Please pay close attention to make sure that your paths are properly configured!
Do you wish to proceed with this config.yaml file? (y/n)y

Unlocking snakemake ...
Unlocking working directory.

Dry-running snakemake jobs ...
Building DAG of jobs...
Job counts:
count jobs
1 all
1 qfilter
2

[Mon Aug 28 20:08:14 2023]
rule qfilter:
input: /home/dail/metaGEM/workflow/dataset/sample1/sample1_R1.fastq.gz, /home/dail/metaGEM/workflow/dataset/sample1/sample1_R2.fastq.gz
output: /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz, /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R2.fastq.gz
jobid: 1
wildcards: IDs=sample1

[Mon Aug 28 20:08:14 2023]
Job 0:
WARNING: Be very careful when adding/removing any lines above this message.
The metaGEM.sh parser is presently hardcoded to edit line 22 of this Snakefile to expand target rules accordingly,
therefore adding/removing any lines before this message will likely result in parser malfunction.

Job counts:
count jobs
1 all
1 qfilter
2
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
Do you wish to submit this batch of jobs on your local machine? (y/n)y
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Conda environments: ignored
Job counts:
count jobs
1 all
1 qfilter
2
Select jobs to execute...

[Mon Aug 28 20:08:16 2023]
rule qfilter:
input: /home/dail/metaGEM/workflow/dataset/sample1/sample1_R1.fastq.gz, /home/dail/metaGEM/workflow/dataset/sample1/sample1_R2.fastq.gz
output: /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz, /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R2.fastq.gz
jobid: 1
wildcards: IDs=sample1

Activating envs/metagem conda environment ...
/usr/bin/bash: line 2: activate: No such file or directory
[Mon Aug 28 20:08:16 2023]
Error in rule qfilter:
jobid: 1
output: /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz, /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R2.fastq.gz
shell:

    # Activate metagem environment
    echo -e "Activating envs/metagem conda environment ... "
    set +u;source activate envs/metagem;set -u;

    # This is just to make sure that output folder exists
    mkdir -p $(dirname /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz)

    # Make job specific scratch dir
    idvar=$(echo $(basename $(dirname /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz))|sed 's/_R1.fastq.gz//g')
    echo -e "

Creating temporary directory $TMP/qfiltered/${idvar} ... "
mkdir -p $TMP/qfiltered/${idvar}

    # Move into scratch dir
    cd $TMP/qfiltered/${idvar}

    # Copy files
    echo -e "Copying /home/dail/metaGEM/workflow/dataset/sample1/sample1_R1.fastq.gz and /home/dail/metaGEM/workflow/dataset/sample1/sample1_R2.fastq.gz to $TMP/qfiltered/${idvar} ... "
    cp /home/dail/metaGEM/workflow/dataset/sample1/sample1_R1.fastq.gz /home/dail/metaGEM/workflow/dataset/sample1/sample1_R2.fastq.gz .

    echo -e "Appending .raw to temporary input files to avoid name conflict ... "
    for file in *.gz; do mv -- "$file" "${file}.raw.gz"; done

    # Run fastp
    echo -n "Running fastp ... "
    fastp --thread 4             -i *R1*raw.gz             -I *R2*raw.gz             -o $(basename /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz)             -O $(basename /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R2.fastq.gz)             -j $(dirname /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz)/$(echo $(basename $(dirname /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz))).json             -h $(dirname /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz)/$(echo $(basename $(dirname /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz))).html

    # Move output files to root dir
    echo -e "Moving output files $(basename /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz) and $(basename /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R2.fastq.gz) to $(dirname /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz)"
    mv $(basename /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz) $(basename /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R2.fastq.gz) $(dirname /home/dail/metaGEM/workflow/qfiltered/sample1/sample1_R1.fastq.gz)

    # Warning
    echo -e "Note that you must manually clean up these temporary directories if your scratch directory points to a static location instead of variable with a job specific location ... "

    # Done message
    echo -e "Done quality filtering sample ${idvar}"
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message
Complete log: /home/dail/metaGEM/workflow/.snakemake/log/2023-08-28T200816.260324.snakemake.log

@flefler flefler added the bug Something isn't working label Aug 29, 2023
@flefler
Copy link
Author

flefler commented Aug 29, 2023

I ran createFolders before this. After running (metagem) dail@swamp:~/metaGEM/workflow$ bash /home/dail/metaGEM/workflow/metaGEM.sh -t fastp -l the qfiltered folder disappears.

@franciscozorrilla
Copy link
Owner

franciscozorrilla commented Aug 29, 2023

Hi Forrest,

Thanks for the extensive report, I think I have an idea what might be going on, looks like the important error message is this:

/usr/bin/bash: line 2: activate: No such file or directory

Which means you probably have the metagem environment set up but maybe its not under the envs/ subdirectory in the root folder. Please run conda list to see the path to your metagem env, and then make sure to replace the defaultenvs/metagem field in the config.yaml file to point towards the appropriate environment path:

envs:
metagem: envs/metagem

Hope this helps and please let me know if you have further questions.

Best,
Francisco

p.s. if possible, its recommended to run metaGEM on a high performance computer cluster rather than local workstation

p.p.s. this is expected behavior btw, since the qfilter rule fails then Snakemake deletes the output folder as a precaution.

I ran createFolders before this. After running (metagem) dail@swamp:~/metaGEM/workflow$ bash /home/dail/metaGEM/workflow/metaGEM.sh -t fastp -l the qfiltered folder disappears.

@franciscozorrilla franciscozorrilla added question Further information is requested and removed bug Something isn't working labels Aug 29, 2023
@flefler
Copy link
Author

flefler commented Aug 30, 2023

Hi Francisco,

Thanks for getting back to me so quickly, this resolved my problems with fastp and crossMapSeries. However, I a similar problem with the binRefine step, see error below. I am using a server that will be able to handle my data (not that many samples), but it does not have slurm setup; hence the -l usage. I tried to run without the -l flag to no avail.

I silenced the lines that activate the conda env, as suggest in a previous issue. #104 (comment). Silenced or not, this step gives an error.

I did configure GTDB-tk and CheckM.

(metagem) dail@swamp:~/metaGEM/workflow$ bash metaGEM.sh -t binRefine -l

=================================================================================================================================
Developed by: Francisco Zorrilla, Kiran R. Patil, and Aleksej Zelezniak___________________________________________________________
Publication: doi.org/10.1101/2020.12.31.424982___________________________/\\\/\\\\/\/\
/\///////////\////////////\_/\_
/\/\/\/\//\
/\//\

/\_/\_
/\\/\\\
/\\_/\/\\/\\\
/\///\/\/
/\

/\///\///\/\/////\////\////
////////\/\
/////\/\///////
/\
///\/
/\

/\//\/\/\\\
/\/\\_/\/\/\/\____////\_
_______/\/\/\
//////////\/_/\/////\/\_______/\
/\/\_/\
_/\/\/\//\\_//\_
//\\/_
//\\\/
/\\\\/\____________/\
//////////////////////////////////___////////////
///////////////
///
___________///

A Snakemake-based pipeline desinged to predict metabolic interactions directly from metagenomics data using high performance computer clusters

Version: 1.0.5

Setting current directory to root in config.yaml file ...

Parsing Snakefile to target rule: binRefine ...

Do you wish to continue with these parameters? (y/n)y
Proceeding with binRefine job(s) ...

Please verify parameters set in the config.yaml file:

path:
root: /home/dail/metaGEM/workflow
scratch: /home/dail/metaGEM/workflow/tmp
folder:
data: dataset
logs: logs
assemblies: assemblies
scripts: scripts
crossMap: crossMap
concoct: concoct
maxbin: maxbin
metabat: metabat
refined: refined_bins
reassembled: reassembled_bins
classification: GTDBTk
abundance: abundance
GRiD: GRiD
GEMs: GEMs
SMETANA: SMETANA
memote: memote
qfiltered: qfiltered
stats: stats
proteinBins: protein_bins
dnaBins: dna_bins
pangenome: pangenome
kallisto: kallisto
kallistoIndex: kallistoIndex
benchmarks: benchmarks
prodigal: prodigal
blastp: blastp
blastp_db: blastp_db
scripts:
kallisto2concoct: kallisto2concoct.py
prepRoary: prepareRoaryInput.R
binFilter: binFilter.py
qfilterVis: qfilterVis.R
assemblyVis: assemblyVis.R
binningVis: binningVis.R
modelVis: modelVis.R
compositionVis: compositionVis.R
taxonomyVis: taxonomyVis.R
carveme: media_db.tsv
toy: download_toydata.txt
GTDBtkVis:
cores:
fastp: 4
megahit: 12
crossMap: 12
concoct: 12
metabat: 12
maxbin: 12
refine: 12
reassemble: 12
classify: 2
gtdbtk: 12
abundance: 12
carveme: 4
smetana: 12
memote: 4
grid: 12
prokka: 2
roary: 12
diamond: 12
params:
cutfasta: 10000
assemblyPreset: meta-sensitive
assemblyMin: 1000
concoct: 800
metabatMin: 50000
seed: 420
minBin: 1500
refineMem: 1600
refineComp: 50
refineCont: 10
reassembleMem: 1600
reassembleComp: 50
reassembleCont: 10
carveMedia: M8
smetanaMedia: M1,M2,M3,M4,M5,M7,M8,M9,M10,M11,M13,M14,M15A,M15B,M16
smetanaSolver: CPLEX
roaryI: 90
roaryCD: 90
envs:
metagem: /home/dail/metaGEM/workflow/envs/metagem
metawrap: /home/dail/metaGEM/workflow/envs/metawrap
prokkaroary: /home/dail/metaGEM/workflow/envs/prokkaroary

Please pay close attention to make sure that your paths are properly configured!
Do you wish to proceed with this config.yaml file? (y/n)y

Unlocking snakemake ...
Unlocking working directory.

Dry-running snakemake jobs ...
Building DAG of jobs...
Job counts:
count jobs
1 all
1 binRefine
1 concoct
1 maxbinCross
4

[Tue Aug 29 21:21:56 2023]
rule maxbinCross:
input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/maxbin/sample1/cov
output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins
jobid: 7
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.maxbin.benchmark.txt
wildcards: IDs=sample1

[Tue Aug 29 21:21:56 2023]
rule concoct:
input: /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv, /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz
output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins
jobid: 2
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.concoct.benchmark.txt
wildcards: IDs=sample1

[Tue Aug 29 21:21:56 2023]
rule binRefine:
input: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins, /home/dail/metaGEM/workflow/metabat/sample1/sample1.metabat-bins, /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins
output: /home/dail/metaGEM/workflow/refined_bins/sample1
jobid: 1
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.binRefine.benchmark.txt
wildcards: IDs=sample1

[Tue Aug 29 21:21:56 2023]
Job 0:
WARNING: Be very careful when adding/removing any lines above this message.
The metaGEM.sh parser is presently hardcoded to edit line 22 of this Snakefile to expand target rules accordingly,
therefore adding/removing any lines before this message will likely result in parser malfunction.

Job counts:
count jobs
1 all
1 binRefine
1 concoct
1 maxbinCross
4
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
Do you wish to submit this batch of jobs on your local machine? (y/n)y
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Conda environments: ignored
Job counts:
count jobs
1 all
1 binRefine
1 concoct
1 maxbinCross
4
Select jobs to execute...
Failed to solve scheduling problem with ILP solver. Falling back to greedy solver. Run Snakemake with --verbose to see the full solver output for debugging the problem.

[Tue Aug 29 21:21:58 2023]
rule maxbinCross:
input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/maxbin/sample1/cov
output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins
jobid: 7
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.maxbin.benchmark.txt
wildcards: IDs=sample1

[Tue Aug 29 21:21:58 2023]
rule concoct:
input: /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv, /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz
output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins
jobid: 2
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.concoct.benchmark.txt
wildcards: IDs=sample1

Creating temporary directory /home/dail/metaGEM/workflow/tmp/maxbin/sample1 ...

Creating temporary directory /home/dail/metaGEM/workflow/tmp/concoct/sample1 ...

Unzipping assembly ...
Unzipping assembly ...
gzip: contigs.fasta already exists; not overwritten
gzip: contigs.fasta already exists; not overwritten
[Tue Aug 29 21:21:58 2023]
Error in rule maxbinCross:
jobid: 7
output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins
shell:

    # Activate metagem environment
    #set +u;source activate /home/dail/metaGEM/workflow/envs/metagem;set -u;

    # Create output folder
    mkdir -p $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)

    # Make job specific scratch dir
    fsampleID=$(echo $(basename $(dirname /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)))
    echo -e "

Creating temporary directory /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID} ... "
mkdir -p /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID}

    # Move into scratch dir
    cd /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID}

    # Copy files to tmp
    cp -r /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz /home/dail/metaGEM/workflow/maxbin/sample1/cov/*.depth .

    echo -e "

Unzipping assembly ... "
gunzip $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)

    echo -e "

Generating list of depth files based on crossMapSeries rule output ... "
find . -name "*.depth" > abund.list

    echo -e "

Running maxbin2 ... "
run_MaxBin.pl -thread 12 -contig contigs.fasta -out $(basename $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)) -abund_list abund.list

    # Clean up un-needed files
    rm *.depth abund.list contigs.fasta

    # Move files into output dir
    mkdir -p $(basename /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)
    while read bin;do mv $bin $(basename /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins);done< <(ls|grep fasta)
    mv * $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Job failed, going on with independent jobs.
[Tue Aug 29 21:21:58 2023]
Error in rule concoct:
jobid: 2
output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins
shell:

    # Activate metagem environment
    #set +u;source activate /home/dail/metaGEM/workflow/envs/metagem;set -u;

    # Create output folder
    mkdir -p $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)

    # Make job specific scratch dir
    sampleID=$(echo $(basename $(dirname /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)))
    echo -e "

Creating temporary directory /home/dail/metaGEM/workflow/tmp/concoct/${sampleID} ... "
mkdir -p /home/dail/metaGEM/workflow/tmp/concoct/${sampleID}

    # Move into scratch dir
    cd /home/dail/metaGEM/workflow/tmp/concoct/${sampleID}

    # Copy files
    cp /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv .

    echo "Unzipping assembly ... "
    gunzip $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)

    echo -e "Done. 

Cutting up contigs (default 10kbp chunks) ... "
cut_up_fasta.py -c 10000 -o 0 -m $(echo $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)|sed 's/.gz//') > assembly_c10k.fa

    echo -e "

Running CONCOCT ... "
concoct --coverage_file $(basename /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv) --composition_file assembly_c10k.fa -b $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)) -t 12 -c 800

    echo -e "

Merging clustering results into original contigs ... "
merge_cutup_clustering.py $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_gt1000.csv > $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_merged.csv

    echo -e "

Extracting bins ... "
mkdir -p $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)
extract_fasta_bins.py $(echo $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)|sed 's/.gz//') $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_merged.csv --output_path $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)

    # Move final result files to output folder
    mv $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins) *.txt *.csv $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message
Complete log: /home/dail/metaGEM/workflow/.snakemake/log/2023-08-29T212157.875193.snakemake.log
(metagem) dail@swamp:~/metaGEM/workflow$

@flefler flefler closed this as completed Aug 30, 2023
@franciscozorrilla
Copy link
Owner

Hi Forrest,

Based on your error messages it looks like concoct and maxbin fail before even getting to the binRefine stage.
It is possible that you may have tmp files from a previous job run getting in the way:

gzip: contigs.fasta already exists; not overwritten
gzip: contigs.fasta already exists; not overwritten

Could you try deleting everything inside the tmp folder, and then re-running the jobs? e.g. rm -r tmp/*
Also, could you check the log files for those specific concoct/maxbin job runs to see if there is a more specific error message? Normally these would be generated inside your logs/ folder, but not sure if that is still the case with with local flag usage.

Sorry you are having trouble with this. For some background: the workflow was tested and designed to run on slurm-based HPC infrastructures, with some users even expanding to qsub-based infrastructures. However, since I do not have access to workstations, I have not developed support for this kind of infrastructure. The local command/flag is mostly meant for troubleshooting, debugging, testing, etc. I am happy, however, to help you to the best of my ability to get metaGEM running on your workstation 💎

@franciscozorrilla franciscozorrilla changed the title [Bug]: error running fastp [Usage]: running workflow on workstation with local flag Aug 30, 2023
@flefler
Copy link
Author

flefler commented Aug 30, 2023

Hi Francisco,

I really appreciate your help! I deleted all files and started from dataset. I ran bash metaGEM.sh -t fastp -l followed by bash metaGEM.sh -t crossMapSeries -l with no errors. Then with bash metaGEM.sh -t binRefine -l i have a different (maybe more helpful?) error, see below. Just FYI running -l does produce a log file, but it is much shorter that this output.

(metagem) dail@swamp:~/metaGEM/workflow$ bash metaGEM.sh -t binRefine -l

=================================================================================================================================
Developed by: Francisco Zorrilla, Kiran R. Patil, and Aleksej Zelezniak___________________________________________________________
Publication: doi.org/10.1101/2020.12.31.424982___________________________/\\\/\\\\/\/\
/\///////////\////////////\_/\_
/\/\/\/\//\
/\//\

/\_/\_
/\\/\\\
/\\_/\/\\/\\\
/\///\/\/
/\

/\///\///\/\/////\////\////
////////\/\
/////\/\///////
/\
///\/
/\

/\//\/\/\\\
/\/\\_/\/\/\/\____////\_
_______/\/\/\
//////////\/_/\/////\/\_______/\
/\/\_/\
_/\/\/\//\\_//\_
//\\/_
//\\\/
/\\\\/\____________/\
//////////////////////////////////___////////////
///////////////
///
___________///

A Snakemake-based pipeline desinged to predict metabolic interactions directly from metagenomics data using high performance computer clusters

Version: 1.0.5

Setting current directory to root in config.yaml file ...

Parsing Snakefile to target rule: binRefine ...

Do you wish to continue with these parameters? (y/n)y
Proceeding with binRefine job(s) ...

Please verify parameters set in the config.yaml file:

path:
root: /home/dail/metaGEM/workflow
scratch: /home/dail/metaGEM/workflow/tmp
folder:
data: dataset
logs: logs
assemblies: assemblies
scripts: scripts
crossMap: crossMap
concoct: concoct
maxbin: maxbin
metabat: metabat
refined: refined_bins
reassembled: reassembled_bins
classification: GTDBTk
abundance: abundance
GRiD: GRiD
GEMs: GEMs
SMETANA: SMETANA
memote: memote
qfiltered: qfiltered
stats: stats
proteinBins: protein_bins
dnaBins: dna_bins
pangenome: pangenome
kallisto: kallisto
kallistoIndex: kallistoIndex
benchmarks: benchmarks
prodigal: prodigal
blastp: blastp
blastp_db: blastp_db
scripts:
kallisto2concoct: kallisto2concoct.py
prepRoary: prepareRoaryInput.R
binFilter: binFilter.py
qfilterVis: qfilterVis.R
assemblyVis: assemblyVis.R
binningVis: binningVis.R
modelVis: modelVis.R
compositionVis: compositionVis.R
taxonomyVis: taxonomyVis.R
carveme: media_db.tsv
toy: download_toydata.txt
GTDBtkVis:
cores:
fastp: 4
megahit: 12
crossMap: 12
concoct: 12
metabat: 12
maxbin: 12
refine: 12
reassemble: 12
classify: 2
gtdbtk: 12
abundance: 12
carveme: 4
smetana: 12
memote: 4
grid: 12
prokka: 2
roary: 12
diamond: 12
params:
cutfasta: 10000
assemblyPreset: meta-sensitive
assemblyMin: 1000
concoct: 800
metabatMin: 50000
seed: 420
minBin: 1500
refineMem: 1600
refineComp: 50
refineCont: 10
reassembleMem: 1600
reassembleComp: 50
reassembleCont: 10
carveMedia: M8
smetanaMedia: M1,M2,M3,M4,M5,M7,M8,M9,M10,M11,M13,M14,M15A,M15B,M16
smetanaSolver: CPLEX
roaryI: 90
roaryCD: 90
envs:
metagem: /home/dail/metaGEM/workflow/envs/metagem
metawrap: /home/dail/metaGEM/workflow/envs/metawrap
prokkaroary: /home/dail/metaGEM/workflow/envs/prokkaroary

Please pay close attention to make sure that your paths are properly configured!
Do you wish to proceed with this config.yaml file? (y/n)y

Unlocking snakemake ...
Unlocking working directory.

Dry-running snakemake jobs ...
Building DAG of jobs...
Job counts:
count jobs
1 all
1 binRefine
1 concoct
1 maxbinCross
1 metabatCross
5

[Wed Aug 30 09:50:51 2023]
rule maxbinCross:
input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/maxbin/sample1/cov
output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins
jobid: 7
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.maxbin.benchmark.txt
wildcards: IDs=sample1

[Wed Aug 30 09:50:51 2023]
rule concoct:
input: /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv, /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz
output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins
jobid: 2
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.concoct.benchmark.txt
wildcards: IDs=sample1

[Wed Aug 30 09:50:51 2023]
rule metabatCross:
input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/metabat/sample1/cov
output: /home/dail/metaGEM/workflow/metabat/sample1/sample1.metabat-bins
jobid: 5
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.metabat.benchmark.txt
wildcards: IDs=sample1

[Wed Aug 30 09:50:51 2023]
rule binRefine:
input: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins, /home/dail/metaGEM/workflow/metabat/sample1/sample1.metabat-bins, /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins
output: /home/dail/metaGEM/workflow/refined_bins/sample1
jobid: 1
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.binRefine.benchmark.txt
wildcards: IDs=sample1

[Wed Aug 30 09:50:51 2023]
Job 0:
WARNING: Be very careful when adding/removing any lines above this message.
The metaGEM.sh parser is presently hardcoded to edit line 22 of this Snakefile to expand target rules accordingly,
therefore adding/removing any lines before this message will likely result in parser malfunction.

Job counts:
count jobs
1 all
1 binRefine
1 concoct
1 maxbinCross
1 metabatCross
5
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
Do you wish to submit this batch of jobs on your local machine? (y/n)y
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Conda environments: ignored
Job counts:
count jobs
1 all
1 binRefine
1 concoct
1 maxbinCross
1 metabatCross
5
Select jobs to execute...
Failed to solve scheduling problem with ILP solver. Falling back to greedy solver. Run Snakemake with --verbose to see the full solver output for debugging the problem.

[Wed Aug 30 09:50:53 2023]
rule maxbinCross:
input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/maxbin/sample1/cov
output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins
jobid: 7
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.maxbin.benchmark.txt
wildcards: IDs=sample1

[Wed Aug 30 09:50:53 2023]
rule concoct:
input: /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv, /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz
output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins
jobid: 2
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.concoct.benchmark.txt
wildcards: IDs=sample1

[Wed Aug 30 09:50:53 2023]
rule metabatCross:
input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/metabat/sample1/cov
output: /home/dail/metaGEM/workflow/metabat/sample1/sample1.metabat-bins
jobid: 5
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.metabat.benchmark.txt
wildcards: IDs=sample1

Creating temporary directory /home/dail/metaGEM/workflow/tmp/maxbin/sample1 ...

Creating temporary directory /home/dail/metaGEM/workflow/tmp/concoct/sample1 ...

Creating temporary directory /home/dail/metaGEM/workflow/tmp/metabat/sample1 ...
Unzipping assembly ...

Unzipping assembly ...
Done.
Cutting up contigs (default 10kbp chunks) ...

Running metabat2 ...
MetaBAT 2 (2.15 (Bioconda)) using minContig 1500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, maxEdges 200 and minClsSize 50000. with random seed=420
[00:00:00] Executing with 1 threads
[00:00:00] Parsing abundance file
[00:00:00] Parsing assembly file

Generating list of depth files based on crossMapSeries rule output ...

Running maxbin2 ...
Can't locate LWP/Simple.pm in @inc (you may need to install the LWP::Simple module) (@inc contains: /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/site_perl/5.36/x86_64-linux-thread-multi /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/site_perl/5.36 /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/5.36/x86_64-linux-thread-multi /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/5.36 /home/linuxbrew/.linuxbrew/lib/perl5/site_perl/5.36/x86_64-linux-thread-multi /home/linuxbrew/.linuxbrew/lib/perl5/site_perl/5.36) at /home/dail/metaGEM/workflow/envs/metagem/bin/run_MaxBin.pl line 4.
BEGIN failed--compilation aborted at /home/dail/metaGEM/workflow/envs/metagem/bin/run_MaxBin.pl line 4.
[00:00:00] Number of large contigs >= 1500 are 3667.
[00:00:00] Reading abundance file
[Wed Aug 30 09:50:53 2023]
Error in rule maxbinCross:
jobid: 7
output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins
shell:

    # Activate metagem environment
    #set +u;source activate /home/dail/metaGEM/workflow/envs/metagem;set -u;

    # Create output folder
    mkdir -p $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)

    # Make job specific scratch dir
    fsampleID=$(echo $(basename $(dirname /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)))
    echo -e "

Creating temporary directory /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID} ... "
mkdir -p /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID}

    # Move into scratch dir
    cd /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID}

    # Copy files to tmp
    cp -r /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz /home/dail/metaGEM/workflow/maxbin/sample1/cov/*.depth .

    echo -e "

Unzipping assembly ... "
gunzip $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)

    echo -e "

Generating list of depth files based on crossMapSeries rule output ... "
find . -name "*.depth" > abund.list

    echo -e "

Running maxbin2 ... "
run_MaxBin.pl -thread 12 -contig contigs.fasta -out $(basename $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)) -abund_list abund.list

    # Clean up un-needed files
    rm *.depth abund.list contigs.fasta

    # Move files into output dir
    mkdir -p $(basename /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)
    while read bin;do mv $bin $(basename /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins);done< <(ls|grep fasta)
    mv * $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Job failed, going on with independent jobs.
[00:00:00] Finished reading 7479 contigs and 1 coverages from sample1.all.depth
[00:00:00] Number of target contigs: 3667 of large (>= 1500) and 3812 of small ones (>=1000 & <1500).
[00:00:00] Start TNF calculation. nobs = 3667
[00:00:00] Finished TNF calculation.

Running CONCOCT ...
Up and running. Check /home/dail/metaGEM/workflow/tmp/concoct/sample1/sample1_log.txt for progress
[00:00:02] Preparing TNF Graph Building [pTNF = 99.9; 0 / 3667 (P = 0.00%) round 1] [00:00:02] Preparing TNF Graph Building [pTNF = 99.8; 6 / 3667 (P = 0.16%) round 2] [00:00:02] Preparing TNF Graph Building [pTNF = 99.7; 24 / 3667 (P = 0.65%) round 3] [00:00:02] Preparing TNF Graph Building [pTNF = 99.5; 44 / 3667 (P = 1.20%) round 4] [00:00:02] Preparing TNF Graph Building [pTNF = 99.4; 60 / 3667 (P = 1.64%) round 5] [00:00:02] Preparing TNF Graph Building [pTNF = 99.2; 100 / 3667 (P = 2.73%) round 6] [00:00:02] Preparing TNF Graph Building [pTNF = 99.0; 140 / 3667 (P = 3.82%) round 7] [00:00:02] Preparing TNF Graph Building [pTNF = 98.7; 198 / 3667 (P = 5.40%) round 8] [00:00:02] Preparing TNF Graph Building [pTNF = 98.4; 235 / 3667 (P = 6.41%) round 9] [00:00:02] Preparing TNF Graph Building [pTNF = 98.0; 274 / 3667 (P = 7.47%) round 10] [00:00:02] Preparing TNF Graph Building [pTNF = 97.6; 339 / 3667 (P = 9.24%) round 11] [00:00:02] Preparing TNF Graph Building [pTNF = 97.2; 379 / 3667 (P = 10.34%) round 12] [00:00:02] Preparing TNF Graph Building [pTNF = 96.8; 442 / 3667 (P = 12.05%) round 13] [00:00:02] Preparing TNF Graph Building [pTNF = 96.5; 489 / 3667 (P = 13.34%) round 14] [00:00:02] Preparing TNF Graph Building [pTNF = 96.2; 526 / 3667 (P = 14.34%) round 15] [00:00:02] Preparing TNF Graph Building [pTNF = 95.9; 581 / 3667 (P = 15.84%) round 16] [00:00:02] Preparing TNF Graph Building [pTNF = 95.6; 625 / 3667 (P = 17.04%) round 17] [00:00:02] Preparing TNF Graph Building [pTNF = 95.1; 716 / 3667 (P = 19.53%) round 18] [00:00:02] Preparing TNF Graph Building [pTNF = 94.8; 787 / 3667 (P = 21.46%) round 19] [00:00:02] Preparing TNF Graph Building [pTNF = 94.4; 872 / 3667 (P = 23.78%) round 20] [00:00:02] Preparing TNF Graph Building [pTNF = 94.1; 940 / 3667 (P = 25.63%) round 21] [00:00:02] Preparing TNF Graph Building [pTNF = 93.6; 1023 / 3667 (P = 27.90%) round 22] [00:00:02] Preparing TNF Graph Building [pTNF = 93.3; 1069 / 3667 (P = 29.15%) round 23] [00:00:02] Preparing TNF Graph Building [pTNF = 92.9; 1148 / 3667 (P = 31.31%) round 24] [00:00:02] Preparing TNF Graph Building [pTNF = 92.4; 1269 / 3667 (P = 34.61%) round 25] [00:00:02] Preparing TNF Graph Building [pTNF = 91.9; 1378 / 3667 (P = 37.58%) round 26] [00:00:02] Preparing TNF Graph Building [pTNF = 91.5; 1467 / 3667 (P = 40.01%) round 27] [00:00:02] Preparing TNF Graph Building [pTNF = 91.0; 1559 / 3667 (P = 42.51%) round 28] [00:00:02] Preparing TNF Graph Building [pTNF = 90.5; 1656 / 3667 (P = 45.16%) round 29] [00:00:02] Preparing TNF Graph Building [pTNF = 90.0; 1770 / 3667 (P = 48.27%) round 30] [00:00:02] Preparing TNF Graph Building [pTNF = 88.9; 1820 / 3667 (P = 49.63%) round 31] [00:00:02] Preparing TNF Graph Building [pTNF = 88.0; 1879 / 3667 (P = 51.24%) round 32] [00:00:02] Preparing TNF Graph Building [pTNF = 86.9; 1960 / 3667 (P = 53.45%) round 33] [00:00:02] Preparing TNF Graph Building [pTNF = 86.0; 2045 / 3667 (P = 55.77%) round 34] [00:00:02] Preparing TNF Graph Building [pTNF = 84.9; 2154 / 3667 (P = 58.74%) round 35] [00:00:02] Preparing TNF Graph Building [pTNF = 84.0; 2227 / 3667 (P = 60.73%) round 36] [00:00:02] Preparing TNF Graph Building [pTNF = 82.9; 2330 / 3667 (P = 63.54%) round 37] [00:00:02] Preparing TNF Graph Building [pTNF = 81.9; 2413 / 3667 (P = 65.80%) round 38] [00:00:02] Preparing TNF Graph Building [pTNF = 81.0; 2505 / 3667 (P = 68.31%) round 39] [00:00:02] Preparing TNF Graph Building [pTNF = 79.9; 2582 / 3667 (P = 70.41%) round 40] [00:00:02] Preparing TNF Graph Building [pTNF = 79.0; 2649 / 3667 (P = 72.24%) round 41] [00:00:02] Preparing TNF Graph Building [pTNF = 78.1; 2717 / 3667 (P = 74.09%) round 42] [00:00:02] Preparing TNF Graph Building [pTNF = 77.1; 2785 / 3667 (P = 75.95%) round 43] [00:00:02] Preparing TNF Graph Building [pTNF = 76.0; 2873 / 3667 (P = 78.35%) round 44] [00:00:02] Preparing TNF Graph Building [pTNF = 75.1; 2915 / 3667 (P = 79.49%) round 45] [00:00:02] Preparing TNF Graph Building [pTNF = 74.0; 2991 / 3667 (P = 81.57%) round 46] [00:00:02] Preparing TNF Graph Building [pTNF = 73.0; 3031 / 3667 (P = 82.66%) round 47] [00:00:02] Preparing TNF Graph Building [pTNF = 72.1; 3082 / 3667 (P = 84.05%) round 48] [00:00:02] Preparing TNF Graph Building [pTNF = 71.2; 3130 / 3667 (P = 85.36%) round 49] [00:00:03] Preparing TNF Graph Building [pTNF = 70.1; 3175 / 3667 (P = 86.58%) round 50] [00:00:03] Finished Preparing TNF Graph Building [pTNF = 69.20]
[00:00:03] Building TNF Graph 4.9% (178 of 3667), ETA 0:00:02 [63.7Gb / 125.5Gb] [00:00:03] Building TNF Graph 9.7% (356 of 3667), ETA 0:00:02 [63.7Gb / 125.5Gb] [00:00:03] Building TNF Graph 14.6% (534 of 3667), ETA 0:00:02 [63.7Gb / 125.5Gb] [00:00:03] Building TNF Graph 19.4% (712 of 3667), ETA 0:00:02 [63.7Gb / 125.5Gb] [00:00:03] Building TNF Graph 24.3% (890 of 3667), ETA 0:00:02 [63.7Gb / 125.5Gb] [00:00:03] Building TNF Graph 29.1% (1068 of 3667), ETA 0:00:02 [63.8Gb / 125.5Gb] [00:00:03] Building TNF Graph 34.0% (1246 of 3667), ETA 0:00:02 [63.7Gb / 125.5Gb] [00:00:03] Building TNF Graph 38.8% (1424 of 3667), ETA 0:00:01 [63.7Gb / 125.5Gb] [00:00:04] Building TNF Graph 43.7% (1602 of 3667), ETA 0:00:01 [63.7Gb / 125.5Gb] [00:00:04] Building TNF Graph 48.5% (1780 of 3667), ETA 0:00:01 [63.8Gb / 125.5Gb] [00:00:04] Building TNF Graph 53.4% (1958 of 3667), ETA 0:00:01 [63.8Gb / 125.5Gb] [00:00:04] Building TNF Graph 58.2% (2136 of 3667), ETA 0:00:01 [63.7Gb / 125.5Gb] [00:00:04] Building TNF Graph 63.1% (2314 of 3667), ETA 0:00:01 [63.7Gb / 125.5Gb] [00:00:04] Building TNF Graph 68.0% (2492 of 3667), ETA 0:00:01 [63.7Gb / 125.5Gb] [00:00:04] Building TNF Graph 72.8% (2670 of 3667), ETA 0:00:01 [63.8Gb / 125.5Gb] [00:00:04] Building TNF Graph 77.7% (2848 of 3667), ETA 0:00:01 [63.8Gb / 125.5Gb] Traceback (most recent call last):
File "/home/dail/metaGEM/workflow/envs/metagem/bin/concoct", line 90, in
results = main(args)
File "/home/dail/metaGEM/workflow/envs/metagem/bin/concoct", line 37, in main
transform_filter, pca = perform_pca(
File "/home/dail/metaGEM/workflow/envs/metagem/lib/python3.8/site-packages/concoct/transform.py", line 5, in perform_pca
pca_object = PCA(n_components=nc, random_state=seed).fit(d)
File "/home/dail/metaGEM/workflow/envs/metagem/lib/python3.8/site-packages/sklearn/base.py", line 1151, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/home/dail/metaGEM/workflow/envs/metagem/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 434, in fit
self._fit(X)
File "/home/dail/metaGEM/workflow/envs/metagem/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 483, in _fit
X = self._validate_data(
File "/home/dail/metaGEM/workflow/envs/metagem/lib/python3.8/site-packages/sklearn/base.py", line 579, in _validate_data
self._check_feature_names(X, reset=reset)
File "/home/dail/metaGEM/workflow/envs/metagem/lib/python3.8/site-packages/sklearn/base.py", line 440, in _check_feature_names
feature_names_in = _get_feature_names(X)
File "/home/dail/metaGEM/workflow/envs/metagem/lib/python3.8/site-packages/sklearn/utils/validation.py", line 2021, in _get_feature_names
raise TypeError(
TypeError: Feature names are only supported if all input features have string names, but your input has ['int', 'str'] as feature name / column name types. If you want feature names to be stored and validated, you must convert them all to strings, by using X.columns = X.columns.astype(str) for example. Otherwise you can remove feature / column names from your input data, or convert them all to a non-string data type.
[00:00:04] Building TNF Graph 82.5% (3026 of 3667), ETA 0:00:00 [63.8Gb / 125.5Gb] [Wed Aug 30 09:50:58 2023]
Error in rule concoct:
jobid: 2
output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins
shell:

    # Activate metagem environment
    #set +u;source activate /home/dail/metaGEM/workflow/envs/metagem;set -u;

    # Create output folder
    mkdir -p $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)

    # Make job specific scratch dir
    sampleID=$(echo $(basename $(dirname /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)))
    echo -e "

Creating temporary directory /home/dail/metaGEM/workflow/tmp/concoct/${sampleID} ... "
mkdir -p /home/dail/metaGEM/workflow/tmp/concoct/${sampleID}

    # Move into scratch dir
    cd /home/dail/metaGEM/workflow/tmp/concoct/${sampleID}

    # Copy files
    cp /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv .

    echo "Unzipping assembly ... "
    gunzip $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)

    echo -e "Done. 

Cutting up contigs (default 10kbp chunks) ... "
cut_up_fasta.py -c 10000 -o 0 -m $(echo $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)|sed 's/.gz//') > assembly_c10k.fa

    echo -e "

Running CONCOCT ... "
concoct --coverage_file $(basename /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv) --composition_file assembly_c10k.fa -b $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)) -t 12 -c 800

    echo -e "

Merging clustering results into original contigs ... "
merge_cutup_clustering.py $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_gt1000.csv > $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_merged.csv

    echo -e "

Extracting bins ... "
mkdir -p $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)
extract_fasta_bins.py $(echo $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)|sed 's/.gz//') $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_merged.csv --output_path $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)

    # Move final result files to output folder
    mv $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins) *.txt *.csv $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Job failed, going on with independent jobs.
[00:00:05] Building TNF Graph 87.4% (3204 of 3667), ETA 0:00:00 [63.7Gb / 125.5Gb] [00:00:05] Building TNF Graph 92.2% (3382 of 3667), ETA 0:00:00 [63.7Gb / 125.5Gb] [00:00:05] Building TNF Graph 97.1% (3560 of 3667), ETA 0:00:00 [63.7Gb / 125.5Gb] [00:00:05] Building TNF Graph 101.9% (3738 of 3667), ETA 0:00:00 [63.7Gb / 125.5Gb] [00:00:05] Finished Building TNF Graph (81835 edges) [63.7Gb / 125.5Gb]
[00:00:05] Applying coverage correlations to TNF graph with 81835 edges
[00:00:05] Traversing graph with 3667 nodes and 81835 edges
[00:00:05] Building SCR Graph and Binning (349 vertices and 964 edges) [P = 9.50%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (697 vertices and 2061 edges) [P = 19.00%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (1046 vertices and 2821 edges) [P = 28.50%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (1394 vertices and 3865 edges) [P = 38.00%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (1742 vertices and 5005 edges) [P = 47.50%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (2091 vertices and 6557 edges) [P = 57.00%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (2439 vertices and 8265 edges) [P = 66.50%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (2787 vertices and 11709 edges) [P = 76.00%; 63.7Gb / 125.5Gb]
[00:00:05] Building SCR Graph and Binning (2988 vertices and 16894 edges) [P = 85.50%; 63.7Gb / 125.5Gb]
[00:00:05] Rescuing singleton large contigs
[00:00:05] There are 16 bins already
[00:00:05] Outputting bins
[00:00:05] 79.68% (11949444 bases) of large (>=1500) and 0.00% (0 bases) of small (<1500) contigs were binned.
16 bins (11949444 bases in total) formed.
[00:00:05] Finished
[Wed Aug 30 09:50:59 2023]
Finished job 5.
1 of 5 steps (20%) done
Exiting because a job execution failed. Look above for error message
Complete log: /home/dail/metaGEM/workflow/.snakemake/log/2023-08-30T095053.149690.snakemake.log

@flefler
Copy link
Author

flefler commented Aug 30, 2023

This is the log file

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Conda environments: ignored
Job counts:
count jobs
1 all
1 binRefine
1 concoct
1 maxbinCross
1 metabatCross
5
Select jobs to execute...
Failed to solve scheduling problem with ILP solver. Falling back to greedy solver. Run Snakemake with --verbose to see the full solver output for debugging the problem.

[Wed Aug 30 09:50:53 2023]
rule maxbinCross:
input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/maxbin/sample1/cov
output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins
jobid: 7
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.maxbin.benchmark.txt
wildcards: IDs=sample1

[Wed Aug 30 09:50:53 2023]
rule concoct:
input: /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv, /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz
output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins
jobid: 2
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.concoct.benchmark.txt
wildcards: IDs=sample1

[Wed Aug 30 09:50:53 2023]
rule metabatCross:
input: /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz, /home/dail/metaGEM/workflow/metabat/sample1/cov
output: /home/dail/metaGEM/workflow/metabat/sample1/sample1.metabat-bins
jobid: 5
benchmark: /home/dail/metaGEM/workflow/benchmarks/sample1.metabat.benchmark.txt
wildcards: IDs=sample1

[Wed Aug 30 09:50:53 2023]
Error in rule maxbinCross:
jobid: 7
output: /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins
shell:

    # Activate metagem environment
    #set +u;source activate /home/dail/metaGEM/workflow/envs/metagem;set -u;

    # Create output folder
    mkdir -p $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)

    # Make job specific scratch dir
    fsampleID=$(echo $(basename $(dirname /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)))
    echo -e "

Creating temporary directory /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID} ... "
mkdir -p /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID}

    # Move into scratch dir
    cd /home/dail/metaGEM/workflow/tmp/maxbin/${fsampleID}

    # Copy files to tmp
    cp -r /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz /home/dail/metaGEM/workflow/maxbin/sample1/cov/*.depth .

    echo -e "

Unzipping assembly ... "
gunzip $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)

    echo -e "

Generating list of depth files based on crossMapSeries rule output ... "
find . -name "*.depth" > abund.list

    echo -e "

Running maxbin2 ... "
run_MaxBin.pl -thread 12 -contig contigs.fasta -out $(basename $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)) -abund_list abund.list

    # Clean up un-needed files
    rm *.depth abund.list contigs.fasta

    # Move files into output dir
    mkdir -p $(basename /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)
    while read bin;do mv $bin $(basename /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins);done< <(ls|grep fasta)
    mv * $(dirname /home/dail/metaGEM/workflow/maxbin/sample1/sample1.maxbin-bins)
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Job failed, going on with independent jobs.
[Wed Aug 30 09:50:58 2023]
Error in rule concoct:
jobid: 2
output: /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins
shell:

    # Activate metagem environment
    #set +u;source activate /home/dail/metaGEM/workflow/envs/metagem;set -u;

    # Create output folder
    mkdir -p $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)

    # Make job specific scratch dir
    sampleID=$(echo $(basename $(dirname /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)))
    echo -e "

Creating temporary directory /home/dail/metaGEM/workflow/tmp/concoct/${sampleID} ... "
mkdir -p /home/dail/metaGEM/workflow/tmp/concoct/${sampleID}

    # Move into scratch dir
    cd /home/dail/metaGEM/workflow/tmp/concoct/${sampleID}

    # Copy files
    cp /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv .

    echo "Unzipping assembly ... "
    gunzip $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)

    echo -e "Done. 

Cutting up contigs (default 10kbp chunks) ... "
cut_up_fasta.py -c 10000 -o 0 -m $(echo $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)|sed 's/.gz//') > assembly_c10k.fa

    echo -e "

Running CONCOCT ... "
concoct --coverage_file $(basename /home/dail/metaGEM/workflow/concoct/sample1/cov/coverage_table.tsv) --composition_file assembly_c10k.fa -b $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)) -t 12 -c 800

    echo -e "

Merging clustering results into original contigs ... "
merge_cutup_clustering.py $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_gt1000.csv > $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_merged.csv

    echo -e "

Extracting bins ... "
mkdir -p $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)
extract_fasta_bins.py $(echo $(basename /home/dail/metaGEM/workflow/assemblies/sample1/contigs.fasta.gz)|sed 's/.gz//') $(basename $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins))_clustering_merged.csv --output_path $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)

    # Move final result files to output folder
    mv $(basename /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins) *.txt *.csv $(dirname /home/dail/metaGEM/workflow/concoct/sample1/sample1.concoct-bins)
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Job failed, going on with independent jobs.
[Wed Aug 30 09:50:59 2023]
Finished job 5.
1 of 5 steps (20%) done
Exiting because a job execution failed. Look above for error message
Complete log: /home/dail/metaGEM/workflow/.snakemake/log/2023-08-30T095053.149690.snakemake.log

@franciscozorrilla
Copy link
Owner

Can't locate LWP/Simple.pm in @inc (you may need to install the LWP::Simple module) (@inc contains: /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/site_perl/5.36/x86_64-linux-thread-multi /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/site_perl/5.36 /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/5.36/x86_64-linux-thread-multi /home/linuxbrew/.linuxbrew/opt/perl/lib/perl5/5.36 /home/linuxbrew/.linuxbrew/lib/perl5/site_perl/5.36/x86_64-linux-thread-multi /home/linuxbrew/.linuxbrew/lib/perl5/site_perl/5.36) at /home/dail/metaGEM/workflow/envs/metagem/bin/run_MaxBin.pl line 4.

Regarding the above maxbin errors, looks like you are missing some perl library. Try could try and install it manually. Looks like others have reported this error message, maybe read throught this issue here: metagenome-atlas/atlas#328

raise TypeError(
TypeError: Feature names are only supported if all input features have string names, but your input has ['int', 'str'] as feature name / column name types.

Regarding the above concoct errors, looks like it has to do with the sklearn version, see these issues for more details on how to solve this BinPro/CONCOCT#323 BinPro/CONCOCT#322

Regarding metabat2, from the logs it looks like it may have succesfully generated some draft bins.

After fixing the issues with concoct and maxbin then you should be able to run the refinement and reassembly modules. Remember to try cleaning up the tmp dir in between runs to avoid any existing intermediate file issues that may cause problems.
Let me know if this helps.

Best,
Francisco

@wupeng1998
Copy link

Hi, I'm using pbs to submit jobs to the server, but as with the command line alone, I have to use the "--local" flag, otherwise I get "This was a dry-run (flag -n). The order of jobs does not reflect the order of execution." and no program will run.
Here is my PBS command, I have to specify the "--local" flag:
#PBS -N draft_bin
#PBS -o draft_bin.log
#PBS -j oe
#PBS -l walltime=10000:00:00
#PBS -l nodes=1:ppn=48
#PBS -l mem=500gb
#PBS -q high

projectname="metaGEM_comand"
project="/public/home/wangjj/WP/metaGEM/workflow"

cd $project

echo -e ""
echo -e ""
source activate metagem

yes | bash metaGEM.sh -t fastp -j 2 -c 120 -m 500 -h 24 --local
echo -e ""
echo -e " fastp Finish...!!! "
echo -e "
"
yes | bash metaGEM.sh -t megahit -j 2 -c 120 -m 500 -h 24 --local
echo -e ""
echo -e " megahit Finish...!!! "
echo -e "
"
yes | bash metaGEM.sh -t crossMapSeries -j 2 -c 120 -m 500 -h 24 --local
echo -e ""
echo -e " crossMapSeries Finish...!!! "
echo -e "
"
yes | bash metaGEM.sh -t concoct -j 2 -c 120 -m 500 -h 24 --local

@wupeng1998
Copy link

In addition to that, I also found that when using "--local", it seems that no matter how many nodes are applied to the server, the program will only run on one node, and not multiple nodes at the same time in parallel, because I got an out of memory error when I ran GTDB-TK, but in fact, I applied 10 24-core 96GB nodes when I ran GTDB-TK with "--local", and it shouldn't have generated this error:
[2023-10-17 15:35:05] INFO: Masked bacterial alignment from 41,084 to 5,037 AAs.
[2023-10-17 15:35:05] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA.
[2023-10-17 15:35:05] INFO: Creating concatenated alignment for 45,560 bacterial GTDB and user genomes.
[2023-10-17 15:35:06] INFO: Creating concatenated alignment for 5 bacterial user genomes.
[2023-10-17 15:35:06] INFO: Done.
[2023-10-17 15:35:06] WARNING: pplacer requires ~215 GB of RAM to fully load the bacterial tree into memory. However, 65.44 GB was detected. This may affect pplacer performance, or fail if there is insufficient swap space.
[2023-10-17 15:35:06] TASK: Placing 5 bacterial genomes into reference tree with pplacer using 48 CPUs (be patient).
[2023-10-17 15:35:06] INFO: pplacer version: v1.1.alpha19-0-g807f6f3

@wupeng1998
Copy link

@franciscozorrilla Do you know how to solve this problem?

@wupeng1998
Copy link

wupeng1998 commented Oct 19, 2023

@franciscozorrilla I think it might be a problem with the conditional judgment here, but I don't know why I get this message when I don't set the "--local" flag: "This was a dry-run (flag -n). The order of jobs does not reflect the order of execution."

In,metagem.sh

Parse snakefile for cluster/local jobs

elif [ $task == "fastp" ]; then
string='expand(config["path"]["root"]+"/"+config["folder"]["qfiltered"]+"/{IDs}/{IDs}_R1.fastq.gz", IDs = IDs)'
if [ $local == "true" ]; then
submitLocal
else
submitCluster
fi

@franciscozorrilla
Copy link
Owner

franciscozorrilla commented Oct 19, 2023

Hey Wupeng,

but as with the command line alone, I have to use the "--local" flag

I do not understand the reasoning here. Why are you trying to use the local flag when submitting jobs to the cluster? In general, you should never run jobs locally on the cluster. The local flag is only for usage with workstations, where there is no scheduler or other users.

In addition to that, I also found that when using "--local", it seems that no matter how many nodes are applied to the server, the program will only run on one node, and not multiple nodes at the same time in parallel, because I got an out of memory error when I ran GTDB-TK, but in fact, I applied 10 24-core 96GB nodes when I ran GTDB-TK with "--local", and it shouldn't have generated this error:

This behavior is exactly as expected. To submit jobs to the cluster, instead of running them locally, remove the --local flag. In fact, if you add the local flag, Snakemake will launch the jobs directly in the node you are running and that is why you are running out of memory.

I'm using pbs to submit jobs to the server

If you are using a pbs cluster then have a look a this discussion and this fork. @fbartusch modified the metaGEM.sh wrapper file and cluster config file to allow submission on qsub, if I were you I would look at those modifications and apply them to your files as well

Hope this helps and let me know if you have further questions!
Best,
Francisco

p.s. feel free to open up a new issue/discussion :)

@wupeng1998
Copy link

Hey Wupeng,

but as with the command line alone, I have to use the "--local" flag

I do not understand the reasoning here. Why are you trying to use the local flag when submitting jobs to the cluster? In general, you should never run jobs locally on the cluster. The local flag is only for usage with workstations, where there is no scheduler or other users.

In addition to that, I also found that when using "--local", it seems that no matter how many nodes are applied to the server, the program will only run on one node, and not multiple nodes at the same time in parallel, because I got an out of memory error when I ran GTDB-TK, but in fact, I applied 10 24-core 96GB nodes when I ran GTDB-TK with "--local", and it shouldn't have generated this error:

This behavior is exactly as expected. To submit jobs to the cluster, instead of running them locally, remove the --local flag. In fact, if you add the local flag, Snakemake will launch the jobs directly in the node you are running and that is why you are running out of memory.

I'm using pbs to submit jobs to the server

If you are using a pbs cluster then have a look a this discussion and this fork. @fbartusch modified the metaGEM.sh wrapper file and cluster config file to allow submission on qsub, if I were you I would look at those modifications and apply them to your files as well

Hope this helps and let me know if you have further questions! Best, Francisco

p.s. feel free to open up a new issue/discussion :)

Thanks for your reply, I did find this problem too, when I unchecked "--local", the nohup.out file mentions that the "sbatch" command can't be found, so as you mentioned, it's indeed a problem with qsub! I will take the next step as you suggested, thanks again!

@wupeng1998
Copy link

Hello, when I set qsub like this, the following error occurred:
Tue Oct 24 23:18:39 2023]
rule binReassemble:
input: /public/home/wangjj/WP/metaGEM/workflow/qfiltered/mergedYL/mergedYL_R1.fastq.gz, /public/home/wangjj/WP/metaGEM/workflow/qfiltered/mergedYL/mergedYL_R2.fastq.gz, /public/home/wangjj/WP/metaGEM/workflow/refined_bins/mergedYL
output: /public/home/wangjj/WP/metaGEM/workflow/reassembled_bins/mergedYL
jobid: 1
benchmark: /public/home/wangjj/WP/metaGEM/workflow/benchmarks/mergedYL.binReassemble.benchmark.txt
wildcards: IDs=mergedYL

RuleException in line 791 of /public/home/wangjj/WP/metaGEM/workflow/Snakefile:
IndexError: tuple index out of range
File "/public/home/wangjj/WP/metaGEM/workflow/envs/metagem/lib/python3.10/site-packages/snakemake/executors/init.py", line 136, in run_jobs
File "/public/home/wangjj/WP/metaGEM/workflow/envs/metagem/lib/python3.10/site-packages/snakemake/executors/init.py", line 969, in run

@franciscozorrilla
Copy link
Owner

Hi wupeng, it sounds like your issues are no longer related to the original post. Please feel free to open up a new issue and provide more details. Based on the error message, it seems like Snakemake is not properly communicating with your cluster. First make sure that Snakemake is working properly, you should be able to submit/run simple Snakemake jobs on your cluster before trying to use metaGEM. For reference, please see the Snakemake docs and tutorial.

@wupeng1998
Copy link

Ok,i know what you means. But the snakemake is work well when i use the --local flag.so i think there maybe not the snakemake false

@franciscozorrilla
Copy link
Owner

Please, do not run jobs on the login node of your cluster using the --local flag. This is improper and harmful usage of the cluster, and you will probably get complaints from your HPC admins. Snakemake is supposed to communicate with your HPC job scheduler and submit jobs to compute nodes, as opposed to launching them on the login node which is what you are doing with the --local flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants