-
Notifications
You must be signed in to change notification settings - Fork 2
/
asscom2
executable file
Β·203 lines (139 loc) Β· 9.07 KB
/
asscom2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
#!/usr/bin/env python
__author__ = "Carl M. Kobel"
__version__ = "2.7.1"
"""
This is the launcher script for assemblycomparator2.
It has two functions.
1) It checks that the necessary environment variables has been set. If not, it uses reasonable defaults.
2) Then it calls the snakemake pipeline using the command line arguments passed to this here script.
Because any command line arguments are passed directly to the subprocess.run()-snakemake call in the bottom of this script, you can use any snakemake-compatible command line argument to this script. E.g: `asscom2 --dry-run`.
This "binary" file works instead of the alias that has been previously used.
"""
import logging
import os
import sys
import subprocess
import shutil
# Create logger that prints to the terminal.
logger = logging.getLogger('asscom2_launcher')
logger.setLevel(logging.INFO)
console_handler = logging.StreamHandler()
#console_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
console_handler.setFormatter(formatter)
logger.addHandler(console_handler)
help_message = f"""
NAME
Assemblycomparator2 (a.k.a. "asscom2") - Genomes to report pipeline
SYNOPSIS
asscom2 [ --config KEY=VALUE [KEY2=VALUE]... ] [ --until RULE [RULE2]... ]
[ --forcerun RULE [RULE2]... ]
[ --dry-run ]
[ --version ] [ --help ]
DESCRIPTION
Analyses bacterial and archaeal genomes.
Based on snakemake, so you can use the same command line arguments.
Please check github.com/cmkobel/assemblycomparator2 for updates, help and to track issues.
OPTIONS
--config KEY=VALUE [KEY2=VALUE]...
Pass a parameter to the snakemake pipeline, where the following keys are available, defaults are stated as standard. (Longer explanation in paranthesis.)
- input_genomes="*.fna *.fa *.fasta *.fas" (Path to input genomes.)
- annotator="prokka" (Choice of annotation tool. Alternatively "bakta".)
- mlst_scheme="automatic" (Choice of mlst scheme. See tseemann/mlst documentation for options.)
--until RULE [RULE2]...
Select to run up until and including a specific rule in the rule graph. Available rules:
abricate annotate assembly_stats bakta bakta_download busco busco_download checkm2 checkm2_download copy dbcan dbcan_download diamond_kegg eggnog eggnog_download fasttree gapseq gapseq_find gtdb_download gtdbtk interproscan iqtree kegg_pathway mashtree mlst prokka sequence_lengths snp_dists
There are also a number of pseudo rules, effectively "shortcuts" to a list of rules.
- downloads (Run rules that download and setup up necessary databases.)
- fast (Only rules that complete within a few seconds. Useful for testing.)
- isolate (Only rules that are relevant for genomes of isolate origin.)
- meta (Only rules that are relevant for genomes "MAGs" of metagenomic origin.)
- report (Re-renders the report.)
--forcerun RULE [RULE2]...
Force rerunning of one or more rules that already have been completed. This is generally necessary when changing running parameters in the config (see "--config" above).
--dry-run
Run a "dry run": Shows what will run without doing it.
--version, -v
Show current version: "v{__version__}"
--help, -h
Show this help and exit.
ENVIRONMENT VARIABLES
No environment variables are strictly necessary to set, but the following might be useful:
ASSCOM2_PROFILE (default "profile/apptainer/local") specifies which of the Snakemake profiles to use. This can be useful for running Assemblycomparator2 on a HPC or using specific settings on a large workstation. Check out the bundled profiles in path profile/* (possibly in $CONDA_PREFIX/assemblycomparator2/profile/*).
ASSCOM2_DATABASES (default "databases/") specifies a database location. Useful when sharing a database installation between various users on the same workstation or HPC.
OUTPUT
Creates a directory named "results_ac2/" that contains all of the analysis results that are computed.
EXAMPLES
# Run all analyses with specified input genomes.
asscom2 --config input_genomes="path/to/genomes_*.fna"
# Run a "dry run".
asscom2 --config input_genomes="path/to/genomes_*.fna" --dry-run
# Specify annotator.
asscom2 --config input_genomes="path/to/genomes_*.fna" annotator="prokka"
# Run only the "fast" rules.
asscom2 --config input_genomes="path/to/genomes_*.fna" annotator="prokka" --until fast
# Run panaroo as well.
asscom2 --config input_genomes="path/to/genomes_*.fna" annotator="prokka" --until fast panaroo
LICENSE:
assemblycomparator2 "asscom2" v{__version__} genomes to report pipeline. Copyright (C) 2019-2024 Carl M. Kobel and contributors GNU GPL v3
"""
# Read system variables and use defaults if necessary.
# Sets the ASSCOM2_BASE to the directory of this python script file which (script file) should be neighboring the correct snakefile.
try:
ASSCOM2_BASE = os.environ['ASSCOM2_BASE']
logger.debug(f"ASSCOM2_BASE was already set to \"{ASSCOM2_BASE}\".")
except KeyError as e:
ASSCOM2_BASE = os.path.dirname(os.path.realpath(__file__)) # This seems to work even for symlinks (it gets the physical path.). Update: Maybe that is why the apptainer environment can't find the cwd?
os.environ["ASSCOM2_BASE"] = ASSCOM2_BASE
logger.debug(f"ASSCOM2_BASE was not set and has been defaulted to \"{ASSCOM2_BASE}\".")
# Defines the ASSCOM2_PROFILE relative to the ASSCOM2_BASE if not set already.
# Detects and prioritizes to use apptainer if it exists.
try:
ASSCOM2_PROFILE = os.environ['ASSCOM2_PROFILE']
logger.debug(f"ASSCOM2_PROFILE was already set to \"{ASSCOM2_PROFILE}\".")
except KeyError as e: # The profile has not been set.
# We might check whether apptainer is present. If it is we will use it, if not, we'll use conda.
if shutil.which("apptainer") is not None: # Apptainer exists, use it.
ASSCOM2_PROFILE = f"{ASSCOM2_BASE}/profile/apptainer/local"
logger.debug(f"Using apptainer.")
else:
ASSCOM2_PROFILE = f"{ASSCOM2_BASE}/profile/conda/local"
logger.debug(f"Using conda.")
os.environ["ASSCOM2_PROFILE"] = ASSCOM2_PROFILE
logger.debug(f"ASSCOM2_PROFILE was not set and has been defaulted to \"{ASSCOM2_PROFILE}\".")
# Defines the ASSCOM2_DATABASES relative to the ASSCOM2_BASE if not set already.
try:
ASSCOM2_DATABASES = os.environ['ASSCOM2_DATABASES']
logger.debug(f"ASSCOM2_DATABASES was already set to \"{ASSCOM2_DATABASES}\".")
except KeyError as e:
ASSCOM2_DATABASES = f"{ASSCOM2_BASE}/databases"
os.environ["ASSCOM2_DATABASES"] = ASSCOM2_DATABASES
logger.debug(f"ASSCOM2_DATABASES was not set and has been defaulted to \"{ASSCOM2_DATABASES}\".")
# Then call the pipeline (alias) using the variables that have just been set, including the command line parameters that have been passed to this script. As we have already saved the new environment variables with os.environ(), we can just call as is.
# I will have to check whether this will work for both conda and apptainer based installations? It should. In the test on bioconda I should use conda (not apptainer).
# Concatenate the trailing command line arguments to add to the snakemake command.
trailing_arguments = sys.argv[1:]
trailing_arguments_concatenated = " ".join(trailing_arguments)
logger.debug(f"The concatenated trailing arguments are {trailing_arguments_concatenated}")
command_main = f"snakemake --snakefile \"{ASSCOM2_BASE}\"/workflow/Snakefile --profile \"{ASSCOM2_PROFILE}\" --configfile \"{ASSCOM2_BASE}\"/config/config.yaml " + trailing_arguments_concatenated
logger.debug(f"Command to run is\n{command_main}")
command_report = f"""snakemake --snakefile "$ASSCOM2_BASE/dynamic_report/workflow/Snakefile" --profile "$ASSCOM2_PROFILE" --configfile=.report_config.yaml"""
# Finally, run the pipeline.
if "--version" in trailing_arguments or "-v" in trailing_arguments:
print(f"assemblycomparator2 v{__version__}")
elif "--help" in trailing_arguments or "-h" in trailing_arguments:
print(help_message)
else:
# Run assemblycomparator2 main pipeline
process_main = subprocess.run(command_main, shell = True)
returncode_main = process_main.returncode
# Run dynamic report pipeline
# Only run report if there is metadata and something is worth rendering.
# if os.path.isfile("{output_directory}/metadata.tsv") and os.path.isfile("{output_directory}/.asscom2_void_report.flag"): # Should be moved to the report pipeline itself.
if os.path.isfile(".report_config.yaml"):
process_report = subprocess.run(command_report, shell = True)
returncode_report = process_report.returncode
else:
print("Pipeline: .report_config.yaml has not yet been created.") # This happens when doing a dry-run in a new dir.
# Exit with the return code of the main pipeline.
sys.exit(returncode_main)