Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--default-resources combined with partial resource specification causes TypeError #1550

Closed
pvandyken opened this issue Mar 31, 2022 · 3 comments · Fixed by #1552
Closed

--default-resources combined with partial resource specification causes TypeError #1550

pvandyken opened this issue Mar 31, 2022 · 3 comments · Fixed by #1552
Assignees
Labels
bug Something isn't working

Comments

@pvandyken
Copy link
Contributor

Snakemake version

7.3.5+0.g4f2258ff.dirty

Describe the bug

When running snakemake using --default-resources, but without specifying mem_mb so that the Snakemake-defined default of mem_mb=max(2*input.size_mb, 1000) kicks in, along with the --resources flag with mem_mb, if a root rule specifies a mem_mb requirement under its resources directive and a downstream rule doesn't, the downstream rule will error out with

TypeError: '>' not supported between instances of 'TBDString' and 'int'

Logs

Minimal example

# Snakefile
rule all:
    input:
        "a.out"

rule a:
    output:
        "a.out"
    resources:
        mem_mb=2000, # Note that resources are specified here, but not in all.
    shell:
        "touch {output}"
# CLI call
snakemake --default-resources --resources mem_mb=4000 -c1 all

Additional context
The last version the above code will work is v6.4.1. The first version the error appears in is v6.8.0. Weirdly, in v6.7,6,5, I get completely different errors, but perhaps those issues have been resolved?

# 6.6,7
TypeError: unsupported operand type(s) for +=: 'int' and 'str'

# 6.5
AttributeError: 'list' object has no attribute 'items'

I'm currently looking into the source if the error, although it obviously stems from the fact that, in the example above, all does not know the size of it's input, so it's mem_mb gets automatically set to "<TBD>". Obviously in previous versions that would get reset when all actually ran, but this isn't happening any more.

@pvandyken pvandyken added the bug Something isn't working label Mar 31, 2022
@pvandyken
Copy link
Contributor Author

Here's the complete trace:

Traceback (most recent call last):
  File "/path/to/snakemake/lib/python3.9/site-packages/snakemake/__init__.py", line 722, in snakemake
    success = workflow.execute(
  File "/path/to/snakemake/lib/python3.9/site-packages/snakemake/workflow.py", line 1099, in execute
    success = self.scheduler.schedule()
  File "/path/to/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 484, in schedule
    run = self.job_selector(needrun)
  File "/path/to/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 638, in job_selector_ilp
    return self.job_selector_greedy(jobs)
  File "/path/to/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 823, in job_selector_greedy
    a = list(map(self.job_weight, jobs))  # resource usage of jobs
  File "/path/to/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 897, in job_weight
    return [
  File "/path/to/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 898, in <listcomp>
    self.calc_resource(name, res.get(name, 0)) for name in self.global_resources
  File "/path/to/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 877, in calc_resource
    if value > gres:
TypeError: '>' not supported between instances of 'TBDString' and 'int'

@pvandyken
Copy link
Contributor Author

pvandyken commented Mar 31, 2022

An immediate fix that DOES work is to modify rules.resources so that every time it's called, it checks for a "<TBD>" in rules._resources. If found, it reevaluates the resources by calling self.rule.expand_resources(...) again.

Happy to make a PR, but this may be too much the "band-aid" fix for a deeper problem. Let me know what you'd like.

EDIT:
With the recent change to always using all Default resources when in Cluster mode (introduced in #1491), this will create significant problems for cluster jobs that use mem_mb. The only current workaround would be to manually set the --default-resources manually to a constant number, at least for mem_mb. So I think fixing this is pretty high priority.

pvandyken added a commit to pvandyken/snakemake that referenced this issue Mar 31, 2022
The new auto `--default-resources` was causing test failures, so we
manually set the default mem_mb to 0.

Ran formatting with black
@johanneskoester
Copy link
Contributor

Thanks a lot, good catch! I have fixed this in above PR. Basically the call to update the resources before scheduling came too late.

@johanneskoester johanneskoester self-assigned this Apr 1, 2022
pvandyken added a commit to pvandyken/snakemake that referenced this issue Jun 3, 2022
The new auto `--default-resources` was causing test failures, so we
manually set the default mem_mb to 0.

Ran formatting with black
pvandyken added a commit to pvandyken/snakemake that referenced this issue Jun 15, 2022
The new auto `--default-resources` was causing test failures, so we
manually set the default mem_mb to 0.

Ran formatting with black
pvandyken added a commit to pvandyken/snakemake that referenced this issue Jun 22, 2022
The new auto `--default-resources` was causing test failures, so we
manually set the default mem_mb to 0.

Ran formatting with black
pvandyken added a commit to pvandyken/snakemake that referenced this issue Jul 6, 2022
The new auto `--default-resources` was causing test failures, so we
manually set the default mem_mb to 0.

Ran formatting with black
pvandyken added a commit to pvandyken/snakemake that referenced this issue Jul 22, 2022
The new auto `--default-resources` was causing test failures, so we
manually set the default mem_mb to 0.

Ran formatting with black
pvandyken added a commit to pvandyken/snakemake that referenced this issue Jul 25, 2022
The new auto `--default-resources` was causing test failures, so we
manually set the default mem_mb to 0.

Ran formatting with black
pvandyken added a commit to pvandyken/snakemake that referenced this issue Jul 26, 2022
The new auto `--default-resources` was causing test failures, so we
manually set the default mem_mb to 0.

Ran formatting with black
johanneskoester pushed a commit that referenced this issue Jul 27, 2022
* Add .python-version to .gitignore

.python-version is used by pyenv to manage local python versions

* Track what specific pipe group each job belongs to

* Adds a pipe_group property to jobs that tracks which pipe group the job belongs to, if any. None if not involved with pipes, otherwise, it equals the CandidateGroup generated when grouping the pipe.

* If the job was not part of a user-defined group, the ordinary group property will still be set to CandidateGroup, as before.

* Toposort puts  pipe jobs at same level

* Jobs connected by a pipe now do not depend on each other, but inherit each other's dependencies. This properly ensures toposort() puts them at the same level.

* Ensure item is correct type for CandidateGroup eq

* __eq__ in CandidateGroup failed when comparing with a different type of object. This corrects this by inserting a type check

* Group resource constraint calculations

Computes group-resource requirements on the cluster by taking
user-provided constraints into account. Jobs that could be run in
parallel will now be run in series if not enough resources (e.g cores or
memory) are available. Runtime is now treated as a special resource, and
will be accurately calculated for parallel jobs via max() or for series
jobs via sum()

Logic is implemented in a series of private methods in GroupJob

Pipe groups are treated as a single unit when calculating resources. Runtime is
summed over pipe groups instead of "maxed". Cores are merged via max()

* Fix sorting for pipe job scheduling.

The toposorting within GroupJob was updated to put all pipe jobs
at the same toposort level so that accurate resource requirements
could be calculated. This broke the scheduler, however, which
relied on pipe jobs within the same group being in an hierarchical
order. This adds an additional toposort to GroupJob.__iter__
specifically for pipe jobs.

This also refactors the toposorting code to a less complex form

* Stop constraining resources for submitted jobs

Add a new property "local_resources" to Job and GroupJob and change all references to
job.resources in the scheduler to job.local_resources. local_resources
is normally identical to resources, unless job.is_local == False (i.e
the job is submitted to a cluster), in which case local_resources will
only include "_cores" and "_nodes".

GroupJob.resources now outputs Resources object instead of dict.

* Add resource constraints to cluster job submission

Include the --resources flag on the snakemake call made in the child
job.

Does not send any runtime information, as this falsly constrains jobs
within the cluster.

Cores is still set to "all" for regular cluster submission, leaving the
cluster to manage how many cores the job is entitled to

* Add workaround for #1550 in tests

The new auto `--default-resources` was causing test failures, so we
manually set the default mem_mb to 0.

Ran formatting with black

* Implement resource scopes for remote jobs

Resources can now be set as local or global, either using the Snakefile
directive "resource_scope" or, with higher priority, the CLI argument
"--set-resource-scopes". The scheduler shall only take local resources
into account on a local job, i.e. a job scheduled on a cluster. Global
resources shall always be taken into account. Group jobs that would
exceed global resource allocations shall not be scheduled until other
jobs have finished.

Note that the global/local distinction only has effect in remote jobs,
i.e. cluster, compute nodes, etc. In local executions, all jobs are
local.

* Update documentation for resource scopes

Add resource scope info and clean up/organize resource section

* Unify toposort functions

Toposort was seperately implemented both in dag and in jobs. This moves
the jobs version into dag. Pipe group jobs are immediately sorted in
dependency order when toposort is calculated, instead of each time
__iter__() is called on a group job

* Ensure resources passed to subcmds are digits

Previously, the check was merely if the resource was an int, but some
int subclasses are not represented as digits, especially bool, which is
represented by the strings True and False. This is corrected by checking
if the stringified int looks like an int (as opposed to `True` or
`False`)

* Remove special merging rules for pipejobs

Also refactors _simple_merge into a more generic format

* Refactor group job calculation into new class

Moves all relevant code into the GroupResources class in the resources
module.

Remove the hardcoding of "runtime" from the logic and isolate it into
two variables: additive_resources, which determines which resources are
summed across layers, and sortby, which determines sorting priority when
layering.

GroupResources is changed a UserDict containing a class method
'basic_layered'. This opens the possibility for new group resource
merging methods to be added later.

* Refactor resource scopes into self-contained class

Moves the logic for resource scope management into a single class

Adds an 'excluded' scope for resources not be sent child snakemake
processes. This undocumented feature will currently be used to prevent
runtime from being sent to cluster jobs

* Pass on resource limits to run-directive jobs

* Make too-much-group-resource error more generic

Remove hardcoded mention of runtime, instead referring to the
additive_resources

Move some of the error logic into it's own function

* Fix group_job_pipe test failures on low core envs

Some environments (like gh actions) don't have enough cores to run the
subprocess in test_group_job_resources_with_pipe. We fix this by
patching RealExecutor.get_job_args, which instructs the executor how
many cores to request

* Refactor basic_layered and add docstring

Common operations have been refactored into seperate methods to simplify
the logic.

* Clarify snakemake version scope feature started

* Add link to scopes explanation in cli help

* Test approp inclusion of resources in submission

Ensure that excluded resources like runtime are not included in the job
submit script
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants