Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hitting vCPU limit #267

Open
Finesim97 opened this issue Feb 16, 2020 · 4 comments
Open

Hitting vCPU limit #267

Finesim97 opened this issue Feb 16, 2020 · 4 comments

Comments

@Finesim97
Copy link

Hi,
While running Tibanna with Snakemake i keep hitting my vCPU limit. While Snakemake supports a job limit for Cluster executions (-j), it doesn't use that for Tibanna. Might this something that Tibanna should consider instead?

@SooLee
Copy link
Member

SooLee commented Feb 16, 2020

At what stage are you getting vCPU limit? Is it on the cloud or locally? Tibanna doesn't use local CPUs to run jobs, it uses them just to submit jobs to the cloud (i.e. 1 CPU would be enough locally). On the cloud, it should have enough CPU if the right instance type was launched.

@Finesim97
Copy link
Author

The first lambda (RunTaskAwsem) in the step function fails.

{
  "errorMessage": "failed to launch instance for job J2sUQPneqq2f: An error occurred 
(VcpuLimitExceeded) when calling the RunInstances operation: You have requested more vCPU
 capacity than your current vCPU limit of 64 allows for the instance bucket that the specified 
instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request 
an adjustment to this limit.",
  "errorType": "Exception",
  "stackTrace": [
    [
      "/var/task/service.py",
      20,
      "handler",
      "return run_task(event)"
    ],
    [
      "/var/task/tibanna/run_task.py",
      64,
      "run_task",
      "execution.launch()"
    ],
    [
      "/var/task/tibanna/ec2_utils.py",
      338,
      "launch",
      "self.instance_id = self.launch_and_get_instance_id()"
    ],
    [
      "/var/task/tibanna/ec2_utils.py",
      480,
      "launch_and_get_instance_id",
      "res = self.ec2_exception_coordinator(self.run_instances)(ec2)"
    ],
    [
      "/var/task/tibanna/ec2_utils.py",
      525,
      "inner",
      "raise Exception(\"failed to launch instance for job %s: %s\" % (self.jobid, str(e)))"
    ]
  ]
}

They almost instantly increased my limit after requesting it, but for a Snakemake workflow with a large number of tasks available at the same time, this will be a problem again.

@SooLee
Copy link
Member

SooLee commented Feb 16, 2020

Ah I see. It was AWS limit. I will try to add some kind of error handling (e. g. waiting) over the next few days. Thanks for reporting again.

@Finesim97
Copy link
Author

No problem and again thank you for your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants