Tibanna instance type error with snakemake #382

Bioinf-usr · 2023-01-27T21:58:21Z

Hi,

I am trying to use tibanna to launch snakemake workflows on AWS. However, I am constantly running into an error (as I can see on the cloud watch)

{ "error": "ClientError", "cause": { "errorMessage": "An error occurred (InvalidInstanceType) when calling the DescribeInstanceTypes operation: The following supplied instance types do not exist: [m1.medium, m3.medium, t4g.medium.search]", "errorType": "ClientError", "stackTrace": [ " File \"/var/task/service.py\", line 20, in handler\n return run_task(event)\n", " File \"/var/task/tibanna/run_task.py\", line 63, in run_task\n execution = Execution(input_json)\n", " File \"/var/task/tibanna/ec2_utils.py\", line 374, in __init__\n self.create_instance_type_list()\n", " File \"/var/task/tibanna/ec2_utils.py\", line 426, in create_instance_type_list\n results = ec2.describe_instance_types(\n", " File \"/var/task/botocore/client.py\", line 530, in _api_call\n return self._make_api_call(operation_name, kwargs)\n", " File \"/var/task/botocore/client.py\", line 960, in _make_api_call\n raise error_class(parsed_response, operation_name)\n" ] } }

Here are my versions:
snakemake version: 7.20.0
tibanna version: 3.1.0
Python version: 3.7.12

Below is the command used:

snakemake --tibanna --tibanna-config spot_instance=true behavior_on_capacity_limit=retry_without_spot instance_type=t4g.medium.search availability_zone=ap-south-1 --default-remote-prefix=<bucketname> -s test.yaml --jobs 1

Could you please let me know if I'm missing something?

Thank you.

The text was updated successfully, but these errors were encountered:

alexander-veit · 2023-01-27T22:12:14Z

It looks like you are trying to launch one of these instance types: [m1.medium, m3.medium, t4g.medium.search]. Are you sure these are valid? I can't find them here. Could you just try t4g.medium?

Bioinf-usr · 2023-01-27T22:21:11Z

Thanks for reverting.

Here is what I tried now

snakemake --tibanna --tibanna-config spot_instance=true behavior_on_capacity_limit=retry_without_spot instance_type=t4g.medium availability_zone=ap-south-1 --default-remote-prefix=<bucketname> -s test.yaml --jobs 1

The error

"errorMessage": "An error occurred (InvalidInstanceType) when calling the DescribeInstanceTypes operation: The following supplied instance types do not exist: [m1.medium, m3.medium]"
Could it be that "m1.medium, m3.medium" are hardcoded somewhere?

alexander-veit · 2023-01-27T22:41:51Z

Hm... not in Tibanna. How does your test.yaml look like?

Bioinf-usr · 2023-01-27T22:44:38Z

Here is what I have

rule a:
    output:
        "test.pdf"
    shell:
        "https://www.cyberciti.biz/files/sticker/sticker_book.pdf -o {output}"

alexander-veit · 2023-01-27T22:50:27Z

Looking at the Snakemake docs, do you have a config.json in your folder or anything that specifies instance_type?

Bioinf-usr · 2023-01-27T23:04:59Z

No, nothing on my side. I don't have a config file.

Bioinf-usr · 2023-01-28T01:17:43Z

Hi,

As a follow-up when I tried to use the api. I get this following error.

Traceback (most recent call last):
  File "/home/ec2-user/.local/bin/tibanna", line 8, in <module>
    sys.exit(main())
  File "/home/ec2-user/.local/lib/python3.7/site-packages/tibanna/__main__.py", line 580, in main
    subcommandf(*sc_args)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/tibanna/__main__.py", line 449, in log
    top=top, top_latest=top_latest))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 844-845: ordinal not in range(256)

Here is my Snakefile

rule a:
    output:
        "test.pdf"
    retries: 3
    shell:
        "https://www.cyberciti.biz/files/sticker/sticker_book.pdf -o {output}"

Here is my config json

{
  "args": {
    "language": "snakemake",
    "container_image": "snakemake/snakemake",
    "command": "snakemake",
    "snakemake_main_filename": "Snakefile",
    "snakemake_directory_local":"/home/ec2-user",
    "output_S3_bucket": "dummy"
  },
  "config": {
    "instance_type": "t3.micro",
    "ebs_size": 10,
    "EBS_optimized": true,
    "log_bucket": "dummy"
  }

}

Please let me know if it helps.

Thank you.

Bioinf-usr · 2023-01-31T23:19:35Z

Hi,

Just wondering if you had a chance to take a look at the errors. Please let me know if you need any further information.

Thank you.

SooLee · 2023-01-31T23:56:30Z

Hi @Bioinf-usr, your shell command doesn't look executable: https://www.cyberciti.biz/files/sticker/sticker_book.pdf -o {output} You might have missed a binary or something in the command?

Bioinf-usr · 2023-02-01T00:40:02Z

Hi,

Thanks you are right, that was an issue but even after fixing it. I am running into another error while using the api. I used the following command to get the error log.

tibanna log --job-id=4Gh64ovZcaXq

Below is the error.

Error: you need to specify the maximum number of CPU cores to be used at the same time. If you want to use N cores, say --cores N or -cN. For all cores on your system (be sure that this is appropriate) use --cores all. For no parallelization use --cores 1 or -c1. <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>

Here is my command

API().run_workflow(input_json="test.json")

My test.json

{
  "args": {
    "language": "snakemake",
    "container_image": "snakemake/snakemake",
    "command": "snakemake",
    "snakemake_main_filename": "Snakefile",
    "snakemake_directory_local":"/home/ec2-user",
    "output_S3_bucket": "dummy"
  },
  "config": {
    "instance_type": "t3.micro",
    "ebs_size": 10,
    "EBS_optimized": true,
    "log_bucket": "dummy",
    "cores": 1
  }

}

Here is my snakefile

rule a:
    output:
        "test.pdf"
    retries: 3
    shell:
        "wget https://www.cyberciti.biz/files/sticker/sticker_book.pdf -o {output}"

Hope it helps. Please note that this is only for the api. The problem with using snakemake as a standalone is still the same.

Thank you.

SooLee · 2023-02-01T01:15:20Z

Try putting an old tag to "container_image": "snakemake/snakemake", e.g. "container_image": "snakemake/snakemake:v6.1.0" (or some other version) - this may be related to a newer version of snakemake.

nhartwic · 2023-02-27T20:42:38Z

I'm encountering a similar error. It seems like snakemake/tibanna is trying to use an instance_type that doesn't exist. It isn't obvious where that specific instance_type is coming from though. Relevant messages from cloudwatch logs below

[tibanna.ec2_utils] DEBUG: 23-02-27 20:01:16 - self.cfg.as_dict() = {
    "run_name": "snakemake-job-frADkYJMPBna-group-racon.ecoli.v4.miniasm",
    "mem": 234.375,
    "cpu": 32,
    "ebs_size": 3907,
    "log_bucket": "salk-tm-logs",
    "root_ebs_size": 32,
    "availability_zone": "us-west-2a",
    "use_benchmark": False,
    "instance_type": "",
    "EBS_optimized": False,
    "ebs_iops": "",
    "ebs_throughput": "",
    "password": "",
    "key_name": "",
    "spot_duration": "",
    "security_group": "",
    "subnet": "",
    "ebs_type": "gp3",
    "shutdown_min": "now",
    "spot_instance": False,
    "behavior_on_capacity_limit": "fail",
    "cloudwatch_dashboard": False,
    "public_postrun_json": False,
    "encrypt_s3_upload": False,
    "awsf_image": "4dndcic/tibanna-awsf:3.1.0",
    "mem_as_is": False,
    "ebs_size_as_is": False,
    "ami_id": "",
    "ami_per_region": {
        "x86": {
            "us-east-1": "ami-06e2266f85063aabc",
            "us-east-2": "ami-03a4e3e84b6a1813d",
            "us-west-1": "ami-0c5e8147be760a354",
            "us-west-2": "ami-068589fed9c8d5950",
            "ap-south-1": "ami-05ef59bc4f359c93b",
            "ap-northeast-2": "ami-0d8618a76aece8a8e",
            "ap-southeast-1": "ami-0c22dc3b05714bda1",
            "ap-southeast-2": "ami-03dc109bbf412aac5",
            "ap-northeast-1": "ami-0f4c520515c41ff46",
            "ca-central-1": "ami-01af127710fadfe74",
            "eu-central-1": "ami-0887bcb1c901c1769",
            "eu-west-1": "ami-08db59692e4371ea6",
            "eu-west-2": "ami-036d3ce7a21e07012",
            "eu-west-3": "ami-0cad0ec4160a6b940",
            "eu-north-1": "ami-00a6f0f9fee951aa0",
            "sa-east-1": "ami-0b2164f9680f97099",
            "me-south-1": "ami-03479b7a590f97945",
            "af-south-1": "ami-080baa4ec59c456aa",
            "ap-east-1": "ami-0a9056eb817bc3928",
            "eu-south-1": "ami-0a72279e56849415e"
        },
        "Arm": {
            "us-east-1": "ami-0f3e90ad8e76c7a32",
            "us-east-2": "ami-03359d89f311a015e",
            "us-west-1": "ami-00ffd20b39dbfb6ea",
            "us-west-2": "ami-08ab3015c1bc36d24",
            "ap-south-1": "ami-01af9ec07fed38a38",
            "ap-northeast-2": "ami-0ee2af459355dd917",
            "ap-southeast-1": "ami-0d74dc5af4bf74ea8",
            "ap-southeast-2": "ami-08ab7201c83209fe8",
            "ap-northeast-1": "ami-07227003bfa0565e3",
            "ca-central-1": "ami-0cbf87c80362a058e",
            "eu-central-1": "ami-09cfa59f75e88ad54",
            "eu-west-1": "ami-0804bdeafd8af01f8",
            "eu-west-2": "ami-0db05a333dc02c1c8",
            "eu-west-3": "ami-0ceab436f882fe36a",
            "eu-north-1": "ami-04ba962c974ddd374",
            "sa-east-1": "ami-0fc9a9dec0f3df318",
            "me-south-1": "ami-0211bc858eb163594",
            "af-south-1": "ami-0d6a4af087f83899d",
            "ap-east-1": "ami-0d375f2ce688d16b9",
            "eu-south-1": "ami-0b1db84f31597a70f"
        }
    },
    "script_url": "https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf3/",
    "json_bucket": "salk-tm-logs",
    "language": "snakemake",
    "job_tag": ""
}

[DEBUG]	2023-02-27T20: 01: 16.522Z	342b9645-c2d3-48d4-82f9-0e0e47bd99da	self.cfg.as_dict() = {
    "run_name": "snakemake-job-frADkYJMPBna-group-racon.ecoli.v4.miniasm",
    "mem": 234.375,
    "cpu": 32,
    "ebs_size": 3907,
    "log_bucket": "salk-tm-logs",
    "root_ebs_size": 32,
    "availability_zone": "us-west-2a",
    "use_benchmark": False,
    "instance_type": "",
    "EBS_optimized": False,
    "ebs_iops": "",
    "ebs_throughput": "",
    "password": "",
    "key_name": "",
    "spot_duration": "",
    "security_group": "",
    "subnet": "",
    "ebs_type": "gp3",
    "shutdown_min": "now",
    "spot_instance": False,
    "behavior_on_capacity_limit": "fail",
    "cloudwatch_dashboard": False,
    "public_postrun_json": False,
    "encrypt_s3_upload": False,
    "awsf_image": "4dndcic/tibanna-awsf:3.1.0",
    "mem_as_is": False,
    "ebs_size_as_is": False,
    "ami_id": "",
    "ami_per_region": {
        "x86": {
            "us-east-1": "ami-06e2266f85063aabc",
            "us-east-2": "ami-03a4e3e84b6a1813d",
            "us-west-1": "ami-0c5e8147be760a354",
            "us-west-2": "ami-068589fed9c8d5950",
            "ap-south-1": "ami-05ef59bc4f359c93b",
            "ap-northeast-2": "ami-0d8618a76aece8a8e",
            "ap-southeast-1": "ami-0c22dc3b05714bda1",
            "ap-southeast-2": "ami-03dc109bbf412aac5",
            "ap-northeast-1": "ami-0f4c520515c41ff46",
            "ca-central-1": "ami-01af127710fadfe74",
            "eu-central-1": "ami-0887bcb1c901c1769",
            "eu-west-1": "ami-08db59692e4371ea6",
            "eu-west-2": "ami-036d3ce7a21e07012",
            "eu-west-3": "ami-0cad0ec4160a6b940",
            "eu-north-1": "ami-00a6f0f9fee951aa0",
            "sa-east-1": "ami-0b2164f9680f97099",
            "me-south-1": "ami-03479b7a590f97945",
            "af-south-1": "ami-080baa4ec59c456aa",
            "ap-east-1": "ami-0a9056eb817bc3928",
            "eu-south-1": "ami-0a72279e56849415e"
        },
        "Arm": {
            "us-east-1": "ami-0f3e90ad8e76c7a32",
            "us-east-2": "ami-03359d89f311a015e",
            "us-west-1": "ami-00ffd20b39dbfb6ea",
            "us-west-2": "ami-08ab3015c1bc36d24",
            "ap-south-1": "ami-01af9ec07fed38a38",
            "ap-northeast-2": "ami-0ee2af459355dd917",
            "ap-southeast-1": "ami-0d74dc5af4bf74ea8",
            "ap-southeast-2": "ami-08ab7201c83209fe8",
            "ap-northeast-1": "ami-07227003bfa0565e3",
            "ca-central-1": "ami-0cbf87c80362a058e",
            "eu-central-1": "ami-09cfa59f75e88ad54",
            "eu-west-1": "ami-0804bdeafd8af01f8",
            "eu-west-2": "ami-0db05a333dc02c1c8",
            "eu-west-3": "ami-0ceab436f882fe36a",
            "eu-north-1": "ami-04ba962c974ddd374",
            "sa-east-1": "ami-0fc9a9dec0f3df318",
            "me-south-1": "ami-0211bc858eb163594",
            "af-south-1": "ami-0d6a4af087f83899d",
            "ap-east-1": "ami-0d375f2ce688d16b9",
            "eu-south-1": "ami-0b1db84f31597a70f"
        }
    },
    "script_url": "https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf3/",
    "json_bucket": "salk-tm-logs",
    "language": "snakemake",
    "job_tag": ""
}

[ERROR] ClientError: An error occurred (InvalidInstanceType) when calling the DescribeInstanceTypes operation: The following supplied instance types do not exist: [cr1.8xlarge]
Traceback (most recent call last):
  File "/var/task/service.py", line 20, in handler
    return run_task(event)
  File "/var/task/tibanna/run_task.py", line 63, in run_task
    execution = Execution(input_json)
  File "/var/task/tibanna/ec2_utils.py", line 374, in __init__
    self.create_instance_type_list()
  File "/var/task/tibanna/ec2_utils.py", line 426, in create_instance_type_list
    results = ec2.describe_instance_types(
  File "/var/task/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/task/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)

...It sure doesn't seem like "cr1.8xlarge" is coming from snakemake, as it isn't in any of the configs, but I can't see where it would be coming from within Tibanna either.

nhartwic · 2023-02-27T20:54:53Z

Attempting to manually set instance_type didn't work. Relevant cloudwatch log messages below....

[tibanna.ec2_utils
] DEBUG: 23-02-27 20: 52: 23 - self.cfg.as_dict() = {
    "run_name": "snakemake-job-nEyH0pfXEE8l-group-racon.ecoli.v4.miniasm",
    "mem": 234.375,
    "cpu": 32,
    "ebs_size": 3907,
    "log_bucket": "salk-tm-logs",
    "root_ebs_size": 32,
    "availability_zone": "us-west-2",
    "instance_type": "m5a.4xlarge",
    "use_benchmark": False,
    "EBS_optimized": False,
    "ebs_iops": "",
    "ebs_throughput": "",
    "password": "",
    "key_name": "",
    "spot_duration": "",
    "security_group": "",
    "subnet": "",
    "ebs_type": "gp3",
    "shutdown_min": "now",
    "spot_instance": False,
    "behavior_on_capacity_limit": "fail",
    "cloudwatch_dashboard": False,
    "public_postrun_json": False,
    "encrypt_s3_upload": False,
    "awsf_image": "4dndcic/tibanna-awsf:3.1.0",
    "mem_as_is": False,
    "ebs_size_as_is": False,
    "ami_id": "",
    "ami_per_region": {
        "x86": {
            "us-east-1": "ami-06e2266f85063aabc",
            "us-east-2": "ami-03a4e3e84b6a1813d",
            "us-west-1": "ami-0c5e8147be760a354",
            "us-west-2": "ami-068589fed9c8d5950",
            "ap-south-1": "ami-05ef59bc4f359c93b",
            "ap-northeast-2": "ami-0d8618a76aece8a8e",
            "ap-southeast-1": "ami-0c22dc3b05714bda1",
            "ap-southeast-2": "ami-03dc109bbf412aac5",
            "ap-northeast-1": "ami-0f4c520515c41ff46",
            "ca-central-1": "ami-01af127710fadfe74",
            "eu-central-1": "ami-0887bcb1c901c1769",
            "eu-west-1": "ami-08db59692e4371ea6",
            "eu-west-2": "ami-036d3ce7a21e07012",
            "eu-west-3": "ami-0cad0ec4160a6b940",
            "eu-north-1": "ami-00a6f0f9fee951aa0",
            "sa-east-1": "ami-0b2164f9680f97099",
            "me-south-1": "ami-03479b7a590f97945",
            "af-south-1": "ami-080baa4ec59c456aa",
            "ap-east-1": "ami-0a9056eb817bc3928",
            "eu-south-1": "ami-0a72279e56849415e"
        },
        "Arm": {
            "us-east-1": "ami-0f3e90ad8e76c7a32",
            "us-east-2": "ami-03359d89f311a015e",
            "us-west-1": "ami-00ffd20b39dbfb6ea",
            "us-west-2": "ami-08ab3015c1bc36d24",
            "ap-south-1": "ami-01af9ec07fed38a38",
            "ap-northeast-2": "ami-0ee2af459355dd917",
            "ap-southeast-1": "ami-0d74dc5af4bf74ea8",
            "ap-southeast-2": "ami-08ab7201c83209fe8",
            "ap-northeast-1": "ami-07227003bfa0565e3",
            "ca-central-1": "ami-0cbf87c80362a058e",
            "eu-central-1": "ami-09cfa59f75e88ad54",
            "eu-west-1": "ami-0804bdeafd8af01f8",
            "eu-west-2": "ami-0db05a333dc02c1c8",
            "eu-west-3": "ami-0ceab436f882fe36a",
            "eu-north-1": "ami-04ba962c974ddd374",
            "sa-east-1": "ami-0fc9a9dec0f3df318",
            "me-south-1": "ami-0211bc858eb163594",
            "af-south-1": "ami-0d6a4af087f83899d",
            "ap-east-1": "ami-0d375f2ce688d16b9",
            "eu-south-1": "ami-0b1db84f31597a70f"
        }
    },
    "script_url": "https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf3/",
    "json_bucket": "salk-tm-logs",
    "language": "snakemake",
    "job_tag": ""
}

[DEBUG
]	2023-02-27T20: 52: 23.936Z	25305e82-9267-4340-9d7d-ec3cc9769018	self.cfg.as_dict() = {
    "run_name": "snakemake-job-nEyH0pfXEE8l-group-racon.ecoli.v4.miniasm",
    "mem": 234.375,
    "cpu": 32,
    "ebs_size": 3907,
    "log_bucket": "salk-tm-logs",
    "root_ebs_size": 32,
    "availability_zone": "us-west-2",
    "instance_type": "m5a.4xlarge",
    "use_benchmark": False,
    "EBS_optimized": False,
    "ebs_iops": "",
    "ebs_throughput": "",
    "password": "",
    "key_name": "",
    "spot_duration": "",
    "security_group": "",
    "subnet": "",
    "ebs_type": "gp3",
    "shutdown_min": "now",
    "spot_instance": False,
    "behavior_on_capacity_limit": "fail",
    "cloudwatch_dashboard": False,
    "public_postrun_json": False,
    "encrypt_s3_upload": False,
    "awsf_image": "4dndcic/tibanna-awsf:3.1.0",
    "mem_as_is": False,
    "ebs_size_as_is": False,
    "ami_id": "",
    "ami_per_region": {
        "x86": {
            "us-east-1": "ami-06e2266f85063aabc",
            "us-east-2": "ami-03a4e3e84b6a1813d",
            "us-west-1": "ami-0c5e8147be760a354",
            "us-west-2": "ami-068589fed9c8d5950",
            "ap-south-1": "ami-05ef59bc4f359c93b",
            "ap-northeast-2": "ami-0d8618a76aece8a8e",
            "ap-southeast-1": "ami-0c22dc3b05714bda1",
            "ap-southeast-2": "ami-03dc109bbf412aac5",
            "ap-northeast-1": "ami-0f4c520515c41ff46",
            "ca-central-1": "ami-01af127710fadfe74",
            "eu-central-1": "ami-0887bcb1c901c1769",
            "eu-west-1": "ami-08db59692e4371ea6",
            "eu-west-2": "ami-036d3ce7a21e07012",
            "eu-west-3": "ami-0cad0ec4160a6b940",
            "eu-north-1": "ami-00a6f0f9fee951aa0",
            "sa-east-1": "ami-0b2164f9680f97099",
            "me-south-1": "ami-03479b7a590f97945",
            "af-south-1": "ami-080baa4ec59c456aa",
            "ap-east-1": "ami-0a9056eb817bc3928",
            "eu-south-1": "ami-0a72279e56849415e"
        },
        "Arm": {
            "us-east-1": "ami-0f3e90ad8e76c7a32",
            "us-east-2": "ami-03359d89f311a015e",
            "us-west-1": "ami-00ffd20b39dbfb6ea",
            "us-west-2": "ami-08ab3015c1bc36d24",
            "ap-south-1": "ami-01af9ec07fed38a38",
            "ap-northeast-2": "ami-0ee2af459355dd917",
            "ap-southeast-1": "ami-0d74dc5af4bf74ea8",
            "ap-southeast-2": "ami-08ab7201c83209fe8",
            "ap-northeast-1": "ami-07227003bfa0565e3",
            "ca-central-1": "ami-0cbf87c80362a058e",
            "eu-central-1": "ami-09cfa59f75e88ad54",
            "eu-west-1": "ami-0804bdeafd8af01f8",
            "eu-west-2": "ami-0db05a333dc02c1c8",
            "eu-west-3": "ami-0ceab436f882fe36a",
            "eu-north-1": "ami-04ba962c974ddd374",
            "sa-east-1": "ami-0fc9a9dec0f3df318",
            "me-south-1": "ami-0211bc858eb163594",
            "af-south-1": "ami-0d6a4af087f83899d",
            "ap-east-1": "ami-0d375f2ce688d16b9",
            "eu-south-1": "ami-0b1db84f31597a70f"
        }
    },
    "script_url": "https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf3/",
    "json_bucket": "salk-tm-logs",
    "language": "snakemake",
    "job_tag": ""
}

[ERROR] ClientError: An error occurred (InvalidInstanceType) when calling the DescribeInstanceTypes operation: The following supplied instance types do not exist: [cr1.8xlarge]
Traceback (most recent call last):
  File "/var/task/service.py", line 20, in handler
    return run_task(event)
  File "/var/task/tibanna/run_task.py", line 63, in run_task
    execution = Execution(input_json)
  File "/var/task/tibanna/ec2_utils.py", line 374, in __init__
    self.create_instance_type_list()
  File "/var/task/tibanna/ec2_utils.py", line 426, in create_instance_type_list
    results = ec2.describe_instance_types(
  File "/var/task/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/task/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)

alexander-veit · 2023-02-27T21:33:02Z

Hm... Could you remove

"mem": 234.375,
"cpu": 32,

from the input and only specify the instance type? Also, make sure the instance type you choose actually exists in us-west-2 (not all instance types are available in all regions).

nhartwic · 2023-02-27T22:24:08Z

The config is being automatically generated by snakemake. I can't remove those fields.

Mostly I want to know where "cr1.8xlarge" is coming from. Is it snakemake or tibanna that is trying to match ec2 type based on resources?

alexander-veit · 2023-02-27T22:40:28Z

I see. It's probably coming from here.

Since mem and cpu is set, a suitable instance type is determined by our Benchmark package. It hasn't been updated in quite a while and it probably does not take the region into account where you are trying to run the job. Therefore, it might suggest instances that are not available in your region, which would cause the error you are seeing.

I will bring this up internally. This is certainly something we need to look at.

nhartwic · 2023-02-27T22:47:04Z

It may be worth clarifying that "cr1.8xlarge" does not appear to be an instance type anymore. It doesn't exist in any of the regions I've checked and Amazon now lists it as a "previous generation instance"

After poking around, looks like this is the csv that needs to be updated...

https://github.com/SooLee/Benchmark/blob/master/Benchmark/aws/Amazon%20EC2%20Instance%20Comparison.csv

alexander-veit · 2023-02-27T22:59:48Z

Yeah, I briefly looked at it. Definitely needs an update. Thanks for bringing the issue to our attention.

Bioinf-usr · 2023-02-27T23:05:42Z

Awesome!! great to see this issue being addressed. Would be happy to do some debugging if needed.

Thanks!!

alexander-veit · 2023-03-02T14:57:37Z

Please do not use version 3.0.0 or 3.1.0. We identified a critical bug that can cause inflated costs when running spot. We are working on a solution.

alexander-veit · 2023-03-03T16:20:53Z

Please use v3.2.1 (or higher) from now on. We are looking at the outdated instance types next.

nhartwic · 2023-03-09T19:21:16Z

Can I also recommend that when instance_type is set, tibanna should skip trying to automatically determine instance type? This current behavior strikes me as counterintuitive and undesirable.

alexander-veit · 2023-03-23T15:27:07Z

Version 3.3.0 should fix this issue. Furthermore, when instance_type is set, Tibanna will use only that instance type for the workflow.

nhartwic · 2023-03-24T23:28:00Z

I've upgraded to 3.3 and think the update has resolved all outstanding issues. I'm not the person who opened this issue, but I'd call it closed.

trahsemaj · 2023-10-04T19:15:33Z

I am running on 4.0.0 and running into the same issue described above, an outdated CSV (even in the latest benchmark release Benchmark-4dn-0.5.23. Using snakemake 7.3.1, with the --tibanna option.
My pipeline has pretty diverse resource needs for different rules, so setting a single instance_type for all steps is not a viable option.
My current workaround is to manually play with the mem_gb and threads until a valid instance type is selected, but that doesn't seem ideal.

alexander-veit · 2023-10-04T19:21:01Z

Which instance type that is causing issues?

trahsemaj · 2023-10-04T19:47:07Z

from the run_task_awsem_* cloudwatch logs:

[ERROR] ClientError: An error occurred (InvalidInstanceType) when calling the DescribeInstanceTypes operation: The following supplied instance types do not exist: [r6a.xlarge, r6id.xlarge]
Traceback (most recent call last):
  File "/var/task/service.py", line 20, in handler
    return run_task(event)
  File "/var/task/tibanna/run_task.py", line 63, in run_task
    execution = Execution(input_json)
  File "/var/task/tibanna/ec2_utils.py", line 375, in init
    self.create_instance_type_list()
  File "/var/task/tibanna/ec2_utils.py", line 427, in create_instance_type_list
    results = ec2.describe_instance_types(
  File "/var/task/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/task/botocore/client.py", line 980, in _make_api_call
    raise error_class(parsed_response, operation_name)

fwiw, the file on the benchmark github (https://github.com/SooLee/Benchmark/blob/master/Benchmark/aws/Amazon%20EC2%20Instance%20Comparison.csv) does seem to be up-to-date (at least seems to contain valid instance types), but the package version with Benchmark-4dn-0.5.23 contains an outdated version. Unsure how/if this is tweakable without manually uploading my own lambda function with a corrected version of this file.
I see r6g* and r6i* instance types available, but not r6a* or r6id* instances in my region.

alexander-veit · 2023-10-04T20:47:26Z

I think the Benchmark-4dn-0.5.23 list is fine but the problem is that this list is not region specific. It returns instance types that are valid in us-east-1. Currently, Tibanna does not cross check what's actually available in active region and just takes the list from Benchmark. This certainly needs to be improved. I will add it to my todo list.

trahsemaj · 2023-10-04T22:44:32Z

ah, great to know, might consider migrating to us-east-1 if that list will be kept up-to-date. Maybe this is an issue better raised in Benchmark, but could imagine a fix might take some adjustments to both.

nhartwic mentioned this issue Feb 28, 2023

Tibanna instance type error with snakemake snakemake/snakemake#2083

Open

alexander-veit mentioned this issue Mar 21, 2023

Improved fleet error handling + smaller fixes #388

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tibanna instance type error with snakemake #382

Tibanna instance type error with snakemake #382

Bioinf-usr commented Jan 27, 2023 •

edited

alexander-veit commented Jan 27, 2023

Bioinf-usr commented Jan 27, 2023

alexander-veit commented Jan 27, 2023 •

edited

Bioinf-usr commented Jan 27, 2023 •

edited

alexander-veit commented Jan 27, 2023

Bioinf-usr commented Jan 27, 2023

Bioinf-usr commented Jan 28, 2023

Bioinf-usr commented Jan 31, 2023

SooLee commented Jan 31, 2023

Bioinf-usr commented Feb 1, 2023

SooLee commented Feb 1, 2023

nhartwic commented Feb 27, 2023

nhartwic commented Feb 27, 2023

alexander-veit commented Feb 27, 2023

nhartwic commented Feb 27, 2023

alexander-veit commented Feb 27, 2023 •

edited

nhartwic commented Feb 27, 2023 •

edited

alexander-veit commented Feb 27, 2023

Bioinf-usr commented Feb 27, 2023

alexander-veit commented Mar 2, 2023

alexander-veit commented Mar 3, 2023

nhartwic commented Mar 9, 2023

alexander-veit commented Mar 23, 2023

nhartwic commented Mar 24, 2023

trahsemaj commented Oct 4, 2023

alexander-veit commented Oct 4, 2023

trahsemaj commented Oct 4, 2023 •

edited

alexander-veit commented Oct 4, 2023

trahsemaj commented Oct 4, 2023

Tibanna instance type error with snakemake #382

Tibanna instance type error with snakemake #382

Comments

Bioinf-usr commented Jan 27, 2023 • edited

alexander-veit commented Jan 27, 2023

Bioinf-usr commented Jan 27, 2023

alexander-veit commented Jan 27, 2023 • edited

Bioinf-usr commented Jan 27, 2023 • edited

alexander-veit commented Jan 27, 2023

Bioinf-usr commented Jan 27, 2023

Bioinf-usr commented Jan 28, 2023

Bioinf-usr commented Jan 31, 2023

SooLee commented Jan 31, 2023

Bioinf-usr commented Feb 1, 2023

SooLee commented Feb 1, 2023

nhartwic commented Feb 27, 2023

nhartwic commented Feb 27, 2023

alexander-veit commented Feb 27, 2023

nhartwic commented Feb 27, 2023

alexander-veit commented Feb 27, 2023 • edited

nhartwic commented Feb 27, 2023 • edited

alexander-veit commented Feb 27, 2023

Bioinf-usr commented Feb 27, 2023

alexander-veit commented Mar 2, 2023

alexander-veit commented Mar 3, 2023

nhartwic commented Mar 9, 2023

alexander-veit commented Mar 23, 2023

nhartwic commented Mar 24, 2023

trahsemaj commented Oct 4, 2023

alexander-veit commented Oct 4, 2023

trahsemaj commented Oct 4, 2023 • edited

alexander-veit commented Oct 4, 2023

trahsemaj commented Oct 4, 2023

Bioinf-usr commented Jan 27, 2023 •

edited

alexander-veit commented Jan 27, 2023 •

edited

Bioinf-usr commented Jan 27, 2023 •

edited

alexander-veit commented Feb 27, 2023 •

edited

nhartwic commented Feb 27, 2023 •

edited

trahsemaj commented Oct 4, 2023 •

edited