Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tibanna instance type error with snakemake #382

Open
Bioinf-usr opened this issue Jan 27, 2023 · 29 comments
Open

Tibanna instance type error with snakemake #382

Bioinf-usr opened this issue Jan 27, 2023 · 29 comments

Comments

@Bioinf-usr
Copy link

Bioinf-usr commented Jan 27, 2023

Hi,

I am trying to use tibanna to launch snakemake workflows on AWS. However, I am constantly running into an error (as I can see on the cloud watch)

{ "error": "ClientError", "cause": { "errorMessage": "An error occurred (InvalidInstanceType) when calling the DescribeInstanceTypes operation: The following supplied instance types do not exist: [m1.medium, m3.medium, t4g.medium.search]", "errorType": "ClientError", "stackTrace": [ " File \"/var/task/service.py\", line 20, in handler\n return run_task(event)\n", " File \"/var/task/tibanna/run_task.py\", line 63, in run_task\n execution = Execution(input_json)\n", " File \"/var/task/tibanna/ec2_utils.py\", line 374, in __init__\n self.create_instance_type_list()\n", " File \"/var/task/tibanna/ec2_utils.py\", line 426, in create_instance_type_list\n results = ec2.describe_instance_types(\n", " File \"/var/task/botocore/client.py\", line 530, in _api_call\n return self._make_api_call(operation_name, kwargs)\n", " File \"/var/task/botocore/client.py\", line 960, in _make_api_call\n raise error_class(parsed_response, operation_name)\n" ] } }

Here are my versions:
snakemake version: 7.20.0
tibanna version: 3.1.0
Python version: 3.7.12

Below is the command used:

snakemake --tibanna --tibanna-config spot_instance=true behavior_on_capacity_limit=retry_without_spot instance_type=t4g.medium.search availability_zone=ap-south-1 --default-remote-prefix=<bucketname> -s test.yaml --jobs 1

Could you please let me know if I'm missing something?

Thank you.

@alexander-veit
Copy link
Member

It looks like you are trying to launch one of these instance types: [m1.medium, m3.medium, t4g.medium.search]. Are you sure these are valid? I can't find them here. Could you just try t4g.medium?

@Bioinf-usr
Copy link
Author

Thanks for reverting.

Here is what I tried now

snakemake --tibanna --tibanna-config spot_instance=true behavior_on_capacity_limit=retry_without_spot instance_type=t4g.medium availability_zone=ap-south-1 --default-remote-prefix=<bucketname> -s test.yaml --jobs 1

The error

"errorMessage": "An error occurred (InvalidInstanceType) when calling the DescribeInstanceTypes operation: The following supplied instance types do not exist: [m1.medium, m3.medium]"
Could it be that "m1.medium, m3.medium" are hardcoded somewhere?

@alexander-veit
Copy link
Member

alexander-veit commented Jan 27, 2023

Hm... not in Tibanna. How does your test.yaml look like?

@Bioinf-usr
Copy link
Author

Bioinf-usr commented Jan 27, 2023

Here is what I have

rule a:
    output:
        "test.pdf"
    shell:
        "https://www.cyberciti.biz/files/sticker/sticker_book.pdf -o {output}"

@alexander-veit
Copy link
Member

Looking at the Snakemake docs, do you have a config.json in your folder or anything that specifies instance_type?

@Bioinf-usr
Copy link
Author

No, nothing on my side. I don't have a config file.

@Bioinf-usr
Copy link
Author

Hi,

As a follow-up when I tried to use the api. I get this following error.

Traceback (most recent call last):
  File "/home/ec2-user/.local/bin/tibanna", line 8, in <module>
    sys.exit(main())
  File "/home/ec2-user/.local/lib/python3.7/site-packages/tibanna/__main__.py", line 580, in main
    subcommandf(*sc_args)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/tibanna/__main__.py", line 449, in log
    top=top, top_latest=top_latest))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 844-845: ordinal not in range(256)

Here is my Snakefile

rule a:
    output:
        "test.pdf"
    retries: 3
    shell:
        "https://www.cyberciti.biz/files/sticker/sticker_book.pdf -o {output}"

Here is my config json

{
  "args": {
    "language": "snakemake",
    "container_image": "snakemake/snakemake",
    "command": "snakemake",
    "snakemake_main_filename": "Snakefile",
    "snakemake_directory_local":"/home/ec2-user",
    "output_S3_bucket": "dummy"
  },
  "config": {
    "instance_type": "t3.micro",
    "ebs_size": 10,
    "EBS_optimized": true,
    "log_bucket": "dummy"
  }

}

Please let me know if it helps.

Thank you.

@Bioinf-usr
Copy link
Author

Hi,

Just wondering if you had a chance to take a look at the errors. Please let me know if you need any further information.

Thank you.

@SooLee
Copy link
Member

SooLee commented Jan 31, 2023

Hi @Bioinf-usr, your shell command doesn't look executable: https://www.cyberciti.biz/files/sticker/sticker_book.pdf -o {output} You might have missed a binary or something in the command?

@Bioinf-usr
Copy link
Author

Hi,

Thanks you are right, that was an issue but even after fixing it. I am running into another error while using the api. I used the following command to get the error log.

tibanna log --job-id=4Gh64ovZcaXq

Below is the error.

Error: you need to specify the maximum number of CPU cores to be used at the same time. If you want to use N cores, say --cores N or -cN. For all cores on your system (be sure that this is appropriate) use --cores all. For no parallelization use --cores 1 or -c1. <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>

Here is my command

API().run_workflow(input_json="test.json")

My test.json

{
  "args": {
    "language": "snakemake",
    "container_image": "snakemake/snakemake",
    "command": "snakemake",
    "snakemake_main_filename": "Snakefile",
    "snakemake_directory_local":"/home/ec2-user",
    "output_S3_bucket": "dummy"
  },
  "config": {
    "instance_type": "t3.micro",
    "ebs_size": 10,
    "EBS_optimized": true,
    "log_bucket": "dummy",
    "cores": 1
  }

}

Here is my snakefile

rule a:
    output:
        "test.pdf"
    retries: 3
    shell:
        "wget https://www.cyberciti.biz/files/sticker/sticker_book.pdf -o {output}"

Hope it helps. Please note that this is only for the api. The problem with using snakemake as a standalone is still the same.

Thank you.

@SooLee
Copy link
Member

SooLee commented Feb 1, 2023

Try putting an old tag to "container_image": "snakemake/snakemake", e.g. "container_image": "snakemake/snakemake:v6.1.0" (or some other version) - this may be related to a newer version of snakemake.

@nhartwic
Copy link

I'm encountering a similar error. It seems like snakemake/tibanna is trying to use an instance_type that doesn't exist. It isn't obvious where that specific instance_type is coming from though. Relevant messages from cloudwatch logs below

[tibanna.ec2_utils] DEBUG: 23-02-27 20:01:16 - self.cfg.as_dict() = {
    "run_name": "snakemake-job-frADkYJMPBna-group-racon.ecoli.v4.miniasm",
    "mem": 234.375,
    "cpu": 32,
    "ebs_size": 3907,
    "log_bucket": "salk-tm-logs",
    "root_ebs_size": 32,
    "availability_zone": "us-west-2a",
    "use_benchmark": False,
    "instance_type": "",
    "EBS_optimized": False,
    "ebs_iops": "",
    "ebs_throughput": "",
    "password": "",
    "key_name": "",
    "spot_duration": "",
    "security_group": "",
    "subnet": "",
    "ebs_type": "gp3",
    "shutdown_min": "now",
    "spot_instance": False,
    "behavior_on_capacity_limit": "fail",
    "cloudwatch_dashboard": False,
    "public_postrun_json": False,
    "encrypt_s3_upload": False,
    "awsf_image": "4dndcic/tibanna-awsf:3.1.0",
    "mem_as_is": False,
    "ebs_size_as_is": False,
    "ami_id": "",
    "ami_per_region": {
        "x86": {
            "us-east-1": "ami-06e2266f85063aabc",
            "us-east-2": "ami-03a4e3e84b6a1813d",
            "us-west-1": "ami-0c5e8147be760a354",
            "us-west-2": "ami-068589fed9c8d5950",
            "ap-south-1": "ami-05ef59bc4f359c93b",
            "ap-northeast-2": "ami-0d8618a76aece8a8e",
            "ap-southeast-1": "ami-0c22dc3b05714bda1",
            "ap-southeast-2": "ami-03dc109bbf412aac5",
            "ap-northeast-1": "ami-0f4c520515c41ff46",
            "ca-central-1": "ami-01af127710fadfe74",
            "eu-central-1": "ami-0887bcb1c901c1769",
            "eu-west-1": "ami-08db59692e4371ea6",
            "eu-west-2": "ami-036d3ce7a21e07012",
            "eu-west-3": "ami-0cad0ec4160a6b940",
            "eu-north-1": "ami-00a6f0f9fee951aa0",
            "sa-east-1": "ami-0b2164f9680f97099",
            "me-south-1": "ami-03479b7a590f97945",
            "af-south-1": "ami-080baa4ec59c456aa",
            "ap-east-1": "ami-0a9056eb817bc3928",
            "eu-south-1": "ami-0a72279e56849415e"
        },
        "Arm": {
            "us-east-1": "ami-0f3e90ad8e76c7a32",
            "us-east-2": "ami-03359d89f311a015e",
            "us-west-1": "ami-00ffd20b39dbfb6ea",
            "us-west-2": "ami-08ab3015c1bc36d24",
            "ap-south-1": "ami-01af9ec07fed38a38",
            "ap-northeast-2": "ami-0ee2af459355dd917",
            "ap-southeast-1": "ami-0d74dc5af4bf74ea8",
            "ap-southeast-2": "ami-08ab7201c83209fe8",
            "ap-northeast-1": "ami-07227003bfa0565e3",
            "ca-central-1": "ami-0cbf87c80362a058e",
            "eu-central-1": "ami-09cfa59f75e88ad54",
            "eu-west-1": "ami-0804bdeafd8af01f8",
            "eu-west-2": "ami-0db05a333dc02c1c8",
            "eu-west-3": "ami-0ceab436f882fe36a",
            "eu-north-1": "ami-04ba962c974ddd374",
            "sa-east-1": "ami-0fc9a9dec0f3df318",
            "me-south-1": "ami-0211bc858eb163594",
            "af-south-1": "ami-0d6a4af087f83899d",
            "ap-east-1": "ami-0d375f2ce688d16b9",
            "eu-south-1": "ami-0b1db84f31597a70f"
        }
    },
    "script_url": "https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf3/",
    "json_bucket": "salk-tm-logs",
    "language": "snakemake",
    "job_tag": ""
}
[DEBUG]	2023-02-27T20: 01: 16.522Z	342b9645-c2d3-48d4-82f9-0e0e47bd99da	self.cfg.as_dict() = {
    "run_name": "snakemake-job-frADkYJMPBna-group-racon.ecoli.v4.miniasm",
    "mem": 234.375,
    "cpu": 32,
    "ebs_size": 3907,
    "log_bucket": "salk-tm-logs",
    "root_ebs_size": 32,
    "availability_zone": "us-west-2a",
    "use_benchmark": False,
    "instance_type": "",
    "EBS_optimized": False,
    "ebs_iops": "",
    "ebs_throughput": "",
    "password": "",
    "key_name": "",
    "spot_duration": "",
    "security_group": "",
    "subnet": "",
    "ebs_type": "gp3",
    "shutdown_min": "now",
    "spot_instance": False,
    "behavior_on_capacity_limit": "fail",
    "cloudwatch_dashboard": False,
    "public_postrun_json": False,
    "encrypt_s3_upload": False,
    "awsf_image": "4dndcic/tibanna-awsf:3.1.0",
    "mem_as_is": False,
    "ebs_size_as_is": False,
    "ami_id": "",
    "ami_per_region": {
        "x86": {
            "us-east-1": "ami-06e2266f85063aabc",
            "us-east-2": "ami-03a4e3e84b6a1813d",
            "us-west-1": "ami-0c5e8147be760a354",
            "us-west-2": "ami-068589fed9c8d5950",
            "ap-south-1": "ami-05ef59bc4f359c93b",
            "ap-northeast-2": "ami-0d8618a76aece8a8e",
            "ap-southeast-1": "ami-0c22dc3b05714bda1",
            "ap-southeast-2": "ami-03dc109bbf412aac5",
            "ap-northeast-1": "ami-0f4c520515c41ff46",
            "ca-central-1": "ami-01af127710fadfe74",
            "eu-central-1": "ami-0887bcb1c901c1769",
            "eu-west-1": "ami-08db59692e4371ea6",
            "eu-west-2": "ami-036d3ce7a21e07012",
            "eu-west-3": "ami-0cad0ec4160a6b940",
            "eu-north-1": "ami-00a6f0f9fee951aa0",
            "sa-east-1": "ami-0b2164f9680f97099",
            "me-south-1": "ami-03479b7a590f97945",
            "af-south-1": "ami-080baa4ec59c456aa",
            "ap-east-1": "ami-0a9056eb817bc3928",
            "eu-south-1": "ami-0a72279e56849415e"
        },
        "Arm": {
            "us-east-1": "ami-0f3e90ad8e76c7a32",
            "us-east-2": "ami-03359d89f311a015e",
            "us-west-1": "ami-00ffd20b39dbfb6ea",
            "us-west-2": "ami-08ab3015c1bc36d24",
            "ap-south-1": "ami-01af9ec07fed38a38",
            "ap-northeast-2": "ami-0ee2af459355dd917",
            "ap-southeast-1": "ami-0d74dc5af4bf74ea8",
            "ap-southeast-2": "ami-08ab7201c83209fe8",
            "ap-northeast-1": "ami-07227003bfa0565e3",
            "ca-central-1": "ami-0cbf87c80362a058e",
            "eu-central-1": "ami-09cfa59f75e88ad54",
            "eu-west-1": "ami-0804bdeafd8af01f8",
            "eu-west-2": "ami-0db05a333dc02c1c8",
            "eu-west-3": "ami-0ceab436f882fe36a",
            "eu-north-1": "ami-04ba962c974ddd374",
            "sa-east-1": "ami-0fc9a9dec0f3df318",
            "me-south-1": "ami-0211bc858eb163594",
            "af-south-1": "ami-0d6a4af087f83899d",
            "ap-east-1": "ami-0d375f2ce688d16b9",
            "eu-south-1": "ami-0b1db84f31597a70f"
        }
    },
    "script_url": "https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf3/",
    "json_bucket": "salk-tm-logs",
    "language": "snakemake",
    "job_tag": ""
}
[ERROR] ClientError: An error occurred (InvalidInstanceType) when calling the DescribeInstanceTypes operation: The following supplied instance types do not exist: [cr1.8xlarge]
Traceback (most recent call last):
  File "/var/task/service.py", line 20, in handler
    return run_task(event)
  File "/var/task/tibanna/run_task.py", line 63, in run_task
    execution = Execution(input_json)
  File "/var/task/tibanna/ec2_utils.py", line 374, in __init__
    self.create_instance_type_list()
  File "/var/task/tibanna/ec2_utils.py", line 426, in create_instance_type_list
    results = ec2.describe_instance_types(
  File "/var/task/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/task/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)

...It sure doesn't seem like "cr1.8xlarge" is coming from snakemake, as it isn't in any of the configs, but I can't see where it would be coming from within Tibanna either.

@nhartwic
Copy link

Attempting to manually set instance_type didn't work. Relevant cloudwatch log messages below....

[tibanna.ec2_utils
] DEBUG: 23-02-27 20: 52: 23 - self.cfg.as_dict() = {
    "run_name": "snakemake-job-nEyH0pfXEE8l-group-racon.ecoli.v4.miniasm",
    "mem": 234.375,
    "cpu": 32,
    "ebs_size": 3907,
    "log_bucket": "salk-tm-logs",
    "root_ebs_size": 32,
    "availability_zone": "us-west-2",
    "instance_type": "m5a.4xlarge",
    "use_benchmark": False,
    "EBS_optimized": False,
    "ebs_iops": "",
    "ebs_throughput": "",
    "password": "",
    "key_name": "",
    "spot_duration": "",
    "security_group": "",
    "subnet": "",
    "ebs_type": "gp3",
    "shutdown_min": "now",
    "spot_instance": False,
    "behavior_on_capacity_limit": "fail",
    "cloudwatch_dashboard": False,
    "public_postrun_json": False,
    "encrypt_s3_upload": False,
    "awsf_image": "4dndcic/tibanna-awsf:3.1.0",
    "mem_as_is": False,
    "ebs_size_as_is": False,
    "ami_id": "",
    "ami_per_region": {
        "x86": {
            "us-east-1": "ami-06e2266f85063aabc",
            "us-east-2": "ami-03a4e3e84b6a1813d",
            "us-west-1": "ami-0c5e8147be760a354",
            "us-west-2": "ami-068589fed9c8d5950",
            "ap-south-1": "ami-05ef59bc4f359c93b",
            "ap-northeast-2": "ami-0d8618a76aece8a8e",
            "ap-southeast-1": "ami-0c22dc3b05714bda1",
            "ap-southeast-2": "ami-03dc109bbf412aac5",
            "ap-northeast-1": "ami-0f4c520515c41ff46",
            "ca-central-1": "ami-01af127710fadfe74",
            "eu-central-1": "ami-0887bcb1c901c1769",
            "eu-west-1": "ami-08db59692e4371ea6",
            "eu-west-2": "ami-036d3ce7a21e07012",
            "eu-west-3": "ami-0cad0ec4160a6b940",
            "eu-north-1": "ami-00a6f0f9fee951aa0",
            "sa-east-1": "ami-0b2164f9680f97099",
            "me-south-1": "ami-03479b7a590f97945",
            "af-south-1": "ami-080baa4ec59c456aa",
            "ap-east-1": "ami-0a9056eb817bc3928",
            "eu-south-1": "ami-0a72279e56849415e"
        },
        "Arm": {
            "us-east-1": "ami-0f3e90ad8e76c7a32",
            "us-east-2": "ami-03359d89f311a015e",
            "us-west-1": "ami-00ffd20b39dbfb6ea",
            "us-west-2": "ami-08ab3015c1bc36d24",
            "ap-south-1": "ami-01af9ec07fed38a38",
            "ap-northeast-2": "ami-0ee2af459355dd917",
            "ap-southeast-1": "ami-0d74dc5af4bf74ea8",
            "ap-southeast-2": "ami-08ab7201c83209fe8",
            "ap-northeast-1": "ami-07227003bfa0565e3",
            "ca-central-1": "ami-0cbf87c80362a058e",
            "eu-central-1": "ami-09cfa59f75e88ad54",
            "eu-west-1": "ami-0804bdeafd8af01f8",
            "eu-west-2": "ami-0db05a333dc02c1c8",
            "eu-west-3": "ami-0ceab436f882fe36a",
            "eu-north-1": "ami-04ba962c974ddd374",
            "sa-east-1": "ami-0fc9a9dec0f3df318",
            "me-south-1": "ami-0211bc858eb163594",
            "af-south-1": "ami-0d6a4af087f83899d",
            "ap-east-1": "ami-0d375f2ce688d16b9",
            "eu-south-1": "ami-0b1db84f31597a70f"
        }
    },
    "script_url": "https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf3/",
    "json_bucket": "salk-tm-logs",
    "language": "snakemake",
    "job_tag": ""
}
[DEBUG
]	2023-02-27T20: 52: 23.936Z	25305e82-9267-4340-9d7d-ec3cc9769018	self.cfg.as_dict() = {
    "run_name": "snakemake-job-nEyH0pfXEE8l-group-racon.ecoli.v4.miniasm",
    "mem": 234.375,
    "cpu": 32,
    "ebs_size": 3907,
    "log_bucket": "salk-tm-logs",
    "root_ebs_size": 32,
    "availability_zone": "us-west-2",
    "instance_type": "m5a.4xlarge",
    "use_benchmark": False,
    "EBS_optimized": False,
    "ebs_iops": "",
    "ebs_throughput": "",
    "password": "",
    "key_name": "",
    "spot_duration": "",
    "security_group": "",
    "subnet": "",
    "ebs_type": "gp3",
    "shutdown_min": "now",
    "spot_instance": False,
    "behavior_on_capacity_limit": "fail",
    "cloudwatch_dashboard": False,
    "public_postrun_json": False,
    "encrypt_s3_upload": False,
    "awsf_image": "4dndcic/tibanna-awsf:3.1.0",
    "mem_as_is": False,
    "ebs_size_as_is": False,
    "ami_id": "",
    "ami_per_region": {
        "x86": {
            "us-east-1": "ami-06e2266f85063aabc",
            "us-east-2": "ami-03a4e3e84b6a1813d",
            "us-west-1": "ami-0c5e8147be760a354",
            "us-west-2": "ami-068589fed9c8d5950",
            "ap-south-1": "ami-05ef59bc4f359c93b",
            "ap-northeast-2": "ami-0d8618a76aece8a8e",
            "ap-southeast-1": "ami-0c22dc3b05714bda1",
            "ap-southeast-2": "ami-03dc109bbf412aac5",
            "ap-northeast-1": "ami-0f4c520515c41ff46",
            "ca-central-1": "ami-01af127710fadfe74",
            "eu-central-1": "ami-0887bcb1c901c1769",
            "eu-west-1": "ami-08db59692e4371ea6",
            "eu-west-2": "ami-036d3ce7a21e07012",
            "eu-west-3": "ami-0cad0ec4160a6b940",
            "eu-north-1": "ami-00a6f0f9fee951aa0",
            "sa-east-1": "ami-0b2164f9680f97099",
            "me-south-1": "ami-03479b7a590f97945",
            "af-south-1": "ami-080baa4ec59c456aa",
            "ap-east-1": "ami-0a9056eb817bc3928",
            "eu-south-1": "ami-0a72279e56849415e"
        },
        "Arm": {
            "us-east-1": "ami-0f3e90ad8e76c7a32",
            "us-east-2": "ami-03359d89f311a015e",
            "us-west-1": "ami-00ffd20b39dbfb6ea",
            "us-west-2": "ami-08ab3015c1bc36d24",
            "ap-south-1": "ami-01af9ec07fed38a38",
            "ap-northeast-2": "ami-0ee2af459355dd917",
            "ap-southeast-1": "ami-0d74dc5af4bf74ea8",
            "ap-southeast-2": "ami-08ab7201c83209fe8",
            "ap-northeast-1": "ami-07227003bfa0565e3",
            "ca-central-1": "ami-0cbf87c80362a058e",
            "eu-central-1": "ami-09cfa59f75e88ad54",
            "eu-west-1": "ami-0804bdeafd8af01f8",
            "eu-west-2": "ami-0db05a333dc02c1c8",
            "eu-west-3": "ami-0ceab436f882fe36a",
            "eu-north-1": "ami-04ba962c974ddd374",
            "sa-east-1": "ami-0fc9a9dec0f3df318",
            "me-south-1": "ami-0211bc858eb163594",
            "af-south-1": "ami-0d6a4af087f83899d",
            "ap-east-1": "ami-0d375f2ce688d16b9",
            "eu-south-1": "ami-0b1db84f31597a70f"
        }
    },
    "script_url": "https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf3/",
    "json_bucket": "salk-tm-logs",
    "language": "snakemake",
    "job_tag": ""
}
[ERROR] ClientError: An error occurred (InvalidInstanceType) when calling the DescribeInstanceTypes operation: The following supplied instance types do not exist: [cr1.8xlarge]
Traceback (most recent call last):
  File "/var/task/service.py", line 20, in handler
    return run_task(event)
  File "/var/task/tibanna/run_task.py", line 63, in run_task
    execution = Execution(input_json)
  File "/var/task/tibanna/ec2_utils.py", line 374, in __init__
    self.create_instance_type_list()
  File "/var/task/tibanna/ec2_utils.py", line 426, in create_instance_type_list
    results = ec2.describe_instance_types(
  File "/var/task/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/task/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)

@alexander-veit
Copy link
Member

Hm... Could you remove

"mem": 234.375,
"cpu": 32,

from the input and only specify the instance type? Also, make sure the instance type you choose actually exists in us-west-2 (not all instance types are available in all regions).

@nhartwic
Copy link

The config is being automatically generated by snakemake. I can't remove those fields.

Mostly I want to know where "cr1.8xlarge" is coming from. Is it snakemake or tibanna that is trying to match ec2 type based on resources?

@alexander-veit
Copy link
Member

alexander-veit commented Feb 27, 2023

I see. It's probably coming from here.

Since mem and cpu is set, a suitable instance type is determined by our Benchmark package. It hasn't been updated in quite a while and it probably does not take the region into account where you are trying to run the job. Therefore, it might suggest instances that are not available in your region, which would cause the error you are seeing.

I will bring this up internally. This is certainly something we need to look at.

@nhartwic
Copy link

nhartwic commented Feb 27, 2023

It may be worth clarifying that "cr1.8xlarge" does not appear to be an instance type anymore. It doesn't exist in any of the regions I've checked and Amazon now lists it as a "previous generation instance"

After poking around, looks like this is the csv that needs to be updated...

https://github.com/SooLee/Benchmark/blob/master/Benchmark/aws/Amazon%20EC2%20Instance%20Comparison.csv

@alexander-veit
Copy link
Member

Yeah, I briefly looked at it. Definitely needs an update. Thanks for bringing the issue to our attention.

@Bioinf-usr
Copy link
Author

Awesome!! great to see this issue being addressed. Would be happy to do some debugging if needed.

Thanks!!

@alexander-veit
Copy link
Member

Please do not use version 3.0.0 or 3.1.0. We identified a critical bug that can cause inflated costs when running spot. We are working on a solution.

@alexander-veit
Copy link
Member

Please use v3.2.1 (or higher) from now on. We are looking at the outdated instance types next.

@nhartwic
Copy link

nhartwic commented Mar 9, 2023

Can I also recommend that when instance_type is set, tibanna should skip trying to automatically determine instance type? This current behavior strikes me as counterintuitive and undesirable.

@alexander-veit
Copy link
Member

Version 3.3.0 should fix this issue. Furthermore, when instance_type is set, Tibanna will use only that instance type for the workflow.

@nhartwic
Copy link

I've upgraded to 3.3 and think the update has resolved all outstanding issues. I'm not the person who opened this issue, but I'd call it closed.

@trahsemaj
Copy link

I am running on 4.0.0 and running into the same issue described above, an outdated CSV (even in the latest benchmark release Benchmark-4dn-0.5.23. Using snakemake 7.3.1, with the --tibanna option.
My pipeline has pretty diverse resource needs for different rules, so setting a single instance_type for all steps is not a viable option.
My current workaround is to manually play with the mem_gb and threads until a valid instance type is selected, but that doesn't seem ideal.

@alexander-veit
Copy link
Member

Which instance type that is causing issues?

@trahsemaj
Copy link

trahsemaj commented Oct 4, 2023

from the run_task_awsem_* cloudwatch logs:

[ERROR] ClientError: An error occurred (InvalidInstanceType) when calling the DescribeInstanceTypes operation: The following supplied instance types do not exist: [r6a.xlarge, r6id.xlarge]
Traceback (most recent call last):
  File "/var/task/service.py", line 20, in handler
    return run_task(event)
  File "/var/task/tibanna/run_task.py", line 63, in run_task
    execution = Execution(input_json)
  File "/var/task/tibanna/ec2_utils.py", line 375, in init
    self.create_instance_type_list()
  File "/var/task/tibanna/ec2_utils.py", line 427, in create_instance_type_list
    results = ec2.describe_instance_types(
  File "/var/task/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/task/botocore/client.py", line 980, in _make_api_call
    raise error_class(parsed_response, operation_name)

fwiw, the file on the benchmark github (https://github.com/SooLee/Benchmark/blob/master/Benchmark/aws/Amazon%20EC2%20Instance%20Comparison.csv) does seem to be up-to-date (at least seems to contain valid instance types), but the package version with Benchmark-4dn-0.5.23 contains an outdated version. Unsure how/if this is tweakable without manually uploading my own lambda function with a corrected version of this file.
I see r6g* and r6i* instance types available, but not r6a* or r6id* instances in my region.

@alexander-veit
Copy link
Member

I think the Benchmark-4dn-0.5.23 list is fine but the problem is that this list is not region specific. It returns instance types that are valid in us-east-1. Currently, Tibanna does not cross check what's actually available in active region and just takes the list from Benchmark. This certainly needs to be improved. I will add it to my todo list.

@trahsemaj
Copy link

ah, great to know, might consider migrating to us-east-1 if that list will be kept up-to-date. Maybe this is an issue better raised in Benchmark, but could imagine a fix might take some adjustments to both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants