AWS Lambda invoker's performance depends on the Python interpreter #1219

gfinol · 2023-12-15T09:40:15Z

I've noticed an issue with the performance of invocation of AWS Lambda functions. Depending on the python interpreter used, the performance of the invocation of cloud functions changes.

For example, when using the Python 3.10 interpreter of VM in AWS EC2 with Ubuntu 22.04, some AWS Lambda functions start is delayed between 5 and 10 seconds. As can be seen in this plot:

But using the same Python version (3.10.12) from Conda in the same VM, same OS and same AWS account I obtained a much better performance:

Despite the performance improvement when using Conda, there are still almost 50% of functions that take 1 second longer to start, even when in a warmed-up state (see the two last map stages from the previous plot). This behavior is the same for Python 3.8, 3.9, 3.10 and 3.11.

Click to see: Python 3.8 plot (using conda)

Python 3.9 plot (using conda)

Python 3.10 plot (using conda)

Python 3.11 plot (using conda)

But with Python 3.7 the performance is what one would expect to be (almost perfect):

All this previous plots have been generated doing 3 maps of 100 functions that sleep for 5 seconds. This has been executed from a t2.large VM with Ubuntu 22.04 in us-east-1, with all the Lithops default configurations except for the invoke_pool_threads that was set to 128. I have also used the same VM with Amazon Linux 2023 OS and the results are similar to the previous ones using the Conda interpreter (I could upload the plots if requested). I've used the current master branch of Lithops to do this test, but the issue can be reproduced using versions 3.0.0, 3.0.1, 2.9, and also 2.7.1.

Here is the code used:

import time
import lithops

def count_cold_starts(futures):
    cold = 0
    warm = 0
    for future in futures:
        stats = future.stats
        if stats['worker_cold_start']:
            cold += 1
        else:
            warm += 1
    return cold, warm

futures = []
fexec = lithops.FunctionExecutor()
for _ in range(3):
    num_fun = 100

    def my_sleep(x):
        time.sleep(x)
        return num_fun

    f = fexec.map(my_sleep, [5 for _ in range(num_fun)])
    fexec.get_result()
    futures.append(f)

    cold, warm = count_cold_starts(f)

    print(f"cold: {cold}, warm: {warm}")

fexec.plot()

The text was updated successfully, but these errors were encountered:

aitorarjona · 2023-12-15T10:26:28Z

Hi @gfinol , just to make sure this is not an issue with lithops rather than its dependences, could you check the following?:

Use different versions using virtual environments (fresh environment)
Intall only lithops using pip install -U --no-cache-dir lithops
Then do pip freeze and send the results

Thanks

gfinol · 2023-12-15T11:16:09Z

Hi @aitorarjona, here you have the results:

Python 3.10 (Ubuntu 22.04)
Python 3.7 (conda env)
Python 3.8 (conda env)
Python 3.9 (conda env)
Python 3.10 (conda env)
Python 3.11 (conda env)

gfinol · 2023-12-15T11:42:50Z

Also, notice that "Boto3 and Botocore ended support for Python 3.7 on December, 13, 2023". So, the best performance is achieved with a Python version no longer supported.

aitorarjona · 2023-12-15T21:15:35Z

Just to make sure, maybe you could create a 3.11 venv and do pip install -U --no-cache-dir -r coda_py37.txt so it has the same versions as the 3.7 venv, but it mostly seems that there is something regarding Python threads that Lithops or boto3/botocore/urllib3 use that changed from 3.8 onwards.

gfinol · 2023-12-18T12:52:25Z

@aitorarjona I tried to do what you suggested with a 3.11 env, but it failed due to some version incompatibilities between libraries versions and the python version.

But I managed to get it working with 3.10. The results look like the previous ones:

(Note that the certifi requeriment in conda_py37.txt points to a file, that line was removed to install them in python 3.10)

I agree with you that this, at a first glance, looks like a problem with the thread pool used. Not sure how that could be confirmed...

JosepSampe · 2023-12-19T13:28:18Z

I remember that some years ago I changed the invoke method of the lambda backend in order to improve the invocation performance. It was working well then (I think I did it with python3.6), but maybe that solution is not working properly now for newer versions of python (or boto3)

In the aws_lambda.py, can you try commenting the lines 630-653 and uncommenting lines 655-670? this way we will see how the boto3 lib perfoms invoking functions, and if this is the casue of the issue you are experiencing.

gfinol · 2024-01-08T11:03:10Z

@JosepSampe, I've been doing the tests that you suggested. I've executed the tests twice, because the results are worse. Here are the resulting plots:

With the Python 3.10 from the OS in Ubuntu 22.04 from the official AMI in AWS EC2:

Using the interpreter from conda, Python 3.10:

And using Python 3.7 with conda:

In general, the performance is worse. For example, we can have a look to the invocations using python 3.7: In this recent plot, the invocations in the second and third map are delayed 1 o 1.5 seconds. But in the original plots, the invocations were almost perfect.

I leave here the plots for the other python versions with conda:

Python 3.8 conda

Python 3.9 conda

Python 3.11 conda

JosepSampe · 2024-01-12T17:00:22Z

So, in summary, is this something related to Lithops? or is it more related to python? or AWS Lambda?

gfinol · 2024-01-12T20:44:32Z

I think that this is something related to Lithops. I guess that it might be related to how Lithops uses the invoker thread pool or the connection pool. But I reviewed the code of the AWS Lambda backend and I didn't see anything...

ZikBurns · 2024-01-29T14:28:08Z

Python Interpreter

I'm currently using Python 3.11 interpreter of VM in AWS EC2 with Ubuntu 22.04.
I'm currently working on a modified runtime of aws_lambda.
Lithops originally serializes the code, dependencies and parameters, uploads it to S3. The function then downloads from S3 and deserializes. I did some experiments to avoid the steps through S3. My invoke just calls the function, passing the parameters as payload.
This is part of my aws_lambda.py:

self.lambda_client = self.aws_session.client(
    'lambda', region_name=self.region_name,
    config=botocore.client.Config(
        max_pool_connections=5000,
        read_timeout=900,
        connect_timeout=900,
        user_agent_extra=self.user_agent
    )
)
...
def invoke(self, runtime_name, runtime_memory, payload):
    response = self.lambda_client.invoke(
        FunctionName=function_name,
        Payload=json.dumps(payload, default=str)
    )
    return json.loads(response['Payload'].read().decode('utf-8'))

And this is how I use invoke:

def invocator(payload, number):
    start = time.time()
    result = self.compute_handler.invoke(payload)
    end = time.time()
    starttimes[number] = start
    endtimes[number] = end
    return result

def general_executor(payloads):
    with ThreadPoolExecutor(max_workers=len(payloads)) as executor:
        results = list(executor.map(lambda p: invocator(*p), payloads_with_numbers))
    return results

With this code, that is different from the way lithops originally works, I get the same problem described in this issue. This is why I think that is not related to Lithops.

I have a Containerized Runtime with many dependencies. For this experiment, every Lambda will just returns a String "Hello World".

return {
    'statusCode': 200,
    'body': "Hello World"
}

As you can see in the invocator code, I measure the startimes and the endtimes of every invocation. I invoked 100 functions in cold and warm state. With those times I can build a plot.

As you can see, there is barely any difference between both cold and warm. This is because of this added delay described in this thread.

Conda Python Interpreter

If I install miniconda and create an env Python 3.11 in my AWS EC2 with Ubuntu 22.04. I execute the same code and get:

The behavior using the conda environment looks more like what Lithops would do. Warm functions are take less than 1 second and cold takes half of the time it used to take.

I don't know why Conda solved the problem...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS Lambda invoker's performance depends on the Python interpreter #1219

AWS Lambda invoker's performance depends on the Python interpreter #1219

gfinol commented Dec 15, 2023 •

edited

aitorarjona commented Dec 15, 2023 •

edited

gfinol commented Dec 15, 2023

gfinol commented Dec 15, 2023

aitorarjona commented Dec 15, 2023 •

edited

gfinol commented Dec 18, 2023

JosepSampe commented Dec 19, 2023

gfinol commented Jan 8, 2024

JosepSampe commented Jan 12, 2024 •

edited

gfinol commented Jan 12, 2024 via email •

edited

ZikBurns commented Jan 29, 2024 •

edited

AWS Lambda invoker's performance depends on the Python interpreter #1219

AWS Lambda invoker's performance depends on the Python interpreter #1219

Comments

gfinol commented Dec 15, 2023 • edited

aitorarjona commented Dec 15, 2023 • edited

gfinol commented Dec 15, 2023

gfinol commented Dec 15, 2023

aitorarjona commented Dec 15, 2023 • edited

gfinol commented Dec 18, 2023

JosepSampe commented Dec 19, 2023

gfinol commented Jan 8, 2024

JosepSampe commented Jan 12, 2024 • edited

gfinol commented Jan 12, 2024 via email • edited

ZikBurns commented Jan 29, 2024 • edited

Python Interpreter

Conda Python Interpreter

gfinol commented Dec 15, 2023 •

edited

aitorarjona commented Dec 15, 2023 •

edited

aitorarjona commented Dec 15, 2023 •

edited

JosepSampe commented Jan 12, 2024 •

edited

gfinol commented Jan 12, 2024 via email •

edited

ZikBurns commented Jan 29, 2024 •

edited