New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(aws-lambda-python): cache Docker layer with dependencies #23829
Comments
Thanks for your idea. I am making it p2 feature request and will raise awareness to the team. |
I looked more at this issue. Changing the command order will not help here because those commands do not create next Docker layers. They are all executed with docker run here (or, to be precise, here), something like: docker run --rm build_python_image bash -c "cp source/ /asset-input && cd /asset-input && pip install -r requirements.txt && ..." I see several potential solutions. Option 1 - build second Docker imageTo utilize caching from Docker layers, we would need to build an image from a Dockerfile, where those commands are executed one by one. We could dynamically create a Dockerfile, put only (possibly generated) requirements.txt file in it, and install dependencies. Then the bundling commands would only copy rest of the files. So instead of: aws-cdk/packages/@aws-cdk/aws-lambda-python/lib/bundling.ts Lines 100 to 106 in 29bdd6c
something like this: const baseImage = image ?? DockerImage.fromBuild(path.join(__dirname, '../lib'), {
buildArgs: {
...props.buildArgs,
IMAGE: runtime.bundlingImage.image,
},
platform: architecture.dockerPlatform,
});
const tempDockerfile = makeTempFile();
fs.writeFileSync(tempDockerfile, `
FORM ${baseImage.image}
RUN mkdir -p ${outputPath} && cd ${outputPath}
COPY requirements.txt .
RUN python -m pip install -r ${DependenciesFile.PIP} -t ${outputPath}
`);
this.image = DockerImage.fromBuild(tempDockerfile); Then remove commands for installing dependencies from A variation of this would be using docker commit to create new layers. Option 2 - build Lambda Layer for dependenciesBuild a Lambda Layer with dependencies only and attach it to the Lambda. Similar to the above, to not rebuild them every time (fetching dependencies from the internet), we need caching. We can either generate and build a Dockerfile to utilize Docker layers (like in option 1), or maybe cache the lockfile in Option 3 - cache dependenciesIf we would install dependencies to a separate directory in the bundle, like However, I'm not sure if accessing the previous bundle is possible (we would need to know the bundle hash). Alternatively, maybe we could copy libs dir to our cache after bundling? So next time we take it from our cache dir. Additional paths for Python libraries can be added with Option 4 - copy files from Python virtual envThe whole For pipenv and Poetry, we could get the virtual environment location ( Then we could copy dependencies from the virtual environment location instead of downloading them from the internet. Consequently, you would need to install dependencies manually before running This could be a flag, so the current mode is the default (as a more reliable one), but you could opt-in to copy already installed dependencies from disk instead of fetching them from the internet each time. I'm happy to work on this, but there is a need for input from the CDK team first to find the best solution. |
These all sound like good ideas. I actually expected the lambda layer method to be how the python layers worked. I’d love to see this implemented. |
I ran into this this week, developing a CDK app using the Python Lambda Construct. Currently using SAM to run it the project locally it is really slow as every time there is code change the bundling install all the packages again even when the dependencies didn't change. |
I'm seeing a similar problem, including when the lambda code is not changed at all. For example, running multiple tests against a stack that contains a PythonFunction causes repeated bundling (identical to this issue for aws-lambda-nodejs). Yes, the first Docker layers are cached, but as mentioned by others, the dependency download isn't, and takes a long time. For the time being, I've set every Python lambda function that doesn't have dependencies to use a standard Function implementation over PythonFunction. However, using a PythonFunction for the lambda functions with dependencies has extended a currently small test suite from approx 1 min to over 10 min due to the repeated bundling and dependency installations on each test. A crude benchmark is showing the following differences for test runs bundling the same python function (with cached Docker layers):
It would be fantastic to have an implementation to fix this. Those 40 seconds stack up when you have multiple tests. |
I've been looking for a workaround in the meantime, with the following Dockerfile (only covering requirements.txt):
Using this Dockerfile as my image in the CDK code (in Java):
This allows the use of the cache layers for the install, but note the specified command in the
The problem is that if Hence, the Dockerfile needs to bundle into a separate directory and then I'm wondering if there will be a similar issue with the solutions described by @m-radzikowski, notably with option 1 and option 2. I'm hopeful there's something I've overlooked here, as I'd like to be able to bundle the asset directly into Curious if anyone has other findings. |
A solution I have found is to use the SAM build image. It also only seems to work with requirements.txt, but a docker build is not triggered unless I make changes to the source. Edit to add examples: new python.PythonLayerVersion(this, 'MyLayer', {
compatibleRuntimes: [
lambda.Runtime.PYTHON_3_9,
],
bundling: {
image: lambda.Runtime.PYTHON_3_9.bundlingImage,
}
}) Works well for x86. If you're running a lambda on ARM though, this image will still try to build for x86 so you have to specify the ARM image. new python.PythonFunction(this, 'MyFunction', {
architecture: lambda.Architecture.ARM_64,
runtime: lambda.Runtime.PYTHON_3_11,
bundling: {
image: cdk.DockerImage.fromRegistry('public.ecr.aws/sam/build-python3.11:latest-arm64')
}
}) Images can be found here https://gallery.ecr.aws/sam |
This seems like a bit of a hacky solution, but I was able to reduce deploy time with it. Hopefully it helps someone. Going off what others said about using layers, I used a The folder structure
I then ignored the MODULE_PATH = 'path to python project'
function generateHashFromPoetryLock() {
const fs = require('fs');
const crypto = require('crypto');
const lockFile = fs.readFileSync(`${MODULE_PATH}/poetry.lock`);
const hash = crypto.createHash('sha256');
hash.update(lockFile);
return hash.digest('hex');
}
const myFunction = new lambda.Function(this, 'MyFunction', {
runtime: lambda.Runtime.PYTHON_3_7,
code: lambda.Code.fromAsset(`${MODULE_PATH}/lambda`),
handler: 'index.handler',
layers: [
new python.PythonLayerVersion(this, 'DependencyLayer', {
entry: MODULE_PATH,
bundling: {
assetExcludes: ['lambda'],
assetHashType: cdk.AssetHashType.CUSTOM,
assetHash: generateHashFromPoetryLock(),
},
}),
],
}); |
This issue has received a significant amount of attention so we are automatically upgrading its priority. A member of the community will see the re-prioritization and provide an update on the issue. |
Any progress here? Im looking to building a new app using python CDK and a python lambda function. Should I wait until this lib comes out of alpha or is it production ready? I dont want to have to update a ton of my python lambda function code if this isnt ready yet - id rather use typescript since that is already in prod. |
I would have thought that in many cases, Docker can be avoided entirely. This is because many Python libraries often provide pre-compiled binaries for various platforms. Pip can download the appropriate binaries for a specific platform with something like; Docker would still be needed as a fallback if the binaries aren't available, but otherwise this could just be done in the host environment. Pip also manages its own local cache. I might experiment with this and open it as a separate issue though. |
Also experiencing this. Added my own hash and still no luck. |
@m-radzikowski Thanks for this detailed write up and clear explanation on this issue. There are now 28 people 👍 for this issue and we've escalated it as a |
@GavinZZ the simplest and fastest would be option 2, with a Lambda Layer for dependencies. We reuse existing Option 1 would not create an extra Lambda Layer and use Docker layers for caching, but bundling would be more complex. Other options are more complex/tricky to implement. We still need to install dependencies in Docker in case pip/pipenv/poetry executable is not found in |
Option 2 is what I use already and it works very well. That said, I’d prefer to manage my own layers so as long as that’s still an option, that would be best. Docker should still be used to build it, because it provides support for other architectures. (Although that seems to have broken with recent docker updates) |
@m-radzikowski Thanks for sharing your thoughts. I agree that Option 2 seems like the most straightforward and feasible solution among all. In the issue description, you expressed interests to implement this feature request. Would you still be willing to create a PR so that our team can deep dive into the solution and review from there. |
@GavinZZ great 👍 Yeah, I can try to make it work. Although my the timeline is "upcoming weeks". |
This is a highly important feature, our workflows take 40 minutes to synthesize because we are using Poetry with PythonFunction, and having about 10+ stacks in some of our CDK apps and a lot of lambdas there, this would save up to 90% of our deployment time We tried computing the assetHash based on the poetry.lock, it didn't work, also tried using lambda layers but the dependencies still get re-installed on every deployment (synth phase), we end up downloading all of them even if there is no real diff for the run. Just to understand better, option 2 means that if a lambda layer with the dependencies is built and attached, then there won't be a need to directly install the dependencies during the bundling phase as part of the docker's dynamic CMD? Is the lambda layer going to be built again if Would love to contribute to this issue. |
To add to @orshemtov's question, will option 2 also prevent lambda layers being installed during synth even if there are no code or dependency changes? |
If I understand correctly, these are the changes proposed above: in In super(scope, id, {
...props,
runtime,
// This would probably need to change
code: Bundling.bundle({
entry,
runtime,
skip: !Stack.of(scope).bundlingRequired,
// define architecture based on the target architecture of the function, possibly overriden in bundling options
architecture: props.architecture,
...props.bundling,
}),
handler: resolvedHandler,
// New code
layers: [
new PythonLayerVersion(scope, "DependenciesLayer", {
entry,
compatibleRuntimes: [runtime],
compatibleArchitectures: [props.architecture ?? Architecture.X86_64],
bundling: {
// Bundling options
},
}),
],
}); In private createBundlingCommand(options: BundlingCommandOptions): string[] {
// const packaging = Packaging.fromEntry(options.entry, options.poetryIncludeHashes, options.poetryWithoutUrls);
let bundlingCommands: string[] = [];
bundlingCommands.push(...options.commandHooks?.beforeBundling(options.inputDir, options.outputDir) ?? []);
const exclusionStr = options.assetExcludes?.map(item => `--exclude='${item}'`).join(' ');
bundlingCommands.push([
'rsync', '-rLv', exclusionStr ?? '', `${options.inputDir}/`, options.outputDir,
].filter(item => item).join(' '));
bundlingCommands.push(`cd ${options.outputDir}`);
// New code
// This would be removed/changed?
// bundlingCommands.push(packaging.exportCommand ?? '');
// if (packaging.dependenciesFile) {
// bundlingCommands.push(`python -m pip install -r ${DependenciesFile.PIP} -t ${options.outputDir}`);
// }
bundlingCommands.push(...options.commandHooks?.afterBundling(options.inputDir, options.outputDir) ?? []);
return bundlingCommands;
} |
Describe the feature
Bundling Python Lambdas that contain
requirements.txt
,Pipfile
, orpoetry.lock
file happens in a Docker container. Firstly, therequirements.txt
file is generated (for pipenv and Poetry), and then dependencies are installed.However, this happens each time any code change is made. Each time you change Lambda code (not its dependencies), the Docker build performs the above steps, downloading libraries from the internet.
This is the responsible code:
aws-cdk/packages/@aws-cdk/aws-lambda-python/lib/bundling.ts
Lines 109 to 122 in 4613886
Dependencies change far less often than the code. Best practices for building in Docker are to firstly download dependencies and only then copy the code. This allows the dependencies layer to be cached, and on consecutive runs, only the code is updated while the dependencies layer is cached.
Use Case
This will greatly reduce consecutive Lambda bundling times, as dependencies will be fetched from the internet only when they change. When only the code changes, a cached Docker layer with dependencies will be used.
Proposed Solution
In short, bundling Python Lambda should be changed from:
requirements.txt
(line 115)to:
poetry.lock
)requirements.txt
Other Information
I am willing to implement it after greenlighting by the CDK team.
Acknowledgements
CDK version used
2.61.1
Environment details (OS name and version, etc.)
any
The text was updated successfully, but these errors were encountered: