Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically create invalidation for CloudFront distribution post-deploy #40

Open
lrholmes opened this issue Mar 9, 2018 · 20 comments

Comments

@lrholmes
Copy link

lrholmes commented Mar 9, 2018

As a common use case for hosting a website with S3 is to integrate with CF, it would be great to be able to automate the invalidation that is required to update a CF distribution after deploying the website to S3.

@linusmarco
Copy link
Contributor

Hey @lrholmes! This should be fairly easily accomplished by making a createInvalidation request to the AWS SDK. You can do this within this plugin via an operation looking something like this:

// create 'params' based on options specified by user

this.aws.request('CloudFront', 'createInvalidation', params, this.stage, this.region);

More info on the information that would need to be included in params can be found in the AWS SDK docs here:
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/CloudFront.html#createInvalidation-property

Since not all deployments will need invalidation requests, we'll probably want to make this an optional parameter in the serverless.yml configuration; something like below, but please reorganize as you see fit upon implementation:

custom:
  client:
    ...
    invalidateCF:
      cfDistribution: [CF distribution ID]
      objects:
        - [objects]
        - [to]
        - [invalidate]
    ...

You can check out the configureBucket function in our codebase for a simple example of how config parameters from serverless.yml get turned into calls to the AWS SDK.

@fernando-mc
Copy link
Owner

@lrholmes great suggestion! Feel free to reach out if you have questions about how you can help us add this in.

@traviscollins
Copy link

Using invalidations for cache defeats on new deployments is not an appropriate use. The invalidations won't be uniform over geographies and times, and is generally a heavy lift operation. See this doc for more details.

https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html

Instead the deployment should include a version number in the resource directory or file name (many sites use a deployment number or date in a parent folder name). Typically in CloudFront applications only images, js, css, etc are cached. html pages are never cached. With that scheme, the end user can simply hit refresh on a web page and they'll get the absolute latest version of the code, even if it was just deployed.

@constb
Copy link

constb commented Mar 12, 2018

Yeah, well, even versioned assets need to be referenced somewhere, like index.html maybe? I mean if what you are making is SPA + lambdas for backend, you're probably gonna need to invalidate at least something anyway…

@traviscollins
Copy link

Correct that your build process must integrate the version number into the HTML file (or somewhere). But...

A) if you do not cache the HTML file, you don’t have to invalidate anything because it will simply reference the new assets

B) If you do cache the HTML file users will see the new version as the old version cache timeout expires. But no matter which version they receive it will be able to reference all appropriate assets as a single unit (no dependency version mis matches)

C) If you deploy without versioned assets and use caching of any sort, even with an invalidation step (which is a long running process), your users will experience run time errors based on dependency version mismatches.

@fernando-mc
Copy link
Owner

@lrholmes I'm going to assign this to you to lead the charge on here. Let us know if we can help. After today there should be a fairly significant refactor that may make this a bit easier to implement in a clean way.

@romainquellec
Copy link

@lrholmes ! Any update on this ?

@fernando-mc
Copy link
Owner

@romainquellec While I assigned the issue to @lrholmes they don't appear to have replied back to accept that responsibility yet. You are also welcome to open a PR to add this functionality - I just request that if you plan on doing that you post here and mention you're working on it.

@lrholmes
Copy link
Author

Hey, really sorry about being quiet on this one. As much as I'd have loved to contribute, I didn't have a lot of time and was put off the idea slightly by the counter arguments put forward here, so decided to use another service for my static-site deployment.

Good luck with progression on this issue!

@fernando-mc
Copy link
Owner

@lrholmes Thanks for submitting the issue. I'm still inclined to support adding this so if you feel like submitting a PR in the future I'd be happy to review it.

CloudFront cache invalidation seems like a reasonable feature to have for static sites that frequently rely on CloudFront. There may be additional considerations to make but I think for the majority of users the easiest solution is just a blanket invalidation. This does impose some cost considerations and performance quirks but it appears better than the alternatives.

While the issues that @traviscollins and @constb were discussing are real issues for many caching systems from what I've been able to tell (both research and personal experience invalidating my own blog), AWS has fairly quick invalidations. I wouldn't expect any mismatch issues to last for more than 5-30 seconds post invalidation. While I've never come across them maybe they're more of an issue than I'm familiar with?

From what I can see here there are a few options here:

  1. Do nothing - users have to figure this out on their own and write a custom invalidation script or add in asset versioning such as @traviscollins mentioned. This plugin can still deploy static assets to S3 and the user can setup whatever CloudFront settings they want to.
  2. Add some simple support for CloudFront invalidations - e.g. take another configuration parameter or two such as CloudFront Distribution ID and a path parameter and automatically invalidate that entire distribution or a specific path within the distribution.

I'm personally inclined to do number two but I'm open to additional arguments against it.

@fernando-mc fernando-mc assigned fernando-mc and unassigned lrholmes Jul 10, 2018
@traviscollins
Copy link

traviscollins commented Jul 10, 2018 via email

@fernando-mc
Copy link
Owner

@traviscollins Definitely appreciate your feedback here. I think the missing context is that the current strategies available to appropriately pursue the best practices here require non-negligible amounts of work.

Take a typical static site generator like Hugo. There is currently no built-in support for versioned assets. This is an area I'm somewhat ignorant in, but as far as I'm aware the majority of frontend frameworks would also require significant additional configuration and build tools to support versioned assets.

@traviscollins so if a developer wants to rely on caching for a site to improve performance but doesn't have the time to set up an asset versioning pipeline is there an alternate solution?

Here are a few ideas I came up with but I'm curious if you have other suggestions.

  1. Set low TTLs on CloudFront and let them expire on their own
  2. Don't use caching at all - Just make requests directly to an S3 origin

In case anyone wants to review some relevant bits of material here there's a good article here that clarifies some strategies @traviscollins is referencing.

Additionally, this specific portion of the AWS documentation references replacing objects.

@zhammer
Copy link

zhammer commented Jul 19, 2018

i think @traviscollins' objections don't apply to most SPA web clients, which i imagine a bunch of serverless-finch users are deploying.

a basic SPA index.html would look something like this: (this is from a create-react-app client)

<!DOCTYPE html>
<html lang="en">
    <head>
        ...
        <title>Finch Website</title>
    </head>
    <body>
        <div id="root"></div>
        <script type="text/javascript" src="/static/js/main.123.js"></script>
    </body>
</html>

let's say we have a live webste, finch.com, for which index.html and static/js/main.123.js are cached on a cloudfront distribution. (for the sake of this example, i'm just considering one cloudfront endpoint)

without any new deployments / invalidations

  1. a user visits finch.com. their browser fetches index.html which is supplied by the cloudfront cache.
  2. index.html instructs browser to fetch static/js/main.123.js, which is supplied by the cache.

author deploys a new version of the client

  1. index.html is overwritten in s3 bucket
  2. new versioned static/js/main.456.js is added to the s3 bucket.

until the index.html cache entry expires, a user visiting finch.com will receive the cloudfront-cached index.html, which instructs browser to fetch static/js/main.123.js (which is supplied by the cache) and the user gets the old version of the page.

author invalidates /index.html

  1. an invalidation for /index.html is started in the cloudfront cache.
  2. a user visits the page while the invalidation is ongoing. that user gets the old index.html, which tells the browser to fetch static/js/main.123.js. static/js/main.123.js is fetched from the cloudfront cache and the user loads the old website.
  3. the validation completes.
  4. another user visits the page after the invalidation is complete. that user gets the new index.html (from the s3 bucket), which tells the browser to fetch static/js/main.456. again, there's a cache miss, and cloudfront fetches the file from the s3 bucket. the user gets the new website.

@traviscollins: am i missing something here? obviously you could just not cache index.html, but isn't caching one of the main values of cloudfront?

also re:

Typically in CloudFront applications only images, js, css, etc are cached. html pages are never cached.

as far as i can tell i get the old version of my index.html on a cloudfront distribution until i invalidate /index.html, but i could be wrong here.

@zhammer
Copy link

zhammer commented Jul 19, 2018

@fernando-mc: if the above is true, then i definitely think adding an invalidation option is worth it, since i imagine deploying SPA pages to s3 buckets / cloudfront distributions is a pretty common use case for serverless-finch.

(this is the invalidation request i send using the python cli after a new deployment:
aws cloudfront create-invalidation --distribution-id DISTRIBUTION_ID --paths /index.html)

@traviscollins
Copy link

traviscollins commented Jul 19, 2018 via email

@zhammer
Copy link

zhammer commented Jul 19, 2018

thanks for the follow up. what you're saying makes sense.

  1. caching HTML pages does not in practice make a loading time difference (unless you have freakishly large HTML). S3 has no issue sending out static HTML files at scale.

yeah that did cross my mind. index.html is a pretty tiny file.

  1. You’re very likely deploying API updates along with html assets. The API will likely be updated quickly. If you cache HTML files it could be 10+ minutes before the matching JS code makes its way into users browsers.

👍

@traviscollins
is this the proper way to disable caching for index.html? https://stackoverflow.com/a/45734248.

@zhammer
Copy link

zhammer commented Jul 19, 2018

also @traviscollins, re:

Typically in CloudFront applications only images, js, css, etc are cached. html pages are never cached.

are you sure about this? seems like index.html is, by default, cached. in which case, disabling caching on certain assets could be a useful feature @fernando-mc

@linusmarco
Copy link
Contributor

linusmarco commented Jul 19, 2018

@zhammer we already have the functionality to set Cache-Control headers on files 😃

objectHeaders:
  index.html:
    - name: Cache-Control
      value: no-cache

This is what I use for all my SPAs hosted on S3. As discussed, this + versioned assets makes updates a breeze.

And as of v2.2.0, glob patterns are supported in the objectHeaders section of the config, so if you don't want to cache any html files, just do:

objectHeaders:
  '*.html':
    - name: Cache-Control
      value: no-cache

@zhammer
Copy link

zhammer commented Jul 19, 2018

@linusmarco ah, awesome. best type of issues are ones that have already been solved. i'll add this to my project.

@fernando-mc
Copy link
Owner

@traviscollins @zhammer @linusmarco

Hello again! Did we ever come to a conclusion here? It sounds like we brought up a few development patterns with the cache control object headers and updating the javascript references in something like a non-cached index.html.

But (correct me if I'm wrong here @traviscollins) it still seems like there is a use case for invalidating at least some objects in CloudFront? For that I think @linusmarco's implementation strategy in one of the first comments looks like a good route forward?

Or am I completely wrong and this issue should be closed and sealed away forever?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants