Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed dynamic import should not always be cached #6768

Open
sodatea opened this issue Jun 15, 2021 · 21 comments · May be fixed by #10327
Open

Failed dynamic import should not always be cached #6768

sodatea opened this issue Jun 15, 2021 · 21 comments · May be fixed by #10327

Comments

@sodatea
Copy link

sodatea commented Jun 15, 2021

Given feedback from https://bugs.chromium.org/p/chromium/issues/detail?id=1195405#c9

In my opinion, HTML spec should also stop demanding to cache failed dynamic imports.


The rationale of this proposal is well discussed in tc39/proposal-dynamic-import#80
The ECMAScript spec has adopted the change long ago: tc39/ecma262#1645


Lack of retry-ability is stopping SPA developers from using native dynamic imports. Because they would have to reload the whole web app, only to make up for an otherwise insignificant network failure when fetching a lazy-loaded js chunk.

@sodatea
Copy link
Author

sodatea commented Jun 15, 2021

cc/ @hiroshige-g @domenic

While Safari isn't conforming to the spec in this case, I think their behavior is more intuitive and reasonable to web developers. We should fix the spec, not the browser 🙂

@annevk
Copy link
Member

annevk commented Jun 17, 2021

cc @whatwg/modules

@annevk
Copy link
Member

annevk commented Jun 22, 2021

If we end up deciding to change behavior here, https://bugzilla.mozilla.org/show_bug.cgi?id=1716019 can be reopened and repurposed.

@hiroshige-g
Copy link
Contributor

Some random thoughts (I'm neutral about whether this change should be made or not; focusing its feasibility here):

Should we retry { fetch errors | parse errors | instantiation errors | evaluation errors }?
I prefer just retrying (at most) fetch errors, due to its complexity (I assume fetch errors < parse errors <<<< instantiation errors << evaluation errors).

Should we retry in {dynamic imports | static imports | <script src>}?
Perhaps all according to tc39/proposal-dynamic-import#80 (comment).

How it should be specified?
Primarily we should change
https://html.spec.whatwg.org/multipage/webappapis.html#fetch-a-single-module-script
to allow retrying fetch when previous result is null (by clearing moduleMap[url] instead of setting it to null in Step 9, or by only early returning when moduleMap[url] exists AND it is not null in Step 3), but we also have to check other uses of module maps especially we want to retry parse errors.
In this way, fetch errors are always not retried within a single import() thanks to visited set around https://html.spec.whatwg.org/multipage/webappapis.html#internal-module-script-graph-fetching-procedure, may be retried between multiple concurrent import()s, and are always retried when starting another import() after previous import() was rejected (if there are no other module fetches).

@leobalter
Copy link

Thanks @hiroshige-g!

I believe these are similar thoughts from @domenic's comment (from 2019):

(...) Tentatively we have the following thoughts:

  • This should behave the same for static and dynamic imports. For example we should just not cache failed fetches (we currently do). This is important if, for example, you do import("./a") where ./a statically imports ./b and ./b has a transient network error. You will want to just retry import("./a").
  • We are unsure how to treat parse errors. Should we continue caching them? Or should we allow retrying failed parses?
  • How this impacts multiple parallel module graph fetches seems complicated.

One thing that I'm imagining here:

to allow retrying fetch when previous result is null (by clearing moduleMap[url] instead of setting it to null in Step 9, or by only early returning when moduleMap[url] exists AND it is not null in Step 3)

I believe if Step 3 just says something like: If moduleMap[url] exists and is not null would also require a change in Step 2, as this one would need to loop back if moduleMap[url] goes back to "fetching". Maybe this is

  1. If moduleMap[url] is "fetching", wait in parallel until that entry's value changes, then queue a task on the networking task source to proceed with repeating this step.
  2. If moduleMap[url] exists and is not null, asynchronously complete this algorithm with moduleMap[url], and return.

The new Step 3 would have an immediate follow up to change a null moduleMap[url] to "fetching".

Step 2 works today because there is no repeated "fetching".

I hope this helps moving this issue forward towards an effective resolution!

@sodatea
Copy link
Author

sodatea commented Mar 1, 2022

Is there anything I can do to move this issue forward?

@07akioni
Copy link

Any progress?

@davidwallacejackson
Copy link

I'm guessing this hasn't made much progress because there aren't a lot of people clamoring for a solution, so I'm just here to be an additional squeaky wheel. I'm about to "fix" this in my own project by implementing a service worker that will retry only requests for javascript files, which strikes me as an extremely drastic measure to have to take to guard against a common condition like flaky connectivity.

The user experience when this fails is also really, really bad -- parts of the app just entirely fail to load, and as far as I can tell the only mitigation short of the service worker approach would be to report that an error happened and encourage/force the user to reload the page. It's a bad look for the app developer, and I don't think most engineers know that this is a danger when implementing code-splitting, which is part of all the major SPA frameworks and a standard recommendation for large apps. I'm really surprised I haven't heard about this biting more people, and I wonder if a lot of teams either don't know this is happening to them or have given up on solving it.

@LabEG
Copy link

LabEG commented Dec 7, 2022

Same problem. I use microfrontends with chunks. I'm trying to protect against chunk loading error when deploying a new microfrontend version. Because the old chunks are gone. I'm trying to reimport a microfrontend, but it's cached.

@AlonMiz
Copy link

AlonMiz commented Jan 5, 2023

I was struggling with this issue and found a workaround. and had the chance to even write about it here:
it shows the workaround with react.lazy, but the same mechanism can be applied in any framework or vanilla js.
Retry dynamic imports with “React.lazy”

@SquareByte
Copy link

Caching the result of a failed import() is in-intuitive to say the least. Please change the spec.

infinite-persistence added a commit to OdyseeTeam/odysee-frontend that referenced this issue Sep 29, 2023
## Issue
ChunkLoadError.

There is anecdote saying that Chromium will cache failed responses (e.g. timeout), so users could end up perpetually in a bad state until the cache is cleared, or the site moved on a new different js chunks.

`whatwg/html#6768

## Approach
While there is no way to programmatically clear a user's cache, doing a separate `fetch` with `no-cache` indirectly replaces the cache with a fresh download.
infinite-persistence added a commit to OdyseeTeam/odysee-frontend that referenced this issue Sep 29, 2023
## Issue
ChunkLoadError.

There is anecdote saying that Chromium will cache failed responses (e.g. timeout), so users could end up perpetually in a bad state until the cache is manually cleared, or the site moved on to a new chunk.

`whatwg/html#6768

## Approach
While there is no way to programmatically clear a user's cache, doing a separate `fetch` with `no-cache` indirectly replaces the cache with a fresh download.
infinite-persistence added a commit to OdyseeTeam/odysee-frontend that referenced this issue Oct 2, 2023
## Issue
ChunkLoadError.

There is anecdote saying that Chromium will cache failed responses (e.g. timeout), so users could end up perpetually in a bad state until the cache is manually cleared, or the site moved on to a new chunk.

`whatwg/html#6768

## Approach
While there is no way to programmatically clear a user's cache, doing a separate `fetch` with `no-cache` indirectly replaces the cache with a fresh download.
infinite-persistence added a commit to OdyseeTeam/odysee-frontend that referenced this issue Oct 2, 2023
## Issue
ChunkLoadError.

There is anecdote saying that Chromium will cache failed responses (e.g. timeout), so users could end up perpetually in a bad state until the cache is manually cleared, or the site moved on to a new chunk.

`whatwg/html#6768

## Approach
While there is no way to programmatically clear a user's cache, doing a separate `fetch` with `no-cache` indirectly replaces the cache with a fresh download.
infinite-persistence added a commit to OdyseeTeam/odysee-frontend that referenced this issue Oct 2, 2023
## Issue
ChunkLoadError.

There is anecdote saying that Chromium will cache failed responses (e.g. timeout), so users could end up perpetually in a bad state until the cache is manually cleared, or the site moved on to a new chunk.

`whatwg/html#6768

## Approach
While there is no way to programmatically clear a user's cache, doing a separate `fetch` with `no-cache` indirectly replaces the cache with a fresh download.
@zcrittendon
Copy link

I'd like to reiterate the importance of addressing this specification issue. ES6 modules should be the preferred choice for building a large SPA website, but currently result in a poor experience for users with unreliable network connections.

Our use case: We utilize Vite and Rollup to produce ES6 modules for a large-scale website. However, we've faced difficulties with the workarounds provided here:

  1. Cache-busting query parameters: This approach only works works when the network failure occurs on the dynamically imported file itself. It doesn't work when the network request succeeds for the dynamic import but fails for one of its descendent imports. This scenario is very common in SPAs, where the SPA router dynamically loads the page component code for the current route, but the page component uses static imports for its descendent components.
  2. Service worker fetch: The network request only occurs the first time the import() is invoked. Subsequent import() calls use the module map, bypassing any potential fetch handling by a service worker.
  3. Reloading the page upon import() error: This technically "solves" the problem, but is not a good user experience.

I genuinely hope this can be prioritized, as it significantly impacts SPA applications using ES6 modules.

@dreambo8563
Copy link

we have the same problem on our projects. we have a different cdn as the backup for the scenario "unstable network" or "DNS cache pollution".

  • for the js/css/image resource import with Link/script/img tags we use the package like https://github.com/huajiejin/url-fallback to retry for the backup cdn resource. but the solution not working for the "lazy load module".

@nathnolt
Copy link

Dirty hack to solve it:

import('my-script.js?r=' + Date.now())

@aethr
Copy link

aethr commented Apr 1, 2024

@nathnolt this was already covered in:

  1. Cache-busting query parameters: This approach only works works when the network failure occurs on the dynamically imported file itself. It doesn't work when the network request succeeds for the dynamic import but fails for one of its descendent imports. This scenario is very common in SPAs, where the SPA router dynamically loads the page component code for the current route, but the page component uses static imports for its descendent components.

@nathnolt
Copy link

nathnolt commented Apr 2, 2024

I see. It's strange that such a common thing like import() is so fragile under non perfect network conditions.

@zcrittendon
Copy link

I see. It's strange that such a common thing like import() is so fragile under non perfect network conditions.

It is strange. Dynamic Import is ~7 years old, this issue is ~3 years old, and the same issue was resolved in the ECMAScript specification ~5 years ago.

But it also makes sense! Most of the internet runs on older bundlers like Webpack UMD. As @davidwallacejackson wrote, there aren't enough squeaky wheels.

The good news is that in 2023 Webpack growth began to decline and ESBuild / Rollup / Vite usage is rising exponentially. With luck this issue will soon have enough squeaky wheels to demand action.

@zcrittendon
Copy link

zcrittendon commented May 6, 2024

@domenic and @hiroshige-g and @annevk it's been a few years since you commented on this issue, but I think it warrants a second look as ESM usage is increasing exponentially. In your previous investigation, you identified that Chrome and Firefox are following this problematic WHATWG specification but Safari is not:
https://issues.chromium.org/issues/40759006#comment11

For a real-world example of how this impacts the Chrome and Firefox user experience, I will share some data from my company's website. We have ~750 javascript modules and approximately 10% of imports are done using import(). We log import() errors, and they reveal a huge difference between Safari vs. Chrome and Firefox:

  • Android Chrome:
    • 0.5% of all unique users experience an import() error.
    • 459 import() errors per 1000 unique users.
    • Average of 90 errors per unique user who experiences an import() error.
  • iPhone Safari:
    • 1.3% of all unique users experience an import() error.
    • 52 import() errors per 1000 unique users.
    • Average of 4.2 errors per unique user who experiences an import() error.
  • Firefox (all devices excluding bot user agents):
    • 2.1% of all unique users experience an import() error.
    • 3340 import() errors per 1000 unique users.
    • Average of 160 errors per unique user who experiences an import() error.

From this data I conclude that import() errors are inevitable in real-world network conditions. When the inevitable errors occur, Chrome users get stuck in a repeating error loop even after their network connectivity improves, whereas Safari users are able to recover because Safari is NOT following the current specification.

Edited to include Firefox data. A larger time range was necessary due to lower Firefox traffic on our site, so I also updated Safari and Chrome with data from same time range. The new numbers vary a bit from the original numbers, but clearly show the same trend.

@lucacasonato lucacasonato linked a pull request May 7, 2024 that will close this issue
5 tasks
@annevk annevk added the agenda+ To be discussed at a triage meeting label May 7, 2024
@annevk
Copy link
Member

annevk commented May 7, 2024

@past I marked this agenda+ mainly to seek feedback on @zcrittendon's request above. @lucacasonato kindly wrote up #10327 so at this point the main question is whether Chromium and Gecko are interested in making a change here. (I won't attend this week due to a holiday, but I don't think I'm needed for this. If you feel otherwise feel free to postpone.)

@zcrittendon
Copy link

zcrittendon commented May 7, 2024

Amazing! Thanks so much @annevk and @lucacasonato -- really appreciate your attention to this! Hopefully Chromium and Gecko will support this change as it will significantly improve the browsing experience for their users.

@smaug----
Copy link
Collaborator

@jonco3 @codehag

@past past removed the agenda+ To be discussed at a triage meeting label May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.