Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement service worker and add PWA manifest #372

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bsidhom
Copy link

@bsidhom bsidhom commented Feb 22, 2024

This is the first step in making Numbat fully offline-capable and installable as a PWA (see #371). It's a bit brittle right now because it relies on static assets being manually tracked and added to the service worker lists.

There's also a hole where the Google Fonts are not explicitly cached by the service worker; however the browser should be able to fall back to the system fonts or browser-level cache in this case. I tested this on my machine and it rendered properly when I was fully offline.

I was not able to test the ECB exchange rate fetching due to the lack of CORS policy headers on the Numbat proxy, since I was running this at localhost.

This is the first step in making Numbat fully offline-capable and
installable as a PWA. It's a bit brittle right now because it relies on
static assets being manually tracked and added to the service worker
lists.

There's also a hole where the Google Fonts are not explicitly
cached by the service worker; however the browser should be able to fall
back to the system fonts or browser-level cache in this case. I tested
this on my machine and it rendered properly when I was fully offline.

I was not able to test the ECB exchange rate fetching due to the lack of
CORS policy headers on the Numbat proxy, since I was running this at
localhost.
@@ -110,11 +110,31 @@ function setup() {
});
}

// TODO: I would prefer to use const lambdas rather than `function` in order to
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't plan to merge this comment as-is but to remove it or update this file based on feedback. I'm not confident enough in my JavaScript abilities to use function in general unless I'm implementing generators, where it's strictly required. 🙂

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I can help here, sorry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I offered a suggestion in my review.

const APP_PREFIX = "numbat_";
// This should be updated any time cached assets change. It allows the entire
// set of precached files to be refreshed atomically on SW update.
// TODO: It would be nice if we did this automatically whenever a commit affects
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this could be done in the deploy or build script, but it's not clear under what conditions exactly we should generate a new version. I think in that case we would want a single source of truth for all self-hosted artifacts (and remote URLs), such as a local JSON file. Then the local script could check the file hashes for changes and the service worker could fetch that JSON to determine the lists of files.

Copy link
Owner

@sharkdp sharkdp Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean.. unless we change something completely unrelated like the docs, which are currently not part of this PWA, almost everything would require a VERSION_UUID change, right?

Could we simply update it every time we use the deploy.sh script to push a new version/state to numbat.dev?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally thinking that we could put this into the deploy script, but the more I think about it the more sense it makes to use deterministic hashing of all built artifacts (i.e., anything that would get pushed via the deploy script). See my other comment in the main thread.

@bsidhom
Copy link
Author

bsidhom commented Feb 22, 2024

I highly recommend this article about the ServiceWorker lifecycle if you aren’t familiar with it.

Addresses part of sharkdp#371.
var numbat;
var combined_input = "";

async function main() {
await init();
await Promise.all([init(), registerSw()]);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth as to whether this should be done in-line at init() time or during the load event as is often suggested. My initial thought was to register() and claim() first in order to take advantage of ServiceWorker-cached artifacts. However, it looks like the SW "prefetching" can actually fall back to the browser cache if we allow the page to load first. This makes me think we should maybe wait until both the load event has fired and the Numbat terminal has initialized before registering. On the other hand, this is a pretty minor detail given how large the WASM payload is at the moment and the fact that most users will be on modern devices with a good connection.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mainly concerned about further delaying the time until a user can type something into the terminal. Due to the large WASM size, this is already a long wait time, unfortunately — you're right. If registerSw is negligible in comparison, I'm okay with either way.

@bsidhom bsidhom changed the title Implement service worker to serve site when offline Implement service worker and add PWA manifest Feb 22, 2024
@bsidhom
Copy link
Author

bsidhom commented Feb 23, 2024

This is staged at https://bsidhom.github.io/numbat/. You should be able to install the app and use it offline.

@bsidhom
Copy link
Author

bsidhom commented Feb 23, 2024

Note that the staged version is using my own Cloudflare Worker-based CORS proxy, but is otherwise identical to this PR.

@sharkdp
Copy link
Owner

sharkdp commented Feb 25, 2024

Thank you very much for your contribution!

This is staged at https://bsidhom.github.io/numbat/. You should be able to install the app and use it offline.

It works great!

Before I take a closer look: My web development knowledge is a bit outdated, and I have never worked with PWAs. I'm also not a frequent user. Can you think of any downsides or side effects for site visitors (that do not want to install the PWA) resulting from this change?

@bsidhom
Copy link
Author

bsidhom commented Feb 26, 2024

The only downside that I see is extra retained storage space since the service worker auto-registers. As a result, it will pre-cache all static assets for the site and keep them "indefinitely". By "indefinitely", I mean the following:

  • Cached files do not obey cache-control headers because we're manually saving them via the Caches API.
  • The current policy in the service worker is to retain these files until the next service worker update (signaled by any byte-level changes in the source code or its inputs (not by changes to the referenced cache files). This is why I included the version UUID, which allows us to signal an explicit update to the SW itself and serves as a cache key. We explicitly drop all outdated caches on update.
  • Each browser has its own storage policy for PWAs (actually, "offline" site storage in general; this includes CacheStorage, as well as IndexedDB, local storage, and others). All offline data gets lumped into a bucket based on origin and browsers garbage collect based on LRU or other policies. Some browsers allow explicit user overrides to retain this data, but otherwise everything is eligible for eviction. As far as I'm aware, Safari has the strictest eviction policy and storage limits, but IIRC, it's still around 500 MiB--1 GiB, and the eviction policy is waived (in a confusing way) for "installed" apps.

This should not affect loading time appreciably because any assets that were fetched by the page before the SW will hit the browser cache when fetched. I did leave a note about potentially delaying SW registration to address this, but in my experience, the WASM loading is so slow that there's no noticeable difference in page load time.

Beyond the page load behavior, this should result in no differences when using the site. It does, however, give us flexibility and additional features when installed. For example, keyboard shortcuts that are normally intercepted by the browser can be handled by the PWA if it's "installed". This should help, for example, provide a full-featured terminal emulator experience if we end up switching to xterm.js. It also allows other stuff such as notifications, etc., which are less relevant to Numbat.

@bsidhom
Copy link
Author

bsidhom commented Feb 26, 2024

Implicit in what I said above is that we can remove auto registration and make users explicitly click some "register service worker" button to install the service worker. Note that this is different from actually installation as a PWA, which has to happen after SW registration and must go through the browser UI (the steps are different depending on the browser or platform, but on mobile, this is the "add to home" feature).

@bsidhom
Copy link
Author

bsidhom commented Feb 27, 2024

My preference would be to keep the auto-registration as is, thanks to the browser retention policies. But we might also want to move registration to post-initial-load. Let me know if that's better.

"pkg/package.json", // Does this need to be cached?
"terminal.css",
];
// Files cached on-demand (at first load), but not refreshed. This should mostly
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing these comments, but I'm still not sure I understand the difference between PRECACHED and CACHED.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRECACHED files are downloaded deterministically before the first request to the service worker, since we know they will be used. CACHED files are cached on-demand at first request but then not updated thereafter (until the next app release, via service worker update). This helps save space because certain user agents will download different subset of optional assets, such as icons. This would be more pronounced if we have more icon sizes, but I've tested it in different browsers and they do in fact request different icon sizes and formats. There's no way to tell in advance which subset a given client might want.

I wasn't sure how to name this. It's not the same as "dynamically cached", since that refers to files whose contents are expected to change with every request (currently the exchange rate data).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

If we're only talking about downloading a few icons or not, I think I'd go with a simplified approach where everything is PRECACHED, if that makes sense?

I'm mainly thinking about the maintenance in the future. When new assets are added by contributors, they would not have to understand this distinction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Especially in the case of auto-generation, this is simpler and more robust if everything is precached.

@sharkdp
Copy link
Owner

sharkdp commented Feb 27, 2024

The only downside that I see is extra retained storage space since the service worker auto-registers. As a result, it will pre-cache all static assets for the site and keep them "indefinitely". By "indefinitely", I mean the following:

Cached files do not obey cache-control headers because we're manually saving them via the Caches API.

This scares me a bit, to be honest :-).

Probably completely unrelated question: why do I see two service workers? (Firefox)

image

@bsidhom
Copy link
Author

bsidhom commented Feb 27, 2024

I'm not sure when you first loaded the page, but it's possible that I updated the SW between you first and second load and the new (second) version hasn't transitioned to the "activated" state yet. In that case, it will wait until you've closed all browser tabs with the PWA or navigated those tabs to different pages. This ensures that the SW update happens atomically and while there are no live clients. I haven't updated it in a few days; IIRC, the last change I made was to update the CORS proxy. If you've had tabs open since then, that could explain it. Otherwise, it's possible one of those SWs is just a dummy. That behavior might vary by browser. See this for an explanation of the lifecycle. It can be a bit difficult to understand exactly what's going on until you've played around with updates a bit, but I can try to explain any specific questions you have.

The only downside that I see is extra retained storage space since the service worker auto-registers. As a result, it will pre-cache all static assets for the site and keep them "indefinitely". By "indefinitely", I mean the following:

Cached files do not obey cache-control headers because we're manually saving them via the Caches API.

This scares me a bit, to be honest :-).

Can you be more specific about what scares you? If I didn't make it clear above, it's not actually indefinite: old caches are cleared when we upload new SW versions (which is trivially done any time the SW contents change). Similarly, browsers will delete storage data based on their own policies or based on user-specified policies. This varies by browser, so I can't make any blanket statements. But if you can describe a pathological situation you're envisioning we can try to walk through what might happen in that case.

@sharkdp
Copy link
Owner

sharkdp commented Feb 29, 2024

Can you be more specific about what scares you? If I didn't make it clear above, it's not actually indefinite: old caches are cleared when we upload new SW versions (which is trivially done any time the SW contents change). Similarly, browsers will delete storage data based on their own policies or based on user-specified policies. This varies by browser, so I can't make any blanket statements. But if you can describe a pathological situation you're envisioning we can try to walk through what might happen in that case.

I'm mainly thinking about the future. What happens if we add new assets that we forgot to add to this list? What happens if we delete assets and they are still on this list? Do we need to enforce some synchronization? If so, how? Also, how can we automate the process of updating the UUID?

The main thing that concerns me is that we're also changing things for users that do not install the PWA. And for them, I don't see too many benefits (their browers would have cached the resources anyway — unless they're offline). But I do see potential problems.

But you know what... I'm also okay with trying this out. I don't want to be overly pessimistic. I really like your contribution and it's definitely cool to provide Numbat as a PWA for users who are into that.

@bsidhom
Copy link
Author

bsidhom commented Feb 29, 2024

I'm mainly thinking about the future. What happens if we add new assets that we forgot to add to this list? What happens if we delete assets and they are still on this list? Do we need to enforce some synchronization? If so, how? Also, how can we automate the process of updating the UUID?

Yeah, the UUID and asset automation are my main concerns as well. Those are both addressable, but I'm not sure how that's best done. Random UUID generation itself is trivial, but we'd need to somehow use templating or a JS build system to get it into the SW itself. It would also be wasteful to generate this on every commit since it really only needs to change if any of the static resources have changed. That basically ties the UUID and asset list problems into the same thing. We'd want some automation to basically scrape the set of built artifacts, hash all the contents, and then decide from there how/whether to update. In fact, since we're computing this deterministically from the set of assets, it probably makes more sense to just make the version identifier a hash value. Then the updater script can be stateless.

If you want to avoid bringing in new build-time dependencies (such as Node/NPM, etc.), which I'd prefer to do as well, then this can probably be done in something like a build.rs file (or just a separate Rust binary that we add). I would have included the skeleton for this stuff in this change, but I wasn't sure how to do it robustly. Specifically, I don't want to change all of the JS (even just in the service worker) into a template which gets filled out by the Rust build, mostly because this is pretty hostile to editors. The asset discovery and hashing is (in principle) straightforward. I guess we could split the file out into separate parts, each of which are valid JS: the base SW logic itself (without the version id or asset list) and the generated content (version hash and asset list). Then the build binary can generate the asset-related contents and form the output SW file by concatenating or something. Or the base SW file can make an explicit fetch against the generated data.

Note that even if we go the route of auto-generation, it still makes sense to give careful thought to which resources need to be dynamically cached (at runtime) vs precached. If we don't care about downloading unnecessary stuff, then we can just precache all discovered files.

Comment on lines +113 to +132
// TODO: I would prefer to use const lambdas rather than `function` in order to
// avoid unexpected `this` resolution. Not sure if there's a reason this is
// currently being used along with `var`. Those "old school" techniques do
// hoist, but this can easily be worked around by reordering declarations.
async function registerSw() {
if ("serviceWorker" in navigator) {
try {
// NOTE: The service worker URL should be stable between releases or this
// could lead to unexpected behavior.
await navigator.serviceWorker.register("./service-worker.js");
} catch (error) {
// NOTE: We don't allow the error to propagate because the app should
// still work without offline capabilities.
console.error("Failed to register Service Worker:", error);
}
} else {
console.warn("Service Workers not available in this browser");
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: I would prefer to use const lambdas rather than `function` in order to
// avoid unexpected `this` resolution. Not sure if there's a reason this is
// currently being used along with `var`. Those "old school" techniques do
// hoist, but this can easily be worked around by reordering declarations.
async function registerSw() {
if ("serviceWorker" in navigator) {
try {
// NOTE: The service worker URL should be stable between releases or this
// could lead to unexpected behavior.
await navigator.serviceWorker.register("./service-worker.js");
} catch (error) {
// NOTE: We don't allow the error to propagate because the app should
// still work without offline capabilities.
console.error("Failed to register Service Worker:", error);
}
} else {
console.warn("Service Workers not available in this browser");
}
}
const registerSw = async () => {
if ("serviceWorker" in navigator) {
try {
// NOTE: The service worker URL should be stable between releases or this
// could lead to unexpected behavior.
await navigator.serviceWorker.register("./service-worker.js");
} catch (error) {
// NOTE: We don't allow the error to propagate because the app should
// still work without offline capabilities.
console.error("Failed to register Service Worker:", error);
}
} else {
console.warn("Service Workers are not available in this browser.");
}
}

@sharkdp sharkdp mentioned this pull request Mar 5, 2024
7 tasks
@sharkdp
Copy link
Owner

sharkdp commented Mar 5, 2024

If you want to avoid bringing in new build-time dependencies (such as Node/NPM, etc.), which I'd prefer to do as well, then this can probably be done in something like a build.rs file (or just a separate Rust binary that we add). I would have included the skeleton for this stuff in this change, but I wasn't sure how to do it robustly. Specifically, I don't want to change all of the JS (even just in the service worker) into a template which gets filled out by the Rust build, mostly because this is pretty hostile to editors

Valid concern, yes.

How about the following: could we use a dummy hash like <replace this> for the development version inside the repository, and then simply search & replace that with the actual hash across all assets when we deploy to "production"? The disadvantage would be that developers would have to properly reset all cache when working on the web version(?), because the hash would never change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants