Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LibWeb: Add support for blob URL in web workers #24220

Merged
merged 8 commits into from May 12, 2024

Conversation

shannonbooth
Copy link
Member

@shannonbooth shannonbooth commented May 5, 2024

Not really sure on approach of this going forward, but it does at least make more stuff work.

Further progress towards: #23632

@shannonbooth shannonbooth marked this pull request as ready for review May 5, 2024 10:20
@github-actions github-actions bot added the 👀 pr-needs-review PR needs review from a maintainer or community member label May 5, 2024
Copy link
Member

@kennethmyhra kennethmyhra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw I think this is worth a merge.

In the chat on discord I was referring to Transferable objects as maybe a mechanism to transfer over IPC. A note though Blobs are not Transferable, only the ReadableStream attached to the Blob is, so you lose the type.

Userland/Libraries/LibURL/URL.h Show resolved Hide resolved
@ADKaster ADKaster requested a review from Lubrsi May 5, 2024 13:41
@ADKaster
Copy link
Member

ADKaster commented May 6, 2024

So I'm a bit worried about the design here. Correct me if I'm wrong about this :D

It looks like the current approach is to store the Blob URL's "data" alongside the URL object. So my question is:

  • Where is the canonical location for an origin's blob store?
  • How do we adjudicate whether a Blob URL is allowed to be vended to a particular realm/worker/other webcontent process?

In previous conversations with @Lubrsi I had thought the consensus on the 'ideal' design (rather than the 'whee it works :marge:' design, which is still a valid design :) ) was something like this:

  • The Blob url acts as a an opaque token to a canonical store of blob URL data. SQLServer, UI Process etc.
  • When creating or retrieving a Blob URL's data, an IPC call is made to the blob store to retrieve the information associated with that token.
  • The data is only available through the get/set APIs, and so to JS it really is an opaque token

Is there an obvious transition from this design to the 'ideal' design? Or do I have this in my head wrong?

Another question is what the usage patterns for these things are. Are most pages creating 10-100 Blob URLs? Or thousands?

@shannonbooth
Copy link
Member Author

shannonbooth commented May 11, 2024

Where is the canonical location for an origin's blob store?

All origins are stored in the same Blob URL store in the spec. See: https://w3c.github.io/FileAPI/#originOfBlobURL.

In our current implementation every process has its own BlobURLStore. That should technically be moved into a singleton process to allow for example one worker to also be able to revoke a blob for another worker. I think that would be my next step (where FileAPI::resolve_a_blob_url as an example becomes an IPC call, and is the mentioned step 1 below).

The Blob url acts as a an opaque token to a canonical store of blob URL data. SQLServer, UI Process etc.
Is there an obvious transition from this design to the 'ideal' design? Or do I have this in my head wrong?

So, I think the big difference on the design here is on the whole topic of making the blob URL a token. The downside of the current design which follows spec is that it eagerly copies blob data from the registry to the URL instead of when the request for that blob through the token is made. So the performance would be worse (we would be inherently slower anyhow with the multi-process design I imagine). In the CloudFlare turnstile case (which is the only place that I've seen this used in the wild so far), it's not a very big deal, since the blobs are small (its only passing across some JS to run), and not many of them (one per worker). I am sure it may become a problem for larger usecases, like blob URLs for an image or something. Not sure what sites use that.

So I think there are two steps to moving towards the 'ideal' design the way I think about it:

  1. Making BlobURLRegistry live in a singleton process (an actual functional improvement)
  2. Making the Blob URL purely a token, and deferring copying the blob from the registry to until the request is made. (performance improvement)

Implement this function to the spec, and use the full blown URL parser
that handles blob URLs, instead of the basic-url-parser.

Also clean up a FIXME that does not seem relevant any more.
Which performs a lookup to find a blob (if any) corresponding to the
given blob URL.
This is no longer needed now that ByteString does not have a null state.
On a non-basic URL parse, we are meant to perform a lookup on the blob
URL registry and attach any blob to that URL if present. Now - we do
that :^)
Performing a lookup in the blob URL registry does not work in the case
of a web worker - as the registry is not shared between processes.
However - the URL itself passed to a worker has the blob attached to it,
which we can pull out of the URL on a fetch.
@ADKaster ADKaster merged commit 5a0f7c5 into SerenityOS:master May 12, 2024
12 checks passed
@github-actions github-actions bot removed the 👀 pr-needs-review PR needs review from a maintainer or community member label May 12, 2024
@shannonbooth shannonbooth deleted the bloburl branch May 12, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants