LibWeb: Add support for blob URL in web workers #24220

shannonbooth · 2024-05-05T09:00:32Z

Not really sure on approach of this going forward, but it does at least make more stuff work.

Further progress towards: #23632

kennethmyhra

fwiw I think this is worth a merge.

In the chat on discord I was referring to Transferable objects as maybe a mechanism to transfer over IPC. A note though Blobs are not Transferable, only the ReadableStream attached to the Blob is, so you lose the type.

Userland/Libraries/LibURL/URL.h

ADKaster · 2024-05-06T18:17:45Z

So I'm a bit worried about the design here. Correct me if I'm wrong about this :D

It looks like the current approach is to store the Blob URL's "data" alongside the URL object. So my question is:

Where is the canonical location for an origin's blob store?
How do we adjudicate whether a Blob URL is allowed to be vended to a particular realm/worker/other webcontent process?

In previous conversations with @Lubrsi I had thought the consensus on the 'ideal' design (rather than the 'whee it works :marge:' design, which is still a valid design :) ) was something like this:

The Blob url acts as a an opaque token to a canonical store of blob URL data. SQLServer, UI Process etc.
When creating or retrieving a Blob URL's data, an IPC call is made to the blob store to retrieve the information associated with that token.
The data is only available through the get/set APIs, and so to JS it really is an opaque token

Is there an obvious transition from this design to the 'ideal' design? Or do I have this in my head wrong?

Another question is what the usage patterns for these things are. Are most pages creating 10-100 Blob URLs? Or thousands?

shannonbooth · 2024-05-11T10:26:07Z

Where is the canonical location for an origin's blob store?

All origins are stored in the same Blob URL store in the spec. See: https://w3c.github.io/FileAPI/#originOfBlobURL.

In our current implementation every process has its own BlobURLStore. That should technically be moved into a singleton process to allow for example one worker to also be able to revoke a blob for another worker. I think that would be my next step (where FileAPI::resolve_a_blob_url as an example becomes an IPC call, and is the mentioned step 1 below).

The Blob url acts as a an opaque token to a canonical store of blob URL data. SQLServer, UI Process etc.
Is there an obvious transition from this design to the 'ideal' design? Or do I have this in my head wrong?

So, I think the big difference on the design here is on the whole topic of making the blob URL a token. The downside of the current design which follows spec is that it eagerly copies blob data from the registry to the URL instead of when the request for that blob through the token is made. So the performance would be worse (we would be inherently slower anyhow with the multi-process design I imagine). In the CloudFlare turnstile case (which is the only place that I've seen this used in the wild so far), it's not a very big deal, since the blobs are small (its only passing across some JS to run), and not many of them (one per worker). I am sure it may become a problem for larger usecases, like blob URLs for an image or something. Not sure what sites use that.

So I think there are two steps to moving towards the 'ideal' design the way I think about it:

Making BlobURLRegistry live in a singleton process (an actual functional improvement)
Making the Blob URL purely a token, and deferring copying the blob from the registry to until the request is made. (performance improvement)

Implement this function to the spec, and use the full blown URL parser that handles blob URLs, instead of the basic-url-parser. Also clean up a FIXME that does not seem relevant any more.

Which performs a lookup to find a blob (if any) corresponding to the given blob URL.

This is no longer needed now that ByteString does not have a null state.

On a non-basic URL parse, we are meant to perform a lookup on the blob URL registry and attach any blob to that URL if present. Now - we do that :^)

Performing a lookup in the blob URL registry does not work in the case of a web worker - as the registry is not shared between processes. However - the URL itself passed to a worker has the blob attached to it, which we can pull out of the URL on a fetch.

shannonbooth marked this pull request as ready for review May 5, 2024 10:20

github-actions bot added the 👀 pr-needs-review PR needs review from a maintainer or community member label May 5, 2024

kennethmyhra reviewed May 5, 2024

View reviewed changes

Userland/Libraries/LibURL/URL.h Show resolved Hide resolved

ADKaster requested a review from Lubrsi May 5, 2024 13:41

shannonbooth added 8 commits May 12, 2024 11:46

LibWeb: Handle blob URLs in Document::parse_url

29155d0

Implement this function to the spec, and use the full blown URL parser that handles blob URLs, instead of the basic-url-parser. Also clean up a FIXME that does not seem relevant any more.

LibWeb: Implement "Resolve a Blob URL"

975e318

Which performs a lookup to find a blob (if any) corresponding to the given blob URL.

LibURL: Add BlobURLEntry to URL

08cf71b

LibIPC: Also encode URL::blob_url_entry over IPC

3364e3b

LibIPC: Remove uneeded NumericLimit<u32>::max check for ByteString

c60b33f

This is no longer needed now that ByteString does not have a null state.

LibWeb: Attach blob to URL on DOMURL::parse

5a0f710

On a non-basic URL parse, we are meant to perform a lookup on the blob URL registry and attach any blob to that URL if present. Now - we do that :^)

LibWeb/Tests: Add a basic test for a blob URL given to a Worker

87f4c27

shannonbooth force-pushed the bloburl branch from e5433bc to 87f4c27 Compare May 12, 2024 00:26

shannonbooth requested a review from ADKaster May 12, 2024 00:27

ADKaster merged commit 5a0f7c5 into SerenityOS:master May 12, 2024
12 checks passed

github-actions bot removed the 👀 pr-needs-review PR needs review from a maintainer or community member label May 12, 2024

shannonbooth deleted the bloburl branch May 12, 2024 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LibWeb: Add support for blob URL in web workers #24220

LibWeb: Add support for blob URL in web workers #24220

shannonbooth commented May 5, 2024 •

edited

kennethmyhra left a comment

ADKaster commented May 6, 2024

shannonbooth commented May 11, 2024 •

edited

LibWeb: Add support for blob URL in web workers #24220

LibWeb: Add support for blob URL in web workers #24220

Conversation

shannonbooth commented May 5, 2024 • edited

kennethmyhra left a comment

Choose a reason for hiding this comment

ADKaster commented May 6, 2024

shannonbooth commented May 11, 2024 • edited

shannonbooth commented May 5, 2024 •

edited

shannonbooth commented May 11, 2024 •

edited