New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LibWeb: Add support for blob URL in web workers #24220
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw I think this is worth a merge.
In the chat on discord I was referring to Transferable objects as maybe a mechanism to transfer over IPC. A note though Blob
s are not Transferable
, only the ReadableStream
attached to the Blob
is, so you lose the type
.
So I'm a bit worried about the design here. Correct me if I'm wrong about this :D It looks like the current approach is to store the Blob URL's "data" alongside the URL object. So my question is:
In previous conversations with @Lubrsi I had thought the consensus on the 'ideal' design (rather than the 'whee it works :marge:' design, which is still a valid design :) ) was something like this:
Is there an obvious transition from this design to the 'ideal' design? Or do I have this in my head wrong? Another question is what the usage patterns for these things are. Are most pages creating 10-100 Blob URLs? Or thousands? |
All origins are stored in the same Blob URL store in the spec. See: https://w3c.github.io/FileAPI/#originOfBlobURL. In our current implementation every process has its own BlobURLStore. That should technically be moved into a singleton process to allow for example one worker to also be able to revoke a blob for another worker. I think that would be my next step (where
So, I think the big difference on the design here is on the whole topic of making the blob URL a token. The downside of the current design which follows spec is that it eagerly copies blob data from the registry to the URL instead of when the request for that blob through the token is made. So the performance would be worse (we would be inherently slower anyhow with the multi-process design I imagine). In the CloudFlare turnstile case (which is the only place that I've seen this used in the wild so far), it's not a very big deal, since the blobs are small (its only passing across some JS to run), and not many of them (one per worker). I am sure it may become a problem for larger usecases, like blob URLs for an image or something. Not sure what sites use that. So I think there are two steps to moving towards the 'ideal' design the way I think about it:
|
Implement this function to the spec, and use the full blown URL parser that handles blob URLs, instead of the basic-url-parser. Also clean up a FIXME that does not seem relevant any more.
Which performs a lookup to find a blob (if any) corresponding to the given blob URL.
This is no longer needed now that ByteString does not have a null state.
On a non-basic URL parse, we are meant to perform a lookup on the blob URL registry and attach any blob to that URL if present. Now - we do that :^)
Performing a lookup in the blob URL registry does not work in the case of a web worker - as the registry is not shared between processes. However - the URL itself passed to a worker has the blob attached to it, which we can pull out of the URL on a fetch.
Not really sure on approach of this going forward, but it does at least make more stuff work.
Further progress towards: #23632