Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a static decode and encode method to TextEncoder and TextDecoder #267

Open
lucacasonato opened this issue Jun 29, 2021 · 10 comments · May be fixed by #284
Open

Add a static decode and encode method to TextEncoder and TextDecoder #267

lucacasonato opened this issue Jun 29, 2021 · 10 comments · May be fixed by #284
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: api

Comments

@lucacasonato
Copy link
Member

lucacasonato commented Jun 29, 2021

The large majority of users of TextEncoder and TextDecoder do not make use of streaming capabilities. Instead they just need to decode a single chunk, or encode a single chunk. We should consider adding a static utility method to TextEncoder / TextDecoder to cater to this use case.

// current
const data = new TextDecoder().decode(new Uint8Array(/** data here */));

// proposed
const data = TextDecoder.decode(new Uint8Array(/** data here */));

I propose the following API be added:

dictionary TextDecoderOptionsWithLabel : TextDecoderOptions {
  DOMString label = "utf-8";
};

interface TextDecoder {
  static USVString decode(optional [AllowShared] BufferSource input, optional TextDecoderOptionsWithLabel options = {});
}

interface TextEncoder {
  static Uint8Array encode(optional USVString input = "");
  static TextEncoderEncodeIntoResult encodeInto(USVString source, [AllowShared] Uint8Array destination);
}

The exact name of the method should be something different though - the static method and prototype method would name clash in this example which is confusing.

Internally this would behave exactly like a single use non streaming encoder.

@andreubotella andreubotella added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: api labels Jun 29, 2021
@domenic
Copy link
Member

domenic commented Jun 29, 2021

What about encodeInto()?

The exact name of the method should be something different though - the static method and prototype method would name clash in this example which is confusing.

Why would that be confusing?

@lucacasonato
Copy link
Member Author

What about encodeInto()?

Added - forgot it existed.

Why would that be confusing?

Just checked spec - wouldn't just be confusing but also against webidl spec:

"The identifier of a static operation also must not be the same as the identifier of a regular operation defined on the same interface." - https://heycam.github.io/webidl/#idl-operations

As to why it would be confusing: "just call TextDecoder.decode" would have two meanings now. Also what would for example the MDN link be? https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/decode is already taken for the prototype method.

@domenic
Copy link
Member

domenic commented Jun 29, 2021

We can fix Web IDL.

As for MDN's problem, they've had a big issue with bad documentation for prototype methods for a long time. They need to fix it. TextEncoder.decode() has always meant a static method (which hasn't existed); TextEncoder.prototype.decode() or textEncoder.decode() are how you write the instance methods.

@inexorabletash
Copy link
Member

It's unclear what the benefit of the proposed change is. It doesn't save significant typing (just new); in cases where you want to avoid allocating a new instance, the encoder/decoder can be cached and used statelessly.

Given a time machine to redesign the API now that streams are a thing, this makes more sense for non-streaming encodes/decodes, but is it worth adding to the platform?

@hsivonen
Copy link
Member

hsivonen commented Jul 5, 2021

If we add API surface that doesn't work in old browsers, I'd like to know if users of the potential API surface care about non-UTF-8 encodings. If not, I'd prefer adding new API surface to JavaScript String itself instead of TextDecoder, since the WebIDL layer is a perf bottle neck.

  • Constructing a String from Uint8Array assuming UTF-8 semantics.
  • Constructing a String from Uint16Array assuming WTF-16 semantics.
  • Method on String that writes UTF-8 into a Uint8Array with the encodeInto semantics but returns an array of numbers instead of a WebIDL type.
  • Method on String that writes WTF-16 into a Uint16Array.

@annevk
Copy link
Member

annevk commented Jul 19, 2021

cc @bakkot @littledan

@lucacasonato
Copy link
Member Author

lucacasonato commented Mar 4, 2022

whatwg/webidl#1098 should technically unblock this. Are any browser vendors objected to this? If not, I'd like to start work on this.

@lucacasonato
Copy link
Member Author

lucacasonato commented Mar 4, 2022

since the WebIDL layer is a perf bottle neck

@hsivonen How so? In Deno we have an optimized path in our WebIDL bindings for this specific API that bypasses the slow USVString conversion completely. This should be possible in other implementations too.

During encode you can convert the first argument to a DOMString (which is fast), and then just perform a lenient UTF-8 encoding on that string, which will already perform lone surrogate conversion. https://github.com/denoland/deno/blob/main/ext/web/08_text_encoding.js#L166-L175

@lucacasonato lucacasonato linked a pull request Apr 22, 2022 that will close this issue
3 tasks
@whatwg whatwg deleted a comment from sukha1996 May 21, 2022
@hsivonen
Copy link
Member

since the WebIDL layer is a perf bottle neck

@hsivonen How so? In Deno we have an optimized path in our WebIDL bindings for this specific API that bypasses the slow USVString conversion completely. This should be possible in other implementations too.

Gecko doesn't actually use USVString in TextEncoder WebIDL, either. However, the WebIDL layer still has enough overhead that e.g. wasm-bindgen conditionally avoids going through the WebIDL layer.

@lucacasonato
Copy link
Member Author

Gecko doesn't actually use USVString in TextEncoder WebIDL, either. However, the WebIDL layer still has enough overhead that e.g. wasm-bindgen conditionally avoids going through the WebIDL layer.

Mh, that seems very weird to me. I very much doubt that WebIDL inherently needs to be a slowdown here. Currently the overhead likely stems from most engines slowing down when calling from JS to native code. This is a performance issue that can be fixed. In V8 based runtimes, TextEncoder.encodeInto can use V8's "fast API" for better JIT inlined performance for example.

The only WebIDL conversion taking place here is the DOMString conversion, which really is just a call to ToString which is a 0 cost op if the call is already a string. A native API in ES would also need to do some runtime type checking (likely the exact same checks), so I really don't see why performance should be a reason to have a native JS API.

TLDR: I don't think spec decisions should be driven by current performance issues in engines, as I don't think that there are any technical reasons for why TextEncoder#encodeInto must be slower than String#encodeIntoUTF8. It's just a matter of optimizing the codepaths as far as I can tell.

Please correct me if I missed something obvious here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: api
Development

Successfully merging a pull request may close this issue.

6 participants