Skip to content

Latest commit

 

History

History
266 lines (158 loc) · 14.7 KB

CG-11-21.md

File metadata and controls

266 lines (158 loc) · 14.7 KB

WebAssembly logo

Agenda for the November 21st video call of WebAssembly's Community Group

  • Where: zoom.us
  • When: November 21st, 5pm-6pm UTC (November 21st, 9am-10am Pacific Time)
  • Location: link on calendar invite

Registration

None required if you've attended before. Send an email to the acting WebAssembly CG chair to sign up if it's your first time. The meeting is open to CG members only.

Logistics

The meeting will be on a zoom.us video conference. Installation is required, see the calendar invite.

Agenda items

  1. Opening, welcome and roll call
    1. Opening of the meeting
    2. Introduction of attendees
  2. Find volunteers for note taking (acting chair to volunteer)
  3. Proposals and discussions
    1. Shared-everything threads [Thomas Lively, 40 mins]
    2. Rounding mode phase 2 discussion & vote [whirlicote & Kloud Koder, 20 mins]
  4. Closure

Agenda items for future meetings

None

Schedule constraints

None

Meeting Notes

Attendees

  • Deepti Gandluri
  • Derek Schuff
  • Thomas Lively
  • Pauldennis
  • Kloud Koder
  • Abrown
  • Michael Ficarra
  • Conrad Watt
  • Yuri Delendik
  • Mingqiu Sun
  • Jeff Charles
  • Alon Zakai
  • Alex Chrichton
  • Slava Kuzmich
  • Ilya Rezvov
  • Emily Ruppel
  • Paolo Severini
  • Zalim Bashorov
  • Till Schneidereit
  • Nuno Pereira
  • Ryan Hunt
  • Yuri Iozzelli
  • Emanuel Ziegler
  • Heejin Ahn
  • Chris Woods
  • Daniel Hillerström
  • Jakob Kummerow
  • Adam Klein
  • Matthew Yacobucci
  • Bailey Hayes
  • Andreas Rossberg
  • Keith Winstein
  • Julien Pages
  • Ashley Nelson
  • Luke Wagner
  • Armando Faz Hernandez
  • Saul Cabrera
  • Brendan Dahl
  • Francis McCabe
  • Jakob Kummerow
  • Sergey Rubanov

Proposals and Discussions

Shared-everything threads [Thomas Lively, 40 mins] (slides)

TL: We’ve been exploring what a full threads proposal would look like, and talking a lot with Andrew around what he’s been proposing. So we have what we call an omnibus threads proposal that has what several different groups have been asking for. Some mostly for web, some mostly for off-web and some intersection.

Shared attributes: Imagine 2 separate JS contexts, and there’s some wasm that wants to be multithreaded. We can’t have a JS object referring to objects on another thread, even via wasm. So we need to statically disallow all references from shared to non-shared.

CW: clarification: you’ve focused a lot on wasmGC, but the issue of shared -> nonshared also applies to MVP constructs and instances.

TL: You could replace the GC structs with globals referring to references held in other globals, or tables with references to globals, anything where one thing refers to another can exhibit this problem.

PP: the function is shared, but the position in the function is different across threads, but there will be some extra state, you’ll still need thread-local state?

TL: I don’t have a diagram for the implementation, the idea is..

CW: broadly, it should be the same way that TLS variable work in native implementations but the runtime will be managing it under the hood.

TL: More detail there, in the instance per thread model, we had 2 different instantiations of this function - under the cover they would still share code in the engine, the engine will pin a register pointing to the current insatncae, when we’re running the different functions, the pinned registers will know which registers to refer to, so the global will already be referenced from the pinned register, so there is already a redirection

AR: 2 questions: what kind of attribute is thread-local, where does it live?

CW: Exclusive with shared is the idea. Only for Global variables right now.

AR: In the same space as shared, the other thing is, you didn’t explain how the different copies come to be? Is it over postMessage?

TL: About to explain how this interacts with JS, we’ll mostly cover that question in a second here. Please ask any follow-ups.

CW: this seems to be combining 2 sets of mechanisms from our early version of the doc. Is this an evolution of what we talked about? Or are you just skipping details?

TL: Not going to all the details, its not intended to be an evolution of the edsign

CW: You can’t put a non-shared function into the shared global

TL: yeah this is skipped detail.

CW: I wouldn’t expect globals would need the bind mechanism, you can just get/set like you need for function?

TL: yes, correct

AR: Can you explain what the difference is?

TL: under the assumption that we'll have shared continuations in the future, we can’t let TLS globals contain non-shared references. Because when you global.get from a shared function, then you have a non-shared value in the activated frame, and shared continuations don’t work. So we constrain the TLS globals to only contain shared references. But we want console.log and it’s not a shared reference, its JS.

On the JSAPI level, we still need these thread local functions where you create wasm functions, and you mark it local, that create this shared local, So you can’t put raw console.log in there, you need to wrap it up in some shared thing

CW: In principle you could import the wrapped function without importing it, you can rebind it without going through the thread local global

TL: yeah and in that case you could import it as a shared function.

CW: Are there separate slides explaining what the thread local functions are?

TL: There are not

AR: my intuition was that the whole point of thread local is that you could work around this restriction that share can only point to shared? Because now you have thread local and thread local can point to… what, is what we’re trying to answer. So i can see that we coil push TLS through the type system like shared. But don’t you at some point need to point to something that is truly unshared?

CW: The thread local global is not the mechanism for this, the slide doesn’t introduce the mechanism for this, which is the thread local function

AR: Guess I’m missing some detail here then.

TL: the thread local globals are not intended to allow you to point from shared to unshared, somewhat counterintuitive. But they are really just for getting the base of thread-local data, linear stack, etc. and you can use them to call out to JS. [Slide: open questions]

FM: how do you know who is responsible for identifying when you postMessage, who identifies which variables are supposed to be reset?

TL: That’ll be up to the language toolchain, if emscripten was using tables, it would post message the tables, and would know how to set up those tables

CW: I would also add from a runtime POV, whether something is thread local is statically annotated, so the engine can see which are really shared things.

TL: Anything else about TLS?

CW: even though thread local functions aren't explained here: if a runtime can handle thread local globals as shown here, I think functions will work fine too

PP: Not a TLS question, implementation strategy where you take an instance and pin a pointer to the instance in a register, is that one way to do it, or is that something that’s required?

TL: That’s one way, you don’t necessarily need to pin the register: but somehow it has to go back to something that’s thread local in the hardware. So a pinned register or a native pthread implementation, or something. So as long as the underlying system supports TLS somehow it can be made to work. Pinning a register is just one way.

[slide: waiter queues for wasmGC]

PP: It’s quite complex, you have to introduce a bunch of new stuff, it stems from the fact that JS is single threaded, the host language on the web, and what Wasm does is somewhat at odds, do you see challenges on introducing this to JS?

TL: no, actually. There’s a proposal on the JS side with Shu (who came up with the watier queue idea) called shared structs. So the technology for shared GC is already in V8 because of the JS work.

CW: If we’re not pursuing thread.spawn, that’s the most controversial instruction from the web platform end, the rest of the spec should be fairly non controversial

KW: it really would be nice to retain the simplicity of the core spec. To accommodate the web adds a lot of complexity, it’s hard for new entrants. Is there a way to layer this on the top to keep the core spec approachable?

AB: Profiles

TL: Interesting question. Profiles let you delete it out of the core spec easily but they are still in there.

CW: A future idea is to have a syntactic version of the spec, with different profiles enabled. For shared attributes specifically, that lives in the core spec, the way it interacts with the validation algorithm, it needs to be in the core spec, we could do something simpler or split it out, but it’s unlikely to be too simple

TL: any objections for renaming the repo to shared-everything-threads, and add Thomas as a co-champion (along with Conrad and Andrew)

Rounding mode phase 2 discussion & vote [whirlicote & Kloud Koder, 20 mins]

KK: started prototyping floating point rounding mode instructions last year. Paul Dennis prototyped all the instructions and wanted help testing. So he had C++ code with a mix of wasm2c/wabt and implemented the proposed instructions and I prototyped in x87. So we went through a list of corner cases and instruction inputs and combined them, to test the implementations against each other. In all cases that didn’t involve a NaN matched between the hardware and the C++. we may have to adhere to the NaN canonicalization proposal. I’ll send a link to the doc with the opcode map, it has 4 pages, each corresponding to one of the rounding modes.

Opcode map: https://github.com/WebAssembly/rounding-mode-control/blob/main/proposals/rounding-mode-control/Overview.md

WebAssembly/rounding-mode-control#2

There’s a symmetry in the geometry of all the pages, its better to be symmetric than to be opcode efficient - in the initial implementation, when you add two numbers together and add a rounding mode, we’ll statically add the mode. In the future, there is a lot of performance to be had for integer arithmetic. In theory you could get all of that performance, but that would be a future proposal. <TODO: Kloud Koder to validate, fill in>

We’re proposing phase 2 because we have the overview doc and test cases, and Paul has a repository with the code.

CW: All of the rounding modes are static in the opcode? Each instruction will know whether it’s rounding up or down

KK: yes. Also we originally proposed a separate opcode byte, but ended up walking that back.

CW: Have you thought about how performant this will be on architectures where the rounding mode has to be set different?

KK: you mean if you have e.g. a global rounding mode register? Yes, so thats easy to implement but inefficient. We might want a follow on proposal e.g. so the toolchain could assume different behavior e.g. chopping.

PP: Is there a way that this would interact with simd now or later?

KK: initially I had proposed some SIMD equivalents, but it was pulled out. It’s not quite as straightforward. But at some point it gets weird if you have different rounding modes for those. So we might want a follow-on.

Initially I was thinking of making integer arithmetic primitives, so you could do the same thing that SIMD was doing, but that was too high level, and not efficiently lowerable. For the record, there ended up being 21000 test cases, the 16000 that did not involve NaNs verified correctly

CW: have any engines signalled that they’re interested in prototyping this?

KK: It’s in the different discussion threads, its references in the discussion (2) or three different projects that want to implement it.

PD: haven’t done work for talking to the runtime engine, but implemented the instructions in the reference interpreter, and also a wasm module that implements the instructions.

CW: that’s probably good for phase 2, but to go further we’ll probably need other engines/runtime as well.

AR: I skimmed through the list and most of them have obvious meaning, but the overview doesn't explain the binary/ternary sign operators.

PD: Floating point numbers, ven NaNs have a sign bit. Usually it’s the first one. If you have the binary sign extractor you extract the first bit and it’s either 0 or 1. Also for numericdal analysis for algorithm you want to check if the sign is 0 or -0 or +0 or negative number or positive number. These are two different things when you want to [??].

AR: so it’s basically the sign function and abs?

PD: Two different sign functions. In C++, one is called sign bit and the other one does not have a C++ std library. Usually there is a formula for a comparison and subtraction. Then you get the 0 or -1.

AR: but thats the signum function essentially

PD: Yes it is. In one proposal iteration, I wrote sigNum or something. I think binary or ternary sign are ??? as well.

DG: wondering if there was going to be a little more detail in the performance section about what the expected performance characteristics would be on different hardware and workloads, and what we should expect about that.

KK: Assume you mean perf delta between having it and not having it.

DG: yes, and we talked about hardware that uses a global flag. The part agbout avoiding different modes seems a little ambiguous, I’d like to see more detail there. Also, for V8: I don’t think we’re going to be actively looking at this but happy to partner with anyone interested in working on an implementation.

KK: would you recommend approaching the V8 community for phase 2?

DG: for now I’d recommend fleshing out the performance section and we can talk more about approaching V8 later.

CW: I would say that implementer interest isn’t strictly a requirement for phase 2, but since there were some questions, i wonder if we should move the poll

PD: Just the instruction is quite easy, like 10 lines of code and you have the instruction. You have to copy the already existing abstractions and put 10 different lines in it. The complexity is very low. More of a problem organizing a new version and version editor and everything I assume.

DS: If we’re punting the vote, do we have clear guidelines on what is next, just the performance section?

DG: Doesn’t seem like a blocker for phase 2, just wanted to know what engines should expect when trying to implement it.

Poll: SF: 1 F: 4 N: 15 A: 0 SA: 0

DS: This doesn’t look like consensus in favor of advancing the proposal.

CW: It might also help if we get some library users who are interested in using this and get some public feedback from them

DS: The chairs will follow up with the champions offline and we’ll try to get some concrete next steps to try to help.