Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Feature detection #356

Closed
tlively opened this issue Sep 22, 2020 · 42 comments
Closed

Feature detection #356

tlively opened this issue Sep 22, 2020 · 42 comments

Comments

@tlively
Copy link
Member

tlively commented Sep 22, 2020

There is currently no runtime feature detection mechanism, which means that libraries/applications will require separate builds for every supported feature set. We previously meant to solve this problem with a general mechanism in the conditional sections proposal, but that proposal is blocked on the [module linking] proposal, so it is not going to be shipped any time soon. This is creating a lot of pressure to fit every potentially-useful instruction into this MVP proposal, which in turn is preventing us from making progress towards shipping the MVP. This issue proposes a new SIMD-specific feature detection mechanism that would relieve some of that pressure and make it much easier for our users to take advantage of follow-on SIMD proposals. Implementing feature detection would be quite a bit of work, so it wouldn't necessarily let us ship any sooner, but it would add significant user value to the SIMD proposal.

Feature detection: simd.has_features

The first new item is a new instruction simd.has_features (name subject to bikeshedding) that takes an immediate bitmask identifying a SIMD feature set and returns a 1 if the current engine supports that feature set and a 0 otherwise. The immediate bitmask will be a uleb128 to allow it to scale to an arbitrary number of features. This MVP proposal would correspond to bit 0 in the bitmask and future follow-on proposals will correspond with successively higher bits.

Forward compatibility: feature_block

While simd.has_features allows supported features to be detected at runtime, we still need a way for new instructions to pass validation on older engines on which they aren't supported. To do this, we introduce a new block-like construct, feature_block (name subject to bikeshedding). Its binary format syntax is

feature_block blocktype feature_bitvec byte_len instr* end

feature_bitvec is a uleb128 encoding the same kind of feature bitmask used in simd.has_features and byte_len is the byte length of instr*. During decoding, if the engine supports all the features in feature_bitvec, the feature_block is decoded as a normal block, i.e. block blocktype instr* end. Otherwise, the feature_block is decoded as an unreachable, using byte_len to skip its contents entirely without decoding them.

Because it is a decoding-time feature, there is no need for feature_block to appear in WebAssembly's syntax, semantics, or validation algorithm. Similarly, simd.has_features can be specified as decoding to i32.const 0 or i32.const 1 and does not need to appear in the spec outside of the binary format.

Usage

When an object is compiled from source, any instruction available in the baseline feature set enabled for that compilation would be used normally. Any additional features could be conditionally enabled via function multiversioning, which would compile down to a dispatch function using simd.has_features to determine which lowered version of the multiversioned function to call. The various lowered versions of the function would have their bodies wrapped in a feature_block specifying the features they use.

Instruction alias-based design [Obsolete]

The second item is the scheme for allowing unsupported future instructions to pass validation. This scheme has to be sufficiently general to allow any future SIMD instruction to pass parsing and validation no matter what opcode, type, or immediates they may have. At the same time, we don't want producers to have to pay in code size or complexity for this engine forward-compatibility scheme if they don't need to emit backward-compatible binaries.

The original design for this is posted below in the "Previous design" section. The new design here reflects @lars-t-hansen's suggestions as of this comment.

We propose introducing new simd.compat instructions in the single-byte SIMD opcode range. Each simd.compat instruction will take as its first immediate the opcode of another SIMD instruction and will adopt the semantics of that instruction if the engine knows about it. If the engine does not know of an instruction corresponding to the provided opcode, the simd.compat will take on the semantics of unreachable instead. This allows future SIMD instructions to pass validation on old engines as long as they are "wrapped" in a simd.compat instruction.

To be able to parse arbitrary unknown SIMD instructions, the engine has to know what kind of immediates the unknown instructions expect. To this end there will a separate simd.compat instruction for each possible shape of immediates that SIMD instructions can take, as well as an additional variant that takes an uninterpreted vector of bytes for use with future SIMD instructions with uncommon immediate patterns or immediate patterns we haven't anticipated.

Instruction Immediates
simd.compat <target opcode>
simd.compat.immbyte <target opcode> <ImmByte>
simd.compat.memarg <target opcode> <memarg>
simd.compat.bytes <target opcode> <vec of ImmByte>

Text format

The wrapped version of each SIMD instruction should re-use the name of the wrapped instruction in some manner for readability. I propose that the wrapped version of each instructions have the same name as the normal version of the instruction, prefixed with compat. (subject to bikeshedding). This may be controversial because it introduces many different text format names each of the simd.compat instructions.

Original prefix-based design [Obsolete]

SIMD opcodes are all currently prefixed with the "SIMD" prefix byte 0x7d. We propose introducing an alias for every defined SIMD instruction with the same opcode but prefixed with a new "Forward-compatible SIMD" prefix byte 0x7a (exact prefix subject to change to avoid collisions). Every instruction in the 0x7a prefix space takes an initial byte immediate that describes the additional immediate arguments the instruction takes.

Immediate descriptor meaning
0x00 no additional immediates
0x01 ImmByte immediate
0x02 memarg immediate
0x03 vector of uninterpreted ImmByte immediates

This initial immediate descriptor immediate makes every instruction in the 0x7a prefix space self-describing, so every instruction in that space can be parsed even if the engine does not know about it. To pass validation, every opcode in the 0x7a prefix space that does not correspond to a concrete SIMD instruction (that the engine knows about) has the same type and semantics as the unreachable instruction.

When we introduce a follow-on proposal, all the new instructions in the 0x7d prefix space will be mirrored into the 0x7a prefix space with prescribed immediate descriptor immediates, replacing opcodes that previously were specified to take any immediate desciptors and have unreachable semantics. Technically, this will not be a backwards compatible change, but that's working as intended because this is a forwards-compatibility feature. This back-compat hazard should be explicitly called out in the spec.

Thanks to @binji for the helpful discussions about how best to design this.

@ngzhian
Copy link
Member

ngzhian commented Sep 22, 2020

Nice write-up, thanks Thomas!

Nit: SIMD opcodes are prefixed with 0xfd, the v128 type is 0x7b in binary format.

Also, is your intention to have feature detection be part of the existing SIMD proposal? I.e. the current proposal will only ship with feature detection implemented. Or can SIMD proposal progress as it is with the caveat that we prioritize feature-detection for SIMD before we look at other stuff (like 128-bit SIMD v2, or fast/long SIMD)?

0x03 vector of uninterpreted ImmByte immediates

This will require some sort of length immediate, otherwise we don't know how many bytes follows.

I think this scheme is pretty neat. If we can require engines to know how to parse these instructions, not necessarily implement them, then we can maintain forwards-compatibility without having to introduce this scheme. The advantage is you save a single byte for each new opcode. And it is not hard for engines to do because the "patterns" of instructions won't change much. However it does require them to know how to "parse" each instruction.

When we introduce a follow-on proposal, all the new instructions in the 0x7b prefix space will be mirrored into the 0x7a prefix space with prescribed immediate descriptor immediates, replacing opcodes that previously were specified to take any immediate desciptors and have unreachable semantics.

Can you elaborate a bit more? You're saying, that new SIMD instructions will be introduced with the same prefix, e.g. 0x7b 0xef, then we make a copy, 0x7a 0xef, which replaces opcodes that were previously defined? Why is there a 0x7a 0xef that was previously defined, where did it come from?

@tlively
Copy link
Member Author

tlively commented Sep 22, 2020

Nice write-up, thanks Thomas!

🎉

Nit: SIMD opcodes are prefixed with 0xfd, the v128 type is 0x7b in binary format.

Oops, thanks for catching that. Fixed inline above.

Also, is your intention to have feature detection be part of the existing SIMD proposal? I.e. the current proposal will only ship with feature detection implemented. Or can SIMD proposal progress as it is with the caveat that we prioritize feature-detection for SIMD before we look at other stuff (like 128-bit SIMD v2, or fast/long SIMD)?

I was thinking that we should consider shipping it as part of MVP, but thinking about it more, I'm not sure doing that would improve the situation much. Whether or not we ship feature detection as a part of MVP, there will be a lengthy period of time where users have to do separate SIMD and MVP builds before all engines support feature detection. Whether we do feature detection as part of MVP or as the first follow-up, all other follow-up SIMD proposals will be able to be backwards compatible with any engine that has implemented feature detection. The only real difference would be in the toolchain, where shipping them separately would mean having to pass an extra feature flag (e.g. -msimd-feature-detect) to be able to use the function multiversioning functionality, rather than having that automatically available with -msimd128. But that's not such a big difference, so maybe it makes sense to leave this to be the first follow-up proposal after all.

0x03 vector of uninterpreted ImmByte immediates

This will require some sort of length immediate, otherwise we don't know how many bytes follows.

Yes, I meant the definition of vector given in the standard, which prefixes the list of elements with their count.

I think this scheme is pretty neat. If we can require engines to know how to parse these instructions, not necessarily implement them, then we can maintain forwards-compatibility without having to introduce this scheme. The advantage is you save a single byte for each new opcode. And it is not hard for engines to do because the "patterns" of instructions won't change much. However it does require them to know how to "parse" each instruction.

Right, most of the complexity of this scheme comes from the requirement that the engine be able to figure out how to parse instructions that it doesn't know anything about, including their immediates. The only way we thought of to accomplish that was with the "immediate descriptor immediate" that makes the structure of each instruction self-describing. But we don't want to pay the cost of that extra one-byte immediate on every SIMD instruction when we're not trying to do the function multiversioning thing, which is why we kept the normal instructions in the 0x7d prefix space unchanged and only added the immediate descriptor immediate to copies of the instructions in a new prefix space. We could avoid the complexity of duplicating every instruction, but we would have to take a one-byte size hit for every SIMD instruction in every program. We should probably measure to see how bad that would be.

When we introduce a follow-on proposal, all the new instructions in the 0x7b prefix space will be mirrored into the 0x7a prefix space with prescribed immediate descriptor immediates, replacing opcodes that previously were specified to take any immediate desciptors and have unreachable semantics.

Can you elaborate a bit more? You're saying, that new SIMD instructions will be introduced with the same prefix, e.g. [0x7d] 0xef, then we make a copy, 0x7a 0xef, which replaces opcodes that were previously defined? Why is there a 0x7a 0xef that was previously defined, where did it come from?

Under this scheme, every single opcode in the 0x7a prefix is by default akin to an unreachable instruction. In other words, this scheme defines an infinite class of new unreachables. Once a SIMD follow-up proposal makes use of an opcode, the 0x7a-prefixed version of it is redefined from being an unreachable to being some particular SIMD instruction. The 0x7d-prefixed version is defined as normal to be the same instruction and does not replace anything.

@lemaitre
Copy link

I was thinking that we should consider shipping it as part of MVP, but thinking about it more, I'm not sure doing that would improve the situation much. Whether or not we ship feature detection as a part of MVP, there will be a lengthy period of time where users have to do separate SIMD and MVP builds before all engines support feature detection. Whether we do feature detection as part of MVP or as the first follow-up, all other follow-up SIMD proposals will be able to be backwards compatible with any engine that has implemented feature detection.

I would say that the mechanism to detect features should be here from the very start, even if it is not used for any MVP features.
Otherwise, people will have to detect (how?) if feature detection is available before using it.
For me, the sooner feature detection is available (even if no feature uses it), the better.

@lars-t-hansen
Copy link
Contributor

I think it's a good idea to look for a SIMD-specific solution here, since the conditional compilation proposal's main use case is SIMD, while the proposal itself is controversial.

It seems to me that instead of the prefixed instruction space (we'll run out of prefixes soon...) you only need a single instruction, 0xfd k nn where k is some representation of the maximum simd version required by the subsequent instructions, and nn is the number of bytes occupied by those instructions, with the semantics of unreachable if the version is not supported. Why is it desirable to prefix each instruction? Is that so that we guarantee that only SIMD instructions are covered by this mechanism? Or is there another reason?

(Would still need simd.has_feature of course.)

@tlively
Copy link
Member Author

tlively commented Sep 22, 2020

It seems to me that... you only need a single instruction, 0xfd k nn where k is some representation of the maximum simd version required by the subsequent instructions, and nn is the number of bytes occupied by those instructions, with the semantics of unreachable if the version is not supported.

Oh nice, we didn't quite think of that solution, although we considered a similar idea of having some sort of general try_validate block around the possibly-uninterpreted instructions. We decided not to pursue that idea because we wanted a more obviously SIMD-specific solution, which your suggestion clearly is.

The only downside of your proposed solution is that the size in bytes of an instruction sequence is not robust to changes in uleb128 encoding. This would be a real complication in practice because tools like wasm-ld intentionally use overly large uleb128s where relocations are patched and downstream tools like wasm-opt. The alternative to specifying the number of bytes to skip is to make unknown instructions parsable by making them self-describing via a mechanism like the the proposed "immediate descriptor immediate."

The solution in the opening post could be changed to use just a single 0xfd-prefixed instruction that can "wrap" and take on the semantics of any other SIMD instruction by taking its opcode as an immediate. This would look something like

simd.compat <immediate descriptor immediate> <target opcode> <target immediates...>

That would save the additional prefix, but would increase code size by another byte per instruction. Overall, I don't think spending another prefix on this would be so bad. We always have the 0xff prefix reserved as an escape hatch to prevent us from running out of prefixes.

@binji
Copy link
Member

binji commented Sep 22, 2020

The only downside of your proposed solution is that the size in bytes of an instruction sequence is not robust to changes in uleb128 encoding.

I'm wondering if it is more work for an engine too -- there would have to be a check to make sure the instructions in that block match the SIMD version requested. e.g. imagine your engine supports SIMD v1, v2, v3, and the block condition is only for v2, you should trap on v3 instructions. This means a streaming compiler would either have to scan the instructions twice (once to see if they're valid, next to generate code) or scan forward assuming it's OK and backpatch a trap if it fails, right? Maybe that's not too bad.

Also, would we allow these to be nested? I don't see much of a use, so maybe best to disallow it. Though that would create another mode when parsing the instructions.

@lars-t-hansen
Copy link
Contributor

@tlively

The only downside of your proposed solution is that the size in bytes of an instruction sequence is not robust to changes in uleb128 encoding. This would be a real complication in practice because tools like wasm-ld intentionally use overly large uleb128s where relocations are patched and downstream tools like wasm-opt. The alternative to specifying the number of bytes to skip is to make unknown instructions parsable by making them self-describing via a mechanism like the the proposed "immediate descriptor immediate."

Acknowledged.

The solution in the opening post could be changed to use just a single 0xfd-prefixed instruction that can "wrap" and take on the semantics of any other SIMD instruction by taking its opcode as an immediate. This would look something like

simd.compat <immediate descriptor immediate> <target opcode> <target immediates...>

That would save the additional prefix, but would increase code size by another byte per instruction.

In that case you could have multiple simd.compat instructions, one for each common size, and then you'd get that byte back. We'd want these compat opcodes to be in the 1-byte range of the current 0xfd encoding, clearly. We could then additionally have a general escape that is a little more expensive, if we don't want to cover all the sizes with single-byte prefixes.

Overall, I don't think spending another prefix on this would be so bad. We always have the 0xff prefix reserved as an escape hatch to prevent us from running out of prefixes.

I disagree with that, because the single-byte opcodes are imo quite valuable and should be used sparingly, but it's probably mostly a matter of taste.

@tlively
Copy link
Member Author

tlively commented Sep 23, 2020

In that case you could have multiple simd.compat instructions, one for each common size, and then you'd get that byte back.

I like this solution a lot. Will update the opening post to use this idea instead.

@ngzhian
Copy link
Member

ngzhian commented Sep 24, 2020

Is this pseudocode accurate in terms of using the suggested instructions?

if simd.has_features FMA_BITMASK:
  simd.compat f32x4.fma
else:
  f32x4.add
  f32x4.mul

What happens when there is a bug like so:

if simd.has_features ANOTHER_BITMASK:
  simd.compat f32x4.fma
else:
  f32x4.add
  f32x4.mul

The bitmask and the instruction used doesn't match. Should it be a validation error? I don't think there is precedence for such "conditional validation", and things will get more complicated with nesting of these simd.has_features. To be clear, both branches need to be validated, but the validation of simd.compat f32x4.fma needs to check the control stack in some way?

Also, this is a quite interesting because the simd.has_features check is a wasm-compile time constant for the engine, and we possibly don't need to generate code for one of the branches.

Each simd.compat instruction will take as its first immediate the opcode of another SIMD instruction

It will be helpful to list out some examples. From what I can tell, the "opcode" of all future SIMD opcodes will continue to be a leb-encoded u32, so suppose simd.compat is 0xfd 0x00, and we have i8x16.swizzle, currently 0xfd 0x0e, will the compat version then be 0xfd 0x00 0x0e?

I'm starting to see this feature detection proposal as a sort of clever partition of opcode space to maintain compatibility, another suggestion:

  1. 0xfd 0:u32 ... for memarg follows, e.g. v128.load = 0xfd 0:u32 0:u32 m:memarg
  2. 0xfd 1:u32 ... for 1 byte, e.g. i32x4.extract_lane = 0xfd 1:u32 0:32 l:laneidx
  3. 0xfd 2:u32 ... for vector of bytes i8x16.shuffle = 0xfd 2:u32 0:32 16:u32 ...
  4. 0xfd 3:u32 ... reserve for new patterns if they emerge?
  5. ... more reservations? We can reserve up to 0xfd 34:u32, then all larger opcodes remain unchanged from existing proposal (no need for renaming/recoding!)
  6. 0xfd ... no immediate bytes, i32x4.splat = 0xfd 17:u32.

This has a small advantage of not changing a lot of the existing opcodes.

@tlively
Copy link
Member Author

tlively commented Sep 24, 2020

Is this pseudocode accurate in terms of using the suggested instructions?

Yes!

What happens when there is a bug like so... Should it be a validation error? I don't think there is precedence for such "conditional validation", and things will get more complicated with nesting of these simd.has_features. To be clear, both branches need to be validated, but the validation of simd.compat f32x4.fma needs to check the control stack in some way?

No, this would still validate. If the simd.compat f32x4.fma is executed (because ANOTHER_BITMASK is a supported feature) and the engine does not support fma, then it will behave like an unreachable.

Also, this is a quite interesting because the simd.has_features check is a wasm-compile time constant for the engine, and we possibly don't need to generate code for one of the branches.

Yes, and I hope engines take advantage of that.

Each simd.compat instruction will take as its first immediate the opcode of another SIMD instruction

It will be helpful to list out some examples. From what I can tell, the "opcode" of all future SIMD opcodes will continue to be a leb-encoded u32, so suppose simd.compat is 0xfd 0x00, and we have i8x16.swizzle, currently 0xfd 0x0e, will the compat version then be 0xfd 0x00 0x0e?

Yes, that's the intention. I will add some examples.

I'm starting to see this feature detection proposal as a sort of clever partition of opcode space to maintain compatibility, another suggestion:

  1. 0xfd 0:u32 ... for memarg follows, e.g. v128.load = 0xfd 0:u32 0:u32 m:memarg
  2. 0xfd 1:u32 ... for 1 byte, e.g. i32x4.extract_lane = 0xfd 1:u32 0:32 l:laneidx
  3. 0xfd 2:u32 ... for vector of bytes i8x16.shuffle = 0xfd 2:u32 0:32 16:u32 ...
  4. 0xfd 3:u32 ... reserve for new patterns if they emerge?
  5. ... more reservations? We can reserve up to 0xfd 34:u32, then all larger opcodes remain unchanged from existing proposal (no need for renaming/recoding!)
  6. 0xfd ... no immediate bytes, i32x4.splat = 0xfd 17:u32.

This has a small advantage of not changing a lot of the existing opcodes.

I don't quite understand what you're proposing here. Do you mean that for small-opcode instructions we could access them solely through the compat wrappers? I also don't think we'll need to support more than a few shapes of immediates. Technically, we only really need the version with a size-prefixed vector of bytes, since that can describe any immediates at all.

@ngzhian
Copy link
Member

ngzhian commented Sep 24, 2020

No, this would still validate.

Got it, thanks! It can be surprising if engine supports both ANOTHER_BITMASK and FMA, and the code is still buggy in that the wrong bitmask is checked, it will incorrectly do the right thing.

I don't quite understand what you're proposing here.

As I understand it, this feature detection issue will only apply the compat wrappers around new SIMD instructions, right? All existing SIMD instructions retain their current binary encodings.

What I'm proposing is, let's adopt the compat wrappers for all current SIMD instructions as well, with 4 variants (no change here)

  • simd.compat.memarg will be0xfd 0x00 <opcode> memarg
  • simd.compat.immbyte will be 0xfd 0x01 <opcode> byte
  • simd.compat.bytes will be 0xfd 0x02 <opcode> <vec bytes>
  • All other 0xfd instructions will be followed by a leb-encoded u32 opcode. 0xfd is essentially simd.compat.

Contrasting with what you're suggesting (IIUC), which is to have 0xfd 0x03 for simd.compat (instructions with no immediates).

Technically, we only really need the version with a size-prefixed vector of bytes, since that can describe any immediates at all.

Yes, although it will be confusing since a vec of 8 immediate bytes != memarg, you will have to do some decoding.

@tlively
Copy link
Member Author

tlively commented Sep 24, 2020

As I understand it, this feature detection issue will only apply the compat wrappers around new SIMD instructions, right?

I was thinking that all SIMD instructions, both current and future, would have "normal" encodings, but also be accessible via compat wrappers. That would allow engines that don't want to support any SIMD at all support gracefully falling back to MVP code as long as they support the compat wrappers and the v128 type.

IIUC what you're saying, we could avoid a distinction between the "compat wrapper" encoding and the "normal" encoding for most instructions by having the simd.compat behavior be the default for all SIMD instructions. So concretely for this proposal we would allow any unallocated 0xfd-prefixed opcode to validate with zero immediates and have unreachable behavior. So any instruction without immediates is backward compatible by default and only instructions with immediates have to be wrapped to become backwards compatible.

That would save even more code size, indeed. I'm not really sure how valuable it is to have a clear distinction between the "normal" encodings and the "compat wrapper" encodings. I'd be interested to hear what other folks think.

@binji
Copy link
Member

binji commented Sep 24, 2020

That would save even more code size, indeed. I'm not really sure how valuable it is to have a clear distinction between the "normal" encodings and the "compat wrapper" encodings. I'd be interested to hear what other folks think.

It would save code size for SIMD instructions without immediates, but would increase code size for SIMD instructions with immediates. That's probably still a win, though. And it is nice that there is no special simd.compat instruction -- that means we don't have to worry about a mismatch between the simd.compat instruction using immediates that don't match the "normal" instruction that they are forwarding to. The only drawback is that we end up with another renumbering, but perhaps that was inevitable.

@tlively
Copy link
Member Author

tlively commented Sep 24, 2020

There's a spectrum here:

  1. No instruction has compat semantics by default. We have compat wrappers for every immediate shape.

    • Largest compat-mode code size because every instruction needs a compat wrapper. Smallest normal-mode code size because there are no additional immediates in normal mode. Simple because compat "specialness" is encapsulated in compat wrapper instructions rather than the entire 0xfd prefix space. (Lars suggested this)
  2. Zero-immediate instructions have compat semantics by default. We have compat wrappers for every immediate shape except for the shape of no immediates.

    • Smaller compat-mode code size because zero-immediate instructions do not need wrappers. Smallest normal-mode code size because there are no additional immediates in normal mode. Most complex because compat "specialness" leaks into the entire 0xfd prefix space, but some instructions still need compat wrappers. (Zhi suggested this)
  3. Zero-immediate instructions have compat semantics by default and instructions with immediates are always wrapped.

    • Like (2) but simpler because there is no difference between compat-mode and normal-mode (similar to 4). Smaller normal-mode code size than (4) because there are no IDIs.
  4. All instructions have compat semantics by default. There are no compat wrappers.

    • Smaller compat-mode code size because there are no wrappers, but instructions now need IDIs. Largest normal-mode code size because all instructions need an additional immediate descriptor immediate. Slightly complex because compat "specialness" leaks into the entire 0xfd prefix space, but there is no difference between compat-mode and normal-mode. (Ben [accidentally?] suggested this)

@lars-t-hansen
Copy link
Contributor

That's a fine summary.

An additional benefit of (1) is that you must opt in to quiet convert-to-unreachable: only instructions that you've asked to be turned into unreachable can be; for everything else, there's a validation error if the instruction is not supported. Solutions (2)-(4) will turn all single-byte junk into unreachable.

(This came to me because I started wondering whether any of these solutions would extend naturally to the entire instruction set. Were we to do that, we might prefer to have more stringent checks than compat-by-default gives us. And on that note, were we to push this solution for the full instruction set, we could realize further size reductions by dispensing with the 0xfd prefix for compat instructions, we could allocate a small number of single-byte opcodes for compat0, compat1, ... encodings.)

@Maratyszcza
Copy link
Contributor

Thanks Thomas for putting this together. Having forward compatibility mechanism would be very valuable for in-the-wild deployment. I have several questions about this proposal:

  1. Does it need to be restricted to SIMD? I see value in covering at least the atomic instructions in addition to SIMD.
  2. Would this schema work for SIMD instructions with multiple outputs? There are none in the current proposal (and IIUC it is still not possible for a WAsm instruction to have multiple outputs), but going forward SIMD instructions with multiple outputs would be useful, and they allow for more efficient expression of some idioms (e.g. load 3 vectors of RGB data and deinterleave).
  3. Have you considered having a section with description of all non-baseline instructions instead of a prefix before each instruction?

As for the developer interface, I have strong preference to not rely on gcc-style function multiversioning and rather specify the minimal instruction set via linker option:

  1. Function multiversioning is supported only on gcc and clang, and only on a limited set of architectures. For this reason, portable libraries (e.g. XNNPACK) don’t rely on compiler-supported multiversioning, but rather put implementations targeting different instruction sets into different source files. Then the wrapper calls into the right implementation depending on which ISA features are supported. I’d like the same model to work for WAsm SIMD.
  2. Function multiversioning hardcodes the baseline instruction set in the source files. But developers of C/C++ libraries generally don’t know which baseline ISA they should target: it is determined by the logic in JavaScript code that would load the WAsm module, and this code is typically a part of a Web app, outside of control of library developers.

@zeux
Copy link
Contributor

zeux commented Oct 1, 2020

General observation: this seems scary. The complexity around variable byte encoding and multiple encoding variants seems significant.

Is it possible to adopt a solution that is more closely aligned with the LLVM model where per each function you need to know the target features that are available? The way this could work is:

  • For each function, there's a way to identify the length of the entire function without having to parse opcodes (the fact that this isn't how Wasm works is generally unfun and results in tooling issues; we could incorporate that in an optional way [I think] by defining a length-prefixed block and requiring all functions that carry SIMD instructions to use that encoding for the function block
  • The length-prefixed block also contains the list of SIMD variants (more generally, instruction encodings) that is required for validating this block
  • There's an instruction to check if a given instruction feature is available (this instruction can naturally be executed by any Wasm function... naturally only if some extension is supported because Wasm MVP doesn't have it).
  • The semantics of calling a function when some features aren't supported is that it traps at runtime

This would allow pretty much direct compilation in gcc/clang model, both from separate translation units (linker just merges functions with different instructions) and from a single translation unit (when target feature attributes of gcc/clang are used).

This would not require any extra per-instruction compat encoding, the only change is that the function body must be enclosed in a block that carries the length so it can be skipped, and the features; the block can be restricted to only appear at the function level.

This seems trivial to implement from the perspective of existing implementations - again, no extra encoding, just need to parse the new function block and generate unreachable if some features aren't supported.

This doesn't have to be SIMD specific. Unsure if this is good (yay, generality) or bad (we'll never ship this).

@zeux
Copy link
Contributor

zeux commented Oct 1, 2020

... maybe this is what feature detection proposal should be, which might unblock it at the cost of setting it far back ;)

@tlively
Copy link
Member Author

tlively commented Oct 1, 2020

  1. Does it need to be restricted to SIMD? I see value in covering at least the atomic instructions in addition to SIMD.

I see the limited scope of this proposal as a feature. If we tried to generalize it beyond SIMD, then there would be significant pressure to come up with a fully general mechanism that could handle arbitrary conditions and arbitrary conditional module content. Then we'd be stuck in the same place as the conditional sections proposal.

If we are ok with shipping feature detection as a follow-on proposal, this is something we could look into more, but we'd have to go into it with a well-defined scope.

  1. Would this schema work for SIMD instructions with multiple outputs?

Yes. No changes would be required.

  1. Have you considered having a section with description of all non-baseline instructions instead of a prefix before each instruction?

Making parsing depend on the contents of a new section sounds like it would be more contentious in the CG, but is also a good idea that we could pursue if we are ok shipping feature detection as a follow-on proposal.

As for the developer interface, I have strong preference to not rely on gcc-style function multiversioning and rather specify the minimal instruction set via linker option:

  1. Function multiversioning is supported only on gcc and clang, and only on a limited set of architectures. For this reason, portable libraries (e.g. XNNPACK) don’t rely on compiler-supported multiversioning, but rather put implementations targeting different instruction sets into different source files. Then the wrapper calls into the right implementation depending on which ISA features are supported. I’d like the same model to work for WAsm SIMD.

How is this different from what you can already do today? Can you share a small example of how this would look?

  1. Function multiversioning hardcodes the baseline instruction set in the source files. But developers of C/C++ libraries generally don’t know which baseline ISA they should target: it is determined by the logic in JavaScript code that would load the WAsm module, and this code is typically a part of a Web app, outside of control of library developers.

How is this compatible with your previous point? The library author can't choose which implementation to include in their library unless they know the features up front and they can't include multiple versions in their library without something like function multiversioning.

General observation: this seems scary. The complexity around variable byte encoding and multiple encoding variants seems significant.

It's certainly less straightforward than the rest of the SIMD proposal. That being said, the complexity is well-contained. There aren't any complex interactions between this proposal and the rest of WebAssembly.

Is it possible to adopt a solution that is more closely aligned with the LLVM model where per each function you need to know the target features that are available? The way this could work is...

@binji and I actually considered an idea very similar to this that also used conditional blocks that stored the number of bytes to skip if some features weren't supported. The reason we decided to go with the present idea instead is that the byte length of instruction sequences is really hard to keep track of as tools change LEB encodings, which happens in practice because lld emits overly-large LEBs wherever it resolves a relocation.

@zeux
Copy link
Contributor

zeux commented Oct 1, 2020

significant pressure to come up with a fully general mechanism that could handle arbitrary conditions and arbitrary conditional module content. Then we'd be stuck in the same place as the conditional sections proposal.

Is that set in stone though? My understanding of the existing feature detection proposal is that it couples the validity rules with runtime selection rules. A proposal like this one decouples them, providing means to specify encoding of unfamiliar extension in a way that a non-supporting implementation can work with, and means for the code to perform the relevant conditional logic for selecting the optimal implementation. What I'm suggesting is that this proposal can be made simpler and more general without coupling these back again. (I understand the reluctance of going back down this path, and I'm sorry to bring this up since I wasn't involved in the fd proposal so I'm not aware of the pain points, but even if we make something SIMD-specific if it could be done in a way that can be then adopted for other instruction set extensions that'd be great, and I think there's a way to do that as sketched out above).

The reason we decided to go with the present idea instead is that the byte length of instruction sequences is really hard to keep track of as tools change LEB encodings

As long as the linker works on a function by function basis, what I'm proposing still seems simple enough to implement. My understanding otherwise is that in this proposal the linker still needs to be aware of the full encoding, including compact forms, to be able to do relocations.

In general I feel like the inability to decode individual functions separately is an issue. Case in point (perhaps - maybe I'm conflating) is that today you can't actually strip - as far as I can tell - a module containing SIMD instructions with any official tools; I tried doing it with wasm-strip or -s argument of wasm-ld, and both work on MVP binaries but don't work on SIMD binaries.

@Maratyszcza
Copy link
Contributor

Yes. No changes would be required.

Wouldn't the WAsm engine need to know how many outputs an instruction produce to properly track the stack?

@zeux
Copy link
Contributor

zeux commented Oct 2, 2020

Beyond the implementation complexity on the decoding side, the proposal as described introduces complexity on the user side in the way of having to reason about two types of encoding, backwards and forwards compatible, potential issues around this on the tooling side, code size tradeoffs (now scalar+SIMD combined .wasm needs to inflate all SIMD instructions so if it's 90% SIMD this would actually regress combined code size from two separate .wasm blobs), etc.

If we don't do that and instead use per-function attributes, the size impact could be minimal (a few extra bytes per every SIMD function, these tend to be decently sized) and we wouldn't even need control over this long-term...

It would still be the case that we need all tooling to be aware of the full set of features to be able to, for example, perform relocation (because you need to be able to parse the function body). But as far as I can tell this proposal requires this anyhow because it introduces a way to specify a compatible encoding for future extensions but presumably future extensions would come in two forms.

Or is the idea here is to reserve future code space for new SIMD instructions and only provide the encoding via the compat form for any extensions of the SIMD spec? (and only support the non-compat encoding for existing instructions)

@tlively
Copy link
Member Author

tlively commented Oct 2, 2020

@zeux, it's a good point that we might be able to come up with a more general feature detection solution that we could still ship relatively quickly if we pursue a narrower scope than the conditional sections proposal. However, a more limited feature detection mechanism may not actually be useful beyond SIMD. There are not many Wasm features that can be used with changes only to the instructions in function bodies - most require using new sections (threads), new types in function signatures (reference types), new things in the type section (gc), new compilation schemes (exception handling), or some other change that would not be covered by something like a conditional block, forward-compat prefix, or function attribute. Given the limited value of a more general solution, it seems better to double down on a simpler SIMD-specific solution.

As long as the linker works on a function by function basis, what I'm proposing still seems simple enough to implement. My understanding otherwise is that in this proposal the linker still needs to be aware of the full encoding, including compact forms, to be able to do relocations.

The linker actually doesn't know anything about instruction encodings. It just has a list of locations in the object files that need to be patched as relocations are resolved. That means that it depends on all LEB relocations to have 5 bytes in both the input object files and the output binary. The problem is that Binaryen does not keep track of LEB sizes when it parses, so it loses track of instruction sequence sizes. I suppose it could just measure the size again when it emits the binary, though, so maybe that's not as bad as I thought. Definitely worth considering further. Actually, the more I think about it, the more I prefer that solution.

@Maratyszcza

Wouldn't the WAsm engine need to know how many outputs an instruction produce to properly track the stack?

No, the idea is that unknown instructions have the same semantics as unreachable, which is polymorphic in the values it consumes and produces. So any module that validates in the future with the real stack signatures will also validate in the past via these polymorphic unreachable signatures. This is definitely a complicated part of WebAssembly typing, but it's not new in this proposal.

@lukewagner
Copy link
Member

A design that provides forward compatibility is certainly powerful, but it comes with all the risks and complexities that are mentioned above. If we weaken the requirements from "forward compatibility" to "trivial to implement not-supported SIMD features on all hosts", then I think we get a lot of really nice properties.

To be a bit more specific:

  • Assume each SIMD instruction and type is specified to be gated by a well-defined set of SIMD feature bits.
  • When a SIMD instruction's gating feature isn't supported, the instruction is specified to be immediately preceded by an unreachable instruction.
  • When a SIMD type's gating feature isn't supported, any instruction that makes use of that type would similarly be specified to be preceded by unreachable (so, e.g., call/call_indirect with a SIMD-containing signature would trap).

What's nice is that this approach would leverage the existing dead-code validation logic that engines already have to implement (and even provide some retroactive justification for why wasm even requires special dead-code validation in the first place ;-). Thus, I think this might be the lowest-effort solution (especially if we assume @conrad-watt's relaxed dead code validation).

Also, this design avoids some of the "testing matrix" concerns that you get with (iiuc) all the forward-compatibility solutions: validating a wasm module in all-features-enabled mode (e.g., using wabt or any other "implement all the things early" tool) implies the validity of the same module for all feature-availability subsets.

While this doesn't give us forward-compatibility, I think it's worth asking what precisely is the problem we're trying to solve. If the problem is "some hosts can't or won't ever implement certain (or all) SIMD features and we don't want to be producing multiple modules, just to use SIMD, forever", then I think the above solution is sufficient.

@tlively
Copy link
Member Author

tlively commented Oct 13, 2020

In my view, forward compatibility is an important problem to solve, but it's a good point that we haven't considered yet that there is still value to be had here without the complexities of forward compatibility. The two approaches are also complementary; a forward-compatibility mechanism can easily be layered on top of non-forward compatible feature detection. If we can't get consensus on a forward-compatible feature detection mechanism for this initial SIMD proposal, I would be happy if we could at least include the non-compatible version you describe.

@rossberg
Copy link
Member

@lukewagner, our solution for optional features should be consistent, i.e., whatever we do for SIMD should equally apply to threads, GC, etc. I don't find it convincing that the situation for SIMD is so fundamentally different that it justifies multiple mechanisms.

For example, I expect there to be environments that fundamentally cannot implement SIMD nor does their application domain have any use for it, e.g., embedded space. An engine specialised for such an environment should perhaps not be forced to invest in tracking this huge and fast-growing instruction subset just to be allowed to call itself conforming.

Or vice versa, if we think it's okay that they have to, then we can make the same argument for any other optional proposal, which all have a much smaller surface, so would be much less work.

So we could go both ways, but should pick one.

@tlively
Copy link
Member Author

tlively commented Oct 14, 2020

@lukewagner, our solution for optional features should be consistent, i.e., whatever we do for SIMD should equally apply to threads, GC, etc.

FWIW, the optionality scheme @lukewagner suggested generalized beyond SIMD (just replace "SIMD" in his explanation with any other proposal). That being said, I think SIMD is different enough from other types of proposals that it makes more sense for that scheme to be applied to SIMD proposals only.

  1. It makes sense to support multiple levels of SIMD in a single module, and toolchains and developers are already able to do this for native targets. This is in contrast to e.g. threads, which developers do not typically write fallback code paths for, and to e.g. GC, which requires an entirely different compilation scheme and so does not benefit from having fallback code available in the same module.

  2. SIMD is especially painful for engines to support in environments where the underlying machine instructions are not available. First of all, it's a large number of instructions to implement lowering for, and second of all, a non-SIMD lowering of many SIMD instructions would yield slower results than lowering equivalent non-SIMD code, so having fallback code paths available is especially useful in these environments. This is in contrast to e.g. threads, for which the lowering to non-atomic operations is straightforward and useful in environments where multiple threads are not supported.

  3. In the limit, we can expect an unbounded number of SIMD proposals as we want to expose more underlying hardware performance features to WebAssembly. This is in contrast to other features which are more one-and-done.

I don't find it convincing that the situation for SIMD is so fundamentally different that it justifies multiple mechanisms.

I absolutely agree that it would be unfortunate to have multiple mechanisms, but part of the thesis here is that we won't need any mechanisms besides the SIMD mechanism for the reasons given above. For engines that do not want to support e.g. GC, providing a way to gracefully fall back to a non-GC implementation is simply not useful. I confidently predict that no toolchain will want to emit both a GC and non-GC implementation of the source code into the same module, so GC modules will simply not be portable to such engines without a second compilation step to non-GC WebAssembly. This is not a problem per se, but it does mean that our feature detection mechanism can completely ignore GC and any other proposal that meaningfully changes compilation schemes.

An engine specialised for such an environment should perhaps not be forced to invest in tracking this huge and fast-growing instruction subset just to be allowed to call itself conforming.

I agree, which is why I would like to solve the forward compatibility problem as well as the optionality problem for SIMD. For non-SIMD proposals that such an engine does not want to implement, I believe that the engine should simply not implement them, as outlined above.

@lukewagner
Copy link
Member

Very well said @tlively. In that argument, "SIMD" could perhaps be generalized to the class of "locally-acting instructions meant to tap into hardware that may not be present on all hosts and, when absent, is so difficult or inefficient to implement that it's better for the instruction to simply be absent". Should any future instructions emerge outside of SIMD that fit that description, I think the same scheme used for SIMD could be extended uniformly.

An engine specialised for such an environment should perhaps not be forced to invest in tracking this huge and fast-growing instruction subset just to be allowed to call itself conforming.

Even imagining the most aggressive SIMD feature release schedule, if the only thing a non-supporting engine has to do is to add instruction decoding + immediate validation, the actual engineering work required to "keep up" seems like it would be minimal. Add to that the high degree of wasm engine reuse these days and I think the effort here is negligible and thus not a design constraint.

@zeux
Copy link
Contributor

zeux commented Oct 14, 2020

I feel like optionality for SIMD is a bit of a strawman; both v8 and Firefox support SIMD now; the only other major engine is JSC/Safari, and it's not clear to me if the timelines there will be helped by making it possible to do a no-op implementation. And regardless of that, realistically all consumers of SIMD will need to provide a non-SIMD fallback for the foreseeable future, which doesn't seem possible to overcome.

So I feel like we should focus strictly on forward compatibility, with two simple angles:

  • Can we provide a way to incorporate future optional instructions into this instruction set
  • Can we provide a way to incorporate future optional types into this instruction set, or is this outside of the scope of the proposal

For pt2 the likely candidates are going to be v256, v512 and mask16; v512 & mask16 realistically are Intel AVX512-only which means the real timeline for that to be widely usable across multiple architectures is very long, AMD Zen 3 came out this month and it doesn't have support for AVX 512, so I'm tempted to ask if it makes sense to standardize v256 with zero instructions targeting that and then the problem simply boils down to how do we specify instruction encoding for future unknown instructions, with possibilities along the lines of a general-purpose compat instruction (proposed in the original post), blocks with explicit length (proposed by me in a comment above), and possibly one more variant of a generalized instruction prefix (similar to how some hardware vendors like Intel extended their ISA, we could have a simd.prefix BYTE and then just specify new instructions in terms of prefixes to existing instructions with a similar shape).

Then it's just a matter of drawing up pros & cons for these and picking one. The only real alternative to me seems to be not doing anything here, which isn't great.

@tlively
Copy link
Member Author

tlively commented Oct 14, 2020

SIMD optionality may not be that useful on the Web, but I expect it to be useful for portability in the broader WebAssembly ecosystem. I agree that on the Web, forward compatibility is much more useful. Forward compatibility also subsumes optionality, so strategically we should angle for a forward compatibility solution first, but if the CG rejects that, we should fall back to at least trying to get basic feature detection and optionality into this proposal. That way we will have solved the simpler problem and laid the building blocks necessary to pursue forward compatibility as a follow-on proposal. I will write fresh proposal using the forward-compatibility block idea later today.

@ngzhian
Copy link
Member

ngzhian commented Oct 14, 2020

Then it's just a matter of drawing up pros & cons for these and picking one. The only real alternative to me seems to be not doing anything here, which isn't great.

It seems most people on this thread would like to see feature detection, I think we should explore it, and I want to try and raise some points (hopefully) in favor of not doing anything, just to get something balanced:

  1. Feature detection will allow us to ship small, incremental additions to the SIMD instruction set. But any new instructions would have to go through the CG process, which is likely to take longer. This process will likely be more straightforward if we are merely adding instructions, but how small do we want the additions to be? I think we should consider grouping instructions into a bigger, more meaningful set for Simd v2 (and so forth).

  2. The biggest benefit with feature detection is to "solve" separate builds. I argue that having detection in SIMD will not get rid of any need to have separate builds, because of the nature of proposals - projects that target different versions/engines will need to support separate builds. Furthermore, we can require that any engines implementing SIMD v2 will have to implement all SIMD proposals prior, thus a follow-up SIMD v2 proposal will add a row/column to the build matrix, and not a new dimension, which I think reduces the separate builds problem.

  3. Should engines that don't implement SIMD be called spec-compliant? I would say no. Imagine we have this feature detection, an engine can do nothing for SIMD, and still be a WebAssembly runtime. This can be confusing for someone trying to run SIMD code on said engine.

@zeux
Copy link
Contributor

zeux commented Oct 14, 2020

If we go with a block idea and require the block around SIMD-carrying functions then it doesn't cost us much in terms of specification complexity to allocate an extra feature bit for "is SIMD functional period", although this would require changing the implementations to do feature detection via loading and executing a module with SIMD instructions in it (and changing the code that emits instructions accordingly). But it would be great to avoid a possibility where you can produce a given function both in a way that's compatible with implementations that support SIMD optionally and with ones that don't, just to make sure that we have crisp rules about what it means to have SIMD supported.

Although maybe that's in general an orthogonal question, which is "do we want to make it possible for the engines to support SIMD without actually implementing SIMD, to make it easier long-term for the users of SIMD"

@zeux
Copy link
Contributor

zeux commented Oct 14, 2020

Maybe we can discuss and reach consensus on a few high level points either this Friday or next time:

  • Do we think we should make it possible to declare support for SIMD without implementing SIMD?

  • Do we need to leave the door open to new types and design a flexible extension mechanism with that in mind, or do we not need any new types, or do we know what types we might need and just standardize them now

  • Do we think of future extensions as orthogonal extra subsets (e.g. crypto, fast-math, neural-networks), or as incremental versioning (v2, v3, etc.)

  • Do we think we need large groups (e.g. fast-math), smaller groups (e.g. approximate-transcendentals, fused-multiplication, etc.) or single instructions (e.g. fmadd, fmsub)

  • Do we anticipate custom instruction encodings (e.g. 8-byte immediate for some shuffle variants proposed on this repository a few years ago, or 2-address memory instructions), or do we think existing instructions capture the space well.

@omnisip
Copy link

omnisip commented Oct 28, 2020

This thread is super fascinating, and the questions regarding types, instructions, and versioning only add to it.

To add to @zeux 's questions:

  • Is there a way to implement this without getting into 'feature sets' or 'versions'? For instance, we have to do op-code detection anyway, otherwise, the WebAssembly won't load or compile. If there is, how would we propose supporting polyfill of the op-codes that are missing? Mind that these two questions skip over the fact that an implementation or its implementation details may not be finalized.
  • Does the mechanism require a separate feature detection set for types specifically or would that fall under the umbrella that already exists for op-codes?

@tlively
Copy link
Member Author

tlively commented Oct 28, 2020

Is there a way to implement this without getting into 'feature sets' or 'versions'? For instance, we have to do op-code detection anyway, otherwise, the WebAssembly won't load or compile. If there is, how would we propose supporting polyfill of the op-codes that are missing? Mind that these two questions skip over the fact that an implementation or its implementation details may not be finalized.

Yes, it's a matter of granularity. In the most fine-grained scheme, each instruction maybe supported independently of any other instruction, so there is no need for versions or feature sets. On the other hand, usability suffers under such a scheme because it causes a combinatorial explosion in the number of possible configurations. Each module would have to individually check whether each instruction it wants to use is available.

In practice, users of WebAssembly already have to reason about versions and feature sets due to different proposals shipping at different times, so it's not too much of a stretch to have these concepts codified in the spec.

  • Does the mechanism require a separate feature detection set for types specifically or would that fall under the umbrella that already exists for op-codes?

Types generally appear in a WebAssembly module in more places than than instructions do, for example in function signatures, so if we wanted to handle forward compatibility for types in those locations we would need a separate mechanism. However, I would propose that we restrict our forward compatibility mechanism to instructions and types inside the code section to keep it simpler.

@omnisip
Copy link

omnisip commented Oct 29, 2020

Is there a way to implement this without getting into 'feature sets' or 'versions'? For instance, we have to do op-code detection anyway, otherwise, the WebAssembly won't load or compile. If there is, how would we propose supporting polyfill of the op-codes that are missing? Mind that these two questions skip over the fact that an implementation or its implementation details may not be finalized.

Yes, it's a matter of granularity. In the most fine-grained scheme, each instruction maybe supported independently of any other instruction, so there is no need for versions or feature sets. On the other hand, usability suffers under such a scheme because it causes a combinatorial explosion in the number of possible configurations. Each module would have to individually check whether each instruction it wants to use is available.

Yeah, that was my fear. Even with compile-time collection, binary compression, and a set intersection, this could become cumbersome and tedious.

In practice, users of WebAssembly already have to reason about versions and feature sets due to different proposals shipping at different times, so it's not too much of a stretch to have these concepts codified in the spec.

Excellent point for a number of reasons. If we were to look at a pie chart for WebAssembly and WebAssembly SIMD users, what would the piechart look like with respect to browsers capable of automatic updates, browsers without such support, and 'server-side' functionality like Node.JS?

  • Does the mechanism require a separate feature detection set for types specifically or would that fall under the umbrella that already exists for op-codes?

Types generally appear in a WebAssembly module in more places than than instructions do, for example in function signatures, so if we wanted to handle forward compatibility for types in those locations we would need a separate mechanism. However, I would propose that we restrict our forward compatibility mechanism to instructions and types inside the code section to keep it simpler.

What does it mean to be typed only inside the code section? What restrictions does that entail?

@tlively
Copy link
Member Author

tlively commented Oct 29, 2020

If we were to look at a pie chart for WebAssembly and WebAssembly SIMD users, what would the piechart look like with respect to browsers capable of automatic updates, browsers without such support, and 'server-side' functionality like Node.JS?

It's hard to say what the exact breakdown of users is between different platforms, but we can make some useful generalities. All modern browsers are generally routinely updated (although on varying schedules), but there can be big gaps between when new features are shipped on some browser and when they are shipped on all browsers. Browsers that are not regularly updated are not likely to support WebAssembly SIMD at all (and prossibly not MVP WebAssembly, either). My impression is that Node users are generally happy to use the latest LTS, which gets all of Chrome's features, albeit with a lag time of up to a few years for LTS releases. Other server-side environments are new enough that they are being actively developed and frequently updated, but due to resource constraints will implement new features at different times.

What does it mean to be typed only inside the code section? What restrictions does that entail?

The most important restriction is that SIMD types wouldn't be able to be used in function signatures in a forward-compatible manner. I don't think this will be a problem in practice because any function with a non-SIMD fallback wouldn't be able to use SIMD types in its non-SIMD signature anyway.

@omnisip
Copy link

omnisip commented Oct 29, 2020

What does it mean to be typed only inside the code section? What restrictions does that entail?

The most important restriction is that SIMD types wouldn't be able to be used in function signatures in a forward-compatible manner. I don't think this will be a problem in practice because any function with a non-SIMD fallback wouldn't be able to use SIMD types in its non-SIMD signature anyway.

Let's say we have an entry point function with memory pointer parameters and length that calls multiple SIMD functions if they exist. Those SIMD functions take v128s as parameters or provide v128s as return values. And for grins, let's say 1 SIMD function calls another.

Does this work? Is this allowable and/or functional in your proposal?

@tlively
Copy link
Member Author

tlively commented Oct 29, 2020

I've updated the opening post to describe the new block-based design, incorporating feedback from @zeux and others above. I'll also present on it briefly at our meeting tomorrow so that everyone can participate in discussion about it without necessarily having read the updated post.


Let's say we have an entry point function with memory pointer parameters and length that calls multiple SIMD functions if they exist. Those SIMD functions take v128s as parameters or provide v128s as return values. And for grins, let's say 1 SIMD function calls another.

Does this work? Is this allowable and/or functional in your proposal?

This works if the baseline feature set for the compilation includes v128. If the baseline feature set was MVP WebAssembly without any SIMD at all, this wouldn't work because v128 would not be recognized as a valid type to have in function signatures. In that case, the helper functions would also have to take and return pointers rather than v128s.

@binji
Copy link
Member

binji commented Oct 29, 2020

Some additional thoughts about this:

  • Originally I thought that scoping this to SIMD would be better, to make it clear that it is not a general-purpose feature. But there is value in allowing new instructions that do not add new sections/change section layout/add new types/etc. @Maratyszcza mentioned atomics above, but we've also discussed new integer instructions like iNN.nez or div/rem combined instructions too.
  • I'm a little concerned about mismatch between the feature bits and the actual instructions used in the feature_block. If the block contains instructions that are not covered by the feature bits, should the decoding fail? If not, it seems very likely that we'll end up with broken modules (I suppose tools can help here, though).

@tlively
Copy link
Member Author

tlively commented Oct 30, 2020

I agree that we should feel free to use this feature for non-SIMD proposals that introduce new instructions (but not new sections or other constructs). For that reason this could easily be split off into a separate proposal, but I also don't think it's unreasonable to include it in the SIMD proposal.

I'm a little concerned about mismatch between the feature bits and the actual instructions used in the feature_block. If the block contains instructions that are not covered by the feature bits, should the decoding fail? If not, it seems very likely that we'll end up with broken modules (I suppose tools can help here, though).

This is a good design question. WebGPU, for example, only allows extension features to be used when explicitly requested in order to avoid this problem. However, I don't think this is as important for WebAssembly because we expect all WebAssembly to be tool-generated. We can expect tools to be less buggy than humans here.

@tlively
Copy link
Member Author

tlively commented Nov 16, 2020

Given the freeze on new instruction proposals we agreed on in #389, feature detection would no longer help us ship SIMD any sooner. It has been moved out to it's own proposal and discussion should continue on https://github.com/WebAssembly/feature-detection, so I will close this issue. Thank you all for your input and feedback!

@tlively tlively closed this as completed Nov 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants