Treat 'PRECONDITION_FAILED' as 'PASS' for interop scoring purposes #178

nt1m · 2023-07-08T06:09:53Z

'PRECONDITION_FAILED' means that the condition tested by assert_implements_optional is false.

For instance, a test might test multiple times the same feature with different video/audio codecs that are optional. If the codec is not supported, 'PRECONDITION_FAILED' would be returned as status.

https://web-platform-tests.org/writing-tests/testharness-api.html#optional-features

This is different from the API not being supported, in that case, assert_implements_optional would fail with an exception (since it couldn't evaluate the condition), and the status would be 'ERROR'.

'PRECONDITION_FAILED' means that the condition tested by `assert_implements_optional` is `false`. For instance, a test might test multiple times the same feature with different video/audio codecs that are optional. If the codec is not supported, 'PRECONDITION_FAILED' would be returned as status. https://web-platform-tests.org/writing-tests/testharness-api.html#optional-features This is different from the API not being supported, in that case, `assert_implements_optional` would fail with an exception (since it couldn't evaluate the condition), and the status would be 'ERROR'.

DanielRyanSmith · 2023-07-10T19:09:50Z

This change considers every PRECONDITION_FAILED status encountered to be considered a passing status. Is that how this should always be treated? The documentation linked above mentions that "A failing assert_implements_optional during setup is reported as a status of PRECONDITION_FAILED for the test, and the subtests will not run."

Is there merit in making a fix to aggregate this differently for the WebCodecs category? I'd just like to be certain that there aren't scenarios where PRECONDITION_FAILED could also mean a test should be considered failing (not to mention the confusing wording of "FAILED" here 😅).

nt1m · 2023-07-11T09:21:41Z

Usually we recommend assert_implements_optional to test optional things, but we haven't been doing it for most of Interop 2023, and have resorted to splitting tests for most part.

jgraham · 2023-07-11T09:27:59Z

Just treating these like passes makes me nervous; it seems very misleading to claim that not supporting a feature amounts to passing tests for that feature (in particular for something like webcodecs, if you only support, say, one media format out of four, you could get a 75% score without supporting the feature at all).

We could just ignore those subtests with a precondition failed result, which would make the per-browser score work better (you'd score out of the features that you do support rather than from the features you don't). However it would "break" the overall interop score (since that's computed based on the total number of subtests).

For webcodecs in particular, I wonder if the best solution would be to have an "interop" mode for the tests which picks a supported video format and then uses that for all the subtests (this is approximately what authors are expected to do). We could keep the per-format variants in the main wpt suite in case there are format-specific problems.

nt1m · 2023-07-11T09:39:11Z

Just treating these like passes makes me nervous; it seems very misleading to claim that not supporting a feature amounts to passing tests for that feature (in particular for something like webcodecs, if you only support, say, one media format out of four, you could get a 75% score without supporting the feature at all).

In the case of WebCodecs, the precondition contains calls to the WebCodecs API, so if the browser does not support WebCodecs, you'd get an error instead of PRECONDITION_FAILED.

We could just ignore those subtests with a precondition failed result, which would make the per-browser score work better (you'd score out of the features that you do support rather than from the features you don't). However it would "break" the overall interop score (since that's computed based on the total number of subtests).

I'm also fine with this if this is an easy solution (but it sounds like it isn't).

For webcodecs in particular, I wonder if the best solution would be to have an "interop" mode for the tests which picks a supported video format and then uses that for all the subtests (this is approximately what authors are expected to do). We could keep the per-format variants in the main wpt suite in case there are format-specific problems.

This sounds similar to just excluding from Interop variants of tests for codecs that are not widely supported across all browsers (which is somewhat what I'm suggesting in #375). This solution is also fine to me.

jgraham · 2023-07-11T09:45:06Z

In the case of WebCodecs, the precondition contains calls to the WebCodecs API, so if the browser does not support WebCodecs, you'd get an error instead of PRECONDITION_FAILED.

We also don't correctly handle tests that return ERROR :/

This sounds similar to just excluding from Interop variants of tests for codecs that are not widely supported across all browsers (which is somewhat what I'm suggesting in #375). This solution is also fine to me.

If there's a single format we expect all participating engines to support I think only including that format in interop is the right way forward here.

nt1m · 2023-07-11T09:53:58Z

If there's a single format we expect all participating engines to support I think only including that format in interop is the right way forward here.

I'm also fine with this, though I have a slight preference for keeping all of the formats that are expected to be supported widely, instead of just one. Reason being that different formats sometimes have slightly different decoding/encoding rules that are reflected in the test, so it would be nice to have that test coverage.

I honestly have no strong opinion though., I would also just be fine with just keeping one format.

jgraham · 2023-07-11T10:02:04Z

Yes, sorry s/one/one or more/. I just mean that if the intersection of formats that all participants will implement is not null we should restrict the Interop-included tests to that intersection rather than trying to figure out how to handle the cases where some implementations intentionally don't implement a format.

gsnedders · 2023-07-20T16:00:32Z

Just treating these like passes makes me nervous; it seems very misleading to claim that not supporting a feature amounts to passing tests for that feature (in particular for something like webcodecs, if you only support, say, one media format out of four, you could get a 75% score without supporting the feature at all).

How can you get that without supporting the feature at all? If you don't support the feature at all, you'll get 0%, provided the tests fail for non-support of Web Codecs but precondition-fail for non-support of a given codec.

Or are you considering the media format as part of the feature?

For webcodecs in particular, I wonder if the best solution would be to have an "interop" mode for the tests which picks a supported video format and then uses that for all the subtests (this is approximately what authors are expected to do). We could keep the per-format variants in the main wpt suite in case there are format-specific problems.

I think, yes, that would be ideal—but that also requires more work than altering how the scoring works (and we still have this problem in the codebase—we need to deal with PRECONDITION_FAILED somehow).

jgraham · 2023-07-20T16:36:26Z

How can you get that without supporting the feature at all? If you don't support the feature at all, you'll get 0%, provided the tests fail for non-support of Web Codecs but precondition-fail for non-support of a given codec.

Hypothetically one could just implement the API for checking codec support, always return unsupported, and then it would look like you were passing every test. I think done deliberately that would be considered bad faith behaviour, but one can imagine similar situations arising if we unconditionally treat PRECONDITION_FAILED as PASS.

If there's a set of codecs that all of Blink/Gecko/WebKit support, I think the easiest solution here would just be to remove tests for other codecs from the Interop set, since it's apparent that we're not going to get Interop for codecs that not every browser implements. Unless that leaves us with no tests I think it's likely to be easier than the solution of creating a set of tests that pick a supported codec in each browser and use that.

foolip · 2023-09-15T11:26:25Z

It seems important to resolve this if it's affecting Interop 2023 scores in an unreasonable way, but I don't see a clear conclusion in web-platform-tests/interop#383.

Did y'all agree on something that would work here that we can go and implement?

Of the options I skimmed, I think updating/splitting the tests or filtering out PRECONDITION_FAILED seems the most practical. But that would leave the question about what wpt.fyi should show, and it should be in sync with interop scoring scripts.

dalecurtis · 2023-09-16T22:08:05Z

I felt the conclusion that we parameterize / separate tests with optional features and come to consensus on what's included in interop seemed reasonable since Interop is already a project of consensus.

jgraham · 2023-09-18T16:16:15Z

Yes. I don't think we can include tests for optional features in Interop without getting additional agreement on what all participants will actually implement, and targeting the tests specifically at that agreement (e.g. we might have agreement that no one will implement the optional feature, or everyone will, or that there will be two behaviours allowed by the tests in Interop).

nt1m requested a review from DanielRyanSmith July 8, 2023 06:09

This was referenced Jul 8, 2023

Remove WebCodecs tests with PRECONDITION_FAILED web-platform-tests/interop#375

Closed

WebCodecs scores inconsistent between test results page and dashboard web-platform-tests/interop#319

Open

Fix typo

64b1223

nt1m requested review from jgraham and gsnedders July 8, 2023 06:15

nairnandu mentioned this pull request Jul 20, 2023

Agenda for Jul 20, 2023 web-platform-tests/interop#383

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treat 'PRECONDITION_FAILED' as 'PASS' for interop scoring purposes #178

Treat 'PRECONDITION_FAILED' as 'PASS' for interop scoring purposes #178

nt1m commented Jul 8, 2023

DanielRyanSmith commented Jul 10, 2023

nt1m commented Jul 11, 2023

jgraham commented Jul 11, 2023 •

edited

nt1m commented Jul 11, 2023 •

edited

jgraham commented Jul 11, 2023

nt1m commented Jul 11, 2023

jgraham commented Jul 11, 2023

gsnedders commented Jul 20, 2023

jgraham commented Jul 20, 2023

foolip commented Sep 15, 2023

dalecurtis commented Sep 16, 2023

jgraham commented Sep 18, 2023

Treat 'PRECONDITION_FAILED' as 'PASS' for interop scoring purposes #178

Are you sure you want to change the base?

Treat 'PRECONDITION_FAILED' as 'PASS' for interop scoring purposes #178

Conversation

nt1m commented Jul 8, 2023

DanielRyanSmith commented Jul 10, 2023

nt1m commented Jul 11, 2023

jgraham commented Jul 11, 2023 • edited

nt1m commented Jul 11, 2023 • edited

jgraham commented Jul 11, 2023

nt1m commented Jul 11, 2023

jgraham commented Jul 11, 2023

gsnedders commented Jul 20, 2023

jgraham commented Jul 20, 2023

foolip commented Sep 15, 2023

dalecurtis commented Sep 16, 2023

jgraham commented Sep 18, 2023

jgraham commented Jul 11, 2023 •

edited

nt1m commented Jul 11, 2023 •

edited