Document test change proposal principles #604

foolip · 2023-11-10T16:12:06Z

In #599 we had few discussions about test change proposals, and @chrishtr suggested that we document the underlying principles for how we make such decisions.

What we have been doing so far:

Each focus area is defined first and foremost by a set of tests, reviewed carefully before each year's effort is announced.
Functional changes to the tests should be reviewed, and the default is to leave them unchanged.
When tests are changed without review, we may review changes retroactively, and the default is to revert changes. (The tests can be split into other files that aren't labeled for interop, not simply deleted.)

Test changes that are motivated by spec changes need special handling. It doesn't make sense to revert the change and keep tests that no longer match the spec. We haven't had this situation often, but I believe our default should be to drop the tests from interop if there is no better solution everyone agrees on.

We've also had one case of spec clarifications that didn't result in any tests changing, but the spec and tests now align. This is a point of discussion, but my personal view is that our default should be to keep the tests, as the tests are what we reviewed, and the failures are what we plan work around.

foolip · 2023-11-10T16:12:19Z

cc @nt1m

nt1m · 2023-11-10T17:06:27Z

When tests are changed without review, we may review changes retroactively, and the default is to revert changes. (The tests can be split into other files that aren't labeled for interop, not simply deleted.)

Having watched test changes through out the year, I can say that most test changes that go through without review (from everyone) are usually uncontroversial:

The majority of changes are changes to match the spec or changes that match a recent resolution. I think we should do our best to assess those based on the project goals, which is how important the resolution is to developers. Dropping may be a reasonable default though.
Some other test changes are to make sure the test passes in more browsers, because there were failing for different reasons than what the test is testing: using Ahem to remove anti-aliasing, adding fuzzy data, removing certain dependencies, etc. We usually haven't had issues with those, though there was one case where a change to disable BFCache ended up regressing Firefox scores. I think for this case, we should try to look at other ways to remove the dependencies (test splitting or such), and drop in the worse case scenario.
Some other test changes are just fixing obvious mistakes (I fixed many math functions tests through out the year expecting the wrong types).
The minority of test changes without review are expanding test coverage for things that were Interop bugs but not covered by original tests (we've had many math functions subtests getting added there and there). I think test splitting is fine in some cases, though it does feel overkill/awkward to do this for certain subtests that are super small like math functions.

We've also had one case of spec clarifications that didn't result in any tests changing, but the spec and tests now align. This is a point of discussion, but my personal view is that our default should be to keep the tests, as the tests are what we reviewed, and the failures are what we plan work around.

If style containment is the case you're referring to, the spec had some examples that were plain wrong, and some of the related tests did end up changing:
web-platform-tests/wpt@dcb79ca
web-platform-tests/wpt@c3a09eb

In general I would prefer to not enforce the test change proposal processes to every single test change. Recent discussions are just a relatively small subset of test changes that go unreviewed, and introducing unnecessary churn is counterproductive IMO. It is OK to file a test change proposal however to question recent changes though.

jgraham · 2023-11-10T17:12:56Z

We do have enough data in the CI to tell that a) a test is an Interop test and b) that results got worse in some browsers. Therefore a policy-we-could-enforce-in-code would be requiring additional review for any test change that regresses an test included in Interop in any participating browser engine.

foolip added the meta Process and/or repo issues label Nov 10, 2023

foolip mentioned this issue Nov 10, 2023

Agenda for Nov 2, 2023 #599

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document test change proposal principles #604

Document test change proposal principles #604

foolip commented Nov 10, 2023

foolip commented Nov 10, 2023

nt1m commented Nov 10, 2023

jgraham commented Nov 10, 2023

Document test change proposal principles #604

Document test change proposal principles #604

Comments

foolip commented Nov 10, 2023

foolip commented Nov 10, 2023

nt1m commented Nov 10, 2023

jgraham commented Nov 10, 2023