Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interop can be scored for partially aligned runs #216

Open
gsnedders opened this issue Mar 14, 2024 · 5 comments
Open

Interop can be scored for partially aligned runs #216

gsnedders opened this issue Mar 14, 2024 · 5 comments

Comments

@gsnedders
Copy link
Member

There's nothing technically stopping us from scoring Interop for browsers when we don't have aligned runs for all browsers on a given day.

This would lessen the impact of a browser not having results on wpt.fyi for a prolonged period (c.f. web-platform-tests/wpt#44366), though it does make the cross-browser Interop score hard to update.

We have several options here:

  1. Find the aligned run with the largest number of browsers on a given day,
  2. If we don't have an aligned run with all browsers, fall back to the latest run (if any) of each browser on that day,
  3. (The status-quo:) Don't update Interop at all when we don't have an aligned run with all browsers.

The biggest risk here is that the Interop dashboard ends up showing scores based on different sets of tests for different browsers (depending on how many tests have changed in the time period since the last fully aligned run), but the current status is browsers are getting no reward for shipping features and bug fixes which progress Interop.

@foolip
Copy link
Member

foolip commented Mar 14, 2024

@jgraham IIRC your Rust rewrite isn't limited to aligned runs. How did you handle the problem there?

I think something like this would work:

  • Fetch all runs for all browsers, not just aligned runs.
  • Find aligned runs and score those, similar to today.
  • For any dates that did not have aligned runs, score each browser individually, and record no interop score at all

The frontend then shows the data we have. The latest score of each type is used, which might not come from an aligned run.

We could do this retroactively.

@foolip
Copy link
Member

foolip commented Mar 14, 2024

What I've described is @gsnedders's option 2, I think. If option 1 turns out to be easy when "find aligned runs" is implemented locally and not on the server, that would work too.

@jgraham
Copy link

jgraham commented Mar 15, 2024

https://github.com/jgraham/interop-results/tree/main/2024/results/revisions just has results per revision for every browser (in the product set we care about) for that revision. They are generated once i.e. it doesn't rescore ever past revision if the metadata changes. No interop score is calculated.

https://github.com/jgraham/interop-results/tree/main/2024/latest/aligned has both "current" (i.e. with rescoring) and "historic" versions of aligned runs. The -daily variants only have the last aligned run in a given day, which is the same as we have today.

I was imagining the frontend allowing two things: a toggle between "current" and "historic" mode, which affects the graph, and the ability to select a specific SHA and see the (historic) scores for that run (but without an "Interop" score unless it happens to be an aligned run).

@foolip
Copy link
Member

foolip commented Mar 15, 2024

I see. Do you want to switch this whole code base to Rust in the near term, or should we try to fix the problem in the current JS code?

@jgraham
Copy link

jgraham commented Mar 15, 2024

In theory the Rust-generated CSV files should be usable as a drop-in replacement for the current data, independent of new features. So I'd propose starting with that. There is one bug I know about in the rust code (we're not correctly recording the metadata revision used to generate each historic entry), but since that's not in the current data I don't think it would affect that transition.

Obviously we should also validate that we're really getting the same results from both systems (and I think @gsnedders would have preferred a different implementation based on https://github.com/gsnedders/results-analysis/tree/rust, but I hope that's not a blocker).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants