Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance custom metric is_lcp_statically_discoverable is incorrectly reporting false #60

Open
kevinfarrugia opened this issue Dec 28, 2022 · 6 comments

Comments

@kevinfarrugia
Copy link
Contributor

kevinfarrugia commented Dec 28, 2022

Test URL: https://webpagetest.httparchive.org/result/221228_GY_1/1/details/#waterfall_view_step1

The LCP image is https://marieefleurir.com/html/template/default/img/top/firstviw01.jpg and is included in the initial HTML document sent by the server. I understand this should be recorded as is_lcp_statically_discoverable as true.

The root cause seems to be because raw_html: {}, which in turn is caused because WPT_BODIES[0].response_body: null.

See test: https://webpagetest.httparchive.org/result/221228_9S_4/1/details/#waterfall_view_step1 which includes custom metric:

[01_test]
const response_bodies = $WPT_BODIES[0];

return ({
  response_bodies
});

Is this a known issue? Why is response_body body null?

@pmeenan
Copy link
Member

pmeenan commented Dec 28, 2022

WPT will only store/process the bodies if they can be decoded as utf-8 in Python.

Even though the response headers say it is utf-8, Python is throwing a decode error when trying to decode it:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 453869: invalid start byte

I can probably modify the agent to ignore decode errors explicitly for responses with text content types which will help fill in missing responses but they will be filtered in that case (better than not present I guess).

@pmeenan
Copy link
Member

pmeenan commented Dec 28, 2022

PR has been sent for the agent. The HA crawl uses a fork that I control though so the change has been rolled out for the HA agent and will be picked up in the January crawl and should be live on the HA test instance within an hour.

I'll verify the fix on the HA test instance in an hour or so to make sure it also fixed the custom metric as a result.

@pmeenan
Copy link
Member

pmeenan commented Dec 28, 2022

Verified that it is working for that page now: https://webpagetest.httparchive.org/result/221228_YV_A/

Coverage in the January crawl should be improved but I still won't guarantee 100% coverage on response bodies (though finding edge cases like this helps improve it).

We can keep this open until you verify that it is fixed in the actual crawl.

@kevinfarrugia
Copy link
Contributor Author

Thank you. Looks great. 🙏

@kevinfarrugia
Copy link
Contributor Author

I think we are able to measure the impact of the fix after the January crawl using

#standardSQL

CREATE TEMPORARY FUNCTION getRawHtml(payload STRING)
RETURNS INT64
LANGUAGE js AS '''
try {
  var $ = JSON.parse(payload);
  var wpt_bodies = JSON.parse($._wpt_bodies);
  if (wpt_bodies) {
    return Object.keys(wpt_bodies.raw_html).length;
  }

  return 0;
} catch (e) {
  return 0;
}
''';

SELECT
  device,
  COUNTIF(raw_html = 0) AS no_raw_html,
  COUNT(0) AS total,
FROM (
  SELECT
    _TABLE_SUFFIX AS device,
    url AS page,
    getRawHtml(payload) AS raw_html,
  FROM
    `httparchive.pages.2022_10_01_*`
)
GROUP BY
  device

Currently:

+---------+-------------+----------+
| device  | no_raw_html |  total   |
+---------+-------------+----------+
| desktop |      417465 | 10346759 |
| mobile  |      594275 | 15583948 |
+---------+-------------+----------+

@pmeenan
Copy link
Member

pmeenan commented Dec 28, 2022

As an upper bound anyway. We may still not get to 100% but good to know it's ~4% of origins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants