Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing font_details #239

Open
rviscomi opened this issue Apr 19, 2021 · 6 comments
Open

Missing font_details #239

rviscomi opened this issue Apr 19, 2021 · 6 comments
Assignees
Labels

Comments

@rviscomi
Copy link
Member

In the March 2021 requests table, about 7% of fonts are missing the _font_details property.

SELECT
  COUNTIF(JSON_EXTRACT(payload, '$._font_details') IS NULL) AS null_font_details,
  COUNT(0) AS all_fonts,
  COUNTIF(JSON_EXTRACT(payload, '$._font_details') IS NULL) / COUNT(0) AS pct_null_font_details
FROM
  `httparchive.requests.2021_03_01_desktop`
WHERE
  JSON_EXTRACT_SCALAR(payload, '$._request_type') = 'Font'
null_font_details all_fonts pct_null_font_details
1,978,495 27,740,971 7.13%

The expected behavior is for all fonts to have a _font_details property.

cc @rsheeter

@rviscomi
Copy link
Member Author

For comparison, in August 2020 13% of font requests were missing _font_details, so this isn't necessarily a new issue and it may even be getting better.

@rsheeter
Copy link

The case that seems particularly odd is where status is 200 but there are no font details. I tried a few urls (for Google Fonts, fonts.gstatic.com) that were status 200 with no font details and they worked and seemed to load into fonttools just fine.

select 
    status,
    countif(has_font_details) has_font,
    countif(not has_font_details) no_font,
    countif(has_font_details) / count(0) pct_has_font
from (
    select
        JSON_EXTRACT_SCALAR(payload, '$.response.status') status,
        JSON_EXTRACT(payload, '$._font_details') is not null has_font_details
    from
    `httparchive.requests.2021_03_01_desktop`
    where
    JSON_EXTRACT_SCALAR(payload, '$._request_type') = 'Font'
) t
group by status
;

claims that 93.5% of fonts with 200 responses have font details.

If I add and net.host(url) = 'fonts.gstatic.com' to try to filter to Google Fonts then I get 98.6% of responses with 200 codes have font details.

@rsheeter
Copy link

cc @drott

@pmeenan
Copy link
Member

pmeenan commented Apr 19, 2021

Yeah, to set expectations, it will never be 100%. The font details can only be pulled from requests that WPT managed to get the font bodies from (directly out of Chrome) and for URLs that it knows about without having to look at the netlog. For that second grouping, that would include any fonts that are pushed with HTTP/2 push but never used by the page (I assume infrequent but could be surprised).

If you have a few page URLs where it was expected to have the font details but where they weren't available it would also help.

@pmeenan
Copy link
Member

pmeenan commented Apr 19, 2021

For clarification, I don't think the issue is with processing the font files themselves with fonttools but rather that we don't have the raw font file at the time of the analysis.

@rsheeter
Copy link

If it's normal to occasionally successfully read metadata, observe a 200 status in the metadata, and yet not be able to get the font file that could explain what we're seeing. Out of curiosity, why do we find ourselves in that situation? Hm, also could we mark those records so we can readily filter them out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants