Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query to access third-party render-blocking script #30

Open
mukeshpanchal27 opened this issue Jan 9, 2023 · 4 comments · May be fixed by #33
Open

Query to access third-party render-blocking script #30

mukeshpanchal27 opened this issue Jan 9, 2023 · 4 comments · May be fixed by #33

Comments

@mukeshpanchal27
Copy link
Collaborator

mukeshpanchal27 commented Jan 9, 2023

  • Create an HTTP Archive query to find out what % of WordPress sites use third-party render-blocking script.

For the query use below references:

@mukeshpanchal27
Copy link
Collaborator Author

mukeshpanchal27 commented Jan 9, 2023

Initial draft for the query.

SELECT
  percentile,
  client,
  host,
  APPROX_QUANTILES(requests, 1000)[
OFFSET
  (percentile * 10)] AS num_requests
FROM (
  SELECT
    client,
    page,
  IF
    (NET.HOST(requests.url) IN (
      SELECT
        domain
      FROM
        `httparchive.almanac.third_parties`
      WHERE
        date = '2022-06-01'
        AND category != 'hosting' ), 'third party', 'first party') AS host,
    COUNT(0) AS requests
  FROM
    `httparchive.almanac.requests` AS requests
  JOIN
    `httparchive.pages.2022_06_01_*` AS pages
  ON
    pages.url = requests.page
  WHERE
    date = '2022-06-01'
    AND type = 'script'
    AND JSON_EXTRACT(pages.payload, '$._detected_apps.WordPress') IS NOT NULL
    AND CAST(JSON_EXTRACT( pages.payload, '$._renderBlockingJS') AS INT64) > 0
  GROUP BY
    client,
    page,
    host),
  UNNEST([10, 25, 50, 75, 90, 100]) AS percentile
GROUP BY
  percentile,
  client,
  host
ORDER BY
  client,
  percentile,
  host

cc. @felixarntz

@felixarntz
Copy link
Collaborator

@mukeshpanchal27 It looks like this query is based on https://almanac.httparchive.org/en/2022/javascript#requests? Please provide such sources for reference :)

I'm not familiar with the nature of the CMS Almanac-specific tables httparchive.almanac.*, so I'm not sure we can rely on those here. Also, please review the dates, as parts of your query are for October 2022, while others are for June 2022, so that would definitely give us incorrect data.

Last but not least, I see you formatted the query via the BigQuery formatter, which generally is good, however it sometimes formats some parts in a weird way, so in such cases we should manually fix this. For example, this is unnecessarily hard to read:

  APPROX_QUANTILES(requests, 1000)[
OFFSET
  (percentile * 10)] AS num_requests

Would be simpler as:

  APPROX_QUANTILES(requests, 1000)[OFFSET(percentile * 10)] AS num_requests

Feel free to make those changes and open a draft pull request. Better to review in there, not in the issue.

@mukeshpanchal27 mukeshpanchal27 linked a pull request Jan 10, 2023 that will close this issue
@mukeshpanchal27
Copy link
Collaborator Author

Thanks @felixarntz, for the comment.

The httparchive.almanac.third_parties has data till June 2022, so I have updated my query to use June 2022 data for httparchive.pages.2022_06_01_*

The query format was also corrected.

@kt-12
Copy link
Collaborator

kt-12 commented Jan 11, 2023

@felixarntz @mukeshpanchal27 there is a slight issue with this approach-

The goal here is to study third-party scripts that are render-blocking.
Issue with initial query is:

  • httparchive.almanac.third_parties in combination with httparchive.almanac.requests do provide us with information on third-party scripts. However it doesn't record the render-blocking nature of the script
  • httparchive.pages.2022_06_01_* gives us the count of render-blocking scripts, but we can't tell if the third-party script is one among them.
  • The current query gives a count of third-party scripts from sites having at least one render-blocking script, but it can't say if the third-party script is one among them.

Eg.
Take a site where it has 2 third-party scripts in the footer. The theme's own JS file (call it theme.js) is render-blocking.
Then AND CAST(JSON_EXTRACT( pages.payload, '$._renderBlockingJS') AS INT64) > 0 would return true, sub query will return 2 third party, 1 first party. Whereas, the expected output here was 1 first party, 0 third party as none of the 3p were render-blocking.

We can get both the information from the lighthouse report but traversing it is a bit expensive.

I have come up with a draft query below:

CREATE TEMP FUNCTION
  getCountRenderBlockingTPJS(lighthouseItems STRING,
    fcp FLOAT64,
    pageUrl STRING)
  RETURNS INT64
  LANGUAGE js AS '''
try {
const networkItems = JSON.parse(lighthouseItems);
let countRenderBlockingTPJS = 0
for(const item of networkItems ) {
  // network-items are arranges in order of start time.
  // render-blocking resources start before fcp
  // network can be very large, faster we break faster is the query
  if(item.startTime > fcp) break;
  if( item.resourceType == "Script" &&
      !item.url.startsWith(pageUrl) && // to determine third-party
      (item.priority == "VeryHigh" || item.priority == "High") //render-blocking
    )
  {
    countRenderBlockingTPJS++;
  }
}
return countRenderBlockingTPJS
} catch (e) {
  return -1;
}
''';
SELECT
  client,
  total_wp_sites,
  with_3pJS_render_blocking,
  (total_wp_sites-with_3pJS_render_blocking) AS without_3pJS_render_blocking,
  CONCAT(ROUND(with_3pJS_render_blocking*100/total_wp_sites, 3),' %') AS opportunity
FROM (
  SELECT
    lh._TABLE_SUFFIX AS client,
    COUNT(lh.url) AS total_wp_sites,
    COUNTIF( getCountRenderBlockingTPJS( JSON_EXTRACT(lh.report, '$.audits.network-requests.details.items'),
        CAST( JSON_EXTRACT_SCALAR(lh.report, '$.audits.first-contentful-paint.numericValue') AS FLOAT64 ),
        lh.url) > 0 ) AS with_3pJS_render_blocking
  FROM
    `httparchive.technologies.2022_12_01_*` AS tech
  JOIN
    `httparchive.lighthouse.2022_12_01_*` AS lh
  ON
    tech.url = lh.url
  WHERE
    app = 'WordPress'
    AND category = 'CMS'
    AND lh._TABLE_SUFFIX = tech._TABLE_SUFFIX
  GROUP BY
    lh._TABLE_SUFFIX )

Result obtained (it took ~20 mins for the result):

Row client total_wp_sites with_3pJS_render_blocking without_3pJS_render_blocking opportunity
1 desktop 4340020 1188953 3151067 27.395 %
2 mobile 5662739 1563492 4099247 27.61 %

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants