We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all.requests
Found this out while looking at the combined pipeline issues.
The all pipeline has the following issues
all
null
summary_requests
firstReq
firstHtml
true
This is because we call the summary code per request here:
data-pipeline/modules/import_all.py
Lines 341 to 346 in d047906
And that code was more intended to be called in one go since it does this:
data-pipeline/modules/transformation.py
Lines 406 to 425 in d047906
You basically need to generate the whole page and all requests, and then lookup this summary_requests array for each request:
try: _, requests = HarJsonToSummary.generate_pages(file_name, har) except Exception: logging.exception( f"Unable to unpack HAR, check previous logs for detailed errors. " f"{file_name=}, {har=}" ) return None summary_requests = [] for request in requests: try: wanted_summary_fields = [ field["name"] for field in constants.BIGQUERY["schemas"]["summary_requests"]["fields"] ] request = utils.dict_subset(request, wanted_summary_fields) except Exception: logging.exception( f"Unable to unpack HAR, check previous logs for detailed errors. " f"{file_name=}, {har=}" ) continue if request: summary_requests.append(request)
The text was updated successfully, but these errors were encountered:
This should be fixed in the streaming writes from the agent for the next crawl: HTTPArchive/wptagent@53189db
Sorry, something went wrong.
No branches or pull requests
Found this out while looking at the combined pipeline issues.
The
all
pipeline has the following issuesnull
for 404s and other errors, even though these have summary data in the legacysummary_requests
table.firstReq
andfirstHtml
correctly (they are always set totrue
).This is because we call the summary code per request here:
data-pipeline/modules/import_all.py
Lines 341 to 346 in d047906
And that code was more intended to be called in one go since it does this:
data-pipeline/modules/transformation.py
Lines 406 to 425 in d047906
You basically need to generate the whole page and all requests, and then lookup this
summary_requests
array for each request:The text was updated successfully, but these errors were encountered: