Skip to content

Commit

Permalink
QA Runs Initial Backend Implementation (#1586)
Browse files Browse the repository at this point in the history
Supports running QA Runs via the QA API!

Builds on top of the `issue-1498-crawl-qa-backend-support` branch, fixes
#1498

Also requires the latest Browsertrix Crawler 1.1.0+ (from
webrecorder/browsertrix-crawler#469 branch)

Notable changes:
- QARun objects contain info about QA runs, which are crawls
performed on data loaded from existing crawls.

- Various crawl db operations can be performed on either the crawl or
`qa.` object, and core crawl fields have been moved to CoreCrawlable.

- While running,`QARun` data stored in a single `qa` object, while
finished qa runs are added to `qaFinished` dictionary on the Crawl. The
QA list API returns data from the finished list, sorted by most recent
first.

- Includes additional type fixes / type safety, especially around
BaseCrawl / Crawl / UploadedCrawl functionality, also creating specific
get_upload(), get_basecrawl(), get_crawl() getters for internal use and
get_crawl_out() for API

- Support filtering and sorting pages via `qaFilterBy` (screenshotMatch, textMatch) 
along with `gt`, `lt`, `gte`, `lte` params to return pages based on QA results.

---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
  • Loading branch information
ikreymer committed Mar 21, 2024
1 parent 05e03e0 commit 4f676e4
Show file tree
Hide file tree
Showing 31 changed files with 1,396 additions and 484 deletions.
10 changes: 5 additions & 5 deletions backend/btrixcloud/background_jobs.py
Expand Up @@ -403,11 +403,11 @@ async def get_replica_job_file(
profile = await self.profile_ops.get_profile(UUID(job.object_id), org)
return BaseFile(**profile.resource.dict())

item_res = await self.base_crawl_ops.get_crawl_raw(job.object_id, org)
matching_file = [
f for f in item_res.get("files", []) if f["filename"] == job.file_path
][0]
return BaseFile(**matching_file)
item_res = await self.base_crawl_ops.get_base_crawl(job.object_id, org)
matching_file = [f for f in item_res.files if f.filename == job.file_path][
0
]
return matching_file
# pylint: disable=broad-exception-caught, raise-missing-from
except Exception:
raise HTTPException(status_code=404, detail="file_not_found")
Expand Down

0 comments on commit 4f676e4

Please sign in to comment.