QA Runs Initial Backend Implementation (#1586)

Supports running QA Runs via the QA API! Builds on top of the `issue-1498-crawl-qa-backend-support` branch, fixes #1498 Also requires the latest Browsertrix Crawler 1.1.0+ (from webrecorder/browsertrix-crawler#469 branch) Notable changes: - QARun objects contain info about QA runs, which are crawls performed on data loaded from existing crawls. - Various crawl db operations can be performed on either the crawl or `qa.` object, and core crawl fields have been moved to CoreCrawlable. - While running,`QARun` data stored in a single `qa` object, while finished qa runs are added to `qaFinished` dictionary on the Crawl. The QA list API returns data from the finished list, sorted by most recent first. - Includes additional type fixes / type safety, especially around BaseCrawl / Crawl / UploadedCrawl functionality, also creating specific get_upload(), get_basecrawl(), get_crawl() getters for internal use and get_crawl_out() for API - Support filtering and sorting pages via `qaFilterBy` (screenshotMatch, textMatch) along with `gt`, `lt`, `gte`, `lte` params to return pages based on QA results. --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
webrecorder · Mar 21, 2024 · 4f676e4 · 4f676e4
1 parent 05e03e0
commit 4f676e4
Show file tree

Hide file tree

Showing 31 changed files with 1,396 additions and 484 deletions.
diff --git a/backend/btrixcloud/background_jobs.py b/backend/btrixcloud/background_jobs.py
@@ -403,11 +403,11 @@ async def get_replica_job_file(
                 profile = await self.profile_ops.get_profile(UUID(job.object_id), org)
                 return BaseFile(**profile.resource.dict())
 
-            item_res = await self.base_crawl_ops.get_crawl_raw(job.object_id, org)
-            matching_file = [
-                f for f in item_res.get("files", []) if f["filename"] == job.file_path
-            ][0]
-            return BaseFile(**matching_file)
+            item_res = await self.base_crawl_ops.get_base_crawl(job.object_id, org)
+            matching_file = [f for f in item_res.files if f.filename == job.file_path][
+                0
+            ]
+            return matching_file
         # pylint: disable=broad-exception-caught, raise-missing-from
         except Exception:
             raise HTTPException(status_code=404, detail="file_not_found")