-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update b1.0 #3607
Merged
Merged
Update b1.0 #3607
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This combines a few issues: first, I've wanted to filter based on the unpacked tarball size, but some tarballs are beyond the range of the SQL `INTEGER` type and cause SQL cast errors -- change the interpretation of the `int` filter and sort type to `BigInteger`. Also cleans up the logging around retried Sync transaction errors, only logging warnings when it can't determine that the error is a PostgreSQL serialization error. (I hope: this is hard to provoke in casual testing.) Finally, clean up the logging of cached unpacked size by avoiding two separate logs (without dataset name) on unpack, and adding a log of the final unpacked size when we compute it.
Sort datasets by uploaded time
* PBENCH-1307 End time column update
Undable to update date
* PBENCH-1300 Visualization Page Pagination
Display metadata modal is empty
Overview page displays Public datasets
…s#3595) * Another tweak to intake metadata problems Make sure we can't end up with undefined `metadata`. Record details of `metadata.log` access to `run.controller` without adding a ton of separate messages.
* PBENCH-1216 TOC page update
* Minor logging cleanup Minimize cache logging: details were useful when cache management first went in, but are now disruptive during ops review.
Visualization page not loading
* Move nginx cache into /srv/pbench PBENCH-1316 Our deployed containerized server maps `/var/lib` (the default NGINX cache location) to `/home`, which has only 26Gb free. Instead, point NGINX cache to our large Pbench volume at `/srv/pbench/nginx` in order to be able to transfer larger datasets.
PBENCH-1318 The reclaimer defaulted to 20%, which is inappropriate for an unpack reclaim where we want to free just enough for the unpacked dataset size. Also, to help diagnose, add the last referenced cache date to the reclaim log message.
* Protect the cache lock better PBENCH-1317 We found a case where a cache lock could "leak" when an error occurs reading a file in the visualize and compare APIs. The file read has now been repackaged with a `finally` to be sure the stream is closed and unlocked on error.
* Add simple report generator This will report on the state of the ARCHIVE, BACKUP, and CACHE on-disk trees in addition to the state of the SQL database. (I'm going to leave analyzing and reporting on the Opensearch database for another time, since this is "off books" weekend upstream work!) This packages the ad hoc SQL queries I've been doing to monitor the server as a CLI utility, plus some more. Here's the output of `pbench-report-generator --all` on the production server: ``` Archive report: 117,446 tarballs consuming 21.7 TB The smallest tarball is 1.0 kB, pbench-user-benchmark__2020.04.03T11.05.44 The biggest tarball is 41.1 GB, uperf_Azure_RHEL-8.10.0-20240116.45_x86_64_gen2_pci_netvsc_quick_D240125T014727_2024.01.25T01.47.28 Backup report: 117,447 tarballs consuming 21.7 TB Cache report: 97,464 datasets consuming 45.6 TB 4 datasets have never been unpacked, 0 are missing reference timestamps, 0 have bad size metadata The smallest cache is 24.6 kB, pbench-user-benchmark__2020.04.03T11.05.44 The biggest cache is 110.5 GB, trafficgen_RHOSP16.2-RHEL8.3-nrt-OVS-OFFLOAD-PVP-LossTests_tg:trex_r:none_fs:64,128,256,512,1024,1500_nf:1024_fm:si_td:bi_ml:0.002,0.0005,0.0001_tt:bs__2020-12-26T03:16:38 The least recently used cache was referenced Dec 11, specjbb2005__2023.09.22T00.22.28 The most recently used cache was referenced today, uperf_rhel84_4.18.0.277_kernel_10gb_jumbo_2021.01.26T09.51.18 SQL storage report: Table Rows Storage -------------------- ---------- ---------- alembic_version 1 57.3 kB audit 683,922 224.7 MB datasets 117,449 34.3 MB templates 12 221.2 kB server_settings 0 24.6 kB users 11 81.9 kB dataset_metadata 352,344 217.9 MB dataset_operations 340,986 29.1 MB api_keys 5 81.9 kB indexmaps 291,510 79.7 GB Operational states: UPLOAD states: OK 117,449 TOOLINDEX states: READY 106,112 INDEX states: OK 106,112 FAILED 494 CODE 7: 365 Bad metadata.log file encountered CODE 1: 128 Operational error while indexing CODE 12: 1 Unexpected error encountered READY 10,819 ```
dbutenhof
previously approved these changes
Feb 1, 2024
* Remove IndexMap document list PBENCH-1315 The production server, with "only" 108,728 indexed datasets (many more still haven't been migrated from the passthrough server), currently claims 84.1Gb of PostgreSQL storage just for the `IndexMap` table. Most of this consists of a list of each Opensearch document ID in order to allow using bulk update and delete operations to manage the index. This is straining the capacity of our RDU2 PostgreSQL server. As an alternative, this PR removes the document list and instead of the bulk update and delete operations uses `_delete_by_query` and `_update_by_query` searching for documents in the appropriate indices (which we still store in the `IndexMap`) by parent dataset resource ID. Along the way, I noticed that (oops) we were missing the `"authorization"` subdocument in some of our Elasticsearch documents, which would impact the authenticated search API behaviors. And I acted on a deprecation warning for a camelCase template keyword by replacing it with a snake_case alternative.
# Conflicts: # dashboard/src/modules/components/ComparisonComponent/PanelContent.jsx
dbutenhof
approved these changes
Feb 5, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is the next update for the Pbench Server. Despite how GitHub displays it, this picks up changes to
main
only since 22 January ("PBENCH-1309"). (The others are already inb1.0
.)