Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations to reduce MySQL writer DB load #18880

Merged
merged 9 commits into from
May 15, 2024

Conversation

getvictor
Copy link
Member

@getvictor getvictor commented May 9, 2024

#18838 and #18986
Optimized master DB accesses during host software ingestion.

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
    See Changes files for more information.
  • Added/updated tests
  • Manual QA for all new/changed functionality

@getvictor getvictor force-pushed the victor/18838-master-db-read-reduction branch from 5b9d1c3 to dc3d4d8 Compare May 14, 2024 15:15
@getvictor getvictor marked this pull request as ready for review May 14, 2024 19:30
@getvictor getvictor requested a review from a team as a code owner May 14, 2024 19:30
Copy link
Contributor

@mostlikelee mostlikelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, a couple questions

// The calculation must match the one in softwareChecksumComputedColumn
func computeRawChecksum(sw fleet.Software) ([]byte, error) {
h := md5.New() //nolint:gosec // This hash is used as a DB optimization for software row lookup, not security
cols := []string{sw.Name, sw.Version, sw.Source, sw.BundleIdentifier, sw.Release, sw.Arch, sw.Vendor, sw.Browser, sw.ExtensionID}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the intention here to generate the same hash as the current implementation, because the columns are in a different order here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. They look like the same order to me:

			CONCAT_WS(CHAR(0),
				%sname,
				%[1]sversion,
				%[1]ssource,
				COALESCE(%[1]sbundle_identifier, ''),
				`+"%[1]s`release`"+`,
				%[1]sarch,
				%[1]svendor,
				%[1]sbrowser,
				%[1]sextension_id
			)

server/datastore/mysql/software.go Show resolved Hide resolved
level.Debug(ds.logger).Log(
"msg", "software item not found or created", "name", software.Name, "version", software.Version, "source", software.Source,
"bundle_identifier", software.BundleIdentifier, "checksum", uuidString,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might also want to consider batching inserts below to account for a lot of new rows during host enrollment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's leave this optimization to after the release. We should also batch the UPDATE statements. I added a TODO to the issue #18838

@lucasmrod lucasmrod self-assigned this May 15, 2024
Copy link

codecov bot commented May 15, 2024

Codecov Report

Attention: Patch coverage is 60.30534% with 52 lines in your changes are missing coverage. Please review.

Project coverage is 66.71%. Comparing base (15ba5f3) to head (a594c42).
Report is 19 commits behind head on main.

❗ Current head a594c42 differs from pull request most recent head b44631b. Consider uploading reports for the commit b44631b to get more accurate results

Files Patch % Lines
server/datastore/mysql/software.go 60.30% 39 Missing and 13 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18880      +/-   ##
==========================================
- Coverage   66.75%   66.71%   -0.04%     
==========================================
  Files         887      887              
  Lines      108471   108523      +52     
==========================================
- Hits        72409    72406       -3     
- Misses      30181    30219      +38     
- Partials     5881     5898      +17     
Flag Coverage Δ
backend 66.71% <60.30%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

)
for checksum, software := range softwareChecksums {
uuidString := ""
checksumAsUUID, err := uuid.FromBytes([]byte(checksum))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about why we need to convert to UUID (non printable chars?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for debugging. In MySQL, we can use BIN_TO_UUID to compare to the printed value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect these print statements to be hit.

Copy link
Member

@lucasmrod lucasmrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! We should deploy to dogfood to smoke test the changes.

@getvictor getvictor merged commit 825e785 into main May 15, 2024
16 checks passed
@getvictor getvictor deleted the victor/18838-master-db-read-reduction branch May 15, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants