Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query sampling for checking #7

Open
wants to merge 1 commit into
base: db_obs_m3coord_cache
Choose a base branch
from

Conversation

andrewma2
Copy link

What changes are proposed in this pull request?

Implementing a method to periodically check results against the expected M3DB result and record metrics (+refactor of config).

How is this tested?

Deployed to dev-aws-us-west-1, verified from emitted metrics that checks are being performed and are accurate (also verified that mismatches are registered). Other performance metrics aren't affected.

How is this feature monitored?

Code Review

For information about the code review process, e.g. how to find a reviewer or how to ping non-responsive reviewers, check the contents of go/code-review Confluence page.

Approvals

Other than the mandatory approvers enforced by the OWNER file framework (http://go/owners), this PR
requires at least one approval from another engineer.

[NEW] Shiproom

Platform & Compute Fabric:
Changes should be tracked by an approved "material change." Multiple PRs may be tracked by a single material change.

  • Change modifies code owned or released by a Platform or Compute Fabric team

See http://go/platshiproomwiki for instructions and use http://go/lightcm-template to evaluate risk. Ask questions in #platform-shiproom.

Runtime changes:

  • Change targets a runtime maintenance release (i.e., targets a maintenance dbr-branch-x.x branch or has a maintenance dbr-branch-x.x label)
    • Change is NOT a “material change”
    • Change IS a “material change” of low / medium risk
    • Change IS a “material change” of high risk needing Shiproom review

Please refer Runtime Shiproom Wiki: http://go/runtimeshiproomwiki

Security implications

This section is intended for the reviewers of this PR
Please, make sure you consider the content of "What are the responsibilities of code reviewers?" section of go/pr-security-review

@andrewma2 andrewma2 marked this pull request as ready for review August 12, 2022 17:48
// Ratio of queries we make a check for
DefaultCheckSampleRate float64 = 0.0
// Threshold in % to determine if there's difference in results (1 means 1% diff)
DefaultComparePercentThreshold float64 = 1.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1% difference cross all buckets?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it checks the final aggregated result, so it's a 1% difference in the final result

)

// Compares results a, b to the specified percent threshold
// Results should be vectors
func compareResults(a, b *promql.Result, threshold float64) bool {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What may cause the value different?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there can be some slight floating point precision errors especially with comparison, so I thought a % threshold would be best

@@ -177,6 +251,27 @@ func (h *readHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
return
}

// Rulemanager results are vector values (list of metric + value)
// Take a random number and check if under rate so we check a proportion of the queries
if rand.Float64() < float64(h.queryCheckConfig.CheckSampleRate) && res.Value.Type() == parser.ValueTypeVector {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big deal, but do we need float64, why not float32?

if result.Err != nil {
h.logger.Error("Comparison query failed to execute")
} else {
if result != nil && !compareResults(res, result, h.queryCheckConfig.ComparePercentThreshold) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know this res if queried from cache instead of m3db if cache miss the hit?

}
defer query.Close()
// Set context so we can default to M3DB later on
result := query.Exec(context.WithValue(ctx, "UseM3DB", true))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename name to m3dbQueryResult to avoid confusion between res and result

@@ -177,6 +251,27 @@ func (h *readHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
return
}

// Rulemanager results are vector values (list of metric + value)
// Take a random number and check if under rate so we check a proportion of the queries
if rand.Float64() < float64(h.queryCheckConfig.CheckSampleRate) && res.Value.Type() == parser.ValueTypeVector {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this check into redis_cache.go? Put this a this section code into root HTTP method is exposing the storage/cache knowledge to upper layer which is not good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants