-
Notifications
You must be signed in to change notification settings - Fork 20
Using to buckets to suppress queue length increase action #465
Conversation
2ad2a9b
to
37ead39
Compare
Codecov Report
@@ Coverage Diff @@
## master #465 +/- ##
============================================
- Coverage 72.22% 72.09% -0.13%
- Complexity 2638 2709 +71
============================================
Files 328 337 +9
Lines 14865 15215 +350
Branches 1233 1269 +36
============================================
+ Hits 10736 10970 +234
- Misses 3740 3838 +98
- Partials 389 407 +18
Continue to review full report at Codecov.
|
|
if (canUseMoreHeap(esNode)) { | ||
action = getAction(ModifyCacheMaxSizeAction.NAME, esNode, cacheType, true); | ||
} else { | ||
PerformanceAnalyzerApp.RCA_RUNTIME_METRICS_AGGREGATOR.updateStat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we should also add a log statement here in addition to the metric. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to emit the log continuously and in a cluster with jvm pressure, we will emit it quite a few times.
protected boolean canUseMoreHeap(NodeKey esNode) { | ||
// we add action only if heap is under-utilized or healthy and yet more can be consumed. | ||
for (ResourceFlowUnit<HotClusterSummary> clusterSummary : highHeapUsageClusterRca.getFlowUnits()) { | ||
if (clusterSummary.hasResourceSummary()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the check instead be clusterSummary.getSummary()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasSummary
is that check. It checks if the summary is null
.
protected boolean canUseMoreHeap(NodeKey esNode) { | ||
// we add action only if heap is under-utilized or healthy and yet more can be consumed. | ||
for (ResourceFlowUnit<HotClusterSummary> clusterSummary : highHeapUsageClusterRca.getFlowUnits()) { | ||
if (clusterSummary.hasResourceSummary()) { | ||
for (HotNodeSummary nodeSummary : clusterSummary.getSummary().getHotNodeSummaryList()) { | ||
NodeKey thisNode = new NodeKey(nodeSummary.getNodeID(), nodeSummary.getHostAddress()); | ||
if (thisNode.equals(esNode)) { | ||
for (HotResourceSummary hotResourceSummary : nodeSummary.getHotResourceSummaryList()) { | ||
Resource resource = hotResourceSummary.getResource(); | ||
if (resource.getResourceEnum() == DECIDING_HEAP_RESOURCE_TYPE) { | ||
double oldGenUsedRatio = hotResourceSummary.getValue(); | ||
double oldGenUsedPercent = oldGenUsedRatio * 100; | ||
BucketCalculator bucketCalculator; | ||
try { | ||
bucketCalculator = rcaConf.getBucketizationSettings(OLD_GEN_TUNABLE_KEY); | ||
} catch (Exception jsonEx) { | ||
bucketCalculator = new BasicBucketCalculator(DEFAULT_HEAP_USAGE_THRESHOLDS); | ||
LOG.debug("rca.conf does not have bucketization limits specified. Using default map."); | ||
} | ||
UsageBucket bucket = bucketCalculator.compute(oldGenUsedPercent); | ||
LOG.debug("Value ({}) bucketized to {}, using {}", oldGenUsedPercent, bucket.toString(), bucketCalculator.toString()); | ||
if (bucket == UsageBucket.UNDER_UTILIZED || bucket == UsageBucket.HEALTHY_WITH_BUFFER) { | ||
return true; | ||
} else { | ||
return false; | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple level of nesting here with for-if-for-if
, can we break down this function into two or more or use lambda ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have broken the function into two. I will keep thinking about making the loop nicer to look at if I find one. We may not block on that though.
getAction(actionName, esNode, threadPool, true); | ||
if (action != null) { | ||
break; | ||
if (canUseMoreHeap(esNode)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation looks wrong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private Action computeBestAction(NodeKey esNode, ResourceEnum threadPool) {
Action action = null;
if (canUseMoreHeap(esNode)) {
for (String actionName : actionsByUserPriority) {
action = getAction(actionName, esNode, threadPool, true);
if (action != null) {
break;
}
}
} else {
PerformanceAnalyzerApp.RCA_RUNTIME_METRICS_AGGREGATOR.updateStat(
RcaRuntimeMetrics.NO_INCREASE_ACTION_SUGGESTED, NAME + ":" + esNode.getHostAddress(), 1);
}
return action;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the deletion and addition makes it look like it in the diff. This is code as it copied here. Can you check if the indentation looks right to you ?
.../opendistro/elasticsearch/performanceanalyzer/decisionmaker/deciders/QueueHealthDecider.java
Show resolved
Hide resolved
...on/opendistro/elasticsearch/performanceanalyzer/decisionmaker/deciders/HeapBasedDecider.java
Outdated
Show resolved
Hide resolved
* Using to buckets to suppress queue length increase action * adding bucketization for cache deciders * Adding a default bucketization values if not in rca.conf * checkstyle fix * Adding the ignore for RCAIt log errors * Added a unit test * rename a variable * Log error message ignore * Extracted some part into another method for readability
Fixes #: #467
Description of changes: The Queue And Cache Deciders generates the actions to increase the Queue or Cache size if there are no errors. But this can have consequences if the Jvm Heap is already contended. This patch adds a check to make sure that the old gen heap occupancy after GC is underutilized or healthy and more of it can be consumed. The buckets to determine the usage limits are obtained from the
rca.conf
. If rca.conf does not contain them, then the defaults in theHeapBasedDecider::DEFAULT_HEAP_USAGE_THRESHOLDS
.Tests:
canUseMoreHeap()
works as expected.If new tests are added, how long do the new ones take to complete
Test testNoCapacityIncreaseOnUnHealthy PASSED (1m 59s)
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.