Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Using to buckets to suppress queue length increase action #465

Merged
merged 9 commits into from
Oct 14, 2020
Merged

Conversation

yojs
Copy link
Contributor

@yojs yojs commented Oct 13, 2020

Fixes #: #467

Description of changes: The Queue And Cache Deciders generates the actions to increase the Queue or Cache size if there are no errors. But this can have consequences if the Jvm Heap is already contended. This patch adds a check to make sure that the old gen heap occupancy after GC is underutilized or healthy and more of it can be consumed. The buckets to determine the usage limits are obtained from the rca.conf. If rca.conf does not contain them, then the defaults in the HeapBasedDecider::DEFAULT_HEAP_USAGE_THRESHOLDS.

Tests:

  • Added unit tests to check the canUseMoreHeap() works as expected.
  • Added an integration test.

If new tests are added, how long do the new ones take to complete

Test testNoCapacityIncreaseOnUnHealthy PASSED (1m 59s)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@yojs yojs force-pushed the use-buckets branch 2 times, most recently from 2ad2a9b to 37ead39 Compare October 13, 2020 19:57
@codecov
Copy link

codecov bot commented Oct 13, 2020

Codecov Report

Merging #465 into master will decrease coverage by 0.12%.
The diff coverage is 68.09%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #465      +/-   ##
============================================
- Coverage     72.22%   72.09%   -0.13%     
- Complexity     2638     2709      +71     
============================================
  Files           328      337       +9     
  Lines         14865    15215     +350     
  Branches       1233     1269      +36     
============================================
+ Hits          10736    10970     +234     
- Misses         3740     3838      +98     
- Partials        389      407      +18     
Impacted Files Coverage Δ Complexity Δ
...ceanalyzer/collectors/MountedPartitionMetrics.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...r/collectors/MountedPartitionMetricsCollector.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...ollectors/PerformanceAnalyzerMetricsCollector.java 30.00% <0.00%> (ø) 2.00 <0.00> (ø)
...cisionmaker/actions/configs/CacheActionConfig.java 97.77% <ø> (+1.94%) 8.00 <0.00> (-2.00) ⬆️
...cisionmaker/actions/configs/QueueActionConfig.java 95.12% <ø> (+1.94%) 8.00 <0.00> (-2.00) ⬆️
...eanalyzer/decisionmaker/deciders/AlarmMonitor.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...h/performanceanalyzer/hwnet/MountedPartitions.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...ceanalyzer/metrics/PerformanceAnalyzerMetrics.java 26.66% <ø> (ø) 4.00 <0.00> (ø)
...r/linux/LinuxMountedPartitionMetricsGenerator.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...trics_generator/linux/LinuxOSMetricsGenerator.java 38.46% <0.00%> (-3.21%) 4.00 <0.00> (ø)
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 036df07...15f62b5. Read the comment docs.

@yojs
Copy link
Contributor Author

yojs commented Oct 14, 2020

Test nodeRoleChange FAILED (2s)

FAILURE: Executed 455 tests in 42m 36s (1 failed)

@yojs yojs requested review from adityaj1107 and khushbr and removed request for vigyasharma and rguo-aws October 14, 2020 19:00
if (canUseMoreHeap(esNode)) {
action = getAction(ModifyCacheMaxSizeAction.NAME, esNode, cacheType, true);
} else {
PerformanceAnalyzerApp.RCA_RUNTIME_METRICS_AGGREGATOR.updateStat(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should also add a log statement here in addition to the metric. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to emit the log continuously and in a cluster with jvm pressure, we will emit it quite a few times.

protected boolean canUseMoreHeap(NodeKey esNode) {
// we add action only if heap is under-utilized or healthy and yet more can be consumed.
for (ResourceFlowUnit<HotClusterSummary> clusterSummary : highHeapUsageClusterRca.getFlowUnits()) {
if (clusterSummary.hasResourceSummary()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the check instead be clusterSummary.getSummary() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasSummary is that check. It checks if the summary is null.

Comment on lines 43 to 75
protected boolean canUseMoreHeap(NodeKey esNode) {
// we add action only if heap is under-utilized or healthy and yet more can be consumed.
for (ResourceFlowUnit<HotClusterSummary> clusterSummary : highHeapUsageClusterRca.getFlowUnits()) {
if (clusterSummary.hasResourceSummary()) {
for (HotNodeSummary nodeSummary : clusterSummary.getSummary().getHotNodeSummaryList()) {
NodeKey thisNode = new NodeKey(nodeSummary.getNodeID(), nodeSummary.getHostAddress());
if (thisNode.equals(esNode)) {
for (HotResourceSummary hotResourceSummary : nodeSummary.getHotResourceSummaryList()) {
Resource resource = hotResourceSummary.getResource();
if (resource.getResourceEnum() == DECIDING_HEAP_RESOURCE_TYPE) {
double oldGenUsedRatio = hotResourceSummary.getValue();
double oldGenUsedPercent = oldGenUsedRatio * 100;
BucketCalculator bucketCalculator;
try {
bucketCalculator = rcaConf.getBucketizationSettings(OLD_GEN_TUNABLE_KEY);
} catch (Exception jsonEx) {
bucketCalculator = new BasicBucketCalculator(DEFAULT_HEAP_USAGE_THRESHOLDS);
LOG.debug("rca.conf does not have bucketization limits specified. Using default map.");
}
UsageBucket bucket = bucketCalculator.compute(oldGenUsedPercent);
LOG.debug("Value ({}) bucketized to {}, using {}", oldGenUsedPercent, bucket.toString(), bucketCalculator.toString());
if (bucket == UsageBucket.UNDER_UTILIZED || bucket == UsageBucket.HEALTHY_WITH_BUFFER) {
return true;
} else {
return false;
}
}
}
}
}
}
}
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple level of nesting here with for-if-for-if, can we break down this function into two or more or use lambda ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have broken the function into two. I will keep thinking about making the loop nicer to look at if I find one. We may not block on that though.

getAction(actionName, esNode, threadPool, true);
if (action != null) {
break;
if (canUseMoreHeap(esNode)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation looks wrong here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  private Action computeBestAction(NodeKey esNode, ResourceEnum threadPool) {
    Action action = null;
    if (canUseMoreHeap(esNode)) {
      for (String actionName : actionsByUserPriority) {
        action = getAction(actionName, esNode, threadPool, true);
        if (action != null) {
          break;
        }
      }
    } else {
      PerformanceAnalyzerApp.RCA_RUNTIME_METRICS_AGGREGATOR.updateStat(
          RcaRuntimeMetrics.NO_INCREASE_ACTION_SUGGESTED, NAME + ":" + esNode.getHostAddress(), 1);
    }
    return action;
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the deletion and addition makes it look like it in the diff. This is code as it copied here. Can you check if the indentation looks right to you ?

@yojs yojs merged commit 69bee54 into master Oct 14, 2020
@yojs yojs deleted the use-buckets branch October 14, 2020 23:25
adityaj1107 pushed a commit that referenced this pull request Oct 14, 2020
* Using to buckets to suppress queue length increase action

* adding bucketization for cache deciders

* Adding a default bucketization values if not in rca.conf

* checkstyle fix

* Adding the ignore for RCAIt log errors

* Added a unit test

* rename a variable

* Log error message ignore

* Extracted some part into another method for readability
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants