-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-10862. Reduce time spent on initializing metrics during OM start #6682
HDDS-10862. Reduce time spent on initializing metrics during OM start #6682
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mango-li for the patch.
if (getMetricsStorageFile().exists()) { | ||
OmMetricsInfo metricsInfo = READER.readValue(getMetricsStorageFile()); | ||
metrics.setNumKeys(metricsInfo.getNumKeys()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This sets
metrics.numKeys
from the file. ScheduleOMMetricsWriteTask
writesmetrics.numKeys
to the file in theTimer
thread started below.
Making the initial read from file async introduces a race condition: the two threads may now run in the wrong order, in which case OM may write 0 to the file before reading the previous valid value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also move the ScheduleOMMetricsWriteTask
scheduling code to this runAsync
, So the order of the metrics.numKeys
init would be the same as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ScheduleOMMetricsWriteTask
is not a one-time task, it is run periodically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This sets
metrics.numKeys
from the file.ScheduleOMMetricsWriteTask
writesmetrics.numKeys
to the file in theTimer
thread started below.Making the initial read from file async introduces a race condition: the two threads may now run in the wrong order, in which case OM may write 0 to the file before reading the previous valid value.
Thank you for the review. I have moved the metrics.numKeys
into the ScheduleOMMetricsWriteTask
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mango-li Thanks for your work on this,a few comments you can refer
if (getMetricsStorageFile().exists()) { | ||
OmMetricsInfo metricsInfo = READER.readValue(getMetricsStorageFile()); | ||
metrics.setNumKeys(metricsInfo.getNumKeys()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also move the ScheduleOMMetricsWriteTask
scheduling code to this runAsync
, So the order of the metrics.numKeys
init would be the same as before.
} catch (IOException e) { | ||
LOG.warn("Async set om metrics fail.", e); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be necessary to continue to throw the Exception here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be necessary to continue to throw the Exception here.
Thank you for the review. I have updated the code to throw the Exception here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a construction method for ScheduleOMMetricsWriteTask
private class ScheduleOMMetricsWriteTask extends TimerTask {
public ScheduleOMMetricsWriteTask() throws IOException {
final File metricsStorageFile = getMetricsStorageFile();
if (metricsStorageFile.exists()) {
OmMetricsInfo metricsInfo = READER.readValue(metricsStorageFile);
metrics.setNumKeys(metricsInfo.getNumKeys());
}
}
@mango-li , thanks on working on this. I have one question, have we measured that how much time can be saved by this async set? Since this metric set looks like a simple operation. |
Thank you for the review. We have 6000 buckets, and they both have link in different volumes, so the async set can save about five seconds of OM startup time. |
Does it mean the time saved is actually from "metadataManager.countRowsInTable(metadataManager.getBucketTable()", instead of om metrics set? |
Yes. The time saved is mainly from |
Volume and bucket tables are fully cached, so we could get row count from cache size by calling: ozone/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/TypedTable.java Lines 453 to 456 in 6f30f2f
This could save time by avoiding iteration and key/value conversion. |
Can asynchronous metrics initialization also interfere with regular operations? E.g. if clients start creating or deleting buckets while async task is still iterating buckets, could |
The volume table and bucket table use FullTableCache, CountEstimatedRowsInTable is actually the exact number of rows in the table. Let's keep it synchronized.
|
Good idea! I have changed |
Thank you for the review. I have updated the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mango-li for updating the patch.
There is similar logic in reloadOMState
and restart
, might want to update there, too (or even extract common metrics init to avoid duplication).
@mango-li can you please let us know your ASF JIRA user name? |
What changes were proposed in this pull request?
In the previous code, the metric was updated synchronously when OM started. There should be changed to asynchronous to reduce time consumption.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10862
How was this patch tested?
exist UT.