Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10862. Reduce time spent on initializing metrics during OM start #6682

Merged

Conversation

mango-li
Copy link
Contributor

What changes were proposed in this pull request?

In the previous code, the metric was updated synchronously when OM started. There should be changed to asynchronous to reduce time consumption.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10862

How was this patch tested?

exist UT.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mango-li for the patch.

Comment on lines 1680 to 1683
if (getMetricsStorageFile().exists()) {
OmMetricsInfo metricsInfo = READER.readValue(getMetricsStorageFile());
metrics.setNumKeys(metricsInfo.getNumKeys());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This sets metrics.numKeys from the file.
  • ScheduleOMMetricsWriteTask writes metrics.numKeys to the file in the Timer thread started below.

Making the initial read from file async introduces a race condition: the two threads may now run in the wrong order, in which case OM may write 0 to the file before reading the previous valid value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also move the ScheduleOMMetricsWriteTask scheduling code to this runAsync , So the order of the metrics.numKeys init would be the same as before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScheduleOMMetricsWriteTask is not a one-time task, it is run periodically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This sets metrics.numKeys from the file.
  • ScheduleOMMetricsWriteTask writes metrics.numKeys to the file in the Timer thread started below.

Making the initial read from file async introduces a race condition: the two threads may now run in the wrong order, in which case OM may write 0 to the file before reading the previous valid value.

Thank you for the review. I have moved the metrics.numKeys into the ScheduleOMMetricsWriteTask.

Copy link
Contributor

@xichen01 xichen01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mango-li Thanks for your work on this,a few comments you can refer

Comment on lines 1680 to 1683
if (getMetricsStorageFile().exists()) {
OmMetricsInfo metricsInfo = READER.readValue(getMetricsStorageFile());
metrics.setNumKeys(metricsInfo.getNumKeys());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also move the ScheduleOMMetricsWriteTask scheduling code to this runAsync , So the order of the metrics.numKeys init would be the same as before.

} catch (IOException e) {
LOG.warn("Async set om metrics fail.", e);
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be necessary to continue to throw the Exception here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be necessary to continue to throw the Exception here.

Thank you for the review. I have updated the code to throw the Exception here.

Copy link
Contributor

@guohao-rosicky guohao-rosicky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a construction method for ScheduleOMMetricsWriteTask

  private class ScheduleOMMetricsWriteTask extends TimerTask {
    public ScheduleOMMetricsWriteTask() throws IOException {
      final File metricsStorageFile = getMetricsStorageFile();
      if (metricsStorageFile.exists()) {
        OmMetricsInfo metricsInfo = READER.readValue(metricsStorageFile);
        metrics.setNumKeys(metricsInfo.getNumKeys());
      }
    }

@ChenSammi
Copy link
Contributor

ChenSammi commented May 17, 2024

@mango-li , thanks on working on this. I have one question, have we measured that how much time can be saved by this async set? Since this metric set looks like a simple operation.

@mango-li
Copy link
Contributor Author

@mango-li , thanks on working on this. I have one question, have we measured that how much time can be saved by this async set? Since this metric set looks like a simple operation.

Thank you for the review. We have 6000 buckets, and they both have link in different volumes, so the async set can save about five seconds of OM startup time.

@ChenSammi
Copy link
Contributor

@mango-li , thanks on working on this. I have one question, have we measured that how much time can be saved by this async set? Since this metric set looks like a simple operation.

Thank you for the review. We have 6000 buckets, and they both have link in different volumes, so the async set can save about five seconds of OM startup time.

Does it mean the time saved is actually from "metadataManager.countRowsInTable(metadataManager.getBucketTable()", instead of om metrics set?

@mango-li
Copy link
Contributor Author

@mango-li , thanks on working on this. I have one question, have we measured that how much time can be saved by this async set? Since this metric set looks like a simple operation.

Thank you for the review. We have 6000 buckets, and they both have link in different volumes, so the async set can save about five seconds of OM startup time.

Does it mean the time saved is actually from "metadataManager.countRowsInTable(metadataManager.getBucketTable()", instead of om metrics set?

Yes. The time saved is mainly from metadataManager.countRowsInTable(metadataManager.getVolumeTable()) and metadataManager.countRowsInTable(metadataManager.getBucketTable()), so I changed om metrics set to async, including metrics.numVolumes and metrics.numBuckets.

@adoroszlai
Copy link
Contributor

The time saved is mainly from metadataManager.countRowsInTable(metadataManager.getVolumeTable()) and metadataManager.countRowsInTable(metadataManager.getBucketTable())

Volume and bucket tables are fully cached, so we could get row count from cache size by calling:

public long getEstimatedKeyCount() throws IOException {
if (cache.getCacheType() == CacheType.FULL_CACHE) {
return cache.size();
}

This could save time by avoiding iteration and key/value conversion.

@adoroszlai adoroszlai changed the title HDDS-10862. Async set om metirc when om start HDDS-10862. Reduce time spent on initializing metrics during OM start May 23, 2024
@adoroszlai
Copy link
Contributor

Can asynchronous metrics initialization also interfere with regular operations? E.g. if clients start creating or deleting buckets while async task is still iterating buckets, could numBuckets value be wrong?

@guohao-rosicky
Copy link
Contributor

The volume table and bucket table use FullTableCache, CountEstimatedRowsInTable is actually the exact number of rows in the table. Let's keep it synchronized.

Metrics.setNumVolumes(metadataManager.countEstimatedRowsInTable(metadataManager.getvolumeTable()));
Metrics.setNumBuckets(metadataManager.countEstimatedRowsInTable(metadataManager.getBucketTable()));

@mango-li
Copy link
Contributor Author

The time saved is mainly from metadataManager.countRowsInTable(metadataManager.getVolumeTable()) and metadataManager.countRowsInTable(metadataManager.getBucketTable())

Volume and bucket tables are fully cached, so we could get row count from cache size by calling:

public long getEstimatedKeyCount() throws IOException {
if (cache.getCacheType() == CacheType.FULL_CACHE) {
return cache.size();
}

This could save time by avoiding iteration and key/value conversion.

Good idea! I have changed metrics.numVolumes and metrics.numBuckets using metadataManager.countEstimatedRowsInTable. And still keep om metrics set synchronized.

@mango-li
Copy link
Contributor Author

The volume table and bucket table use FullTableCache, CountEstimatedRowsInTable is actually the exact number of rows in the table. Let's keep it synchronized.

Metrics.setNumVolumes(metadataManager.countEstimatedRowsInTable(metadataManager.getvolumeTable()));
Metrics.setNumBuckets(metadataManager.countEstimatedRowsInTable(metadataManager.getBucketTable()));

Thank you for the review. I have updated the code.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mango-li for updating the patch.

There is similar logic in reloadOMState and restart, might want to update there, too (or even extract common metrics init to avoid duplication).

@guohao-rosicky guohao-rosicky merged commit 879c6ca into apache:master May 27, 2024
39 checks passed
@adoroszlai
Copy link
Contributor

@mango-li can you please let us know your ASF JIRA user name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants