Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix capacity issue for category aware #16391

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

AlbericByte
Copy link
Contributor

@AlbericByte AlbericByte commented May 5, 2024

Fixes #15847.

Description

Fix the bug of capacity api which is not category aware
The proposal solution is following:

  1. If current WorkerSelectStrategy is category based strategy(FillCapacityWithCategorySpecWorkerSelectStrategy and EqualDistributionWithCategorySpecWorkerSelectStrategy), we will calculate the capacity group by category in getWorkerCategoryCapacity method.
  2. if current WorkerSelectStrategy is not category based strategy, the getWorkerCategoryCapacity will return null.
  3. In OverlordResource#getTotalWorkerCapacity method, get the capacity group by category from WorkerSelectStrategy#getWorkerCategoryCapacity
  4. In CoordinatorDutyUtils#getTotalWorkerCapacity if overlordClient.getTotalWorkerCapacity() return category based capacity, return Math.min(totalCapacity, (int) (totalWorkerCapacity * taskSlotRatio)); here totalCapacity is category capacity.
    Summary: according to the WorkerSelectStrategy(category based or not), set the category capacity in TotalWorkerCapacityResponse, in client side, compare the category capacity and totalWorkerCapacity * taskSlotRatio, take the smaller one

Fixed the bug 15847

Release note

Fix the bug of getCompactionTaskCapacity which is not category aware
Update the getTotalWorkerCapacity of OverlordResource

if WorkerSelectStrategy is category based strategy(FillCapacityWithCategorySpecWorkerSelectStrategy and EqualDistributionWithCategorySpecWorkerSelectStrategy), TotalWorkerCapacityResponse will contains the capacity group by category.


Key changed/added classes in this PR
  • CategoryCapacityInfo
  • OverlordResource
  • TotalWorkerCapacityResponse
  • EqualDistributionWithCategorySpecWorkerSelectStrategy
  • FillCapacityWithCategorySpecWorkerSelectStrategy
  • WorkerSelectStrategy
  • WorkerSelectUtils
  • OverlordResourceTest
  • EqualDistributionWithCategorySpecWorkerSelectStrategyTest
  • FillCapacityWithCategorySpecWorkerSelectStrategyTest
  • IndexingCategoryCapacityInfo
  • IndexingTotalWorkerCapacityInfo
  • IndexingWorker
  • CompactSegments
  • CoordinatorDutyUtils
  • KillUnusedSegments
  • OverlordClientImplTest
  • CompactSegmentsTest
  • KillUnusedSegmentsTest

This PR has:

  • been self-reviewed.
  • using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@kfaraz
Copy link
Contributor

kfaraz commented May 6, 2024

@AlbericByte , thanks for the changes. Please add an explanation of the approach you have taken in the PR description and also include a release note for the new API changes.

@AlbericByte
Copy link
Contributor Author

AlbericByte commented May 6, 2024

@AlbericByte , thanks for the changes. Please add an explanation of the approach you have taken in the PR description and also include a release note for the new API changes.

@kfaraz i added the explanation and updated the release note section. for release note section, it is my first time, i am not sure it is correct format, if not, could you help to share the right format, i will update asap.

And I will fix the test

@kfaraz kfaraz self-requested a review May 9, 2024 09:40
@AlbericByte
Copy link
Contributor Author

@kfaraz @abhishekagarwal87 and @gianm
for the failing test case, i have try sever times in my local environment using the latest code, it can successfully. do i need to increase the java memory for test, welcome for any suggestion?

org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTaskResourceTest.testAPIs
Error:    Run 1: ParallelIndexSupervisorTaskResourceTest.testAPIs:214 » TestTimedOut test timed out after 20000 milliseconds
Error:    Run 2: ParallelIndexSupervisorTaskResourceTest.testAPIs:214 » TestTimedOut test timed out after 20000 milliseconds

@AlbericByte
Copy link
Contributor Author

hi @kfaraz @abhishekagarwal87 and @gianm
Could you help to review this pr when you are not busy, i hope i can contribute more with your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

getCompactionTaskCapacity is not worker category aware.
2 participants