New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[controller] Fix multiple AdminExecutionTasks working on the same store at the same time. #918
Conversation
ddaf559
to
04657bd
Compare
04657bd
to
a853b62
Compare
...roller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminConsumptionTask.java
Outdated
Show resolved
Hide resolved
.../java/com/linkedin/venice/controller/kafka/consumer/AdminConsumptionTaskIntegrationTest.java
Show resolved
Hide resolved
I have a few questions:
We face a similar situation when implementing async gauge execution, and we adopted the following strategy, which seems to work well: cc @xunyin8 |
Regarding your questions: Line 18 in f07d0c1
But thanks for suggested solutions, let me figure out if it works. |
41df7e8
to
3cbb787
Compare
...roller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminConsumptionTask.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, just some comments on the integration part.
.../java/com/linkedin/venice/controller/kafka/consumer/AdminConsumptionTaskIntegrationTest.java
Outdated
Show resolved
Hide resolved
.../java/com/linkedin/venice/controller/kafka/consumer/AdminConsumptionTaskIntegrationTest.java
Show resolved
Hide resolved
@haoxu07, LGTM and thanks for the fix. |
Summary, imperative, start upper case, don't end with a period
Current state: when we create
AdminExecutionTask
, we did not check if there isAdminExecutionTask
for one store on the fly, so if oneAdminExecutionTask
is waiting for a lock (e.g. updating store operation waiting for the store level write lock and that lock is currently unavailable), it is possible thatAdminConsumptionTask
will create multipleAdminExecutionTask
s for a single store until they occupy all threads fromExecutorService
's thread pool, they are all waiting for the same store-level lock.Besides,
executorService.invokeAll(tasks, processingCycleTimeoutInMs, TimeUnit.MILLISECONDS)
will cancel every invoked tasks, even the task is emitted by not getting thread from pool to execute.
ExecutorService
will try to cancel theAdminExecutionTask
waiting for a lock after timeout, but that will not terminate that thread (as the acquiring locking operation byAutoCloseableLock
is not interruptible). Then thatAdminExecutionTask
will keep occupying one thread fromExecutorService
's thread pool until the lock is released.This PR has the following changes to address the issue:
AdminExecutionTask
is running for a store.How was this PR tested?
CI
Does this PR introduce any user-facing changes?