-
-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Severe JobRunr Exception - Deleting a job that's in process #516
Comments
I have the same error. However, in our case the reason is known. Basically, if we have a running job that does not respond properly to "DELETE" requests, we start seeing this error. For example: lets say I submit a job. The job starts PROCESSING. After a while, I DELETE the job. The job thread is interrupted. But if the job does not respond properly to the The problem that I have is that this eventually kills the background job server. And if spring-boot actuator is also enabled, takes down the whole app server since the liveness probes start to fail. I tried to look into the code. There is this code in
So, basically, 5 of these errors and the What do you recommend for this? I can't guarantee that my jobs will be stopped as soon as an |
Indeed, @cefalo-partha describes the exact reason. This is a duplicate of #467. Normally, your job should be interruptible. If it's part of an external library off-course this is not ideal - especially IO operations like MyBatis should be interruptible in my eyes. If you're doing some CPU intensive work yourself, you can check whether the Thread is interrupted using This will be solved in JobRunr 6 which will be released in Q3. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I see a similar issue as well. We are using a cluster with 2 nodes.
Logs from node 1 where you can see how job 401cc7c0-52e3-499e-b501-07e736521540 failed to be updated.
Logs from node 2 at the time when the errors happened on node 1:
|
It seems to be that this bug is a bit more serious than it might look initially because jobs get run multiple times which in some cases is worse than not running them at all. It looks like Jobrunr is updating 'updatedAt' field on all running jobs and if that coincides in time with some job transferring to SUCCEEDED or FAILED state the transactions conflict. As the result of this bug the job is not being marked as finished and gets run another time, which I do see in logs:
|
@rdehuyss How much $ would it take to fix this? Just asking. |
Hi @AlexanderBartash : let's perhaps connect as I'm not completely following. I've reached out to you via LinkedIn. |
Could you also attach the automatic report generation from JobRunr? Normally, if such an error occurs, JobRunr allows to generate a Github issue automatically from the dashboard. This info helps in diagnosing root causes. |
SevereJobRunrException occurred in BackgroundJobServer efc46e4f-e598-4d47-a966-6ade15455ed7: Could not resolve ConcurrentJobModificationException
Runtime information
Background Job Servers
(workerPoolSize: 32, pollIntervalInSeconds: 15, firstHeartbeat: 2022-07-12T12:11:08.355Z, lastHeartbeat: 2022-07-12T12:31:36.146Z)
Diagnostics from exception
Concurrent modified jobs:
Job id: 28ad9534-aec7-3c06-98a5-f3a5ec0bc81d
Local version: 71; Storage version: 72
Local state: DELETED (at 2022-07-12T12:30:08.670Z) ← PROCESSING (at 2022-07-12T12:30:08.626Z on BackgroundJobServer efc46e4f-e598-4d47-a966-6ade15455ed7) ← ENQUEUED (at 2022-07-12T12:23:04.050Z)
Storage state: DELETED (at 2022-07-12T12:29:54.260Z) ← PROCESSING (at 2022-07-12T12:29:53.540Z on BackgroundJobServer efc46e4f-e598-4d47-a966-6ade15455ed7) ← ENQUEUED (at 2022-07-12T12:23:04.050Z)
Exception
The text was updated successfully, but these errors were encountered: