Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new Jobs are not getting created when multiple jobs are running with more frequent schedule #1379

Open
YashwanthaGowd opened this issue Sep 6, 2023 · 1 comment

Comments

@YashwanthaGowd
Copy link

YashwanthaGowd commented Sep 6, 2023

Describe the bug
Only around 90 jobs that can run "every 2 seconds" are getting created and newer requests ( creation or delete or toggle ) are causing the server to crash and even after starting the server again, toggle and delete from UI are not working

Observed the same issue where the jobs are not getting created, toggle and delete from UI are not working when job count reaches around 2000. In this case jobs are scheduled to run at a specific schedule ( eg: "@at 2023-08-18T18:30:00Z" )

Is this behavior expected with Dkron?
Is there any limit for the number of jobs that can be created or number of jobs that can be run parallelly ?

Here are the logs when issue happened,

node -1

dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.148Z [DEBUG] raft: calculated votes needed: needed=3 term=2432"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.148Z [DEBUG] raft: vote granted: from=00793ce59a5f term=2432 tally=1"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.149Z [DEBUG] raft: lost leadership because received a requestVote with a newer term"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.152Z [WARN] raft: rejecting vote request since our last index is greater: candidate=172.23.0.4:6868 last-index=24362 last-candidate-index=23424"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.152Z [INFO] raft: entering follower state: follower="Node at 172.23.0.6:6868 [Follower]" leader-address= leader-id="
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.393Z [ERROR] raft: failed to make requestVote RPC: target="{Voter b69644af381a 172.23.0.5:6868}" error="dial tcp 172.23.0.5:6868: connect: no route to host" term=2430"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.393Z [ERROR] raft: failed to make requestVote RPC: target="{Voter b69644af381a 172.23.0.5:6868}" error="dial tcp 172.23.0.5:6868: connect: no route to host" term=2432"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.393Z [ERROR] raft: failed to make requestVote RPC: target="{Voter b69644af381a 172.23.0.5:6868}" error="dial tcp 172.23.0.5:6868: connect: no route to host" term=2429"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023/09/06 14:01:24 [DEBUG] memberlist: Stream connection from=172.23.0.2:39268"
dkron-dkron-agent-1 | time="2023-09-06T14:01:24Z" level=info msg="2023/09/06 14:01:24 [DEBUG] memberlist: Initiating push/pull sync with: c27d0ddfcf54 172.23.0.6:8946"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023/09/06 14:01:24 [DEBUG] memberlist: Stream connection from=172.23.0.4:45642"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023/09/06 14:01:24 [DEBUG] memberlist: Initiating push/pull sync with: c27d0ddfcf54 172.23.0.6:8946"
dkron-dkron-agent-1 | time="2023-09-06T14:01:25Z" level=info msg="2023/09/06 14:01:25 [DEBUG] memberlist: Stream connection from=172.23.0.3:36776"
dkron-dkron-server-1 | time="2023-09-06T14:01:25Z" level=info msg="2023/09/06 14:01:25 [DEBUG] memberlist: Initiating push/pull sync with: 9cbda2b54669 172.23.0.2:8946"
dkron-dkron-server-3 | time="2023-09-06T14:01:25Z" level=info msg="2023/09/06 14:01:25 [DEBUG] memberlist: Initiating push/pull sync with: 4c2e4be74dfc 172.23.0.3:8946"
dkron-dkron-server-1 | time="2023-09-06T14:01:25Z" level=info msg="2023/09/06 14:01:25 [DEBUG] memberlist: Stream connection from=172.23.0.6:37072"
dkron-dkron-1 | time="2023-09-06T14:01:25Z" level=info msg="2023-09-06T14:01:25.289Z [WARN] raft: Election timeout reached, restarting election"
dkron-dkron-1 | time="2023-09-06T14:01:25Z" level=info msg="2023-09-06T14:01:25.289Z [INFO] raft: entering candidate state: node="Node at 172.23.0.4:6868 [Candidate]" term=2433"
dkron-dkron-1 | time="2023-09-06T14:01:25Z" level=info msg="2023-09-06T14:01:25.292Z [DEBUG] raft: voting for self: term=2433 id=00793ce59a5f"
dkron-dkron-1 | time="2023-09-06T14:01:25Z" level=info msg="2023-09-06T14:01:25.300Z [DEBUG] raft: asking for vote: term=2433 from=b69644af381a address=172.23.0.5:6868"
dkron-dkron-1 | time="2023-09-06T14:01:25Z" level=info msg="2023-09-06T14:01:25.300Z [DEBUG] raft: asking for vote: term=2433 from=c27d0ddfcf54 address=172.23.0.6:6868"
dkron-dkron-1 | time="2023-09-06T14:01:25Z" level=info msg="2023-09-06T14:01:25.300Z [DEBUG] raft: asking for vote: term=2433 from=4c2e4be74dfc address=172.23.0.3:6868"

node - 2

dkron-dkron-1 | time="2023-09-06T14:01:23Z" level=info msg="2023-09-06T14:01:23.634Z [ERROR] raft: failed to make requestVote RPC: target="{Voter 4c2e4be74dfc 172.23.0.3:6868}" error="read tcp 172.23.0.4:58978->172.23.0.3:6868: i/o timeout" term=2409"
dkron-dkron-server-1 | time="2023-09-06T14:01:23Z" level=info msg="2023/09/06 14:01:23 [INFO] serf: attempting reconnect to b69644af381a 172.23.0.5:8946"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.134Z [WARN] raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id="
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.134Z [INFO] raft: entering candidate state: node="Node at 172.23.0.4:6868 [Candidate]" term=2432"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.139Z [DEBUG] raft: voting for self: term=2432 id=00793ce59a5f"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.148Z [DEBUG] raft: asking for vote: term=2432 from=b69644af381a address=172.23.0.5:6868"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.148Z [DEBUG] raft: asking for vote: term=2432 from=c27d0ddfcf54 address=172.23.0.6:6868"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.148Z [DEBUG] raft: asking for vote: term=2432 from=4c2e4be74dfc address=172.23.0.3:6868"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.148Z [DEBUG] raft: calculated votes needed: needed=3 term=2432"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.148Z [DEBUG] raft: vote granted: from=00793ce59a5f term=2432 tally=1"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.149Z [DEBUG] raft: lost leadership because received a requestVote with a newer term"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.152Z [WARN] raft: rejecting vote request since our last index is greater: candidate=172.23.0.4:6868 last-index=24362 last-candidate-index=23424"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.152Z [INFO] raft: entering follower state: follower="Node at 172.23.0.6:6868 [Follower]" leader-address= leader-id="
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.393Z [ERROR] raft: failed to make requestVote RPC: target="{Voter b69644af381a 172.23.0.5:6868}" error="dial tcp 172.23.0.5:6868: connect: no route to host" term=2430"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.393Z [ERROR] raft: failed to make requestVote RPC: target="{Voter b69644af381a 172.23.0.5:6868}" error="dial tcp 172.23.0.5:6868: connect: no route to host" term=2432"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.393Z [ERROR] raft: failed to make requestVote RPC: target="{Voter b69644af381a 172.23.0.5:6868}" error="dial tcp 172.23.0.5:6868: connect: no route to host" term=2429"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023/09/06 14:01:24 [DEBUG] memberlist: Stream connection from=172.23.0.2:39268"
dkron-dkron-agent-1 | time="2023-09-06T14:01:24Z" level=info msg="2023/09/06 14:01:24 [DEBUG] memberlist: Initiating push/pull sync with: c27d0ddfcf54 172.23.0.6:8946"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023/09/06 14:01:24 [DEBUG] memberlist: Stream connection from=172.23.0.4:45642"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023/09/06 14:01:24 [DEBUG] memberlist: Initiating push/pull sync with: c27d0ddfcf54 172.23.0.6:8946"
dkron-dkron-agent-1 | time="2023-09-06T14:01:25Z" level=info msg="2023/09/06 14:01:25 [DEBUG] memberlist: Stream connection from=172.23.0.3:36776"
dkron-dkron-server-1 | time="2023-09-06T14:01:25Z" level=info msg="2023/09/06 14:01:25 [DEBUG] memberlist: Initiating push/pull sync with: 9cbda2b54669 172.23.0.2:8946"
dkron-dkron-server-3 | time="2023-09-06T14:01:25Z" level=info msg="2023/09/06 14:01:25 [DEBUG] memberlist: Initiating push/pull sync with: 4c2e4be74dfc 172.23.0.3:8946"
dkron-dkron-server-1 | time="2023-09-06T14:01:25Z" level=info msg="2023/09/06 14:01:25 [DEBUG] memberlist: Stream connection from=172.23.0.6:37072"
dkron-dkron-1 | time="2023-09-06T14:01:25Z" level=info msg="2023-09-06T14:01:25.289Z [WARN] raft: Election timeout reached, restarting election"

node -3

dkron-dkron-server-1 | time="2023-09-06T14:01:23Z" level=info msg="2023/09/06 14:01:23 [INFO] serf: attempting reconnect to b69644af381a 172.23.0.5:8946"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.134Z [WARN] raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id="
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.134Z [INFO] raft: entering candidate state: node="Node at 172.23.0.4:6868 [Candidate]" term=2432"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.139Z [DEBUG] raft: voting for self: term=2432 id=00793ce59a5f"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.148Z [DEBUG] raft: asking for vote: term=2432 from=b69644af381a address=172.23.0.5:6868"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.148Z [DEBUG] raft: asking for vote: term=2432 from=c27d0ddfcf54 address=172.23.0.6:6868"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.148Z [DEBUG] raft: asking for vote: term=2432 from=4c2e4be74dfc address=172.23.0.3:6868"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.148Z [DEBUG] raft: calculated votes needed: needed=3 term=2432"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.148Z [DEBUG] raft: vote granted: from=00793ce59a5f term=2432 tally=1"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.149Z [DEBUG] raft: lost leadership because received a requestVote with a newer term"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.152Z [WARN] raft: rejecting vote request since our last index is greater: candidate=172.23.0.4:6868 last-index=24362 last-candidate-index=23424"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.152Z [INFO] raft: entering follower state: follower="Node at 172.23.0.6:6868 [Follower]" leader-address= leader-id="
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.393Z [ERROR] raft: failed to make requestVote RPC: target="{Voter b69644af381a 172.23.0.5:6868}" error="dial tcp 172.23.0.5:6868: connect: no route to host" term=2430"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.393Z [ERROR] raft: failed to make requestVote RPC: target="{Voter b69644af381a 172.23.0.5:6868}" error="dial tcp 172.23.0.5:6868: connect: no route to host" term=2432"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.393Z [ERROR] raft: failed to make requestVote RPC: target="{Voter b69644af381a 172.23.0.5:6868}" error="dial tcp 172.23.0.5:6868: connect: no route to host" term=2429"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023/09/06 14:01:24 [DEBUG] memberlist: Stream connection from=172.23.0.2:39268"
dkron-dkron-agent-1 | time="2023-09-06T14:01:24Z" level=info msg="2023/09/06 14:01:24 [DEBUG] memberlist: Initiating push/pull sync with: c27d0ddfcf54 172.23.0.6:8946"
dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023/09/06 14:01:24 [DEBUG] memberlist: Initiating push/pull sync with: c27d0ddfcf54 172.23.0.6:8946"
dkron-dkron-server-3 | time="2023-09-06T14:01:24Z" level=info msg="2023/09/06 14:01:24 [DEBUG] memberlist: Stream connection from=172.23.0.4:45642"
dkron-dkron-agent-1 | time="2023-09-06T14:01:25Z" level=info msg="2023/09/06 14:01:25 [DEBUG] memberlist: Stream connection from=172.23.0.3:36776"
dkron-dkron-server-1 | time="2023-09-06T14:01:25Z" level=info msg="2023/09/06 14:01:25 [DEBUG] memberlist: Initiating push/pull sync with: 9cbda2b54669 172.23.0.2:8946"
dkron-dkron-server-1 | time="2023-09-06T14:01:25Z" level=info msg="2023/09/06 14:01:25 [DEBUG] memberlist: Stream connection from=172.23.0.6:37072"
dkron-dkron-server-3 | time="2023-09-06T14:01:25Z" level=info msg="2023/09/06 14:01:25 [DEBUG] memberlist: Initiating push/pull sync with: 4c2e4be74dfc 172.23.0.3:8946"
dkron-dkron-1 | time="2023-09-06T14:01:25Z" level=info msg="2023-09-06T14:01:25.289Z [WARN] raft: Election timeout reached, restarting election"

To Reproduce
Steps to reproduce the behavior:

  1. run the following commands to run Dkron on 3 nodes
    docker-compose up
    docker-compose up --scale dkron-server=2
    docker-compose up --scale dkron-agent=3
  2. started creating jobs using http executor to run every 2 seconds
  3. around 90 jobs are getting created and any new requests are causing the server down
  4. the reponses during job creation is as follows-
    201 Created
    201 Created
    201 Created
    Post "http://localhost:8080/v1/jobs": EOF
    Post "http://localhost:8080/v1/jobs": EOF

Expected behavior
Jobs are supposed to be created and other APIs like toggle, delete should work

Screenshots
this is the screenshot from docker when when the issue happened
image

** Specifications:**

  • OS: [e.g. linux]
  • Version [e.g. 2.0.1]

Additional context
Add any other context about the problem here.

@cobolbaby
Copy link
Contributor

dkron-dkron-1 | time="2023-09-06T14:01:24Z" level=info msg="2023-09-06T14:01:24.393Z [ERROR] raft: failed to make requestVote RPC: target="{Voter b69644af381a 172.23.0.5:6868}" error="dial tcp 172.23.0.5:6868: connect: no route to host" term=2430"

Something wrong with your env.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants