Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shell cron job with more than 2 hours of execution looses the jobid #1118

Open
fabiofelici-nexi opened this issue May 24, 2022 · 5 comments
Open

Comments

@fabiofelici-nexi
Copy link

Describe the bug
Hi, I've 3 node with following parameters:

tag value
dc dc1
expect 3
grp_ambari ambari_mon
grp_kafka kafka_mon
port 6869
region global
role dkron
rpc_addr xx.xx.xx.xx:6869
server true
version 3.1.10

When i have job shell execution that run for more of 2 hours, it lostes the JOB ID

Cannot find jobid 30731

** Specifications:**

  • OS: Red Hat Enterprise Linux Server
  • Version: 7.7 (Maipo)

Additional context
Print there the error in server journal logs when we encounter the problem

May 24 10:18:09 node1 dkron[27567]: time="2022-05-24T10:18:09+02:00" level=error msg="grpc_agent: command error output" error="exit status 1" job=myjob node=node1 plugin="&{0xc0005442c0 0xc0010042d0}"
@fabiofelici-nexi fabiofelici-nexi changed the title shell cron job with more than 2 hours of execution lostes the jobid shell cron job with more than 2 hours of execution looses the jobid May 24, 2022
@vcastellm
Copy link
Member

Could it be that the job was deleted after starting execution?

@fabiofelici-nexi
Copy link
Author

Hi,
I paste there an example of you, this is this night/weekend run of daily job

image

The same job that run less that 2 hours finished correctly, instead when it run more long of 2 hours report failure with this message at EOF "Cannot find jobid "

image

As if after two hours the reference to the job is lost

@vcastellm
Copy link
Member

Can you paste the job json? What executor are you using?

@fabiofelici-nexi
Copy link
Author

Hi, i paste job there.

{
  "id": "id",
  "name": "job1",
  "displayname": "job1",
  "timezone": "",
  "schedule": "@at 2050-01-02T15:00:00Z",
  "owner": "",
  "owner_email": "",
  "success_count": 0,
  "error_count": 23,
  "last_success": null,
  "last_error": "2022-06-05T00:13:32.09237517Z",
  "disabled": false,
  "tags": {
    "grp_kafka": "kafka_mon:1"
  },
  "metadata": null,
  "retries": 0,
  "dependent_jobs": null,
  "parent_job": "",
  "processors": {},
  "concurrency": "forbid",
  "executor": "shell",
  "executor_config": {
    "command": "ssh root@node1 /opt/myscript.sh 11 ",
    "shell": "true"
  },
  "status": "failed",
  "next": "2050-01-02T15:00:00Z",
  "ephemeral": false,
  "expires_at": null
}

I'm using executor script for execute ssh job on node1.

@vcastellm
Copy link
Member

The error message seems to come from your script /opt/myscript.sh not from Dkron, can you confirm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants