Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document job event binding and output reference kwargs. #353

Open
subdavis opened this issue Feb 19, 2020 · 7 comments
Open

Document job event binding and output reference kwargs. #353

subdavis opened this issue Feb 19, 2020 · 7 comments

Comments

@subdavis
Copy link
Contributor

subdavis commented Feb 19, 2020

I've now used girder worker on 3 projects, and I've wanted this feature on all of them.

What do people think about a new type of docker result hook that is a simple webhook? It would look something like this:

docker_run.delay(
    'my_docker_image:latest', container_args=[
        ...
    ],
    girder_result_hooks=[
        GirderUploadVolumePathJobArtifact(...)
    ],
    girder_status_hooks=[
        GirderSimpleWebhook('https://domain.com/api/v1/hook')
    ],
)

I don't think the return value of the webhook should impact the status of the job. If a job completes successfully and the webhook gives a 500, that should be totally fine, just like how CI works. Most of the time this hook would be to an endpoint in a girder plugin, but it could just as easily be an external system.

I actually think "ignore failures" is what people imagine when they think about hooks, so the kwarg girder_result_hook is IMO a bit misleading. Those aren't hooks, but transforms that are part of the job and affect the outcome.

We can't rename girder_result_hooks, so these two kwargs would inevitably cause confusion.

@manthey
Copy link
Member

manthey commented Feb 19, 2020

I assume we want to pass some small information to the endpoint, such as the job's completion status and the job ID, but not large data, like an output file. Further, I assume the hook could have some auth token to send to the endpoint to show that it is a trusted source.

Do all status hooks get triggered after all result hooks, as the success of processing result hooks is part of the final status of the job? Or would you want a flag indicating where in the job's life cycle the status hook should be called?

@subdavis
Copy link
Contributor Author

subdavis commented Feb 19, 2020

I assume we want to pass some small information to the endpoint, such as the job's completion status and the job ID, but not large data, like an output file

Yes.

Further, I assume the hook could have some auth token to send to the endpoint to show that it is a trusted source.

Also yes.

Do all status hooks get triggered after all result hooks.

That's how I imagine it would work.

Or would you want a flag indicating where in the job's life cycle the status hook should be called?

I don't know. I've never had a use case where I cared about a status other than completion. We could definitely have hooks that trigger for specific statuses, like queued, started, completed, but I've never needed any of that, so I think I'd prefer to do option 1 I really don't know.

@subdavis subdavis changed the title Docker plugin completion webhook Docker plugin completion (or status update) webhook Feb 19, 2020
@zachmullen
Copy link
Member

The way I've always handled this is to attach special metadata fields to the job, then hook into the job update event on the server and check if it's a status update, and if so take action on whatever piece of data I linked into the job.

# create some job
job = ...
# link it to some piece of data
job['relatedThingId'] = item['_id']
Job().save(job)
def load(self, info):
    events.bind('jobs.job.update', 'my_plugin', jobUpdated)

Here's an example

Maybe you already knew about that and were making a case that the webhook approach is superior, however unless we actually have a use case to talk to something other than Girder I'm not convinced. What makes this preferable to using the existing job update endpoint to listen to job updates? If we just want more specificity of events, we could add those as more granular server-side events, such as jobs.jobs.update.status instead of just jobs.job.update, for example.

@subdavis
Copy link
Contributor Author

Maybe you already knew about that and were making a case that the webhook approach is superior.

No, I didn't know this was possible. It would be great to add this to the docs. I'll try it out. Thanks!

@zachmullen
Copy link
Member

Cool. FWIW, another common need is to come from the other direction when linking data -- that is, linking file(s) produced by a worker job back to some existing data in Girder. The way to accomplish that is using the reference feature on the output, like so.

@subdavis
Copy link
Contributor Author

Converting this issue to a docs update issue. I'll make a PR sometime after I've used it a bit more.

@subdavis subdavis changed the title Docker plugin completion (or status update) webhook Document job event binding and output reference kwargs. Feb 19, 2020
@subdavis subdavis added docs and removed question labels Feb 19, 2020
@subdavis subdavis self-assigned this Feb 19, 2020
@zachmullen
Copy link
Member

Thanks. Our docs for the new girder_worker architecture are abysmal. We've heretofore done everything via tribal knowledge and reading source code.

@subdavis subdavis removed their assignment Jul 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants