Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a code "map" section to the developer documentation #965

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

abergeron
Copy link
Collaborator

This should be merged after #964 to make sure all the links will work correctly, which is why I publish it as a draft.

Otherwise it is mostly ready if you want to read and comment.

Checklist

Tests

The new pre-commit hook fails for unrelated files.

Documentation

  • I have updated the relevant documentation related to my changes

Quality

  • I have read the CONTRIBUTING doc
  • My commits messages follow this format
  • My code follows the style guidelines ($ tox -e lint)

@abergeron abergeron marked this pull request as ready for review July 13, 2022 14:26
@abergeron abergeron closed this Jul 13, 2022
@abergeron abergeron reopened this Jul 13, 2022
@bouthilx
Copy link
Member

Thanks a lot for documentation the flow of execution during a call to orion hunt! I think it is quite complete, despite maybe a short mention of experiment version check/experiment creation race condition handling and a last section on termination criteria and how runners may exit (ex: broken experiment, completed experiment, reached max trials per worker).

@abergeron
Copy link
Collaborator Author

Sorry it took a lot of time, but I think I've addressed the comments now.

@abergeron
Copy link
Collaborator Author

I don't know what is wrong with tests/unittests/client/test_runner.py::test_runner_inside_dask, but I'm reasonably certain I didn't touch anything related to it and it keeps failing, but only on python 3.8.

they fail to start, they crashed, were killed (like by an external job
scheduler) or the take too much time to complete. This is checked in
:py:meth:`orion.client.runner.Runner.gather` with
:py:attr:`orion.client.runner.Runner.is_broken`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 levels of max_trials/max_broken. There is at the level of the experiment. If we reach either max_trials or max_broken, all Runners will stop. And there is at the level of the Runner (under the config name worker, that's a bit confusing since the introduction of the Runner which now control multiple workers). If max_trials or max_broken is reached within the execution of this Runner, it will stop, but the other runner working on the same experiment may continue.

See for instance in doc:
https://orion.readthedocs.io/en/stable/user/config.html#max-trials
vs
https://orion.readthedocs.io/en/stable/user/config.html#config-worker-max-trials

abergeron and others added 2 commits August 2, 2022 14:07
Co-authored-by: Xavier Bouthillier <xavier.bouthillier@gmail.com>
Co-authored-by: Xavier Bouthillier <xavier.bouthillier@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants