New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HMA: Enable gunicorn task scheduling #1565
HMA: Enable gunicorn task scheduling #1565
Conversation
Hi @jagraff! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may start the scheduler in every sub-process, which could be a problem for the curator tasks, but may be desirable for the indexing classes.
Multiple curator tasks running (fetcher, build index, etc) can lead to problems, and I don't think I was sufficiently defensive in those tasks to detect and prevent it.
Since you are the first person that is likely to run into this, take care to manage your omm_config and production deployment to avoid ending up with multiple curator tasks!
# We only run apscheduler in the "outer" reloader process, else we'll | ||
# have multiple executions of the the scheduler in debug mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may consider adding a note about gunicorn production code, and the dangers/desirability of running multiple schedulers there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the CI changes might be related to these changes, can you confirm?
Saw your updates, let me know when you are ready to review |
3e06fcf
to
e27be65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it's heading a step in the wrong direction. Having the defaults be a combination of file and environment variable inputs makes it more difficult for me to understand the expectations of the docker file, the compose, the omm_config, and what customizations will need to occur inside of each deployment.
I could use some help understanding the expectations we should set of the docker file that is in the base directory for your needs in deployment. Let's chat offline.
afba477
to
90e28a8
Compare
804eba6
to
f58f820
Compare
add warning comment about running multiple indexer/fetcher workers
0093b18
to
ce62f28
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable, we can iterate on detecting misconfiguration or improvement to server bootstrapping as needed.
Fixes #1566
In order to prevent the apscheduler from executing twice in debug mode, hasher-matcher-actioner checks whether WERKZEUG_RUN_MAIN=="true" before starting the scheduler, because in debug mode the Flask service will initialize twice - werkzeug will set that environment variable only for the "outer" process, meaning that if werkzeug is the WSGI server this is a reliable way to only run the scheduler once in debug mode. However, the "production" version of HMA uses gunicorn as the WSGI server. Since gunicorn doesn't set that enviornment variable, the production version of HMA will incorrectly not trigger the scheduler. This PR fixes this by adding a check for debug mode before checking for WERKZEUG_RUN_MAIN in the environment.