HMA: Enable gunicorn task scheduling #1565

jagraff · 2024-03-21T13:15:02Z

In order to prevent the apscheduler from executing twice in debug mode, hasher-matcher-actioner checks whether WERKZEUG_RUN_MAIN=="true" before starting the scheduler, because in debug mode the Flask service will initialize twice - werkzeug will set that environment variable only for the "outer" process, meaning that if werkzeug is the WSGI server this is a reliable way to only run the scheduler once in debug mode. However, the "production" version of HMA uses gunicorn as the WSGI server. Since gunicorn doesn't set that enviornment variable, the production version of HMA will incorrectly not trigger the scheduler. This PR fixes this by adding a check for debug mode before checking for WERKZEUG_RUN_MAIN in the environment.

facebook-github-bot · 2024-03-21T13:15:07Z

Hi @jagraff!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

Dcallies

I think this may start the scheduler in every sub-process, which could be a problem for the curator tasks, but may be desirable for the indexing classes.

Multiple curator tasks running (fetcher, build index, etc) can lead to problems, and I don't think I was sufficiently defensive in those tasks to detect and prevent it.

Since you are the first person that is likely to run into this, take care to manage your omm_config and production deployment to avoid ending up with multiple curator tasks!

Dcallies · 2024-03-21T15:30:52Z

hasher-matcher-actioner/src/OpenMediaMatch/app.py

        # We only run apscheduler in the "outer" reloader process, else we'll
        # have multiple executions of the the scheduler in debug mode


You may consider adding a note about gunicorn production code, and the dangers/desirability of running multiple schedulers there.

Dcallies

It looks like the CI changes might be related to these changes, can you confirm?

Dcallies · 2024-03-26T19:52:52Z

Saw your updates, let me know when you are ready to review

Dcallies

This feels like it's heading a step in the wrong direction. Having the defaults be a combination of file and environment variable inputs makes it more difficult for me to understand the expectations of the docker file, the compose, the omm_config, and what customizations will need to occur inside of each deployment.

I could use some help understanding the expectations we should set of the docker file that is in the base directory for your needs in deployment. Let's chat offline.

hasher-matcher-actioner/src/OpenMediaMatch/app.py

hasher-matcher-actioner/docker-compose.prod.yaml

add warning comment about running multiple indexer/fetcher workers

Dcallies

Seems reasonable, we can iterate on detecting misconfiguration or improvement to server bootstrapping as needed.

Dcallies approved these changes Mar 21, 2024

View reviewed changes

facebook-github-bot added the CLA Signed label Mar 21, 2024

Dcallies requested changes Mar 22, 2024

View reviewed changes

jagraff force-pushed the giphy/improve-debug-mode-checking branch from 3e06fcf to e27be65 Compare April 8, 2024 14:32

jagraff requested a review from Dcallies April 8, 2024 14:43

Dcallies requested changes Apr 9, 2024

View reviewed changes

hasher-matcher-actioner/src/OpenMediaMatch/app.py Outdated Show resolved Hide resolved

hasher-matcher-actioner/src/OpenMediaMatch/app.py Outdated Show resolved Hide resolved

hasher-matcher-actioner/docker-compose.prod.yaml Outdated Show resolved Hide resolved

jagraff force-pushed the giphy/improve-debug-mode-checking branch from afba477 to 90e28a8 Compare May 2, 2024 16:34

jagraff requested a review from Dcallies May 2, 2024 16:34

jagraff force-pushed the giphy/improve-debug-mode-checking branch 2 times, most recently from 804eba6 to f58f820 Compare May 2, 2024 16:36

jagraff changed the title ~~HMA: Improve debug mode checking~~ HMA: Enable gunicorn task scheduling May 2, 2024

Enable gunicorn server to schedule indexer and fetcher tasks;

ce62f28

add warning comment about running multiple indexer/fetcher workers

jagraff force-pushed the giphy/improve-debug-mode-checking branch from 0093b18 to ce62f28 Compare May 2, 2024 16:58

black

a0e068f

Dcallies approved these changes May 2, 2024

View reviewed changes

Dcallies merged commit 434bc54 into facebook:main May 2, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HMA: Enable gunicorn task scheduling #1565

HMA: Enable gunicorn task scheduling #1565

jagraff commented Mar 21, 2024 •

edited

facebook-github-bot commented Mar 21, 2024

Dcallies left a comment

Dcallies Mar 21, 2024

Dcallies left a comment

Dcallies commented Mar 26, 2024

Dcallies left a comment

Dcallies left a comment

		# We only run apscheduler in the "outer" reloader process, else we'll
		# have multiple executions of the the scheduler in debug mode

HMA: Enable gunicorn task scheduling #1565

HMA: Enable gunicorn task scheduling #1565

Conversation

jagraff commented Mar 21, 2024 • edited

facebook-github-bot commented Mar 21, 2024

Action Required

Process

Dcallies left a comment

Choose a reason for hiding this comment

Dcallies Mar 21, 2024

Choose a reason for hiding this comment

Dcallies left a comment

Choose a reason for hiding this comment

Dcallies commented Mar 26, 2024

Dcallies left a comment

Choose a reason for hiding this comment

Dcallies left a comment

Choose a reason for hiding this comment

jagraff commented Mar 21, 2024 •

edited