Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openlineage: use ProcessPoolExecutor over ThreadPoolExecutor. #39235

Merged

Conversation

JDarDagran
Copy link
Contributor

@JDarDagran JDarDagran commented Apr 24, 2024


This may possibly fix #39232. I tested the solution many times locally and deployed in Astro Cloud (so it gives me quite certainty it works).

I don't really know how to write tests for the change (load test is the thing but do we do this in regular tests?). Some guidance would be very much helpful.

@JDarDagran JDarDagran force-pushed the openlineage/use-process-pool-executor branch from e8edb6a to 582d9b1 Compare April 30, 2024 12:00
@JDarDagran JDarDagran force-pushed the openlineage/use-process-pool-executor branch 2 times, most recently from 7670638 to 05261ed Compare May 7, 2024 14:10
@JDarDagran
Copy link
Contributor Author

Needed a bit more time to confirm in various environment that ProcessPoolExecutor fixes the issue and doesn't break anything else.

Converting to ready for review since I've added configurable max_workers and simple test.

@JDarDagran JDarDagran marked this pull request as ready for review May 7, 2024 14:11
@JDarDagran JDarDagran requested review from ashb and Taragolis May 14, 2024 13:28
Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>

Make `max_workers` configurable.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
@JDarDagran JDarDagran force-pushed the openlineage/use-process-pool-executor branch from 05261ed to b9bb056 Compare May 14, 2024 13:39
@mobuchowski
Copy link
Contributor

Internal tests did not find any problems when this approach is enabled.

Copy link
Contributor

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, @JDarDagran , @rahul342 confirmed things worked with our internal benchmarks - it will be great to see this change reaching Airflow + OL users.

@mobuchowski mobuchowski merged commit d529ec8 into apache:main May 15, 2024
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

openlineage, celery: scheduler hanging when emitting lots of OL events via HTTP
5 participants