Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

variables_as_arguments results in a large number of variable queries #168

Open
cpboyd opened this issue Jun 8, 2023 · 0 comments
Open

Comments

@cpboyd
Copy link

cpboyd commented Jun 8, 2023

I've been tracing an issue where our Airflow instances are querying our secrets backends thousands of times per minute. The main requests seem to be specified under variables_as_arguments.

From https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html:

In top-level code, variables using jinja templates do not produce a request until a task is running, whereas, Variable.get() produces a request every time the dag file is parsed by the scheduler. Using Variable.get() will lead to suboptimal performance in the dag file processing. In some cases this can cause the dag file to timeout before it is fully parsed.

variables_as_arguments queries each variable with Variable.get() twice (if the variable exists):

for variable in variables:
if Variable.get(variable["variable"], default_var=None) is not None:
task_params[variable["attribute"]] = Variable.get(
variable["variable"], default_var=None
)

Are there any alternatives to using the variables_as_arguments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant