Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskInstances- "get instance batch" is getting timeout after 1 minute, status code- 504 (~1000 records) #95

Open
litalkat opened this issue Sep 22, 2023 · 4 comments
Labels
question Further information is requested

Comments

@litalkat
Copy link

litalkat commented Sep 22, 2023

is there a way to increase this timeout?
Does pulling data for other airflow instances affect it?
I tried to use _request_timeout=180 but it didn't help

@ferruzzi
Copy link

Hey @litalkat, we're likely going to need a little more info for this. Is there any stack trace with the error or any indication of a line number we can use to track it down?

@ferruzzi ferruzzi added the question Further information is requested label Sep 22, 2023
@litalkat
Copy link
Author

litalkat commented Sep 23, 2023

@ferruzzi sure.
the request is for specific airflow deployment. in general- the airflow deployment is running ~100K airflow tasks per day.
the process already run at least 5000 times In the last 3 months.
I already countered this "504 Service Exception" response when I made an API call for long-range date but I saw that it can handle a 7000-9000 records in the response.
in the last two days I get this 504 exception almost for any call (even for a range of 3 minutes -> ~100 tasks response)

its important to mention that I am monitoring a lot of other airflow instances in the same time but I didn't saw any thing about rate limit

the exception is returning EXACTLY 1 min after the API call is sent
the traceback looks like:
(504)
Reason: Gateway Timeout
HTTP response headers: HTTPHeaderDict({'x-powered-by': 'Express', .....
HTTP response body: Error accured while trying to proxy: some private env name/api/v1/dags//dagRuns//taskInstances/list

the request is using the airflow_client.get_task_instances_batch(ListTaskInstancesForm(...),_check_return_type=False)

@Yaadto
Copy link

Yaadto commented Sep 27, 2023

@ferruzzi
any updates? or ideas how to solve it?

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Oct 8, 2023

This is due to the webserver timeout. More info here:
https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#web-server-master-timeout.

Maybe we need to improve performance of this particular endpoint, but if you have a really huge task instance table, this is king of expected I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants