Set parallelism log messages to warning level for better visiblity #39298

BasPH · 2024-04-28T19:47:15Z

This PR raises the log level of several log messages related to parallelism to warning level. IMO it's currently not evident when Airflow reaches a parallelism limit so the purpose of this PR is to make it more obvious that a scalability limit is reached.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

…isibility

hussein-awala

-0.9 for this change, reaching the parallelism limits is a normal and expected thing and we should not warn the user when it happens.

BasPH · 2024-04-28T20:03:17Z

Why should we not warn the user?

hussein-awala · 2024-04-28T20:12:17Z

I think the question is why should we warn the user when the parallel limits are reached?

BasPH · 2024-04-28T20:16:21Z

To inform the user that a software limit has been reached. For example, unless you enable debug logging there's no way to tell from the logs that AIRFLOW__CORE__PARALLELISM was reached. The default limits are rather low in Airflow so this is a common situation and it's currently not self-evident.

eladkal · 2024-04-28T21:39:12Z

To inform the user that a software limit has been reached. For example, unless you enable debug logging there's no way to tell from the logs that AIRFLOW__CORE__PARALLELISM was reached. The default limits are rather low in Airflow so this is a common situation and it's currently not self-evident.

I think this is more sutaible for cluster activity dashboard to expose this in actionable manner. Logs are a bit problematic because what you get here is notification that parallesim is reach but you have no information if user set this specific value by design or not. If user chose the specific value then logging warnings is not very user freidnly and in fact not actionable. This can result in further user request to silent the messages. In my point of view If we can find a way to present the information in cluster activity dashboard that would be prefered.

RNHTTR · 2024-04-29T22:18:42Z

-0.9 for this change, reaching the parallelism limits is a normal and expected thing and we should not warn the user when it happens.

This is only true for users whose Airflow environment is somewhat optimized. There are a lot of configs to consider (parallelism, worker concurrency, pools) in order to limit task throughput -- surfacing more info to the user could only be helpful IMO.

I think this is more sutaible for cluster activity dashboard to expose this in actionable manner. Logs are a bit problematic because what you get here is notification that parallesim is reach but you have no information if user set this specific value by design or not. If user chose the specific value then logging warnings is not very user freidnly and in fact not actionable. This can result in further user request to silent the messages. In my point of view If we can find a way to present the information in cluster activity dashboard that would be prefered.

Why not both? I personally spend a lot more time debugging Airflow logs than I do using the cluster activity dashboard. If parallelism is reached unexpectedly such that it causes a problem, I'd definitely find it faster in the logs.

jedcunningham · 2024-04-29T23:01:16Z

My 2c, I think info is appropriate.

hussein-awala · 2024-04-30T19:39:50Z

This is only true for users whose Airflow environment is somewhat optimized. There are a lot of configs to consider (parallelism, worker concurrency, pools) in order to limit task throughput -- surfacing more info to the user could only be helpful IMO.

Exactly, in this case:

It would be useless to warn the user when the pool limit or dag/run/task max concurrency are reached, because these limits are usually set according to the capacity of an external system
Some users (including me) will find some of these warnings annoying especially that we can use the OTEL or statsd metrics to monitor the concurrency and create alerts.

RNHTTR · 2024-04-30T22:39:46Z

This is only true for users whose Airflow environment is somewhat optimized. There are a lot of configs to consider (parallelism, worker concurrency, pools) in order to limit task throughput -- surfacing more info to the user could only be helpful IMO.

Exactly, in this case:

It would be useless to warn the user when the pool limit or dag/run/task max concurrency are reached, because these limits are usually set according to the capacity of an external system

Some users (including me) will find some of these warnings annoying especially that we can use the OTEL or statsd metrics to monitor the concurrency and create alerts.

Yeah that's fair. Do you agree that at least an info log is reasonable?

airflow/executors/base_executor.py

BasPH · 2024-05-02T09:01:15Z

Okay I reverted a few logging statements so that now leaves an info log when AIRFLOW__CORE__PARALLELISM is reached, which should help users understand why Airflow isn't running any more tasks.

@hussein-awala Regarding:

It would be useless to warn the user when the pool limit or dag/run/task max concurrency are reached, because these limits are usually set according to the capacity of an external system

Disagree with this statement. Airflow has lots of knobs to tweak for parallelism which can be a blessing and a curse. You might have a perfect setup with OTEL integration, dashboards, and alerts, but I'm sure that knowledge and tooling wasn't developed overnight. My experience is that Airflow's default settings quite often limit the user before they start reaching hardware limits. This is a frustrating problem because as of today visibility (via UI/logs) on such limits is low. I'll investigate alternative options such as dashboards, but stating that it's "useless to warn the user" is not my experience since 9 out of 10 times I see people hitting a default limit instead of a user-defined limit.

Sidenote: we currently have one warning-level log regarding pool limits, should we downgrade that to INFO level? https://github.com/apache/airflow/blob/main/airflow/jobs/scheduler_job_runner.py#L450

Set log messages related to parallelism to warning level for better v…

184f9b9

…isibility

BasPH requested review from kaxil, ashb, XD-DENG, o-nikolas, pierrejeambrun and hussein-awala as code owners April 28, 2024 19:47

boring-cyborg bot added area:Executors-core LocalExecutor & SequentialExecutor area:Scheduler Scheduler or dag parsing Issues labels Apr 28, 2024

hussein-awala reviewed Apr 28, 2024

View reviewed changes

RNHTTR reviewed Apr 30, 2024

View reviewed changes

airflow/executors/base_executor.py Outdated Show resolved Hide resolved

Set a few logging statements back to info

14d8f94

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set parallelism log messages to warning level for better visiblity #39298

Set parallelism log messages to warning level for better visiblity #39298

BasPH commented Apr 28, 2024

hussein-awala left a comment

BasPH commented Apr 28, 2024

hussein-awala commented Apr 28, 2024

BasPH commented Apr 28, 2024

eladkal commented Apr 28, 2024 •

edited

RNHTTR commented Apr 29, 2024 •

edited

jedcunningham commented Apr 29, 2024

hussein-awala commented Apr 30, 2024

RNHTTR commented Apr 30, 2024

BasPH commented May 2, 2024 •

edited

Set parallelism log messages to warning level for better visiblity #39298

Are you sure you want to change the base?

Set parallelism log messages to warning level for better visiblity #39298

Conversation

BasPH commented Apr 28, 2024

hussein-awala left a comment

Choose a reason for hiding this comment

BasPH commented Apr 28, 2024

hussein-awala commented Apr 28, 2024

BasPH commented Apr 28, 2024

eladkal commented Apr 28, 2024 • edited

RNHTTR commented Apr 29, 2024 • edited

jedcunningham commented Apr 29, 2024

hussein-awala commented Apr 30, 2024

RNHTTR commented Apr 30, 2024

BasPH commented May 2, 2024 • edited

eladkal commented Apr 28, 2024 •

edited

RNHTTR commented Apr 29, 2024 •

edited

BasPH commented May 2, 2024 •

edited