Airflow statsd stops sending metrics during maximum dagrun #39664
Replies: 4 comments
-
Can you confirm if this is happening consistently when you run 200+ tasks in parallel |
Beta Was this translation helpful? Give feedback.
-
@rawwar , yes I can confirm that the issue occurs intermittently while we run 250+ tasks in parallel |
Beta Was this translation helpful? Give feedback.
-
All signs here that this issue with |
Beta Was this translation helpful? Give feedback.
-
I have upgraded the statsd to latest version and blocked few metrics on airflow for performance, Will observe for a while and update here. |
Beta Was this translation helpful? Give feedback.
-
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.8.3
What happened?
Statsd stopped sending metrics while we ran tasks more than 200 in parallel in multiple dags. Restarting the statsd pod solved the issue and the metrics exposing. No logs were found in the statsd pod and no spike in cpu or memory is found in the statsd pod.
What you think should happen instead?
The statsd should not stop sending metrics while we run tasks more than 200 in parallel in multiple dags.
How to reproduce
Run tasks more than 200 in parallel in multiple dags
Operating System
Amazon Linux 2
Versions of Apache Airflow Providers
pytest>=6.2.5
docker>=5.0.0
crypto>=1.4.1
cryptography>=3.4.7
pyOpenSSL>=20.0.1
ndg-httpsclient>=0.5.1
boto3>=1.34.0
sqlalchemy
redis>=3.5.3
requests>=2.26.0
pysftp>=0.2.9
werkzeug>=1.0.1
apache-airflow-providers-cncf-kubernetes==8.0.0
apache-airflow-providers-amazon>=8.13.0
psycopg2>=2.8.5
grpcio>=1.37.1
grpcio-tools>=1.37.1
protobuf>=3.15.8,<=3.21
python-dateutil>=2.8.2
jira>=3.1.1
confluent_kafka>=1.7.0
pyarrow>=10.0.1,<10.1.0
Deployment
Official Apache Airflow Helm Chart
Deployment details
Official helm chart deployment.
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions