Replies: 2 comments 2 replies
-
Thank you for the research and very detailed description! In my opinion we should implement validation for this use case and raise an appropriate exception so users could see the actual reason and fix the config. Perhaps, we could add validation here, but I'm not sure if it is a right place. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi @eladkal @potiuk @hussein-awala @Taragolis |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I would like to take the community's opinion about a bug. We have some ideas about why it is happening and how we can fix it, but it is tightly related with the airflow core. So we would like to start a discussion here and decide on a solution and hopefully implement it.
Details
I have two different DAGs called
basic_dag
andbasic_dag_triggerer
. Thebasic_dag
can be as simple as the following:and the
basic_dag_triggerer
is only responsible for triggering thebasic_dag
via theairflow.api.common.trigger_dag.trigger_dag
function but we are passing a circular configuration by mistake.When we run the
basic_dag_triggerer
in the UI (or any other method like using the CLI:airflow dags trigger basic_dag_triggerer
), the task and dag run for thebasic_dag_triggerer
can successfully finish and everything looks OK.But if you try to navigate the details page for the
basic_dag
, you will be getting some errors on the UI and the grid data will not be loading:The reason for this, we are trying to get the grid data from the backend as JSON to show it in the UI and the backend tries to serialize (and encode) the dag run config to return to the UI by calling the
json.dumps
. It is failing due to the circular reference. Here is the webserver log for this:Also, in the UI, if we click the Run button on the DAG list page for the
basic_dag
, we are seeing the generictraceback.html
page like the following:The same thing happens for this.
json.dumps
related exception. Here is the webserver log for this:If we run
basic_dag_triggerer
there is no error or if we want to navigate the details page forbasic_dag_triggerer
everything looks normal.Apart from this, the scheduler is also failing but we only have the following warning
The scheduler exception is like the following:
While execution is happening for
basic_dag_triggerer
, we are inserting new DAG run to the DB and when the scheduler is trying to read the config forbasic_dag
and start to execute, we are having thisRecursionError
.Since the scheduler is failing with this DAG run, we cannot start the scheduler. It is because every time we try to start the scheduler, it will try to read the last not executed DAG run and try to execute it. If the DAG run does not have any timeout, then the scheduler cannot start at all (we just need to remove the DAG run from DB).
How to reproduce
basic_dag.py
basic_dag_triggerer.py
basic_dag
, and wait a little bit to see it is working correctly and you can also test the Run buttonbasic_dag_triggerer
and see if it is working correctly.basic_dag
and you will see the error on the UI and also in the webserver logs.Possible Solutions
For the page rendering side, we can include some
try-except
to give a meaningful message to the user. It is relatively easy to do but for the scheduler problem it is tricky.We have two different opinions on how we can resolve this:
This is like the error box for DAG imports: if there is a funky thing about DAG, we are currently flashing a message to the user that you have DAG import errors.
We can give some kind of error to say you have a problem in your DAG and it cannot be run.
If you have any other ideas for this, please feel free to share the ideas.
Footnotes
To be able to do this, we might create a DAG Run on the fly while importing the DAG and try to serialize it. If we are seeing an error then it means we cannot execute this DAG because it might cause problems. ↩
Beta Was this translation helpful? Give feedback.
All reactions