Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added a new condition before launching the self._scan_stale_dags() fu… #39159

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

tchakib
Copy link

@tchakib tchakib commented Apr 22, 2024

…nction

this condition will allow us to have the choice of purging expired dags or keeping them


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

…nction

this condition will allow us to have the choice of purging expired dags or keeping them
@boring-cyborg boring-cyborg bot added the area:Scheduler Scheduler or dag parsing Issues label Apr 22, 2024
Copy link

boring-cyborg bot commented Apr 22, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@potiuk
Copy link
Member

potiuk commented Apr 22, 2024

In order for that change would be considered for merging - it needs to have more explanation, documentation, configuration parameter added and described in the templates and unit tests.

@tchakib tchakib requested a review from potiuk as a code owner April 22, 2024 13:29
Copy link
Contributor

@Taragolis Taragolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

airflow/dag_processing/manager.py Outdated Show resolved Hide resolved
tchakib and others added 2 commits April 22, 2024 16:21
Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>
Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>
@tchakib
Copy link
Author

tchakib commented Apr 22, 2024

Pourriez-vous également ajouter des cas de test pour celui-ci dans le https://github.com/apache/airflow/blob/main/tests/dag_processing/test_processor.py

Ok and thanks for the two suggestions :)

@Taragolis Taragolis added this to the Airflow 2.10.0 milestone Apr 22, 2024
Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work on your first PR. Have some comments, and also unit tests are needed

airflow/config_templates/config.yml Outdated Show resolved Hide resolved
@@ -500,6 +500,13 @@ core:
type: integer
example: ~
default: "4096"
disalble_scan_stale_dags:
description: |
Stale dags are deleted by default, [core] disalble_scan_stale_dags is True if you want to keep them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct this in context of above comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -59,6 +59,8 @@ unit_test_mode = True
killed_task_cleanup_time = 5
# We only allow our own classes to be deserialized in tests
allowed_deserialization_classes = airflow.* tests.*
# expired dags are deleted by default, you can put true in this variable if you want to keep them.
disalble_scan_stale_dags = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In a versioned DAG context, there may be a need to run two versions of DAGs in parallel in two versions of workers.
It can also be useful to keep the allowed DAGs if they are still in progress in the worker (n -1).
To keep the allowed DAGs, you can change the value of the variable ``AIRFLOW__CORE__DISABLE_SCAN_STALE_DAGS`` to ``True``. By default, it is set to ``False``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention that we can also set in config.yml

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@uranusjr
Copy link
Member

There are a couple of exceptions, but I think the configs are generally positive, i.e. this should use scan_stale_dags (or enable_...) and default to True.

@tchakib
Copy link
Author

tchakib commented Apr 23, 2024

There are a couple of exceptions, but I think the configs are generally positive, i.e. this should use scan_stale_dags (or enable_...) and default to True.

Indeed, if it suits everyone we can go there :)

@tchakib
Copy link
Author

tchakib commented Apr 24, 2024

it also needs actual unit tests to check if this works.

Ok, it's in progress 👍

@@ -924,3 +924,10 @@ if it fails for ``N`` number of times consecutively.
we can also provide and override these configuration from DAG argument:

- ``max_consecutive_failed_dag_runs``: Overrides :ref:`config:core__max_consecutive_failed_dag_runs_per_dag`.

Disable deletion of stale DAGs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to clarify this. DAGs are always “deleted” (from the DAG file) to become stale, so we should try to explain here about the concept of DAGs being stale, and what keeping stale DAGs actually means.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler Scheduler or dag parsing Issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants