Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: check and cache the task definition, variables, status #1401

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

aminya
Copy link
Contributor

@aminya aminya commented Nov 15, 2023

This adds a new checker for Task definitions, status, and any other variable that affect a task.

The implementation hashes the task structure and writes it to .task/definition. In later runs, this is checked to make sure the definition is up-to-date. If the definition has changed (e.g. one of the variables has changed), the task will run again.

The approach is configurable through check_definition. Now, by default ("auto"), the definition is considered when the sources are defined and not the status checks. If desirable, check_definition can be set to "always" to always consider the definition of the task. It can be also disabled via "never" altogether regardless of other conditions.

Fixes #548
Fixes #455

@aminya aminya force-pushed the check-variables branch 2 times, most recently from bfe40d2 to e2449fe Compare November 16, 2023 09:36
@aminya
Copy link
Contributor Author

aminya commented Dec 5, 2023

@andreynering Would you take a look at this pull request?
This fixes the Task false-cache hit issues when the variables change. Without this, we cannot use Task caching altogether as it incorrectly skips running some of the tasks.

@andreynering
Copy link
Member

Hi @aminya!

First of all, thanks for your contribution!

After a quick look, the code itself seems to go in the right direction.

As said in comments like #548 (comment), #455 (comment) and #455 (comment), enabling this could add some unintended side-effects. Also, it would be a breaking change (although a relatively small one).

In other words: this needs a bit more thought. I think this could maybe be enabled via a global setting in the Taskfile.

/cc @pd93 (just in case you have any thoughts)

@pd93
Copy link
Member

pd93 commented Dec 17, 2023

Hey both, just adding some quick thoughts while I have a bit of time:

  1. I agree with @andreynering regarding the code. I like the general direction it's going in and that it follows the same style as the other fingerprinting checker interfaces that we refactored earlier this year.

  2. I still stand by my previous comments in Reset task status if definition has changed #455.

More specifically, it appears that you went with the second option that I outlined which I think has some flaws. Mainly that I don't think changing a Task's desc or summary should invalid its cache. We should probably only stick to properties that affect a task's outcome. i.e. cmds and deps.

Additionally, I'm not convinced that enabling this new behaviour by default is the correct decision. Sometimes running a task can be destructive and this is why we write status checks - to ensure that a task will not run depending on the state of the system. Overwriting this behaviour when a task's definition changes will absolutely trip someone up at some point. Especially given that a task's command could change entirely and still have the same outcome as before.

One way or another, we have to accept one of two fates. Either:

  1. A task will occasionally not run when a user actually wanted it to (safe, current behaviour)
  2. A task will occasionally run when a user did not want it to (unsafe)

We have a responsibility to ensure that Task users have confidence in the outcome of their commands when they run them and I think the second option goes against this. Particularly when you consider that the first problem can be quickly solved by intuitively adding the --force flag to your command.

If there is a desire from the community/other maintainers to continue with this change then I would like to entertain some ways that we can make it clearer to the user that a task's cache is invalidated when the definition changes. A couple of ideas that I haven't given much thought:

  • A global/task-level key to enable the behaviour, alongside a warning/disclaimer in our documentation about the dangers of enabling it.
  • Showing a prompt before the task runs saying that it is only running because the definition has changed and not because of the status check and asking the user if they wish to continue.

@aminya
Copy link
Contributor Author

aminya commented Dec 17, 2023

@pd93 I think instead of making things complicated and keeping the current buggy behaviour, we can do this:

  • When the user has added status checks, do not check the definition of the task and only rely on the status check to run the test. This would mean the addition of status checks.

  • In other cases, check the task definition like GNU Make and cache the results according to sources/generates.

Doing this will solve your concerns regarding destructive tasks, but will also fix the current buggy behaviour where sometimes tasks do not run although the environment variables have changed.

I also want to mention that the current buggy behaviour is also dangerous. For example, I expect the task to run again when a variable changes, but currently task skips those in the subsequent calls, which can potentially result in incorrect/dangerous results:

- find_user_home a with variable user = A
- task uninstall_tool_for_user with tool = coffee   # removes ~/A/coffee
- find_user_home  with variable user = B        # currently skipped incorrectly so user stays as A!!
- task uninstall_tool_for_user with tool = tea  # it incorrectly removes tea for user A, while we wanted to remove tea for user B

This pull request fixes this case, which is the main pain point of using caching with Task

If you want this to be configurable, we can add a check_definition flag with these possible values:

  • auto (Default): check definition when status check is not specified for a task.
  • always: always check the definition
  • never: ignore the definition

@aminya
Copy link
Contributor Author

aminya commented Feb 7, 2024

@andreynering @pd93

I have updated the approach to be configurable. Now, by default ("auto"), it will not check the definitions if status checks are defined for a task, but it considers the definition when sources are defined.

If desirable, check_definition can be set to "always" to always consider the definition of the task.

Rebased the branch and added tests to check and support the addition.

@aminya
Copy link
Contributor Author

aminya commented Mar 23, 2024

Is there any reason why this was not merged? Fixing conflicts takes time every time. I appreciate some feedback @andreynering

@andreynering andreynering self-requested a review March 23, 2024 09:50
@aminya
Copy link
Contributor Author

aminya commented Apr 25, 2024

I have rebased this branch again. Could you take a look? @andreynering @pd93

@aminya
Copy link
Contributor Author

aminya commented May 4, 2024

For those waiting on this essential feature:

I ended up forking task and publishing the binaries here:
https://github.com/aminya/task/releases/tag/v3.37.0

In GitHub Actions, I have something like this to patch task via these binaries:

Details
      - name: Setup Cpp
        uses: aminya/setup-cpp@v1
        with:
          task: true

      - name: Patch Task
        run: |
          cd ~/task
          curl -LJO https://github.com/aminya/task/releases/download/v3.18.0/task_linux_amd64.tar.gz
          tar -xvf task_linux_amd64.tar.gz
          chmod +x ./task
          rm task_linux_amd64.tar.gz
          task --version

@andreynering
Copy link
Member

Hi @aminya,

I'm sorry for the wait! I know many months have passed. The thing is just that this project demands more work than we actually have to dedicate to it, so we advance in the speed we can. Thanks for your patience!

Hopefully I'll be able to review this soon. I want to try this feature to have a clearer opinion on how this should behave.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interpolated variables should be factored into checksum Reset task status if definition has changed
3 participants