Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DVC and syrupy. .dvc tracking files deleted by "pytest --snapshot-update" #694

Open
drjasonharrison opened this issue Feb 3, 2023 · 7 comments

Comments

@drjasonharrison
Copy link

Describe the bug

DVC is a tool that combines git and external storage to track large files that shouldn't be added to the git repository but that need to be tracked and stored. For a PNG file like __snapshots__/test_image/test_image.jpeg running

```
dvc add __snapshots__/test_image/test_image.jpeg
```

will

  1. add a .gitignore entry for the file __snapshots__/test_image/test_image.jpeg
  2. create a file __snapshots__/test_image/test_image.jpeg.dvc to track the file
  3. execute git add __snapshots__/test_image/test_image.jpeg.dvc .gitignore

Then you use git commit; dvc push; git push to copy everything to the storage and repositories.

So the snapshots created by syrupy are still in the working copy of the repository, but now there are .dvc files next to them. A new git clone will not include the syrupy snapshots until dvc pull is executed.

When I ran pytest, syrupy reported

```
9 snapshots passed. 7 snapshots unused.

Re-run pytest with --snapshot-update to delete unused snapshots.
```

So I did pytest --snapshot-update and the .dvc files under the __snapshots__ directory
were deleted. git restore brought them back.

To reproduce

Note: This uses bash commands to create a .dvc file that will be deleted
by running pytest --snapshot-update. This simplifies the reproduction steps.

  1. mkdir dvc-and-syrupy

  2. cd dvc-and-syrupy

  3. pipenv --python 3.7

  4. pipenv shell

  5. git init

  6. wget https://raw.githubusercontent.com/tophat/syrupy/main/tests/examples/test_custom_image_extension.py

  7. python -m pip install pytest syrupy

  8. pytest --snapshot-update

  9. git status

    On branch master
    
    No commits yet
    
    Untracked files:
      (use "git add <file>..." to include in what will be committed)
            Pipfile
            __snapshots__/
            dvc-and-syrupy.md
            test_custom_image_extension.py
    
    nothing added to commit but untracked files present (use "git add" to track)
  10. echo "this is a test dvc file" > snapshots/test_custom_image_extension/test_jpeg_image.jpg.dvc

  11. echo "/test_jpeg_image.jpg" >> snapshots/test_custom_image_extension/.gitignore

  12. git add snapshots/ test_custom_image_extension.py

  13. git status

    On branch master
    
    No commits yet
    
    Changes to be committed:
      (use "git rm --cached <file>..." to unstage)
            new file:   __snapshots__/test_custom_image_extension/.gitignore
            new file:   __snapshots__/test_custom_image_extension/test_jpeg_image.jpg.dvc
            new file:   test_custom_image_extension.py
    
    Untracked files:
      (use "git add <file>..." to include in what will be committed)
            Pipfile
  14. git commit -m "dvc tracking of snapshots"

    [master (root-commit) 90fb787] dvc tracking of snapshots
     3 files changed, 44 insertions(+)
     create mode 100644 __snapshots__/test_custom_image_extension/.gitignore
     create mode 100644 __snapshots__/test_custom_image_extension/test_jpeg_image.jpg.dvc
     create mode 100644 test_custom_image_extension.py
  15. pytest --snapshot-update

    =============================================================== test session starts ===============================================================
    platform linux -- Python 3.7.13, pytest-7.2.1, pluggy-1.0.0
    rootdir: /home/jason/mm/dvc-and-syrupy
    plugins: syrupy-3.0.6
    collected 1 item
    
    test_custom_image_extension.py .                                                                                                            [100%]
    
    ------------------------------------------------------------- snapshot report summary -------------------------------------------------------------
    1 snapshot passed. 1 unused snapshot deleted.
    
    Deleted unknown snapshot fossil (__snapshots__/test_custom_image_extension/test_jpeg_image.jpg.dvc)
    ================================================================ 1 passed in 0.02s ================================================================
  16. git status

    On branch master
    Changes not staged for commit:
      (use "git add/rm <file>..." to update what will be committed)
      (use "git restore <file>..." to discard changes in working directory)
            deleted:    __snapshots__/test_custom_image_extension/test_jpeg_image.jpg.dvc
    
    Untracked files:
      (use "git add <file>..." to include in what will be committed)
            Pipfile
    
    no changes added to commit (use "git add" and/or "git commit -a")

Expected behavior

It would be nice to have a way to tell syrupy that
certain files, or file patterns, under the snapshot folder should be ignored.

Screenshots

N/A

Environment (please complete the following information):

  • OS: Ubuntu 18.04 but it's not an OS issue
  • Syrupy Version: 2.
  • Python Version: 3.7.13

Additional context

Right now I have in my repos, .gitignore, .dockerignore, .dvcignore, .jshintignore,
.npmignore, .prettierignore, .eslintignore..... Perhaps it's a sign that a tool has
reached a certain maturity when it accepts guidance on how to ignore unexpected files in
it's sandbox.

@drjasonharrison
Copy link
Author

Additional note, with the unused snapshots (the .dvc files) pytest exits with an error code. So while every test can pass, the presence of the .dvc files results in a error which will require adjustments to CI/CD pipelines. One such adjustment is to add the --snapshot-warn-unused flag to the pytest command.

@tolgaeren
Copy link
Contributor

Hi Jason,

I saw your issue after submitting this #703 . I think the issues might be related since the dvc file has the following extension .jpg.dvc. We ran into a similar issue with .png.zip. I am thinking basically having an multiple. in the extension causes syrupy to behave unexpectedly.

@hoffch
Copy link

hoffch commented Feb 9, 2023

We use exactly the same setup as @drjasonharrison for managing our snapshot files with DVC (very convenient!). We hence run into the same issues: when running pytest with syrupy, it tells us X snapshots unused. and returns a non-zero status code, even if no tests failed. Consequentially, the related CI jobs fails.

Would be great one could specify a "file ignore list" for syrupy.

Otherwise, it's a fantastic tool! Keep up the good work!

@drjasonharrison-vp-eio
Copy link

@hoffch add the --snapshot-warn-unused flag to the pytest command.

@tolgaeren I doubt that the problem is multiple dots in the filename. Syrupy is assuming that all files in the snapshots folder are from running pytest --update-snapshots. When pyest runs syrupy tracks which files in the snapshots folder are accessed and errors or warns (see above) about unused files.

Possible solutions include a "file ignore list" such as a .syrupyignore file, or a better understanding of other tools. The first is general but lacks insight that each snapshot file will also have a separate tracking file. The second would require code changes for each additional tool that might make files in the snapshots folder.

Alternatively, DVC could be used to track the root snapshots folder rather than the individual snapshot files or test folders.

So that's two workarounds.

@noahnu
Copy link
Collaborator

noahnu commented Feb 13, 2023

I'd prefer not to have tool specific logic in syrupy if it can be helped.

For that reason I like the "ignore" file/config approach.

For some context, the reason syrupy considers all file extensions as possible snapshots is because it's possible to write custom extensions with arbitrary extensions. The single file extension makes this quite easy.

@drjasonharrison
Copy link
Author

Another option would be a method to tell syrupy that for every snapshot file ignore the additional file with appended end. This would allow dvc files that no longer track snapshots to be flagged as removable.

@noahnu
Copy link
Collaborator

noahnu commented Feb 13, 2023

Unused snapshot detection logic for reference:

unused_collections = SnapshotCollections()

and the default "discover_snapshots" method of the extension class:

def discover_snapshots(

The ignore config is the simplest approach since we could just pass an ignore list to walk_snapshot_dir.

Another option would be a method to tell syrupy that for every snapshot file ignore the additional file with appended end.

It's an interesting idea, however if someone is playing around with a custom extension and they rename the file extension, that may cause issues. It also somewhat conflicts with #703.

@noahnu noahnu added the good first issue Good for newcomers label Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants