Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VSS error: async operation pending #3142

Closed
JPPhoto opened this issue Dec 1, 2020 · 9 comments
Closed

VSS error: async operation pending #3142

JPPhoto opened this issue Dec 1, 2020 · 9 comments

Comments

@JPPhoto
Copy link

JPPhoto commented Dec 1, 2020

Output of restic version

restic 0.11.0 compiled with go1.15.3 on windows/amd64

How did you run restic exactly?

.\restic.exe backup --use-fs-snapshot --cache-dir \\fileserver\shared\.cache --exclude-caches -r rest:http://[...] --exclude C:\swapfile.sys --exclude C:\hiberfil.sys --exclude C:\pagefile.sys C:\

My local machine is running Windows 20H2. I encountered the problem while backing up a 1 TB HDD and again when I swapped that for a 1 TB SSD (thinking that the slow HDD was to blame).

What backend/server/service did you use to store the repository?

I'm using a REST backend on a remote machine.

Expected behavior

Restic should successfully make a filesystem snapshot before running normally.

Actual behavior

repository 94fe90ee opened successfully, password is correct
creating VSS snapshot for [c:\]
error: failed to create snapshot for [c:\]: VSS error: async operation pending

Restic continues working after that error (should it be fatal?) and backs up all files that it can access, but obviously those in use are inaccessible because it doesn't believe it has a working shadow copy. However, the shadow copy was created by the OS and is available during restic's run.

Steps to reproduce the behavior

I was able to reproduce this by running the backup command again, whether or not I first cleared out all shadow copies. I've rebooted several times since and can no longer reproduce the problem.

Do you have any idea what may have caused this?

No clue! It seems like there might be some timeout or race condition waiting for VSS to create the shadow copy.

Do you have an idea how to solve the issue?

Unfortunately not.

Did restic help you today? Did it make you happy in any way?

I love restic and use it to back up about 15TB on a regular basis. I'm a huge fan of its deduplication.

@rawtaz
Copy link
Contributor

rawtaz commented Dec 1, 2020

@fgma Ping, is this something you can look at? If there's a need for reproducability and @JPPhoto can't reproduce it, let's close it for now and re-open it later if need be.

@DRON-666
Copy link
Contributor

DRON-666 commented Dec 1, 2020

@JPPhoto Restic VSS implementation hardcoded 120sec as snapshot timeout. If you have a lot of files on your disks, it may take over 2 minutes.

@JPPhoto
Copy link
Author

JPPhoto commented Dec 1, 2020

@rawtaz @fgma It looks from the code that NewVssSnapshot is called with a timeout of two minutes, and for some reason that wasn't sufficient on my system. Is VSS_S_ASYNC_PENDING actually an error condition or should the code wait and somehow see if the snapshot becomes available?

@MichaelEischer
Copy link
Member

Doesn't the VSS service itself enforce certain timelimits? Maybe we could use that as timeout?

@DRON-666
Copy link
Contributor

DRON-666 commented Dec 2, 2020

Doesn't the VSS service itself enforce certain timelimits?

Yes, there is some timelimits, but I don't think IdleTimeout is applicable in this situation (read "Additional Information" in this KB article), and by default it's only 3 minutes.

@fgma
Copy link
Contributor

fgma commented Dec 3, 2020

I've initially used 2 minutes randomly. All my systems snapshot way faster than that. I'm lacking access to system with really big drives or heavy load to test this. VSS_S_ASYNC_PENDING means it is still working (see https://docs.microsoft.com/en-us/windows/win32/api/vss/nf-vss-ivssasync-querystatus for microsofts description).

@JPPhoto Can you test creating a snapshot manually using vssadmin? https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/cc788055(v=ws.11) Might be interesting how long it takes on your system. We should probably extend the default timeout based on this knowledge. I think @DRON-666 also prepared a PR to configure the timeout.

@JPPhoto
Copy link
Author

JPPhoto commented Dec 3, 2020

@fgma Running the command wmic shadowcopy call create Volume='C:\' took 35.81 seconds when I just did it. Obviously something was up when I tried it before and it took longer than two minutes, but IMHO the behavior should be to wait until the asynchronous operation either succeeds or fails or simply report the VSS_S_ASYNC_PENDING error and terminate; I don't see how continuing to back up from a volume with a lot of open files while a shadow copy is underway is a good failure mode for restic. Perhaps failing completely with a suggestion to add to the timeout via a command-line argument sounds like something good for @DRON-666's PR.

@DRON-666
Copy link
Contributor

DRON-666 commented Dec 3, 2020

I don't see how continuing to back up from a volume with a lot of open files while a shadow copy is underway is a good failure mode for restic.

It's better to backup something than nothing, and you can detect partial backups by exit code 3 in you scripts.

@DRON-666
Copy link
Contributor

DRON-666 commented Dec 3, 2020

Maybe I need to write small (first three commits) "timeout only" version of #3067?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants