Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in FileSystemCache on Windows #854

Closed
ecpost opened this issue Jan 20, 2016 · 5 comments
Closed

Race condition in FileSystemCache on Windows #854

ecpost opened this issue Jan 20, 2016 · 5 comments

Comments

@ecpost
Copy link

ecpost commented Jan 20, 2016

Python 2.7.2
Werkzeug 0.11.2

In FileSystemCache, the set() method creates a temporary file of the new cache entry and then replaces the old file with it. Under Windows, if get() is called from another thread at the same time as set's file replacement (implemented by werkzeug.posixemulation.rename), a "permission denied" IOError can occur. It looks like werkzeug.posixemulation.rename does its best to try to implement an atomic-like rename/replace for Windows, but apparently there's still a short moment where this can happen.

Maybe it's preferable (if technically feasible) to make werkzeug.posixemulation.rename 100% atomic. But for what it's worth, I made the following change for myself as a workaround, in werkzeug/contrib/cache.py, line 737 (in the latest version). This is the original version:

def get(self, key):
    filename = self._get_filename(key)
    try:
        with open(filename, 'rb') as f:
            pickle_time = pickle.load(f)
            if pickle_time == 0 or pickle_time >= time():
                return pickle.load(f)
            else:
                os.remove(filename)
                return None
    except (IOError, OSError, pickle.PickleError):
        return None

In my workaround I wrapped it in a loop that catches IOError exceptions and just pauses a moment (needs sleep() imported from time):

def get(self, key):
    filename = self._get_filename(key)
    try:
        for retry in range(20):
            try:
                with open(filename, 'rb') as f:
                    pickle_time = pickle.load(f)
                    if pickle_time == 0 or pickle_time >= time():
                        return pickle.load(f)
                    else:
                        os.remove(filename)
                        return None
            except IOError:
                sleep(0.001)
        return None
    except (OSError, pickle.PickleError):
        return None

Whether or not this is a good approach, if you need tests or anything else, I can help with that later.

I'm not sure what a good number of max retries would be. I arbitrarily used 20 (with a 1ms pause). When the error occurs, it usually only needs 1 retry, and the most I've seen (in my limited testing) was 2.

Side note: I noticed that in set(), it sets the new file's mode after renaming. So at first I thought just setting the file mode of the temp file before the rename/replace would fix all this. But that didn't work. So it does seem to be a problem with rename() itself.

@untitaker
Copy link
Contributor

werkzeug.posixemulation.rename IS atomic. Wrapping get in a loop is not a solution. TBH I don't see the race here.

@ecpost
Copy link
Author

ecpost commented Feb 5, 2016

I agree that my workaround (as I called it) is not a good solution. I was reluctant to even mention it. I'm just not familiar enough with the Windows filesystem to say if a proper solution is feasible.

It can be difficult to see race conditions just looking at code (especially if some of that code is internal to Windows). I think the script I just left HERE demonstrates that werkzeug.posixemulation.rename is not atomic. It repeatedly calls rename in one thread, and reads from the rename target file in the other thread. It intermittently outputs "Permission denied" exceptions from the thread doing the reading.

I've run it on 3 different boxes using python 2.7.10 on 64 bit Windows Server 2012, 2.7.1 on 32 bit Windows Server 2003, and 2.7.2 on 64 bit Windows Server 2008, all with the same intermittent exceptions.

@untitaker
Copy link
Contributor

Sorry, I somehow missed that you're running on Windows. I can reproduce this problem on Windows 7.

rename is MoveFileEx, which is described as "usually atomic, but don't rely on it": https://msdn.microsoft.com/en-us/library/windows/desktop/aa365240%28v=vs.85%29.aspx

ISTM that Windows is just acquiring a lock on the target name instead of actually making an atomic operation. That would explain the "permission denied" error: Acquisition of the lock failed. However, this is just pure speculation.

Unfortunately MoveFileEx seems to be the best way to get atomic file renaming on Windows. There are more system calls out there that are supposed to do similar things, but every atomic file library/implementation I've seen uses MoveFileEx under the hood.

We could include your workaround in Filesystemcache.get, but completely disable it on POSIX, and only conservatively enable it for Windows. However, we're talking about a caching interface here, and this race condition just results in a cache miss. It shouldn't crash your system in any way. Is it really worth fixing?

@ecpost
Copy link
Author

ecpost commented Feb 5, 2016

Yeah, it results in a cache miss, not a crash. And ultimately, this caused a problem in the Flask-Session extension. Maybe I should bring it up over there instead. Its 'filesystem' session type is backed by FileSystemCache. (If two threads from two browser connections pull up the session at the same time, and one has a cache miss, that empty session gets re-saved, logging the user out.)

Not to say that sessions should have guaranteed persistence. But getting logged out randomly isn't good.

Anyway, I can close this if you don't think it's worth addressing here at the cache level.

@lepture
Copy link
Contributor

lepture commented Feb 4, 2018

close it via #1249

@lepture lepture closed this as completed Feb 4, 2018
@pallets pallets deleted a comment from usingnamespaceSystem Jan 24, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants