You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The JSONFileCache class __getitem__ accessor tries to load JSON from a file, but does not handle syntax errors in that file, raising an exception that bubbles up to the client application.
Given this is a cache, a failure to read from the cache should be treated as a cache miss, not an error.
Expected Behavior
In botocore/credentials.py, in the CachedCredentialFetcher class, I expected _load_from_cache() to return None if the cache was invalid, causing credentials to be re-fetched and the cache to be rebuilt.
Current Behavior
I found this while using awscli, which consumes botocore. When a credentials cache file was malformed, every command using that particular AWS account failed super cryptically, printing the bad file's cache key to STDERR:
$ python -m venv json-cache-repro
$ . json-cache-repro/bin/activate
(json-cache-repro) $ pip install botocore
(json-cache-repro) $ python3 json-cache-repro.py
Traceback (most recent call last):
File "/Users/me/json-cache-repro/lib/python3.11/site-packages/botocore/utils.py", line 3518, in __getitem__
return json.load(f)
^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 293, in load
returnloads(fp.read(),
^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/me/json-cache-repro.py", line 17, in<module>fetcher.fetch_credentials()
File "/Users/me/json-cache-repro/lib/python3.11/site-packages/botocore/credentials.py", line 685, in fetch_credentials
returnself._get_cached_credentials()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/me/json-cache-repro/lib/python3.11/site-packages/botocore/credentials.py", line 693, in _get_cached_credentials
response = self._load_from_cache()
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/me/json-cache-repro/lib/python3.11/site-packages/botocore/credentials.py", line 711, in _load_from_cache
creds = deepcopy(self._cache[self._cache_key])
~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/Users/me/json-cache-repro/lib/python3.11/site-packages/botocore/utils.py", line 3520, in __getitem__
raise KeyError(cache_key)
KeyError: 'foo-botocore-test'
Possible Solution
Workaround
Delete the affected cache file, or delete all files in the cache dir (e.g. ~/.aws/cli/cache/).
Fix
This is tricky. JSONFileCache doesn't know what it's storing, and CachedCredentialFetcher doesn't know what kind of cache it's using. CachedCredentialFetcher assumes that if a key exists in the cache, it's valid to read, which is incorrect for JSONFileCache.
Suggestion:
In JSONFileCache#__getitem__, catch (OSError, ValueError, json.decoder.JSONDecodeError) and re-raise any of them as a new CacheMissError.
In CachedCredentialFetcher#_load_from_cache, catch CacheMissError and return None. This will hopefully trigger a credential refresh that will overwrite the invalid cache file.
Triage
If the maintainers prefer to raise exceptions here (maybe there's a lot of extra work to change behavior), in the meantime, the exception message in JSONFileCache#__getitem__ could be improved to suggest manual cleanup:
try:
withopen(actual_key) asf:
returnjson.load(f)
except (OSError, ValueError, json.decoder.JSONDecodeError):
raiseKeyError(f'Cache file {actual_key} may be corrupt, please delete it.')
Additional Information/Context
It took me most of a day to track this down when it started causing awscli errors. I initially thought it was an AWS permissions issue, or a problem with my local credentials, or corrupted awscli configuration, or my environment, or something a reboot might fix. None of my teammates could reproduce it.
I was disappointed to find out it was a cache error!
Thanks for your consideration.
SDK version used
v1.34.28 (also observed in v2.0.0dev155, installed as awscli v2.15.13 dependency via homebrew)
Environment details (OS name and version, etc.)
MacOS 14.2.1 (arm64)
The text was updated successfully, but these errors were encountered:
Thinking about it last night, I also believe the current code will never heal a corrupted cache file on its own. With the current strategy, a cache entry is only updated if it doesn't exist yet (cache miss) or the data can be parsed and evaluated for expiration. A cache file that can't be read or parsed will break the library forever unless there's manual intervention to clear the cache on disk, and that intervention is less likely because the exceptions raised by the cache class don't contain enough information to suggest a solution.
Thanks for reaching out and for your patience here. I could reproduce the errors you referenced using your code snippet. I will plan to bring this up with the team for discussion.
tim-finnigan
added
needs-review
This issue or pull request needs review from a core team member.
p2
This is a standard priority issue
and removed
investigating
This issue is being investigated and/or work is in progress to resolve the issue.
needs-triage
This issue or PR still needs to be triaged.
labels
May 20, 2024
After discussing with the team, we think that there may be some opportunities to improve the behavior here. I created a PR with the error message you suggested for the team to consider: #3183. Further investigation may still be needed in terms of when the error occurs in Botocore/the AWS CLI.
Describe the bug
The
JSONFileCache
class__getitem__
accessor tries to load JSON from a file, but does not handle syntax errors in that file, raising an exception that bubbles up to the client application.Given this is a cache, a failure to read from the cache should be treated as a cache miss, not an error.
Expected Behavior
In
botocore/credentials.py
, in theCachedCredentialFetcher
class, I expected_load_from_cache()
to returnNone
if the cache was invalid, causing credentials to be re-fetched and the cache to be rebuilt.Current Behavior
I found this while using
awscli
, which consumesbotocore
. When a credentials cache file was malformed, every command using that particular AWS account failed super cryptically, printing the bad file's cache key to STDERR:Using a debugger, I located the failure at
utils.py:3518
, whenbotocore
failed to parse a cache file.Cache file contents (prettified here):
(The bit at the bottom,
: true, "RetryAttempts": 2}}
invalidates the JSON.)Reproduction Steps
Here's a minimal reproduction:
Example run:
Possible Solution
Workaround
Delete the affected cache file, or delete all files in the cache dir (e.g.
~/.aws/cli/cache/
).Fix
This is tricky.
JSONFileCache
doesn't know what it's storing, andCachedCredentialFetcher
doesn't know what kind of cache it's using.CachedCredentialFetcher
assumes that if a key exists in the cache, it's valid to read, which is incorrect forJSONFileCache
.Suggestion:
JSONFileCache#__getitem__
, catch(OSError, ValueError, json.decoder.JSONDecodeError)
and re-raise any of them as a newCacheMissError
.CachedCredentialFetcher#_load_from_cache
, catchCacheMissError
and returnNone
. This will hopefully trigger a credential refresh that will overwrite the invalid cache file.Triage
If the maintainers prefer to raise exceptions here (maybe there's a lot of extra work to change behavior), in the meantime, the exception message in
JSONFileCache#__getitem__
could be improved to suggest manual cleanup:Additional Information/Context
It took me most of a day to track this down when it started causing
awscli
errors. I initially thought it was an AWS permissions issue, or a problem with my local credentials, or corruptedawscli
configuration, or my environment, or something a reboot might fix. None of my teammates could reproduce it.I was disappointed to find out it was a cache error!
Thanks for your consideration.
SDK version used
v1.34.28 (also observed in v2.0.0dev155, installed as awscli v2.15.13 dependency via homebrew)
Environment details (OS name and version, etc.)
MacOS 14.2.1 (arm64)
The text was updated successfully, but these errors were encountered: