Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

response.json() fails when while(1); is in response body #4454

Closed
jessemcbride opened this issue Jan 8, 2018 · 4 comments
Closed

response.json() fails when while(1); is in response body #4454

jessemcbride opened this issue Jan 8, 2018 · 4 comments

Comments

@jessemcbride
Copy link

jessemcbride commented Jan 8, 2018

In an effort to prevent JSON hijacking, some providers append while(1); before their JSON response. This doesn't appear to be filtered out by requests automatically, and the end result is a ValueError.

Expected Result

A JSON response of this form:

while(1);{"some": "json"}

should properly translate to a Python dictionary when calling response.json():

{
     "some": "json"
}

Actual Result

response.json() results in an exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 892, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

Reproduction Steps

import requests

broken = requests.get('https://gist.githubusercontent.com/jessemcbride/3306f72e92f08deff1e32c3df5c7b7c8/raw/621e5e94040486004b3b8e22157f2fb5d73a3686/test')

broken.json() # throws an exception

working = requests.get('https://gist.githubusercontent.com/jessemcbride/cc091ea99560b9b5c13632d19fefaa75/raw/1fc92d404c6f9113d83d6fdee00e4a59410a7032/test2')

working.json() # renders properly

System Information

$ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.6"
  },
  "implementation": {
    "name": "CPython",
    "version": "2.7.13"
  },
  "platform": {
    "release": "16.7.0",
    "system": "Darwin"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.18.4"
  },
  "system_ssl": {
    "version": "100020ef"
  },
  "urllib3": {
    "version": "1.22"
  },
  "using_pyopenssl": false
}

FWIW, I'm not totally convinced that requests should be responsible for fixing this. I'd like to open that up for discussion, though, as we were surprised by the behavior.

@sethmlarson
Copy link
Member

There's a lot of different anti-hijacking methods for instance I've heard of for(;;) being used as well.

@nateprewitt
Copy link
Member

Hey @jessemcbride, thanks for starting the conversation on this. I was under the impression we’d discussed something along these lines before but can’t find a ticket for reference. My general thoughts are that json() is intended to only be used on json response bodies. When while(1); is added, it’s no longer valid json and if it claims otherwise that’s incorrect.

That said, if this is a common case for you, it’s pretty easy to handle. You can do something like:

r = requests.get(url)
json_content = json.loads(r.content.strip(‘while(1);’))

Since this is easily implemented by the user, and as @SethMichaelLarson noted there’s more than just while(1); to worry about, I think it’s unlikely we’ll do anything in Requests to handle this. Hopefully that answers your questions about the behavior here.

@jessemcbride
Copy link
Author

jessemcbride commented Jan 9, 2018

@nateprewitt I think this might be the one you're looking for? #4291 (comment)

In any case, I agree with your assessment. There are too many edge cases. It's a bummer that JavaScript presents such a vulnerability in the first place :(.

Thank you for your feedback. We'll go ahead and manually patch our code to handle this. Before closing this, though, I wonder if there might be some way to document this as the intended behavior? A section on manually parsing tricky responses might do it. I'd be happy to submit a PR.

@nateprewitt
Copy link
Member

@jessemcbride It looks like our intro docs and the method's docstring specify that json() will yield a ValueError on responses without a json body.

From some quick digging, it looks like Google and Facebook implement the stops as )]}' and for(;;) respectively. They both however return the content as text/javascript, which means even if it were a valid JSON document, it shouldn't be treated that way. At least not initially.

Unless another maintainer feels strongly this would be a needed addition, this is probably a better topic for a blog or simply a comment in this issue. Thanks again for getting the discussion going though!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants