New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MechanicalSoup needs an interface to set raw cookie strings #360
Comments
Great question, thanks for asking! The Requests session has a few methods for cookiejar manipulation, and I could definitely envision adding a |
If only they were sufficient.
They would not. Both of those take an already-parsed broken-down cookie and add it to the cookie jar: session.cookies.set("COOKIE_NAME", "the cookie works", domain="example.com") But that's not what I have. I have an un-parsed raw cookie string, because that's what the Javascript Requests does not appear to have an interface for such cookies — it leverages Python's urllib and urllib and that's why I am calling their methods. But I definitely do not love what I came up with above. If someone can point me at a better solution, ideally one that does not involve creating a fake request, that would be great! |
Thanks for the clarification. Hmm, what about something like this? from http.cookies import SimpleCookie
cookie_jar = browser.get_cookiejar()
for raw_cookie in raw_cookies_from_js:
cookie = SimpleCookie(raw_cookie)
cookie_jar.set_cookie(cookie) |
cookie = SimpleCookie(raw_cookie) If only! A similar problem. The I spent several hours trying to figure out how these interfaces (SimpleCookie, Morsels, http.cookies, etc., etc.) are supposed to work and failed to come up with anything useful. It seems needlessly complex and yet also unusable. Here's how that plays out: >>> raw_cookie="blah=value; path=/; domain=.foobar.com"
>>> from http.cookies import SimpleCookie
>>> cookie_jar = browser.get_cookiejar()
>>> cookie = SimpleCookie(raw_cookie)
>>> cookie
<SimpleCookie: blah='value'>
>>> cookie_jar.set_cookie(cookie)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.9/site-packages/requests/cookies.py", line 344, in set_cookie
if hasattr(cookie.value, 'startswith') and cookie.value.startswith('"') and cookie.value.endswith('"'):
AttributeError: 'SimpleCookie' object has no attribute 'value'
>>> cookie_jar
<RequestsCookieJar[]>
>>> |
Thanks for your follow-up. I guess I was naive in thinking that such a straightforward solution would be possible! It does seem frustrating, but I think that having a hack to make a fake request in MechanicalSoup would be equally unappealing. I don't have the expertise to know if this is a fundamental limitation of how cookies work, or if it's just a poor implementation in urllib. Maybe you could get more visibility of the issue by escalating it to Requests? |
Having an ugly hack in MechanicalSoup is unappealing, but not as unappealing as having to have it in every users' code that has to deal with this problem, I think. I am definitely not convinced that my solution is Correct or even the cleanest. It was the best I could come up with. If I don't get anywhere I'll ask with Requests. I suspect, though, that I will get a response suggesting that it's not a good idea to encourage the use of raw cookie strings, since effectively bypass controls on who can write what cookies, and you wouldn't want Site X to be able to control Site Y's cookies. The rejoinder here is that of course, when matching the brower's JS API, this is the hand we are dealt. But it may be that it's the kind of thing one doesn't encounter for users of Requests who are not dealing with MechanicalSoup. It belatedly occurs to me to look at what Mechanize does: cookiejar = self._ua_handlers["_cookies"].cookiejar
response = self.response() # copy
headers = response.info()
headers["Set-cookie"] = cookie_string
cookiejar.extract_cookies(response, self.request) Hrmm. That does seem cleaner, assuming it translates properly. This time it involves a fake response rather than a fake request. |
Hello everyone!
|
@ism, that ends up no differently than above in #360 (comment), where we got just as far, setting and examining the
So, once we've done that: >>> cookie_jar = browser.get_cookiejar()
>>> cookie_jar.set_cookie(cookie)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.9/site-packages/requests/cookies.py", line 344, in set_cookie
if hasattr(cookie.value, 'startswith') and cookie.value.startswith('"') and cookie.value.endswith('"'):
AttributeError: 'SimpleCookie' object has no attribute 'value' If you keep reading the accepted answer to that SO post, after the part you quoted, it gets a bit hokey: # Even though SimpleCookie is dictionary-like, it internally uses a Morsel object
# which is incompatible with requests. Manually construct a dictionary instead.
cookies = {}
for key, morsel in cookie.items():
cookies[key] = morsel.value That, of course, gives you a dictionary that is missing much of the parameters from our test example: >>> cookies
{'blah': 'value'} So, yeah, you could use it: >>> browser.set_cookiejar(cookies)
>>> (setting aside the fact that doing so in that fashion would lose any existing cookies in the cookie jar), but the whole point of using So, in sum: no, that doesn't solve the problem. |
Sorry, it was my bad, i misread you couldn't parse raw cookie. Now, back to topic:
I don't understand why do MechanicalSoup have to keep and parse all that info? :) After all what a client is sending to webserver is name/value tuple, removing everything else. From my experience, if additional cookie is needed i just pass it to request as additional header
As another workaround is to pass your own request Session to browser constructor and have access to all underlying transport variables you can modify. (like you may add a cookie to session.cookies without losing the rest) To recap, i don't think MechanicalSoup need any extra interface to handle such specific usecase as parsing extra cookie information and pass it to requests. Cheers! |
The additional information adds constraints. It adds, for instance, a time after which you should not send the cookie. It adds path constraints, so if the cookie is not sent to all endpoints on a given web server. It adds domain constraints, so if you're dealing with multiple servers, there is a way to determine which cookies should go to which servers. And perhaps importantly, it adds the All of these things are important to mimicing what a first-class browser does. And where cookies are used for authentication, as they often are, it's important to get them right.
See above. These are controls that determine when it is appropriate to send the cookie.
I don't understand what problem this is trying to solve. Another workaround for what, precisely?
Well, if you need to parse raw cookie strings, you need such an interface. |
Mechanize has an interface to create cookies from raw cookie strings,
browser.set_cookie(cookie_string)
, https://mechanize.readthedocs.io/en/latest/browser_api.html#mechanize.Browser.set_cookie, and MechanicalSoup does not.Not having the interface makes it really painful to do this.
My situation is I have to deal with JavaScript that uses
document.cookie="blah=value; path=/; domain=.foobar.com"
.With Mechanize I just extracted the cookie strings from the JS with a regexp and called
browser.set_cookie()
on each one.By contrast, this is the cleanest I could come up with using MechanicalSoup, and it involves using an undocumented private interface to
http.cookiejar
and is still far hairier than I would like. But manual parsing of the cookie string to cons up a "cookie" object to pass tocookie_jar.set_cookie()
would be even worse.Here's my code, with the javascript parsing omitted:
The text was updated successfully, but these errors were encountered: