Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Browser.add_soup method forces to load chunked response into memory #288

Open
spider69 opened this issue Jul 3, 2019 · 3 comments
Open

Comments

@spider69
Copy link

spider69 commented Jul 3, 2019

I want to make get request and receive chunked response, then parse each chunk body to get information from it. But at the end of get request invoked following chain: add_soup(response) -> Browser.__looks_like_html(response) -> response.text, that forces to load all chunks into memory. Is add_soup method required when request has "stream=True"?

@moy
Copy link
Collaborator

moy commented Jul 3, 2019

We never really tested this use-case, so you should probably expect issues if you need to stream content that do not fit in memory. Clearly, you won't be able to use BeautifulSoup-related features if you can't load the page in memory, but other potential issues should be fixable, so patches welcome. On my side, I won't have time to implement it myself any time soon, but I can help if you want to work on a patch.

@yrro
Copy link

yrro commented Nov 14, 2019

I think I've run into this while trying to POST a form, follow the 302 redirect and then GET the result.

It's not too bad to dip into Browser's internals with:

form = browser.get_current_form().form
# Here we would like to call browser.submit_selected(update_state=False, stream=False)
# but mechanicalsoup will not allow us to stream the response.
# <https://github.com/MechanicalSoup/MechanicalSoup/issues/288>
response = browser._request(form, browser.get_url(), stream=True)

... and then streaming the content by accessing response.iter_content.

@sbraz
Copy link
Contributor

sbraz commented Jan 29, 2021

Thanks for the workaround. It would be really helpful if this were documented somewhere. I think there are many cases where submitting a form returns a file to download that we do not want to load into memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants