-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-resumed failed downloads #2006
Conversation
In proxy v2, all requests go through the proxy, even in development mode, so there's only one code path to implement. When downloading files, we set securedrop-client/client/securedrop_client/sdk/__init__.py Lines 127 to 186 in a93ca77
securedrop-client/proxy/src/main.rs Lines 93 to 111 in a93ca77
Yep, that seems like a good idea, test data is currently generated by the loaddata.py script so adding stuff would go there. |
I've rebased from the proxy-rusting branch (#1718), so the commits are starting to look messy. I've started to refactor the API code to support streaming, but I'm hitting an issue with using the new VCRAPI. Specifically, in order for streaming responses to actually work, I've changed the type of @dataclass(frozen=True)
class StreamedResponse:
"""Container for streamed data along with the filename and ETag checksum sent by the server."""
contents: BufferedReader
sha256sum: str
filename: str
content_length: int And I also made it so that when The problem I'm hitting with VCRAPI is ultimately:
The VCR is trying to pickle a StreamedResponse, but since it contains the stdout for a running process, it just can't be done. I'm not sure the best way to proceed. Since I'm be reviewing #1718 soon, and that's where VCRAPI was implemented, I think I may point this out there instead and possibly request changes. |
The idea is that both StreamedResponse and JSONResponse are serializable, so we can save them in the YAML files for VCR. Do we need to stream the bytes through |
Ahh okay, this makes a lot of sense. I'll try this out. |
Actually, there's an issue with
Does the client every download multiple submissions simultaneously? I could see multiple 500mb submissions causing a DOS on SDW, by exhausting the memory. It might be a good idea to talk this through and figure out exactly how it should work. |
I think downloads are in series, but I agree with your general point that we should buffer downloads. And we have two use cases of of wanting to both serialize the raw bytes but also keep them out of memory and on disk. Maybe we could do something like:
What do you think? |
c1ce17f
to
d6f2fd1
Compare
My earlier rebase was extremely messy, so since proxy v2 to was merged into main, I cleaned up all of my commits (including writing detailed commit messages) and force pushed. This PR is now ready for review. Here are some things to consider: I get As we discussed above, the streaming and retrying now happens directly in # FIXME: For now, store the contents as bytes
fobj.seek(0)
contents = fobj.read()
fobj.close() I didn't mess with the VCR cassette code to try to serialize file objects with Do you think we should change it in this PR, or punt that for the future? This also includes two new tests:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay! Left comments - let me know if it would be helpful to do a pairing session (sync or async) on any of the feedback.
I think it would be also good to split the retry loop into its own function so we don't end up with super indented code and a 150-line function :)
|
||
# Check for errors | ||
if returncode != 0: | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these errors should raise right away, it should get retried too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this got missed - these errors should retry as well. Let's do this in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, no wait, you added c3074c6. Let me debug why I saw an error a bit more...
@legoktm this should address everything. Here are the main changes:
|
beea866
to
2dc6d67
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, in dev testing it looks good but in Qubes/SDW I'm hitting errors with the initial sync failing - let me debug that further.
Alright I think this is ready! I got the SDK tests to pass by this: 1a103a7 |
Yay, it looks good to me as well. I'll rebase and then do one more round of testing on Qubes before merging. |
I set up a server with 50MB files, set 10 retries via qubesdb, start a large download, and then |
1ee3538
to
c4569c3
Compare
I figured out what was going on, and 1ee3538 fixes it now. I added a bunch of debug logs, ran it in sd-app, tried downloading a large file over tor, and killed securedrop-proxy in the middle of the download. I discovered it was actually failing at this exception: securedrop-client/client/securedrop_client/sdk/__init__.py Lines 236 to 240 in 66f2984
During the loop that reads chunks from the securedrop-client/client/securedrop_client/sdk/__init__.py Lines 174 to 181 in 66f2984
It turns out, when the securedrop-proxy process is killed, I fixed it by checking the process return code when it's done executing. If it's non-zero, I set Also, I moved the block of code that reads the contents of the streaming file from disk to memory to below this return code check. Here's that code: securedrop-client/client/securedrop_client/sdk/__init__.py Lines 186 to 189 in 66f2984
Since fobj is a tempfile, when This commits adds several new debug logs, but it will only get logged if you run securedrop-client like this: Also when testing this, I actually killed the securedrop-proxy process twice for the same download, and watched it retry twice and still finish successfully. |
Rebased to clear the safety alert... |
b1f8956
to
10a0951
Compare
The PR looks good! I successfully downloaded a ~50MB file over 3 requests. On the server side, I saw it responding with HTTP 206 for Partial Content. I pushed one commit to make the value in QubesDB optional, thanks to Cory for reviewing. I'm going to do a partial squash and then merge this (once tests pass). |
88acf0a
to
8252312
Compare
A new _streaming_download function runs a loop that retries failed downloads, using a configurable value from QubesDB/environment (default 3). The files are streamed to disk in a temporary file and then loaded in memory (a FIXME exists for that) to return a StreamedResponse with the contents as bytes. The test works by mocking subprocess.Popen. The first time it uses Popen to download a file, it returns 200 bytes and then raises subprocess.TimeoutExpired, and then subsequent times it uses the real, non-stubbed Popen with range requests. test_download_submission_autoresume_fail makes sure that in the event of a timeout that doesn't auto-resume, a RequestTimeoutError is ultimately thrown. We also generate bigger random files for tests using LOADDATA_ARGS to effectively test this. `__reduce__` methods were added to the custom exceptions because otherwise an exception was thrown when VCR tried to deepcopy them.
ec045cc
to
6d56025
Compare
Currently QubesDB lookups are required, while environment lookups are optional. For fields with a default value, the QubesDB lookup is no longer required. Ideally we'd also make the environment lookups mandatory if the field is required (`journalist_key_fingerprint`), but this causes a number of issues with tests and should be done separately. We were relying on `gpg_domain` being optional in env but not in QubesDB; it's now explicitly optional at the type level. Fixes #2041.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, finally got the tests to pass. LGTM, thanks @micahflee!
@cfm if you have time to look at 0b40907 again since I changed it since the first time, then I think we can land this.
Status
Ready for review
Description
Will fix #1994.
I've primarily edited
API.download_submission
. There's a big code block the only runs ifself.proxy
is False, and if the proxy is in use it doesn't do anything yet. I'll still need to make this work with proxy v2.The change works like this: The output file is opened for writing at the beginning, and then there's a loop that continues until the download is complete, or it has been retried 3 times. (I have a comment to make
RETRY_LIMIT
configurable, but it's hard-coded right now.)Inside this loop, it tries to download the submission by sending the correct request to
self._send_json_request
. I've made it sostream=True
is passed torequests.request
on the download requests, which means therequests.request
call returns immediately, but the content of the request is later streamed viadata.iter_content
. After_send_json_request
returns, it streams a chunk at a time, writing that chunk to disk, and keeping track of how many bytes have been written in total.If there's a
ReadTimeout
,ConnectTimeout
,ConnectionError
, orTooManyRedirects
exception during thedata.iter_content
call (basically, if the download starts, but then it times out during one of the chunks), then it retries and the loop starts over. This time though, when it callsself._send_json_request
it sends the appropriateRange
header to just get the data from where it left off.I've written a new test to test this,
TestAPI.test_download_submission_autoresume
. This works by stubbingrequests.request
with a fake version of it that, the first time it runs, makes the request timeout after serving 200 bytes, and subsequent times serves the request as it should. So, this should serve 200 bytes of the download, timeout, and then resume with a range request and finish.The test passes, and I've confirmed that the code actually works like this. But the problem is that the test files are too small. In
API.download_submission
, the chunk size is 1024 bytes, but the test file I'm downloading is 651 bytes, which means that currently the test actually serves the entire 651 bytes in a single chunk and then raises aReadTimeout
. It still works, but it would be more realistic if we had a larger file to test with.I confirmed that it works as it's supposed to by changing the chunk size in
API.download_submission
from 1024 to 64 bytes, and it basically sends 4 chunks (256 bytes) and then raises aReadTimeout
on the subsequent chunk. Then, on retry, it makes a range request and successfully finishes the download.Can we add test data to the
securedrop
to include a bigger file? And also a binary file, like maybe a 200kb image or something, since all the file uploads now are text?What's left:
RETRY_LIMIT
configurable