Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New requests.exceptions.HTTPError today while wget and browser can still fetch, old python quark maybe? #319

Open
4Dolio opened this issue Dec 21, 2023 · 16 comments

Comments

@4Dolio
Copy link

4Dolio commented Dec 21, 2023

I have begun to get the following error earlier today:

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: hteeteeps :// share-service-download-bucket.s3.amazonaws.com/FooBar...ManyLinesLongURL...OfVideoFile

If I manually open up that url in a browser or with wget then it works fine...

So something seems to have broken with the python in the past 24 hours. Two RasPi which are capable of fetching all my scripts are both doing this. I know just enough python to have written a wrapper for my use case some years ago. But I'm not sure why python can no longer fetch these, when wget and the local browser can? Did some core certificates change perhaps?

@4Dolio
Copy link
Author

4Dolio commented Dec 21, 2023

If I use the following, where the URL is that which the exception is thrown for, but attempt to fetch it with pyton this way:

from urllib.request import urlretrieve
url = ( "https://share-service-download-bucket.s3.amazonaws.com/SuperDuperLongLineWith_Algo_Creds_date_Expire_etc_etc" )
filename = "20231220.202003-MyCamNames_Front_Step-7314897971670496449.mp4"
urlretrieve(url, filename)

Then it also works perfectly fine... So something happened with whatever python library or function that is used to fetch the video clips with, it my best guess... But I can not quite find where that is...

@4Dolio
Copy link
Author

4Dolio commented Dec 21, 2023

For the record, in case I lose track, I began to encounter this issue at precisely: Tue 19 Dec 2023 10:50:16 AM PST

Before that time it seems to have been fetching video clips properly with some custom archival code and loops, which I have been using for a few years now...

Maybe Ring or AWS is rejecting my user_agent which is ChromeAtHome/20220801, tried updating to todays data, also tried using nothing. But that part seems to be working fine, I still get all of the event IDs and then eliminate those I already have locally, then try to fetch any that I do not already have... Same as the past few years.. So second system at home yeilds the same results. But all other Cromium, wget, other python url fetching libraries all seem to be able to get the video downloaded... Just this one is returning 404 for some reason all of the sudden.. so weird.. any crazy ideas whats wrong?

@4Dolio
Copy link
Author

4Dolio commented Dec 22, 2023

So, I resorted to upgrading to a brand new Raspbian 12 64bit (bookworm), same error... However, this time It threw us a clue...

The destination name is too long (715), reducing to 236 Nevermind, that was from wget, which still works for the urls that are throwing new error for last two days...

ring_doorbell.exceptions.RingError: HTTP error with status code 404 during query of url

  File "/usr/local/lib/python3.11/dist-packages/ring_doorbell/ring.py", line 208, in _query
    response = self.auth.query(
               ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/ring_doorbell/auth.py", line 166, in query
    raise RingError(
ring_doorbell.exceptions.RingError:\
HTTP error with status code 404 during query of url \
https://api.ring.com/clients_api/dings/7315351631886103745/\
recording: 404 Client Error: Not Found for \
url: https://share-service-download-bucket.s3.amazonaws.com/\
16140672/7315351631886103745_stamp.mp4?\
Bunch_of_auth_stuff...Which_I_Presume_has_always_been_there...

Although, perhaps Ring has changed something with these URLs which are no longer working with the python-ring-doorbell project?

Or, perhaps the python-urllib3 that I believe this project is using for the GET calls for clips no longer works because ring has changed something about the way these are fetched? Shrugs.

@4Dolio
Copy link
Author

4Dolio commented Jan 12, 2024

Is no one else having this problem?

I have no idea why this behavior changed all of the sudden on me. The Traceback dumps the 404 error and url that if failed as described above, and if I just "wget url_for_share-service-download-bucket.s3.amazonaws.com_blah_blah -O name_wanted_for_saved_file.mp4" then it works perfectly fine.

I think I could go so far as to write a further bash wrapper that would scrape off my intended filename, and then grab this url from the Traceback, and then use the pair to have wget to do my video downloads... But that is getting a little bit insane like a bad inception nightmare...

@4Dolio
Copy link
Author

4Dolio commented Jan 12, 2024

So, I resorted to bash to catch and filter out the FILE and the URL to attempt with wget from outside the python. It appears that there is more latency than there once was. I would also get the 404 response with wget when I did not have the natural delay of my manually copy/pasting the failed url... By adding an extraordinary 50 seconds of sleep between the url = URL_RECORDING.format(recording_id) line and the req = self._ring.query(url, timeout=timeout) lines 382 and 385-ish in the doorbot.py file, I am now back to successfully downloading clips again.

Even with this 50 seconds of added sleep, download attempts do still fail, 30% of the time thus far with a sample size of 3. Manually retrying the third was successful. And now a fourth needed more than 50 seconds. Going to bump this up to 90 seconds and see if I can get less failures. So It appears the above attempt needs to explicitly catch the 404 response code and then do an escalating delay and retries such as sleeping for seconds: 15, 30, 60, 120, 240 and then finally giving up. Or it could perhaps gain a pair of new variables like NotFoundRetryDelay or 5 seconds and NotFoundRetryAttempts of 12=60seconds or 24=2minutes worth of retrying. The goal should be to try a few more times, but not so much that the s3 bucket thinks it is being DOSed by retrying too rapidly...

Still got a 404 at 90 seconds which succeeded when retried manually using wget, so bumping up to 120 seconds. It might be possible that an initial attempt triggers the s3 to actually make the target object available, so each retry might improve the odds that the object is actually ready to be downloaded?

Still failed at 120 but manually succeeded upon retry, increasing delay to 150 (2.5 minutes), which is absolutely ridiculous... If I am forced to wait multiple minutes between downloading each video then my process here will never keep up with archiving all of my videos like I have been doing for the past several years...

So, I might be able to muddle through and add a new handler for 404 to get that function in the mentioned file to retry after a delay a few times, but I promise any such patch is bound to be ugly. So someone elses should probably pick it up from here.

I find it difficult to believe that other people are not encountering this problem.

@4Dolio
Copy link
Author

4Dolio commented Jan 12, 2024

The bash flavored exception handler in all it's glory, I was using this to simply keep alive my session since late Dec, adapted into retrying the 404 failed attempts:

root@pi6:~# while :; do for x in {0..8};do echo;done;date;./RingFetch.py $con $pas $RR $DP 2>&1 | grep "Fetchin\|requests.exceptions.HTTPError" | tee fetch.log ; URL=$(cat fetch.log | strings | tail -n1 | sed 's%requests.exceptions.HTTPError: 404 Client Error: Not Found for url: %%') ; FILE=$(echo $(cat fetch.log | strings | tail -n3 | head -n1 | sed 's%\[94m%%;s%Fetchin %%;s% Fetch Traceback (most recent call last):%%')) ; sleep 60 ; wget "$URL" -O $FILE ; sleep $(( 1 * 6 ));done # Loop and manually attempt when a 404 gets thrown

I normally run an bash alias that calls this custom py of mine and progressively increases the queue_depth starting at 8 until about 4096-ish or more depending on how deep I need to catch up with clips. Most cameras are not so busy and a depth this large can go back a month or so. But some cameras are very busy and have many hundreads or thousands of clips per day. I have yet another bash alias that can take a date.time string and walk backwards in time by the hour or day so to fetch deeper and deeper in time in order to catch up... I'll probably need to leverage that one as I am coming up on 30 days without having been able to fetch any clips.

Yet another rub, not getting 403: Forbidden
Damn it, why can they not just give us an easy API..

@4Dolio
Copy link
Author

4Dolio commented Jan 12, 2024

I believe the 403 forbidden as because the clip download url only remains valid for less than 3~5 minutes...

So perhaps a valid method is to slowly retry during 404 until 403 is returned which indicates it has expired.

@4Dolio
Copy link
Author

4Dolio commented Jan 12, 2024

clip seems to remain valid for about 5 minutes to 5min 30 seconds ish.

@4Dolio
Copy link
Author

4Dolio commented Jan 12, 2024

I was not able to create a 404 handler and retry because it exits with the Traceback before getting to that point after the line that attempts to download the clip...

Perhaps instead, before fetching the clip for download it can query for some other property for the clip like size or modified time and then only when those do not return 404 would it proceed to attempt the download?

@4Dolio
Copy link
Author

4Dolio commented Jan 13, 2024

A little context, my ./RingFetch.py is a crummy wrapper python script that gives me some progress output with nice colors and such which looks something like this: Sometimes a clip is 0 bytes, so we skip those. Sometimes we skip cameras by their name, such as ## PlasmaRandom ## is being skipped here. If we will attempt to fetch, we prefix with "Fetchin" then the file name, and a suffex of "Fetch" while in progress. Upon success, the prefix "Fetchin" is replaced with "Success#" and is green and we print the file size as a suffex.

IsZero  /ext/RingArch20xx/2024/01/11/20240111.150511-house_Side-7322980704984576518.mp4 Is Zero Size
 08:11:57 ## PlasmaRandom ##          
 08:11:57 __ sw353pl BackYard __          
Fetchin /ext/RingArch20xx/2024/01/12/20240112.071930-house_BackYard-7323231784477733804.mp4 Fetch
Success#
Fetchin /ext/RingArch20xx/2024/01/12/20240112.024137-house_BackDeck-7323160174672080082.mp4 Fetch Traceback (most recent call last):

This last one is the new (as of a month ago 404 unhandled exception), which as it turns out, If I could get it to retry untill it succeeded OR until we get a 403 response because the elapsed time has expired the url, then these would work again. But since I'm no good at python, I've composed this nasty bash loop in the mean time, which while ugly, seems to be working and handles all exceptions I've encountered thus far:

root@pi6:~# DP=360;WAIT=5;while :; do date;timeout 120 ./RingFetch.py $con $pas $RR $DP 2>&1|tee fetch.log;URL=$(grep "Fetchin\|requests.exceptions.HTTPError" fetch.log|strings|tail -n1|sed 's%requests.exceptions.HTTPError: 404 Client Error: Not Found for url: %%');FILE=$(echo $(grep "Fetchin\|requests.exceptions.HTTPError" fetch.log|strings|tail -n3|head -n1|sed 's%\[94m%%;s%Fetchin %%;s% Fetch Traceback (most recent call last):%%'));echo "" >| wget.log;RET=0;while [ $(grep -c "200 OK" wget.log) -eq 0 ] && [ $(grep -c "403 Forbidden" wget.log) -eq 0 ] && [ $(grep -c "Unterminated" wget.log) -eq 0 ]; do for W in $(seq $WAIT -1 0);do echo -en "\rRetry:$RET $W ";sleep 1;done;RET=$[ $RET + 1 ];wget "$URL" -O $FILE 2>&1|tee wget.log|grep " saved ";done ; echo finished $(grep "response" wget.log)|grep . --color;echo $(date) Retry:$RET $FILE $(grep "response" wget.log) >> Exception.log;sleep 15;done

DP is the clip depth for each camera, I normally wrap my RingFetch.py in a bash alias loop that increases the queue depth while counting the zero lenth files and incrimenting the depth if the zero size videos is unchanged... If the zero length video count increases, then something when wrong and we should retry at the same queue depth again until we completely fetch all clips for that given depth, only then do we fetch a larger list of clips to be downloaded.
WAIT is how many seconds until we try again for this new exception handling bash retry loop.
If wget returns 200 then we succeeded and can exit the retry loop
if wget returns 403 then the clip has expired and we must give up the retry loop
if wget returns "http://[0m Fetch : Unterminated IPv6 numeric address." then my RingFetch.py did not even crash dump properly per some 120 second timeout that would otherwise hang, possibly forever, so give up on the retry loop. My .py touched the target mp4 file, so that if this occurs, we will skip it on the next loop, this is always sometimes happened. This new bash loop is nearly always failing during the original native download attempt, so when it retires, it looks like this:

Retry:3 0 2024-01-13 02:22:06 (2.84 MB/s) - ‘/ext/RingArch20xx/2024/01/11/20240111.072902-house_Driveway-7322863156238723367.mp4’ saved [5958218/5958218]
finished HTTP request sent, awaiting response... 200 OK

@5E7EN
Copy link

5E7EN commented Feb 16, 2024

@4Dolio quite the extensive history you've recorded here. Definitely provides some valuable insight. I've also been experiencing 404 errors when attempting to download many consecutive camera recordings.

Here's what seems to be happening:

  1. Invoke download function recording_download
  2. Request is made to https://api.ring.com/clients_api/dings/{recording_id}/recording
  3. Response from that API is a 302 redirect with a Location header pointing to https://share-service-download-bucket.s3.amazonaws.com/[...].mp4?[...]
  4. Redirect is followed and a GET request is attempted to the S3 URL - but it returns a 404 Not Found error since the resource/recording doesn't yet exist in the S3 bucket for some reason.

From what it seems, there's some kind of wait period that needs to be satisfied before following the redirect, since the Ring API / S3 needs time to actually prepare the file for download. To confirm this theory, you can grab the S3 URL from the thrown error message and paste it in your browser a few seconds later and the file will start downloading just fine (you may need to refresh a few times if it's not ready yet).

In addition, this can be confirmed by the behavior that's exerted on the official Ring website upon downloading recordings the normal way.
When clicking the download button, it first makes a request to an API (that differs from the one this lib uses) with the ID of the recording in the payload. Then, it makes many of the same requests subsequently over the next 10+ seconds that each return a result_url containing a S3 mp4 link and a status that says either pending or done. Once it makes a request that returns a status with done, it makes the final GET request to the S3 mp4 link and the download begins.

The solution for this library (@tchellomello 😉🔔) would be to implement logic that handles 404 errors and then retries the post-redirect link (S3) until no 404 is returned (with some retry counter/cap of course).
My current "workaround" has been to re-invoke the recording_download function upon encountering a 404 error - but this is has been mostly ineffective since each time that function is called, it makes a new request to the pre-redirect link ([...]/clients_api/dings/{recording_id}/recording) which then redirects to a newly generated S3 link (I assume), that may also not be ready yet - thus taking many attempts until success.

The solution above should resolve this issue and prevent additional confusion.
I invite anybody to chime in with and/or request more info if need to reach an official resolution. :)

@5E7EN
Copy link

5E7EN commented Feb 19, 2024

@sdb9696 gonna bing you on this one to bring to your attention, see above ^

@sdb9696
Copy link
Collaborator

sdb9696 commented Feb 19, 2024

When clicking the download button, it first makes a request to an API (that differs from the one this lib uses)

@5E7EN do you have the details of the API call it makes?

@5E7EN
Copy link

5E7EN commented Feb 19, 2024

@5E7EN do you have the details of the API call it makes?

@sdb9696 sure, and feel free to have a look yourself as well on the Ring history web dashboard.

Clicking the "Download" button triggers the following series of API calls:

  1. [Authenticated] https://account.ring.com/api/share_service/v2/transcodings/downloads

    • Type: POST
    • Payload:
    {
        "ding_id": "[REDACTED]",
        "device_id": "[REDACTED]",
        "file_type": "VIDEO",
        "start_timestamp": 1708354079000,
        "end_timestamp": 1708354200000
    }
    
    • Response:
    {
        "ding_id": "[REDACTED]",
        "file_type": "VIDEO",
        "device_id": "[REDACTED]",
        "start_timestamp": 1708354079000,
        "end_timestamp": 1708354200000,
        "status": "pending",
        "result_url": "https://share-service-download-bucket.s3.amazonaws.com/3e9926a6-4e[...]",
        "action": "download",
        "updated_at": "2024-02-19T14:55:11Z"
    }
    
  2. [Authenticated] https://account.ring.com/api/share_service/v2/transcodings/downloads/[DING_ID]?add_download_headers=true&custom_file_name=[...]&device_id=[REDACTED]&file_type=VIDEO&start_timestamp=1708354079000&end_timestamp=1708354200000

    • Type: GET
    • Response:
    {
        "ding_id": "[REDACTED]",
        "file_type": "VIDEO",
        "device_id": "[REDACTED]",
        "start_timestamp": 1708354079000,
        "end_timestamp": 1708354200000,
        "status": "pending",
        "result_url": "https://share-service-download-bucket.s3.amazonaws.com/3e9926a6-4e[...]",
        "action": "download",
        "updated_at": "2024-02-19T14:55:11Z"
    }
    
  3. Request 2 is then repeated at 1 second interval until the status in the response body changes from pending to done. After which, the final request is made:

  4. [Unauthenticated] https://share-service-download-bucket.s3.amazonaws.com/3e9926a6-4e[...]

    • Type: GET
    • Response: Video

@5E7EN
Copy link

5E7EN commented Feb 20, 2024

I managed to resolve the download issue in my case. The 404 error occurs when calling cam.history(older_than=DING_ID) with a DING_ID that doesn't exist for the specified camera.
I was providing it with a ding ID of a recording that was deleted from the Ring cloud after expiring (older than 6 months).

@4Dolio
Copy link
Author

4Dolio commented Apr 6, 2024

I managed to resolve the download issue in my case. The 404 error occurs when calling cam.history(older_than=DING_ID) with a DING_ID that doesn't exist for the specified camera. I was providing it with a ding ID of a recording that was deleted from the Ring cloud after expiring (older than 6 months).

@5E7EN does this mean that you do not get the 404 for normal non older_than (current) clips? I managed to work around the problem by wrapping my python (I fumbled around until I got a python process that uses the video file names I want and redraws the progress lines while it attempts, succeeds, fails, gets an empty file, etc)... Anyway, I wrapped that python in a bash shell (that I'm way better at) alias that can scrape out the clip final failed 404 url and the intended file name and then repeat the download attempt using wget repeatedly until it succeeds or I decide it's been too long (about 55.5 minutes).

I've been collecting counts of how many times it loops until it succeeds, so could try and quantify the spread of the delays I've seen over the past few months of using this new method. I was seeing the same issue that the clips only become valid after some variable delay of 5 seconds up to 5.5-ish minutes after which they expire again. Since I use mine to download all my videos locally for archival, I can't just wait 5 minutes. Tangent: So I also implemented a parallel race-condition-lock, so I can run many instances of the bash(python) to fetch in parallel and they don't DOS the ring service. Which will cause your IP to get blocked for hours-days. Perk of having multiple uplinks, I'm able to switch the uplink my RasPi is using, FYI should not try to fetch the clip list more than once per 5 seconds as I recall...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants