Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request() retries fetch twice, which can cause timeouts that are twice as long as requested #191

Open
dhalbert opened this issue Apr 26, 2024 · 10 comments

Comments

@dhalbert
Copy link
Contributor

request() tries twice to get a socket, connect, and fetch twice:

# We may fail to send the request if the socket we got is closed already. So, try a second
# time in that case.
retry_count = 0
last_exc = None
while retry_count < 2:
retry_count += 1

This is confusing because if it tries a second time, it can make the default or user specified timeout (see #9210) appear to take twice as long as requested.

I am not sure about the circumstances the comment is talking about:

# We may fail to send the request if the socket we got is closed already. So, try a second

and whether it's true for non-ESP32-SPI, etc. I wonder if some checking or resetting of the socket can be done in advance.

@justmobilize
Copy link
Collaborator

When I pulled out code for ConnectionManager, I tried to find a way to make this happen for real, but couldn't. Do you have a way to make it happen?

@dhalbert
Copy link
Contributor Author

Here's a rather rough program I used (I originally used some personal domains that I knew succeeded or failed, but I tried again with the ones listed):

import os
import time
import wifi

import adafruit_connection_manager
import adafruit_requests

class Timer:
    def __init__(self, title):
        self.title = title

    def __enter__(self):
        self.start = time.monotonic()
        return self

    def __exit__(self, type, value, traceback):
        print(self.title+":", time.monotonic() - self.start)

# Get WiFi details, ensure these are setup in settings.toml
ssid = os.getenv("CIRCUITPY_WIFI_SSID")
password = os.getenv("CIRCUITPY_WIFI_PASSWORD")

SUCCEED_URL = "https://example.org"
FAIL_URL = "https://192.168.1.253"

pool = adafruit_connection_manager.get_radio_socketpool(wifi.radio)
ssl_context = adafruit_connection_manager.get_radio_ssl_context(wifi.radio)
requests = adafruit_requests.Session(pool, ssl_context)

wifi.radio.connect(ssid, password)

with Timer("default success"):
    response = requests.get(SUCCEED_URL)
try:
    with Timer("default fail"):
        response = requests.get(FAIL_URL)
except Exception as e:
    print(e)

with Timer("timeout success"):
    response = requests.get(SUCCEED_URL, timeout=2)
try:
    with Timer("timeout fail"):
        response = requests.get(FAIL_URL, timeout=3)
except Exception as e:
    print(e)

with output:

code.py output:
default success: 0.432861
default fail: 60.072
Error connecting socket: [Errno 119] EINPROGRESS
timeout success: 0.693115
timeout fail: 6.073
Error connecting socket: [Errno 116] ETIMEDOUT

Note that the last timeout fail: asked for a timeout of 3 seconds, not 6 seconds. Changing this to some other value causes double the value.

@justmobilize
Copy link
Collaborator

What MCU are you using?

On a UM FeatherS3, I get:

default success: 0.872002
default fail: 36.912
Error connecting socket: [Errno 113] ECONNABORTED
timeout success: 0.887009
timeout fail: 36.618
Error connecting socket: [Errno 113] ECONNABORTED

On an Feather ESP32-S2 with TFT, I get (added print(f"error: {e}") because I was getting blanks...):

default success: 0.694092
default fail: 0.00390625
error:
timeout success: 0.704956
timeout fail: 0.00402832
error:

@dhalbert
Copy link
Contributor Author

I was using a Metro ESP32-S3

@dhalbert
Copy link
Contributor Author

dhalbert commented Apr 26, 2024

My testing was with adafruit/circuitpython#9210, not yet merged as of this writing. I shjould test that on an S2, then.

@justmobilize
Copy link
Collaborator

What was odd was that the UM FeatherS3 seemed to ignore the timeouts. I remember some other boards did that in 8.x

@justmobilize
Copy link
Collaborator

@dhalbert tomorrow I'll try again on both my S2 and UMS3 with that build

@justmobilize
Copy link
Collaborator

@dhalbert, added some logging. The double timeout in this case is in ConnectionManager. The flow from CM is that it tries once, and then if it fails and has any sockets it can release to try again after closing them:

RQ retry_count: 0
CM try_count: 1, timeout: 60
CM self._socket_pool.socket: 0.000976563
CM socket.connect: 0.513184
default success: 0.737793

RQ retry_count: 0
CM try_count: 1, timeout: 60
CM self._socket_pool.socket: 0.0678711
CM socket.connect: 18.1948
CM connect OSError: [Errno 119] EINPROGRESS
CM try_count: 2, timeout: 60
CM self._socket_pool.socket: 0.000976563
CM socket.connect: 18.4888
CM connect OSError: [Errno 119] EINPROGRESS
default fail: 36.7739
error: Error connecting socket: [Errno 119] EINPROGRESS

RQ retry_count: 0
CM try_count: 1, timeout: 2
CM self._socket_pool.socket: 0.000976563
CM socket.connect: 0.580078
timeout success: 0.808105

RQ retry_count: 0
CM try_count: 1, timeout: 3
CM self._socket_pool.socket: 0.0678711
CM socket.connect: 3.10693
CM connect OSError: [Errno 116] ETIMEDOUT
CM try_count: 2, timeout: 3
CM self._socket_pool.socket: 0.0
CM socket.connect: 3.10596
CM connect OSError: [Errno 116] ETIMEDOUT
timeout fail: 6.30176
error: Error connecting socket: [Errno 116] ETIMEDOUT

If I just do:

try:
    with Timer("timeout fail"):
        response = requests.get(FAIL_URL, timeout=3)
except Exception as e:
    print(f"error: {e}")

I get:

RQ retry_count: 0
CM try_count: 1, timeout: 3
CM self._socket_pool.socket: 0.000976563
CM socket.connect: 3.10596
CM connect OSError: [Errno 116] ETIMEDOUT
CM try_count: 2, timeout: 3
timeout fail: 3.11914

What we could do is in CM, look at specific errors like:

errno.ECONNABORTED
errno.ETIMEDOUT
errno.ECONNREFUSED
errno.EINPROGRESS

And stop trying.

Thoughts? If you help me come up with which ones to not try again on, I can open up a PR in CM

@dhalbert
Copy link
Contributor Author

dhalbert commented Apr 27, 2024

Ah, OK, so it is not the retry loop in requests at all, you're saying? So I was mistaken. Does it not make sense to close a socket before trying, because we'd like to reuse a socket that has an existing open connection to a particular host?

I think it would be worth looking at the logic in CPython requests and lower down, if necessary, because they don't have double timeouts. I am not more experienced with socket programming than you.

@justmobilize
Copy link
Collaborator

I totally want to make this better. I trimmed what I could from the previous request code. It used to try up to 5 times...

I think making both requests and connection manager smarter is a good idea. I left this block in, because I couldn't find a way to reproduce is and so didn't want to take it out. Part of me wants to take it out and then fix it when it breaks for someone...

I'll try some other things and see if I can get it to try the second time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants