Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device stuck in reconnect loop while renewing SAS token #1181

Open
dsburns opened this issue May 13, 2024 · 2 comments
Open

Device stuck in reconnect loop while renewing SAS token #1181

dsburns opened this issue May 13, 2024 · 2 comments
Labels

Comments

@dsburns
Copy link

dsburns commented May 13, 2024

Context

  • OS and version used: Ubuntu 20.03
  • **Python version:**3.8
  • pip version: 20.0.2
  • list of installed packages:
Package                 Version      Location
----------------------- ------------ -------------------------------
absl-py                 0.15.0
appdirs                 1.4.4
astunparse              1.6.3
async-timeout           4.0.3
attrs                   20.3.0
audioread               3.0.1
azure-core              1.20.1
azure-iot-device        2.12.0
azure-storage-blob      12.9.0
cachetools              4.2.4
certifi                 2020.6.20
cffi                    1.14.3
charset-normalizer      2.0.12
clang                   5.0
cmd2                    1.3.11
colorama                0.4.4
coloredlogs             15.0.1
cryptography            3.2.1
cycler                  0.12.1
decorator               5.1.1
dependency-injector     4.34.0
deprecation             2.1.0
flatbuffers             1.12
fs                      2.4.11
gast                    0.4.0
google-auth             1.35.0
google-auth-oauthlib    0.4.6
google-pasta            0.2.0
grpcio                  1.62.1
h5py                    3.1.0
humanfriendly           10.0
idna                    2.10
importlib-metadata      7.1.0
importlib-resources     6.4.0
isodate                 0.6.0
janus                   0.4.0
joblib                  0.17.0
jsonschema              4.17.3
keras                   2.6.0
Keras-Preprocessing     1.1.2
kiwisolver              1.4.5
librosa                 0.8.0
llvmlite                0.39.1
Markdown                3.6
MarkupSafe              2.1.5
matplotlib              3.4.3
msrest                  0.6.21
nmcli                   1.3.0
numba                   0.56.4
numpy                   1.19.4
oauthlib                3.1.0
opt-einsum              3.3.0
packaging               20.4
paho-mqtt               1.6.1
pandas                  1.2.2
pathtools               0.1.2
Pillow                  8.1.2
pip                     20.0.2
pkg-resources           0.0.0
pkgutil-resolve-name    1.3.10
platformdirs            4.2.0
pooch                   1.8.1
protobuf                3.19.6
psutil                  5.9.0
pyasn1                  0.6.0
pyasn1-modules          0.4.0
pycparser               2.20
pydantic                1.9.2
pyparsing               2.4.7
pyperclip               1.8.1
pyprctl                 0.1.3
pyrsistent              0.20.0
pyserial                3.4
PySocks                 1.7.1
python-dateutil         2.9.0.post0
python-git-info         0.7.1
python-statemachine     0.8.0
pytz                    2020.4
PyYAML                  5.4.1
redis                   5.0.3
requests                2.26.0
requests-oauthlib       1.3.0
requests-unixsocket     0.2.0
resampy                 0.4.3
rsa                     4.9
scikit-learn            0.24.1
scipy                   1.5.4
setuptools              44.0.0
six                     1.15.0
SoundFile               0.10.3.post1
tensorboard             2.6.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit  1.8.1
tensorflow              2.6.2
tensorflow-addons       0.13.0
tensorflow-estimator    2.6.0
termcolor               1.1.0
threadpoolctl           2.1.0
torch                   1.10.0
torchaudio              0.10.0
torchlibrosa            0.0.9
tqdm                    4.64.0
typeguard               2.13.3
typing-extensions       3.7.4.3
urllib3                 1.26.7
watchdog                0.10.3
wcwidth                 0.2.5
werkzeug                3.0.2
wheel                   0.43.0
wrapt                   1.12.1
xgboost                 1.4.2
yappi                   1.3.3
zipp                    3.18.1

Description of the issue

The iot hub client appears to be stuck in a connect/disconnect loop when reauthorizing it's SAS token. The issue occurs randomly across devices and can take anywhere from 12hours to several days before first occurence. After several minutes of connecting and disconnecting, my application is restarted and the azure client connects as expected.

Summary of events

  1. 00:16:10 - Azure message sent
  2. 00:16:12 - Azure blob uploaded
  3. 00:16:14 - Reauth process for SAS token starts
  4. 00:16:19 - Client disconnects
  5. 00:16:24 - ConnectionFailedError raised in mqtt_transport.py
  6. 00:16:24 - Retrying connection authorization
  7. 00:16:24 - Successfully connects to hub
  8. 00:16:34 - Starting reauthorization process for new SAS token
  9. 00:16:35 - Disconnected with result code 7

Steps 4-9 continue until the application is stopped.

Code sample exhibiting the issue

class Client:

    def __init__(self, connection_str):
        self._connection_str = connection_str
        self._client: IoTHubDeviceClient = None

    def _statistics_increment(self, name):
        self.statistics[name] += 1
        return self.statistics[name]

    async def open(self):
        log.debug('Azure open start')
        if self._client is not None:
            await self.close()
        self._client = IoTHubDeviceClient.create_from_connection_string(self._connection_str)
        await self._client.connect()
        self._twin = await self._client.get_twin()
        log.debug('TWIN: %s', self._twin)
        # set the message handler on the client
        self._client.on_message_received = self._message_handler
        self._client.on_twin_desired_properties_patch_received = self._twin_patch_listener
        self._client.on_method_request_received = self._device_method_listener
        log.debug('Azure open completed')

    async def close(self):
        if self._client is None:
            return
        log.info('Azure close')
        await self._client.disconnect()

    def is_connected(self):
        connected = False
        if self._client:
            connected = self._client.connected
        return connected

    async def message_send(self, payload):
        await self._client.send_message(payload)



def run()
    while not self._exit_event.is_set():
            try:
                log.debug(f"azure service loop - {azure} connected: {azure.is_connected() if azure else False}")

                if not azure and self._azure_enabled:
                    azure = self._azure_factory()

                # Determine if we need to connect to the IoT Hub.  We only try to reconnect on an interval to prevent spamming the
                # IoT Hub with connection attemps.
                poll_time_now = time.time()
                attempt_reconect = not azure.is_connected() and (poll_time_now >= (last_connect_poll_time + AZURE_OPEN_INTERVAL_SEC))
                if attempt_reconect:
                    last_connect_poll_time = poll_time_now
                    log.debug("azure.open() start")
                    await asyncio.wait_for(azure.open(), timeout=10.0)
                    self._set_and_send_azure_connection_status(True)
                    log.debug('azure.open() complete')
                    last_disconnect_ts = 0

                # Send messages - Coroutines are setup to swallow exceptions.  Reconnect logic is handled by the azure.is_connected
                # function which is much more relaible and simpler than trying to propagate connectivity errors back to the main loop
                # from coroutines.
                if azure.is_connected():
                    for handler in self._d2c_handlers:
                        while not handler.queue.empty():
                            _, payload, properties = handler.queue.get(False)
                            task = asyncio.create_task(handler.upload(azure, payload, properties))
                            running_tasks.add(task)
                            task.add_done_callback(running_tasks.discard)
            except (
                azure_exceptions.ClientError,
                azure_exceptions.ConnectionDroppedError,
                azure_exceptions.ConnectionFailedError,
                azure_exceptions.CredentialError,
                azure_exceptions.NoConnectionError,
                azure_exceptions.OperationCancelled,
                azure_exceptions.OperationTimeout,
                azure_exceptions.ServiceError,
            ) as ex:
                log.exception('azure service exception')
                self._set_and_send_azure_connection_status(False)
                disconnect_ts = time.time()
                if azure:
                    await asyncio.wait_for(azure.close(), timeout=10.0)
                    azure = None

            await asyncio.sleep(1.0)

Console log of the issue

See attached logs. The debug.log file contains DEBUG log statements for the 'azure.iot.device' logger.
azure_error_summary.log
azure_error_debug.log

@dsburns dsburns added the bug label May 13, 2024
@dsburns
Copy link
Author

dsburns commented May 21, 2024

Bump

@cartertinney
Copy link
Member

@dsburns

Can you tell me a bit more about your use case?

  1. Are you using IoTHub or IoT Edge?
  2. What type of authentication are you using (Connection String?)
  3. Are you specifying a sastoken_ttl value when you create your client?
  4. If using IoTHub, are you using a Gateway v1 Hub, or a Gateway v2 Hub?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants