Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API heartbeat error #1349

Open
dcaputo-harmoni opened this issue May 13, 2023 · 1 comment · May be fixed by #1350
Open

API heartbeat error #1349

dcaputo-harmoni opened this issue May 13, 2023 · 1 comment · May be fixed by #1350

Comments

@dcaputo-harmoni
Copy link

dcaputo-harmoni commented May 13, 2023

I'm receiving the message below for a group of devices exactly every 30 seconds when running open-balena-api 11.8.3. Upgraded from an older version (0.194.0) so am wondering if could be caused by items that existed in the device-online-state redis queue at the time of cutover that were entered using the older api. I believe previously this queue used uuid to identify devices, and that was changed over to id at some point between my last version and the current one. I'm assuming the 30 second frequency is driven by RSMQ_READ_TIMEOUT being set to 30 seconds.

And separately, does anyone know if I can address this by just flushing the device state queue in redis? If so, how would I go about doing this? Presumably I could just run FLUSHALL on my redis instance but not sure if that would have any negative effects elsewhere.

May 13 20:02:12 openbalena-api-78d9b9466-xqvlz api[971]: DeviceStateManager: Error updating the API state of device undefined to timeout. If the id property is set it must be non-null Error: If the id property is set it must be non-null
May 13 20:02:12 openbalena-api-78d9b9466-xqvlz api[971]:     at PinejsClient.compile (/usr/src/app/node_modules/pinejs-client-core/src/index.ts:1421:12)
May 13 20:02:12 openbalena-api-78d9b9466-xqvlz api[971]:     at PinejsClient.request (/usr/src/app/node_modules/pinejs-client-core/src/index.ts:1512:32)
May 13 20:02:12 openbalena-api-78d9b9466-xqvlz api[971]:     at PinejsClient.patch (/usr/src/app/node_modules/pinejs-client-core/src/index.ts:1194:15)
May 13 20:02:12 openbalena-api-78d9b9466-xqvlz api[971]:     at DeviceOnlineStateManager.updateDeviceModel (/usr/src/app/src/features/device-heartbeat/index.ts:251:20)
May 13 20:02:12 openbalena-api-78d9b9466-xqvlz api[971]:     at /usr/src/app/src/features/device-heartbeat/index.ts:303:19
May 13 20:02:12 openbalena-api-78d9b9466-xqvlz api[971]:     at processTicksAndRejections (node:internal/process/task_queues:95:5)
May 13 20:02:12 openbalena-api-78d9b9466-xqvlz api[971]: An error occurred trying to process an API heartbeat event. If the id property is set it must be non-null Error: If the id property is set it must be non-null
May 13 20:02:12 openbalena-api-78d9b9466-xqvlz api[971]:     at PinejsClient.compile (/usr/src/app/node_modules/pinejs-client-core/src/index.ts:1421:12)
May 13 20:02:12 openbalena-api-78d9b9466-xqvlz api[971]:     at PinejsClient.request (/usr/src/app/node_modules/pinejs-client-core/src/index.ts:1512:32)
May 13 20:02:12 openbalena-api-78d9b9466-xqvlz api[971]:     at PinejsClient.patch (/usr/src/app/node_modules/pinejs-client-core/src/index.ts:1194:15)
May 13 20:02:12 openbalena-api-78d9b9466-xqvlz api[971]:     at DeviceOnlineStateManager.updateDeviceModel (/usr/src/app/src/features/device-heartbeat/index.ts:251:20)
May 13 20:02:12 openbalena-api-78d9b9466-xqvlz api[971]:     at /usr/src/app/src/features/device-heartbeat/index.ts:303:19
May 13 20:02:12 openbalena-api-78d9b9466-xqvlz api[971]:     at processTicksAndRejections (node:internal/process/task_queues:95:5)

And one further note - when I view the device-online-state:expired:Q hash in redis I get the following (truncated - but more below with uuid).

127.0.0.1:6379> hgetall device-online-state:expired:Q
  1) "gkxl808qmpd3Q09ZU5HOH7O3ZkpA3deZ"
  2) "{\"deviceId\":66,\"nextState\":\"timeout\"}"
  3) "gkvm3r7i83BPsSYBJHw8Zj8F9fAJ7zCd:fr"
  4) "1683861304212"
  5) "gkvm4ef4vabtiFjXfeIOEjUJQlfuPFMn:fr"
  6) "1683861304248"
  7) "gkvm1yk0yq5VpBcil8I9ASWEInJAevHZ:fr"
  8) "1683861304200"
  9) "gkvm1jg53eTW74Ye79n5Esmoi86Kxjef:fr"
 10) "1683861304195"
 11) "gkvm3yxck2ZMISogzEeFE3QRL3J021xk:fr"
 12) "1683861304220"
 13) "gkvluxxc96XeK4qUYO6mus29Y65zfBUp"
 14) "{\"uuid\":\"9050ca7f113caf47396d4f1493cc945f\",\"nextState\":\"timeout\"}"
 15) "gkvm3yxck2ZMISogzEeFE3QRL3J021xk:rc"
 16) "5026"
 17) "gkvlz9rltlg7SuWT99sueEBbgkf1zXYh"
 18) "{\"uuid\":\"70429306a30c6c747f88ba64739218c0\",\"nextState\":\"timeout\"}"
...
dcaputo-harmoni added a commit to dcaputo-harmoni/open-balena-api that referenced this issue May 14, 2023
@thgreasi
Copy link
Member

Hi,
We indeed changed the state manager's object and switched it from uuids to ids as you mentioned to make the implementation slightly more performant and using less space in Redis at the same time.
In previous API version the heartbeat mechanism was quite forgiving for such migration errors, but we recently revamped it so that to avoid having the DB & Redis device heartbeat states go out of sync, after which point we changed it fails for such unexpected cases, so that we can notice them if they are occurring.

In general we do suggest upgrading one major at a time and I think in this case it would have helped avoiding this issue.
A way that you can use to get your device state in sync between the DB & Redis after v11.7.0 is setting the API_HEARTBEAT_STATE_ONLINE_UPDATE_CACHE_TIMEOUT env var to any non-negative integer value (milliseconds).
I do see that you already have a PR for this though, so I guess that you might no longer be in this inconsistent state so the API_HEARTBEAT_STATE_ONLINE_UPDATE_CACHE_TIMEOUT is only a suggestion in case you face a similar issue in the future.

thgreasi pushed a commit to dcaputo-harmoni/open-balena-api that referenced this issue May 24, 2023
thgreasi pushed a commit to dcaputo-harmoni/open-balena-api that referenced this issue May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants