Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[netatmo] API limit reached handling #16489

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

clinique
Copy link
Contributor

@clinique clinique commented Mar 5, 2024

Distinct timing (60 minutes) when API LIMIT REACHED has been issued.

Partially resolves issue #16485

@clinique clinique self-assigned this Mar 5, 2024
@clinique clinique requested a review from lolodomo as a code owner March 5, 2024 13:50
Copy link
Contributor

@jlaur jlaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that was fast! Thanks for taking care. I will take a closer look at my logs, your fixes and also give it a run on my production system.

@clinique
Copy link
Contributor Author

clinique commented Mar 5, 2024

Wow, that was fast! Thanks for taking care. I will take a closer look at my logs, your fixes and also give it a run on my production system.

You were lucky enough that I had some spare time today - I did not launch eclipse since around 3 month.

Copy link
Contributor

@jlaur jlaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ported your changes to 4.1.x and put it on my production system. All Things went offline pretty quickly - the API bridge with: "Request timed out - will attempt reconnection later".

Does it work well for you?

@jlaur jlaur added bug An unexpected problem or unintended behavior of an add-on additional testing preferred The change works for the pull request author. A test from someone else is preferred though. labels Mar 6, 2024
@jlaur jlaur added the work in progress A PR that is not yet ready to be merged label Mar 9, 2024
@clinique clinique force-pushed the netatmo_16485 branch 2 times, most recently from 07c499b to f420e73 Compare March 16, 2024 08:20
@clinique clinique requested a review from jlaur March 16, 2024 08:21
@clinique
Copy link
Contributor Author

@jlaur : if you can test on this base. I've reached something that seems pretty strong on heavy workload and normally solved the ghost process issue.

@jlaur jlaur removed the work in progress A PR that is not yet ready to be merged label Mar 16, 2024
@jlaur
Copy link
Contributor

jlaur commented Mar 16, 2024

@jlaur : if you can test on this base. I've reached something that seems pretty strong on heavy workload and normally solved the ghost process issue.

Thanks! I've backported your changes to 4.1.x and just put the binding on my production system. It started without issues, but quickly after I noticed this:

2024-03-16 12:41:35.835 [WARN ] [.core.thing.binding.BaseThingHandler] - Handler ApiBridgeHandler of thing netatmo:account:home tried checking if channel monitoring#request-count is linked although the handler was already disposed.

and:

java.lang.ClassCastException: Cannot cast org.openhab.binding.netatmo.internal.handler.ModuleHandler to org.openhab.binding.netatmo.internal.handler.CommonInterface
	at java.lang.Class.cast(Class.java:3889) ~[?:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[?:?]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) ~[?:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[?:?]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) ~[?:?]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) ~[?:?]
	at java.util.AbstractList$RandomAccessSpliterator.forEachRemaining(AbstractList.java:720) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260) ~[?:?]
	at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616) ~[?:?]
	at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:622) ~[?:?]
	at java.util.stream.ReferencePipeline.toList(ReferencePipeline.java:627) ~[?:?]
	at org.openhab.binding.netatmo.internal.handler.CommonInterface.getActiveChildren(CommonInterface.java:156) ~[?:?]
	at org.openhab.binding.netatmo.internal.handler.CommonInterface.updateReadings(CommonInterface.java:213) ~[?:?]
	at org.openhab.binding.netatmo.internal.handler.CommonInterface.proceedWithUpdate(CommonInterface.java:207) ~[?:?]
	at org.openhab.binding.netatmo.internal.handler.capability.RefreshCapability.proceedWithUpdate(RefreshCapability.java:83) ~[?:?]
	at org.openhab.binding.netatmo.internal.handler.capability.RefreshCapability.lambda$3(RefreshCapability.java:124) ~[?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:840) [?:?]

I'm not sure if the exception is related to the undisposed task from the previous version. In that case, it can be ignored.

I have now restarted openHAB and will let the binding run for some time, and I will also start analyzing your changes in relation to the experienced issues and see if I can figure out which changes addresses which issues. 🙂

@clinique clinique requested a review from jlaur March 18, 2024 08:33
@jlaur
Copy link
Contributor

jlaur commented Mar 18, 2024

@clinique - strangely, if I grep "Module refreshed, next one in" /var/log/openhab/openhab.log, I see on only three minutes poll interval since yesterday when restarting the binding:

2024-03-18 22:28:30.940 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:28:31.065 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:31:33.954 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:31:34.200 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:34:34.491 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:34:34.664 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s

Also, I do not see anything logged regarding probing, i.e. "Data validity period identified to be" nor "Data validity period not yet found, data timestamp unchanged".

I cannot rule out that I made some mistake when backporting the changes, although I went over everything a few times. Do you get different results?

@clinique
Copy link
Contributor Author

clinique commented Mar 18, 2024

@clinique - strangely, if I grep "Module refreshed, next one in" /var/log/openhab/openhab.log, I see on only three minutes poll interval since yesterday when restarting the binding:

2024-03-18 22:28:30.940 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:28:31.065 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:31:33.954 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:31:34.200 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:34:34.491 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s
2024-03-18 22:34:34.664 [DEBUG] [handler.capability.RefreshCapability] - Module refreshed, next one in 180s

Also, I do not see anything logged regarding probing, i.e. "Data validity period identified to be" nor "Data validity period not yet found, data timestamp unchanged".

I cannot rule out that I made some mistake when backporting the changes, although I went over everything a few times. Do you get different results?

Yes, I see the same. Looking at the thing definition, it has a refreshInterval configuration :
image
That's weird.
Ok, got it.

@jlaur
Copy link
Contributor

jlaur commented Mar 19, 2024

Yes, I see the same. Looking at the thing definition, it has a refreshInterval configuration :
image
That's weird.

I see you removed the default value of 180 seconds. But should it even have this configuration parameter? According to the documentation this only applies to the home Thing: https://www.openhab.org/addons/bindings/netatmo/#list-of-supported-things

@clinique
Copy link
Contributor Author

Yes, I see the same. Looking at the thing definition, it has a refreshInterval configuration :
image
That's weird.

I see you removed the default value of 180 seconds. But should it even have this configuration parameter? According to the documentation this only applies to the home Thing: https://www.openhab.org/addons/bindings/netatmo/#list-of-supported-things

Yes, it's only a quick fix. A change has been introduced here that created this issue. I'll take some to think of it and partially revert or correct PR #16492

@clinique
Copy link
Contributor Author

@jlaur : can you give a test or your feed-back on this ? I rebased with last merges.

@jlaur
Copy link
Contributor

jlaur commented Mar 23, 2024

@jlaur : can you give a test or your feed-back on this ? I rebased with last merges.

Yes, I've had it running in production since Thursday after #16546 was merged. I'm currently running some tests after your latest commit today:

Additional module:

  • Take out batteries and wait for it to go offline (when API returns "reachable": false). 🟢
  • Insert batteries again and verify it goes back online. 🟢

Main module:

  • Cut power and wait for it to go offline (when last timestamp is more than one hour ago). 🟢
  • Power on again and verify it goes back online. 🔴

Restart binding:

  • Verify number of calls (no disposed handlers still working). 🟢

Simulate API responses:

  • OAuth authentication call failed. 🔴
  • OAuth authentication call failed twice. 🔴
  • Normal call failed. 🔴
  • HTTP code 429. 🟢

Currently I'm checking the results of main module getting back online which doesn't seem to work.

For the last three tests I have provided a new console command for simulating exceptions and HTTP status codes.

I will keep you updated on the results.

@jlaur
Copy link
Contributor

jlaur commented Mar 23, 2024

On first OAuth failure there are two status updates:

2024-03-23 16:42:16.408 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:account:home' changed from ONLINE to OFFLINE (CONFIGURATION_ERROR): Complete the configuration by granting the binding to Netatmo Connect : `/netatmo/connect/xxx`.
2024-03-23 16:42:16.420 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:account:home' changed from OFFLINE (CONFIGURATION_ERROR): Complete the configuration by granting the binding to Netatmo Connect : `/netatmo/connect/xxx`. to OFFLINE (COMMUNICATION_ERROR)

Retry is correctly scheduled:

2024-03-23 16:42:16.416 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - Reconnection scheduled in 300 seconds

After second failure:

2024-03-23 16:47:16.470 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:account:home' changed from OFFLINE (COMMUNICATION_ERROR) to OFFLINE (CONFIGURATION_ERROR): Complete the configuration by granting the binding to Netatmo Connect : `/netatmo/connect/xxx`.

and it doesn't recover after that.

jlaur added a commit to jlaur/openhab-addons that referenced this pull request Mar 23, 2024
Signed-off-by: Jacob Laursen <jacob-github@vindvejr.dk>
@jlaur
Copy link
Contributor

jlaur commented Mar 23, 2024

For the last three tests I have provided a new console command for simulating exceptions and HTTP status codes.

In case you are interested, see 62ae29d.

@jlaur
Copy link
Contributor

jlaur commented Mar 25, 2024

@clinique - it looks like you reverted too much with your latest force push. The support for HTTP status code 429 is no longer here. Perhaps you can also update the PR description to state exactly what is fixed.

@jlaur
Copy link
Contributor

jlaur commented Mar 25, 2024

@clinique - there is also a conflict to resolve.

@jlaur
Copy link
Contributor

jlaur commented Mar 25, 2024

Baseline tests, i.e. with PR's already merged today, but without changes from this PR:

Additional module:

  • Take out batteries and wait for it to go offline (when API returns "reachable": false). 🟢
  • Insert batteries again and verify it goes back online. 🟢

Main module:

  • Cut power and wait for it to go offline (when last timestamp is more than one hour ago). 🟢
  • Power on again and verify it goes back online. 🟢

Restart binding:

  • Verify number of calls (no disposed handlers still working). 🔴

Simulate API responses:

  • OAuth authentication call failed (should retry after max 15 minutes). 🟢
  • OAuth authentication call failed twice (should retry after max 15 minutes). 🟢
  • Normal call failed. 🟢 (logged, but status not updated)
  • HTTP code 429. 🟢 (behaves like normal call failed)

Test notes:

On second OAuth2 failure:

2024-03-25 18:29:39.637 [WARN ] [etatmo.internal.servlet.GrantServlet] - Registering servlet failed:ServletModel{id=ServletModel-58,name='org.openhab.binding.netatmo.internal.servlet.GrantServlet',alias='/netatmo/connect/xxx',urlPatterns=[/netatmo/connect/xxx/*],servlet=org.openhab.binding.netatmo.internal.servlet.GrantServlet@1c10460,contexts=[{HS,OCM-54,default,/}]} can't be registered. Context / already contains servlet mapping for alias /netatmo/connect/xxx: ServletModel{id=ServletModel-56,name='org.openhab.binding.netatmo.internal.servlet.GrantServlet',alias='/netatmo/connect/xxx',urlPatterns=[/netatmo/connect/xxx/*],servlet=org.openhab.binding.netatmo.internal.servlet.GrantServlet@136f4ae,contexts=[{HS,OCM-54,default,/}]}

After playing around for some time with failed calls, too many calls:

2024-03-25 18:39:41.734 [DEBUG] [handler.capability.RefreshCapability] - netatmo:weather-station:home:indoor_small_bathroom refreshed, next one in 300s
2024-03-25 18:39:41.762 [DEBUG] [handler.capability.RefreshCapability] - netatmo:weather-station:home:indoor_small_bathroom refreshed, next one in 120s
2024-03-25 18:40:39.161 [DEBUG] [handler.capability.RefreshCapability] - netatmo:weather-station:home:indoor_small_bathroom refreshed, next one in 120s
2024-03-25 18:41:44.420 [DEBUG] [handler.capability.RefreshCapability] - netatmo:weather-station:home:indoor_small_bathroom refreshed, next one in 120s
2024-03-25 18:42:39.513 [DEBUG] [handler.capability.RefreshCapability] - netatmo:weather-station:home:indoor_small_bathroom refreshed, next one in 120s

Will run these tests again with this PR applied after conflict has been resolved.

@jlaur
Copy link
Contributor

jlaur commented Mar 25, 2024

@clinique - there is also a conflict to resolve.

Ah, now I understand: This PR depends on #16574 now.

@lolodomo
Copy link
Contributor

Conflicts need to be resolved.

@jlaur jlaur removed the awaiting other PR Depends on another PR label Mar 29, 2024
@clinique
Copy link
Contributor Author

Conflicts need to be resolved.

Yes, I have to rebase on behalf of all that has been merged and come back working on this PR now that the landscape is cleared.

Signed-off-by: clinique <gael@lhopital.org>
Signed-off-by: clinique <gael@lhopital.org>
Signed-off-by: clinique <gael@lhopital.org>
Signed-off-by: clinique <gael@lhopital.org>
Rebased

Signed-off-by: clinique <gael@lhopital.org>
Signed-off-by: gael@lhopital.org <gael@lhopital.org>
Rebased
Reintroducing TOO_MANY_REQUESTS

Signed-off-by: clinique <gael@lhopital.org>
Signed-off-by: gael@lhopital.org <gael@lhopital.org>
Rebased
Reintroducing TOO_MANY_REQUESTS
Rebased

Signed-off-by: clinique <gael@lhopital.org>
Signed-off-by: gael@lhopital.org <gael@lhopital.org>
@clinique
Copy link
Contributor Author

@lolodomo : you can now proceed with the review of this one.

@clinique clinique closed this Mar 30, 2024
@clinique clinique deleted the netatmo_16485 branch March 30, 2024 10:00
@jlaur
Copy link
Contributor

jlaur commented Apr 1, 2024

@clinique - why did you close this PR?

@clinique clinique restored the netatmo_16485 branch April 1, 2024 15:28
@clinique clinique reopened this Apr 1, 2024
@clinique
Copy link
Contributor Author

clinique commented Apr 1, 2024

@clinique - why did you close this PR?

I don't know. Probably a mistake, I made some cleaning in my repo...sorry.

@jlaur
Copy link
Contributor

jlaur commented Apr 1, 2024

@clinique - I'm now back to this.

Can I ask how you tested this? I'm asking because I have applied my "console error simulator", and I immediately ran into an issue when being authenticated already, but then hitting an HTTP error code 429:

2024-04-01 21:11:11.736 [DEBUG] [tatmo.internal.handler.DeviceHandler] - bridgeStatusChanged for bridge netatmo:weather-station:home:indoor to OFFLINE (COMMUNICATION_ERROR): @text/maximum-usage-reached [ "3600" ]
2024-04-01 21:11:11.736 [DEBUG] [tatmo.internal.handler.DeviceHandler] - bridgeStatusChanged for bridge netatmo:weather-station:home:indoor_small_bathroom to OFFLINE (COMMUNICATION_ERROR): @text/maximum-usage-reached [ "3600" ]
2024-04-01 21:11:11.738 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor:indoor3 to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.740 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor:indoor2 to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.742 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - Reconnection scheduled in 3600 seconds
2024-04-01 21:11:11.742 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:outdoor:home:indoor:outdoor to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_office to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_alva to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_linus to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.745 [WARN ] [handler.capability.WeatherCapability] - Error retrieving weather data '32': 
2024-04-01 21:11:11.748 [DEBUG] [handler.capability.RefreshCapability] - Thing 'netatmo:weather-station:home:indoor' is not ONLINE, using special refresh interval
2024-04-01 21:11:11.750 [DEBUG] [handler.capability.RefreshCapability] - 'netatmo:weather-station:home:indoor' next refresh in PT15M
2024-04-01 21:11:11.740 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:account:home' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Maximum usage reached, will reconnect in 3600 seconds.
2024-04-01 21:11:11.746 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:weather-station:home:indoor_small_bathroom' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.748 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:weather-station:home:indoor' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.752 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor:indoor3' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.753 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor:indoor2' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.754 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_office' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.755 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_alva' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.755 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_linus' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.756 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.757 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:outdoor:home:indoor:outdoor' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:26:11.751 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:weather-station:home:indoor

(and then everything came back online)

Then problem I see here is that we seem not to be protected at all from worsening after MAXIMUM_USAGE_REACHED or HTTP error code 429, since the child handlers are still permitted to call the API, and are actively doing so.

Looking at the code, I think the "API limiting" is only for the corner case of missing authentication or request errors?

Maybe I'm testing it the wrong way or misunderstood the intentions.

EDIT: It seems that after this occurred, API calls are still made periodically, but nothing is updated anymore. Example:

2024-04-01 21:45:47.840 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:weather-station:home:indoor_small_bathroom
2024-04-01 21:45:47.842 [DEBUG] [l.handler.capability.CacheCapability] - WeatherCapability requesting fresh data for netatmo:weather-station:home:indoor_small_bathroom
2024-04-01 21:45:47.843 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=46&get_favorites=false 
2024-04-01 21:45:47.850 [TRACE] [mo.internal.handler.ApiBridgeHandler] -  -with headers: Accept-Encoding: gzip, User-Agent: Jetty/9.4.52.v20230823, Authorization: Bearer xxx 
2024-04-01 21:45:48.166 [TRACE] [mo.internal.handler.ApiBridgeHandler] -  -returned: code [200 OK] body [...]
2024-04-01 21:45:48.170 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:indoor:home:indoor_small_bathroom:indoor_office
2024-04-01 21:45:48.172 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed
2024-04-01 21:45:48.173 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:indoor:home:indoor_small_bathroom:indoor_alva
2024-04-01 21:45:48.174 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:indoor:home:indoor_small_bathroom:indoor_linus
2024-04-01 21:45:48.183 [DEBUG] [handler.capability.RefreshCapability] - 'netatmo:weather-station:home:indoor_small_bathroom' next refresh in PT10M4.816913174S

throw new NetatmoException("Request interrupted");
} catch (TimeoutException | ExecutionException e) {
throw exception;
} catch (InterruptedException | TimeoutException | ExecutionException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the InterruptedException handling is correct, but I think it was not correct previously either because the handler probably didn't respect it after being converted to NetatmoException. This might give a bit insight: https://rules.sonarsource.com/java/RSPEC-2142/

By initiating a retry below, it seems we are delaying the thread interruption. Perhaps it would be better to propagate the exception?

@wborn - can you advise a bit here?

@clinique
Copy link
Contributor Author

clinique commented Apr 2, 2024

@clinique - I'm now back to this.

Can I ask how you tested this? I'm asking because I have applied my "console error simulator", and I immediately ran into an issue when being authenticated already, but then hitting an HTTP error code 429:

2024-04-01 21:11:11.736 [DEBUG] [tatmo.internal.handler.DeviceHandler] - bridgeStatusChanged for bridge netatmo:weather-station:home:indoor to OFFLINE (COMMUNICATION_ERROR): @text/maximum-usage-reached [ "3600" ]
2024-04-01 21:11:11.736 [DEBUG] [tatmo.internal.handler.DeviceHandler] - bridgeStatusChanged for bridge netatmo:weather-station:home:indoor_small_bathroom to OFFLINE (COMMUNICATION_ERROR): @text/maximum-usage-reached [ "3600" ]
2024-04-01 21:11:11.738 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor:indoor3 to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.740 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor:indoor2 to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.742 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - Reconnection scheduled in 3600 seconds
2024-04-01 21:11:11.742 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:outdoor:home:indoor:outdoor to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_office to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_alva to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.743 [DEBUG] [tatmo.internal.handler.ModuleHandler] - bridgeStatusChanged for thing netatmo:indoor:home:indoor_small_bathroom:indoor_linus to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.745 [WARN ] [handler.capability.WeatherCapability] - Error retrieving weather data '32': 
2024-04-01 21:11:11.748 [DEBUG] [handler.capability.RefreshCapability] - Thing 'netatmo:weather-station:home:indoor' is not ONLINE, using special refresh interval
2024-04-01 21:11:11.750 [DEBUG] [handler.capability.RefreshCapability] - 'netatmo:weather-station:home:indoor' next refresh in PT15M
2024-04-01 21:11:11.740 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:account:home' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Maximum usage reached, will reconnect in 3600 seconds.
2024-04-01 21:11:11.746 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:weather-station:home:indoor_small_bathroom' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.748 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:weather-station:home:indoor' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.752 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor:indoor3' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.753 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor:indoor2' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.754 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_office' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.755 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_alva' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.755 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:indoor:home:indoor_small_bathroom:indoor_linus' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.756 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:11:11.757 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'netatmo:outdoor:home:indoor:outdoor' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE)
2024-04-01 21:26:11.751 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:weather-station:home:indoor

(and then everything came back online)

Then problem I see here is that we seem not to be protected at all from worsening after MAXIMUM_USAGE_REACHED or HTTP error code 429, since the child handlers are still permitted to call the API, and are actively doing so.

Looking at the code, I think the "API limiting" is only for the corner case of missing authentication or request errors?

Maybe I'm testing it the wrong way or misunderstood the intentions.

EDIT: It seems that after this occurred, API calls are still made periodically, but nothing is updated anymore. Example:

2024-04-01 21:45:47.840 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:weather-station:home:indoor_small_bathroom
2024-04-01 21:45:47.842 [DEBUG] [l.handler.capability.CacheCapability] - WeatherCapability requesting fresh data for netatmo:weather-station:home:indoor_small_bathroom
2024-04-01 21:45:47.843 [DEBUG] [mo.internal.handler.ApiBridgeHandler] - executeUri GET  https://api.netatmo.com/api/getstationsdata?device_id=46&get_favorites=false 
2024-04-01 21:45:47.850 [TRACE] [mo.internal.handler.ApiBridgeHandler] -  -with headers: Accept-Encoding: gzip, User-Agent: Jetty/9.4.52.v20230823, Authorization: Bearer xxx 
2024-04-01 21:45:48.166 [TRACE] [mo.internal.handler.ApiBridgeHandler] -  -returned: code [200 OK] body [...]
2024-04-01 21:45:48.170 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:indoor:home:indoor_small_bathroom:indoor_office
2024-04-01 21:45:48.172 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:outdoor:home:indoor_small_bathroom:outdoor_shed
2024-04-01 21:45:48.173 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:indoor:home:indoor_small_bathroom:indoor_alva
2024-04-01 21:45:48.174 [DEBUG] [l.handler.capability.CacheCapability] - MeasureCapability requesting fresh data for netatmo:indoor:home:indoor_small_bathroom:indoor_linus
2024-04-01 21:45:48.183 [DEBUG] [handler.capability.RefreshCapability] - 'netatmo:weather-station:home:indoor_small_bathroom' next refresh in PT10M4.816913174S

I'll be back on this one very soon. I'm currently focused on another PR I had open since a while.

@lolodomo
Copy link
Contributor

@clinique @jlaur : What is the status here ?
Is it work on progress or finished and ready for a review?

@clinique
Copy link
Contributor Author

@clinique @jlaur : What is the status here ?
Is it work on progress or finished and ready for a review?

Unfinished - I have to get back on it

@lolodomo lolodomo added the work in progress A PR that is not yet ready to be merged label Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
additional testing preferred The change works for the pull request author. A test from someone else is preferred though. bug An unexpected problem or unintended behavior of an add-on work in progress A PR that is not yet ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants