Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with grpc error handling and time to receive device update event #1090

Open
ipzago opened this issue Oct 25, 2023 · 5 comments
Open

Help with grpc error handling and time to receive device update event #1090

ipzago opened this issue Oct 25, 2023 · 5 comments
Assignees
Labels
🔌 feature New feature request ❓ question Further information is requested

Comments

@ipzago
Copy link

ipzago commented Oct 25, 2023

Hello,

We are trying to improve our PLGD usage and some questions appeared. Could you help us?

  • Talking about error, is there any place that we could consult possible errors returned from PLGD?
    We're trying to evaluate better the errors returned from grpc calls (get, update, create and delete)
    and for this we need to know possible errors and the best way to identify them without need to compare strings.

  • Should grpc attempt to update a read only resource return error? We were expecting it but no error is received.
    E.g.: We tried to perform grpc update of module's /cfg access identifier and no error was received, just the
    resource payload with no changes.

  • Sometimes, it takes seconds to receive events updating device metadata or operations of unregister/register.
    Is there any timer in PLGD os DPS to check/send updates?
    E.g.: In our tests, we pause or disconnect module container from network and wait for system reaction, and sometimes
    it takes more than expected.

Best Regards,
Icaro

@ipzago ipzago added the 🔌 feature New feature request label Oct 25, 2023
@jkralik
Copy link
Member

jkralik commented Oct 25, 2023

Talking about error, is there any place that we could consult possible errors returned from PLGD?
We're trying to evaluate better the errors returned from grpc calls (get, update, create and delete)
and for this we need to know possible errors and the best way to identify them without need to compare strings.

The errors code that could be returned are defined by grpc codes and our extended codes. The device for the action is specified in data.status, link. Also data.status is used to set proper code for the grpc.

Should grpc attempt to update a read only resource return error? We were expecting it but no error is received.
E.g.: We tried to perform grpc update of module's /cfg access identifier and no error was received, just the
resource payload with no changes.

Plgd doesn't know if a resource is read-only, so the request is sent to the device, and the device needs to return an error code. Additionally, the body can contain data that describes the error. In this case, the error code will be set in data.status of the body in the gRPC response, and the body in data.content.data will be encoded mostly in CBOR.

Sometimes, it takes seconds to receive events updating device metadata or operations of unregister/register.
Is there any timer in PLGD os DPS to check/send updates?
E.g.: In our tests, we pause or disconnect module container from network and wait for system reaction, and sometimes
it takes more than expected.

This depends on various factors and use cases:

  • When a module is able to retrieve configurations from DPS, it goes through intervals in an infinite loop. In the default worst case scenario, it could take up to 130 seconds (configurable).
  • The configured keepalive in the CoAP gateway determines the handling of OFFLINE events when a device has been disconnected.
  • The configuration of the heartbeat in the CoAP gateway is relevant for handling SIGKILL events in the CoAP gateway.
  • The duration during which the module has been unable to reach the hub is a factor. The cloud connector will iterate in seconds intervals in an infinite loop, with a default worst-case scenario of 66 seconds (configurable).

@jkralik jkralik added the ❓ question Further information is requested label Oct 25, 2023
@ipzago
Copy link
Author

ipzago commented Oct 25, 2023

Hey jkralik, many thanks for the answer.
Could you please just mention which variables we should set using PLGD bundle on docker?

@jkralik
Copy link
Member

jkralik commented Oct 25, 2023

@ipzago To use ghcr.io/plgd-dev/hub/bundle:2.12.1, you need to perform two runs with a mounted volume to the directory /data:

  1. The first run generates configurations to the volume. Use the following command and then stop it:

    docker run -it --rm -v /tmp/bundle_data:/data ghcr.io/plgd-dev/hub/bundle:2.12.1
    
  2. Modify the configurations in /tmp/bundle_data. For example, you can edit /tmp/bundle_data/coap-gateway-secure.yaml by changing values like apis.coap.keepAlive.timeout or serviceHeartbeat.timeToLive to 10s.

  3. In the second run, use the configurations from the module with the same command:

    docker run -d --name plgd-bundle -v /tmp/bundle_data:/data ghcr.io/plgd-dev/hub/bundle:2.12.1
    

@ipzago
Copy link
Author

ipzago commented Nov 29, 2023

@jkralik I performed some local tests changing both values to 10s and 5s and it seems to make no difference. I mounted the volume as indicated and confirmed that file was with right values on /data. Is there any other config that we could try?

@jkralik
Copy link
Member

jkralik commented Nov 29, 2023

@ipzago Pls could you look to coap-gw logs in the file /data/log/coap-gateway.log?

There will be one INFO log (mostly second line) similar to {"L":"INFO","T":"2023-11-16T08:19:25.93088978Z","M":"config: .... You can then verify if the values are loaded as configured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔌 feature New feature request ❓ question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants