Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All appears to be working but no response being delivered #338

Open
phoebebright opened this issue Jan 25, 2024 · 11 comments
Open

All appears to be working but no response being delivered #338

phoebebright opened this issue Jan 25, 2024 · 11 comments

Comments

@phoebebright
Copy link

I am using aiocoap python client to manage communicaitons between an IOT logger device and a python coap server.

The simple process is that the logger makes a POST or GET request and receives a response from coap. I am doing these ACKs like this:

   response = aiocoap.Message(mtype=aiocoap.ACK, code=aiocoap.Code.CREATED,
                       token=request.token, payload=return_payload)

This has all been working well for over a year but when I put more than 5 loggers using the system then the request works fine but none of them get a response.

image

As far as CoAP is concerned all is well and I see the ACK being sent but the logger never receives it.

Has anyone experienced this or have any suggests where to look?

Defaults:

Python version: 3.10.7 (main, May 29 2023, 13:51:48) [GCC 12.2.0]
aiocoap version: 0.4.5
Modules missing for subsystems:
dtls: everything there
oscore: missing cbor2, cryptography, filelock, ge25519
linkheader: missing LinkHeader
prettyprint: missing LinkHeader, cbor2, termcolor, pygments
Python platform: linux
Default server transports: tinydtls:tcpserver:tcpclient:tlsserver:tlsclient:udp6
Selected server transports: tinydtls:tcpserver:tcpclient:tlsserver:tlsclient:udp6
Default client transports: tinydtls:tcpclient:tlsclient:udp6
Selected client transports: tinydtls:tcpclient:tlsclient:udp6
SO_REUSEPORT available (default, selected): True, True

@chrysn
Copy link
Owner

chrysn commented Jan 25, 2024

Please post that screenshot as text (copy-paste into a ``` delimited block as with your code example), or at least in higher resolution -- this is not legible.

Note that you should never set the mtype in a response Message: The library handles this, and will react appropriately set what the protocol demands.

@phoebebright
Copy link
Author

phoebebright commented Jan 25, 2024

Thanks for responding. Here is the log file showing a message coming and and the response going out.
Note that this has been working for over a year. If the number of units are reduced then they start receiving responses again and if they added back then it goes into "no response" mode again.

Jan 24 16:25:00 q2test.puritie.tinycomms.com coap-server DEBUG:coap-server:Incoming message <aiocoap.Message at 0x7f0ad3687fd0: NON POST (MID 50298, token 5653608e4e359e27) remote <UDP6EndpointAddress 91.168.225.244:61522 (locally 167.99.85.188%eth0)>, 1 option(s), 323 byte(s) payload>
Jan 24 16:25:00 q2test.puritie.tinycomms.com coap-server DEBUG:coap-server:New unique message received
Jan 24 16:25:00 q2test.puritie.tinycomms.com coap-server coap time received from AB0033 - 1706113500 payload 1706113234 - diff 266
Jan 24 16:25:00 q2test.puritie.tinycomms.com coap-server data as received {'schema_ver': 3, 'gadget_id': 'AB0033', 'settings_ver': 0, 'rec_type': 1, 'timestamp': 1706113234, 't_amb_c': 100, 't_rad_c': 100, 'rh_amb_pct': 100, 'pressure_hpa': 1250, 'co2_ppm': 1002, 'as_x_ms': -0.01, 'as_y_ms': -0.01, 'noise_tr_press': 45, 'noise_tr_dba': 0, 'noise_tr_dbc': 0, 'light_amb_lx': 6553, 'light_white_lx': 655, 'no2_we': 65507, 'no2_ae': 0}
Jan 24 16:25:00 q2test.puritie.tinycomms.com coap-server Data to be written AB0033:IEQ_BASE12 {'schema_ver': 3, 'gadget_id': 'AB0033', 'settings_ver': 0, 'rec_type': 'Avg', 'timestamp': 1706113234, 't_amb_c': 100.0, 't_rad_c': 100.0, 'rh_amb_pct': 100.0, 'pressure_hpa': 1250.0, 'co2_ppm': 1002, 'as_x_ms': -0.01, 'as_y_ms': -0.01, 'noise_tr_press': 45.0, 'noise_tr_dba': 0.0, 'noise_tr_dbc': 0.0, 'light_amb_lx': 6553, 'light_white_lx': 655, 'no2_we': 65507.0, 'no2_ae': 0.0, 'data_type': 'IEQ_', 'org_id': 'TEST', 'calibrated': False}
Jan 24 16:25:01 q2test.puritie.tinycomms.com coap-server /mb/ POST AB0033 responding ACK with payload b'""'
Jan 24 16:25:01 q2test.puritie.tinycomms.com coap-server DEBUG:coap-server:Sending message <aiocoap.Message at 0x7f0ad3684880: ACK 2.01 Created (MID 2510, token 5653608e4e359e27) remote <UDP6EndpointAddress 91.168.225.244:61522 (locally 167.99.85.188%eth0)>, 2 byte(s) payload>
J

@chrysn
Copy link
Owner

chrysn commented Jan 25, 2024

That's just one message that looks like it's being sent just fine. Do you have a comparison log with 5 clients requesting simultaneously?

Log with links removed and regular formatting
 Jan 24 16:25:00 q2test.puritie.tinycomms.com coap-server DEBUG:coap-server:Incoming message <aiocoap.Message at 0x7f0ad3687fd0: NON POST (MID 50298, token 5653608e4e359e27) remote <UDP6EndpointAddress 91.168.225.244:61522 (locally 167.99.85.188%eth0)>, 1 option(s), 323 byte(s) payload>
 Jan 24 16:25:00 q2test.puritie.tinycomms.com coap-server DEBUG:coap-server:New unique message received
 Jan 24 16:25:00 q2test.puritie.tinycomms.com coap-server coap time received from AB0033 - 1706113500 payload 1706113234 - diff 266
 Jan 24 16:25:00 q2test.puritie.tinycomms.com coap-server data as received {'schema_ver': 3, 'gadget_id': 'AB0033', 'settings_ver': 0, 'rec_type': 1, 'timestamp': 1706113234, 't_amb_c': 100, 't_rad_c': 100, 'rh_amb_pct': 100, 'pressure_hpa': 1250, 'co2_ppm': 1002, 'as_x_ms': -0.01, 'as_y_ms': -0.01, 'noise_tr_press': 45, 'noise_tr_dba': 0, 'noise_tr_dbc': 0, 'light_amb_lx': 6553, 'light_white_lx': 655, 'no2_we': 65507, 'no2_ae': 0}
 Jan 24 16:25:00 q2test.puritie.tinycomms.com coap-server Data to be written AB0033:IEQ_BASE12 {'schema_ver': 3, 'gadget_id': 'AB0033', 'settings_ver': 0, 'rec_type': 'Avg', 'timestamp': 1706113234, 't_amb_c': 100.0, 't_rad_c': 100.0, 'rh_amb_pct': 100.0, 'pressure_hpa': 1250.0, 'co2_ppm': 1002, 'as_x_ms': -0.01, 'as_y_ms': -0.01, 'noise_tr_press': 45.0, 'noise_tr_dba': 0.0, 'noise_tr_dbc': 0.0, 'light_amb_lx': 6553, 'light_white_lx': 655, 'no2_we': 65507.0, 'no2_ae': 0.0, 'data_type': 'IEQ_', 'org_id': 'TEST', 'calibrated': False}
 Jan 24 16:25:01 q2test.puritie.tinycomms.com coap-server /mb/ POST AB0033 responding ACK with payload b'""'
 Jan 24 16:25:01 q2test.puritie.tinycomms.com coap-server DEBUG:coap-server:Sending message <aiocoap.Message at 0x7f0ad3684880: ACK 2.01 Created (MID 2510, token 5653608e4e359e27) remote <UDP6EndpointAddress 91.168.225.244:61522 (locally 167.99.85.188%eth0)>, 2 byte(s) payload>

@phoebebright
Copy link
Author

They all look like that - no errors at all on the CoAP side but no response received by the logger. I can replicate this using python code acting as a client so it is definitely not a problem with the loggers. Where could the response message be getting lost or how could coap be appearing to send it but not actually sending it and is there a way to instrument it to identify where the problem is?

@chrysn
Copy link
Owner

chrysn commented Jan 25, 2024

The best way to split this is to run wireshark (or any similar tool) on the host running the aiocoap server; see whether the responses get out there. Thus we can split the problem space.

@phoebebright
Copy link
Author

Thanks - will give that a go.

@phoebebright
Copy link
Author

phoebebright commented Jan 31, 2024

So the problem is not that coap is not responding but that the response is very slow and the device timed out. The question is where/why is there this delay:

This is the code in my coap server:

class Pulse(GadgetResource):
    '''return message queueu '''

    async def render_get(self, request):

        start_dt = datetime.now()

        payload = json.loads(request.payload)
        gadget_id, gadget_data = validate_gadget_id(payload)
      
        if gadget_id:
            return_payload = get_messageq(gadget_id)



            # returning "ACK" to the sensor
            response = aiocoap.Message(mtype=aiocoap.ACK, code=aiocoap.Code.CREATED,
                                       token=request.token, payload=return_payload)
            if return_payload != "":
                print(f"/pulse/ GET {gadget_id} responding - ACK {return_payload}")

            # track_response("pulse", start_dt,
            #                {"gadget_id": str(gadget_id or "N/A")},
            #                {"status": (500 if (str(return_payload).startswith("ERROR")) else 200)})
        else:
            # Create a payload to explain the error
            error_payload = json.dumps({"error": "Invalid or missing gadget_id"}).encode('UTF-8')

            # Respond with a 4.00 Bad Request or 4.04 Not Found
            response = aiocoap.Message(mtype=aiocoap.ACK,
                                       code=aiocoap.Code.UNAUTHORIZED,  # or aiocoap.Code.NOT_FOUND
                                       token=request.token,
                                       payload=error_payload)
        return response

And this can be seen in the log file:

<image.png>

Looking at the log file, the elapsed time between the CON being received and the ACK being sent is less than a second.

But looking at Wireshark there seems to be a gap of 33 seconds.

image

This does not happen with just a couple of devices logging but at 4 units some messages start getting delayed and at 5 units only an occassional message makes it through. Obviously we need more than 4 units per CoAP server! Do you have any thoughts on where to look for the problem?

[edit by chrysn: Used backticks to make code readable]

@chrysn
Copy link
Owner

chrysn commented Jan 31, 2024

While I'm processing this, please check what happened to the log, and preferably upload it as text instead of an image. (Can't do text search in images).

@chrysn
Copy link
Owner

chrysn commented Jan 31, 2024

Just in the interest of removing unneeded parts and visual clutter, there is no need to set the token or the mtype in the responses -- those are managed by the library.

@chrysn
Copy link
Owner

chrysn commented Jan 31, 2024

When you upload the log, please make sure it is showing requests from the multiple clients (as the multiple clients are what causes this), and add a Wireshark view showing the multiple requests as well.

Right now, my best guess would be that the requests are all coming in from the same address, and then flow control kicks in, but the logs and network dumps will clarify that.

@phoebebright
Copy link
Author

Great, thanks for the pointers. Will give that a go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants