Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downlink in passive roaming does not work #555

Open
2 tasks done
ghost opened this issue Oct 28, 2021 · 4 comments
Open
2 tasks done

Downlink in passive roaming does not work #555

ghost opened this issue Oct 28, 2021 · 4 comments

Comments

@ghost
Copy link

ghost commented Oct 28, 2021

  • The issue is present in the latest release.
  • I have searched the issues of this repository and believe that this is not a duplicate.

What happened?

When running passive roaming the downlink XmitDataReq from hNS does not receive the response from fNS and thus the device's downlink counter is never incremented.
Probably cause for this that hNS XmitDataReq http client call reuse the fNS' XmitDataReq or PRStartReq request context, but since the XmitDataReq is sent in a separate go routine the original context is cancelled too soon.

What did you expect?

hNS registering downlink as successful

Steps to reproduce this issue

Steps:

  1. Set up two Chirpstack NS with roaming agreement
  2. Join a device in hNS
  3. Move device out of hNS coverage and into fNS coverage
  4. Make sure uplink requires an answer (Confirmed uplink or mac-command)

Could you share your log output?

Oct 28 12:46:40 hns.example.com chirpstack-network-server[88242]: time="2021-10-28T12:46:40.639572887Z" level=info msg="roaming: api request received" async_client=false ctx_id=d302c55a-49a2-4e6e-b07e-f4af9834f829 message_type=PRStartReq receiver_id=000001 sender_id=000000 transaction_id=2708645192
Oct 28 12:46:40 hns.example.com chirpstack-network-server[88242]: time="2021-10-28T12:46:40.646412561Z" level=info msg="sent uplink meta-data to network-controller" ctx_id=d302c55a-49a2-4e6e-b07e-f4af9834f829 dev_eui=0018b20000020450
Oct 28 12:46:40 hns.example.com chirpstack-network-server[88242]: time="2021-10-28T12:46:40.647317203Z" level=info msg="device gateway rx-info meta-data saved" ctx_id=d302c55a-49a2-4e6e-b07e-f4af9834f829 dev_eui=0018b20000020450
Oct 28 12:46:40 hns.example.com chirpstack-network-server[88242]: time="2021-10-28T12:46:40.648128014Z" level=info msg="device-session saved" ctx_id=d302c55a-49a2-4e6e-b07e-f4af9834f829 dev_addr=0363d4c1 dev_eui=0018b20000020450
Oct 28 12:46:40 hns.example.com chirpstack-network-server[88242]: time="2021-10-28T12:46:40.656330156Z" level=info msg="finished client unary call" ctx_id=d302c55a-49a2-4e6e-b07e-f4af9834f829 grpc.code=OK grpc.ctx_id=c4ed93c6-50ae-445c-94f6-85b2f1802c94 grpc.duration=8.843511ms grpc.method=HandleUplinkData grpc.service=as.ApplicationServerService span.kind=client system=grpc
Oct 28 12:46:40 hns.example.com chirpstack-network-server[88242]: time="2021-10-28T12:46:40.751556966Z" level=info msg="pending mac-command block set" cid=LinkADRReq commands=1 ctx_id=d302c55a-49a2-4e6e-b07e-f4af9834f829 dev_eui=0018b20000020450
Oct 28 12:46:40 hns.example.com chirpstack-network-server[88242]: time="2021-10-28T12:46:40.752367508Z" level=info msg="device-session saved" ctx_id=d302c55a-49a2-4e6e-b07e-f4af9834f829 dev_addr=0363d4c1 dev_eui=0018b20000020450
Oct 28 12:46:40 hns.example.com chirpstack-network-server[88242]: time="2021-10-28T12:46:40.752709202Z" level=info msg="storage: downlink-frame saved" ctx_id=d302c55a-49a2-4e6e-b07e-f4af9834f829 token=54018
Oct 28 12:46:40 hns.example.com chirpstack-network-server[88242]: time="2021-10-28T12:46:40.752766413Z" level=info msg="api/roaming: returned response" ctx_id=d302c55a-49a2-4e6e-b07e-f4af9834f829 message_type=PRStartAns receiver_id=000000 result_code=Success sender_id=000001 transaction_id=2708645192
Oct 28 12:46:40 hns.example.com chirpstack-network-server[88242]: time="2021-10-28T12:46:40.752935495Z" level=error msg="downlink/data: XmitDataReq failed" ctx_id=d302c55a-49a2-4e6e-b07e-f4af9834f829 dev_eui=0018b20000020450 error="http post error: Post \"https://fns.example.com:8002\": context canceled" net_id=000000
Oct 28 14:46:40 fns.example.com chirpstack-network-server[1775656]: time="2021-10-28T14:46:40.325834985+02:00" level=info msg="gateway/mqtt: uplink frame received" gateway_id=0080000000010513 uplink_id=e5f547e2-4e0d-4310-8e95-597053be51a4
Oct 28 14:46:40 fns.example.com chirpstack-network-server[1775656]: time="2021-10-28T14:46:40.628540019+02:00" level=info msg="uplink: frame(s) collected" ctx_id=5321223e-84ab-4e75-827f-d21dc8aee4d1 mtype=ConfirmedDataUp uplink_ids="[e5f547e2-4e0d-4310-8e95-597053be51a4]"
Oct 28 14:46:40 fns.example.com chirpstack-network-server[1775656]: time="2021-10-28T14:46:40.628886063+02:00" level=info msg="uplink/data: devaddr does not match netid, assuming roaming device" ctx_id=5321223e-84ab-4e75-827f-d21dc8aee4d1 dev_addr=0363d4c1
Oct 28 14:46:40 fns.example.com chirpstack-network-server[1775656]: time="2021-10-28T14:46:40.629397827+02:00" level=info msg="uplink/data: starting passive-roaming sessions with matching netids" ctx_id=5321223e-84ab-4e75-827f-d21dc8aee4d1 dev_addr=0363d4c1
Oct 28 14:46:40 fns.example.com chirpstack-network-server[1775656]: time="2021-10-28T14:46:40.629782151+02:00" level=info msg="uplink/data: starting passive-roaming session" ctx_id=5321223e-84ab-4e75-827f-d21dc8aee4d1 dev_addr=0363d4c1 net_id=000001
Oct 28 14:46:40 fns.example.com chirpstack-network-server[1775656]: time="2021-10-28T14:46:40.752562177+02:00" level=info msg="lorawan/backend: finished backend api call" message_type=PRStartReq protocol_version=1.0 receiver_id=000001 result_code=Success sender_id=000000 transaction_id=2708645192

Your Environment

Component Version
Application Server v3.17.3
Network Server v3.15.3
@ghost
Copy link
Author

ghost commented Oct 29, 2021

After changing the go routine to send the XmitDataReq using a new context I realized that the hNS does not trigger the DownlinkTxAck flow after receiving a Success from fNS.
The relevant spec, TS002-1.1.0_LoRaWAN_Backend_interfaces section 11.3.2, only says that the fNS should send info on DLFreq and result so it should be safe to assume that the frame has been sent if a success is received. Whether or not the frame was received by the GW and sent on air is surely a nice to have info but outside the BE spec.

I will look into a solution on this part and eventually submit a PR.

@brocaar brocaar closed this as completed in 16439a4 Dec 6, 2021
@brocaar
Copy link
Owner

brocaar commented Dec 6, 2021

Thanks for reporting. I ran into the same issue last week. This was a regression that was caused by the txack handling which was added at a later point.

The above commit removes spawning the XmitDataReq within Go routine, as I'm not sure if it adds any value, it might actually be better to not run this async of the flow, as potentially some mac-commands are marked as requested within the device-session, while the downlink was never sent.

This at least solves the queue-item that stays in the queue forever. What this doesn't implement is the logging of the downlink transmission.

@brocaar brocaar reopened this Dec 6, 2021
brocaar added a commit that referenced this issue Dec 6, 2021
@conny-andersson
Copy link
Contributor

Brilliant, thanks!

I will perform additional roaming tests based on this last commit.

@conny-andersson
Copy link
Contributor

conny-andersson commented Dec 7, 2021

After some test and review I unfortunately think the async solution is needed.
If we look at the example of stateless passive roaming, we end up in a non optimal solution:

  1. fNS sends PRStartReq including uplink
  2. sNS process uplink and sends XmitDataReq
  3. fNS process downlink and sends XmitDataAns
  4. sNs process answer and sends PRStartAns

In order to follow the spec (correct order should be 1, 4, 2, 3), the XmitDataReq needs to be sent async (or triggered after the PRStartAns has been sent) so that the PRStartReq flow can continue. Currently there is a very high risk of suffering from timeout in fNS on the PRStartReq request since it has to wait for the XmitDataReq to finish.
The above flow results in even worse timeouts when async roaming is used due to the extra round-trips.

Raising the async_timeout will probably solve the timeout problems but it would be good to follow the spec here, other NS might expect that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants