Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SARA-R510S keeps CTS line sporadically high if DTR is used for power saving #192

Open
eeFLis opened this issue Feb 1, 2024 · 64 comments
Open

Comments

@eeFLis
Copy link
Contributor

eeFLis commented Feb 1, 2024

Hi

maybe no ubxlib related problem, but perhaps you have already observed this behavior

If the DTR pin is used for power saving, the SARA R510 s module sporadically holds the CTS line high and no longer releases it.
the module is then no longer responsive.

we had the problem several times with the module firmware version 03.15, A00.01.
We hoped that this problem would be solved with the current module firmware version 03.30, A00.01, but the problem still occurs, even if it is much less frequent (after hours ).

Have you observed such behavior in your test environment?

@RobMeades
Copy link
Contributor

Hi @eeFLis: can't say this is something we've noticed, @philwareublox is going to check internally.

@RobMeades
Copy link
Contributor

@eeFLis: some questions. You say that the module allows the CTS line to float high when DTR is used for power saving: is the module known to be otherwise functional when this problem has occurred, e.g. is VINT high, do you maybe get URC's turning up, that kind of thing? Have you happened to be able to capture any HW traces (e.g. of the UART lines with a Saleae probe or similar) that show the relationship between the DTR line and CTS or other lines?

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 1, 2024

Hi

I have attached the salea capture. However, this was already recorded some time ago with the module fw version 03.15, A00.01, but should show the problem
SARA-R510S-01B-00 keep CTS high if DTR is used for power save.zip

@RobMeades
Copy link
Contributor

Can you confirm that you are using the default value for U_CELL_PWR_UART_POWER_SAVING_GSM_FRAMES, so AT+UPSV looks like this:

AT+UPSV=1,1300
OK

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 1, 2024

Yes, we use the default value.
At the time of the salea capture, we certainly used a older version of the ubxlib.
If the default value for U_CELL_PWR_UART_POWER_SAVING_GSM_FRAMES was never adjusted in the ubxlib it was AT+UPSV=1,1300

Were you able to gather all the information from the salea capture or do you need additional information?

@RobMeades
Copy link
Contributor

RobMeades commented Feb 1, 2024

I've been staring at it with the relevant expert internally.

If you look at the picture below, at the green marker is where the module emits a +CSCON: 0, so it is no longer RRC-connected and hence could go to sleep at any time (FYI, I'm told that when DTR-controlled sleep is in play the timeout in the AT+UPSV=1,1300 doesn't apply, the module can go to sleep as soon as DTR permits it):

image

12 seconds later comes the problematic part, focusing on that:

image

You can see, in the middle of the picture, that the module has allowed CTS to float high, without there being any TXD UART activity, which must be because the module has decided to go to 32 kHz sleep. A short while later CTS goes low again, which is likely because some timed activity inside the module has caused it to wake up again, but only briefly. CTS goes high again, and it is at about this time that the MCU pulls DTR low in order to send the next AT command.

Pure speculation, of course, but a guess would be that something to do with the fact that the module is autonomously coming out of [EDIT: I meant going into] 32 kHz sleep at the same moment as DTR goes high [EDIT: I meant low] causes it to miss the DTR edge and remain in 32 kHz sleep.

Will continue to investigate...

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 1, 2024

you mean > at the same moment as DTRgoes low right?
That would make sense

@RobMeades
Copy link
Contributor

Sorry, yes at the same moment as DTR goes low.

@RobMeades
Copy link
Contributor

Out of interest, are you able to read the state of the CTS pin from SW? I had a feeling that STM32 wouldn't allow you to read the state of a pin that had been assigned to the UART? Vaguely wondering if there might be a workaround of some form that you could do if you could read it.

@RobMeades
Copy link
Contributor

Were you able to gather all the information from the Saleae capture or do you need additional information?

If you have additional Saleae captures of the problem then we could check if they look the same but no matter if you don't.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 1, 2024

Unfortunately I have no additional capture.
I think the pin status can be read even if it is assigned to the UART.
Do you have an idea for a workaround if the pin could be read? then I can test it shortly.

@RobMeades
Copy link
Contributor

Yes, not exactly sure what the workaround would be really. Haven't been able to think of anything yet that would not introduce a new problem. Will carry on thinking...

@RobMeades
Copy link
Contributor

One more question: do you know if the problem goes away if you set UPSV=0? I mean I assume it does, but I thought I'd ask.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 1, 2024

It already goes away when we set the U_CFG_APP_PIN_CELL_DTR -1.
However, this makes the application much less energy efficient.

@RobMeades
Copy link
Contributor

Thanks for confirming, understood, the aim is of course to retain the power-saving goodness, just wanted to be sure of the problem.

@RobMeades
Copy link
Contributor

RobMeades commented Feb 1, 2024

Just for my information, if you set U_CFG_APP_PIN_CELL_DTR -1, so you aren't using the DTR pin to control UART power saving, you can still have UART power saving of course, provided you have implemented uPortUartCtsSuspend(), which should be there for STM32F4. It may be slightly more clumsy, since the ubxlib code will be waking the module up by prodding it with AT if the module has not been talked-to for more than 6 seconds, but it should still save pretty much the same amount of power, unless your device is going in and out of UART power saving very frequently (e.g. every few 10's of seconds, as opposed to every few minutes).

Is this an approach you have tried?

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

Yes we are using this approach.

That's what I tried to say with "we set U_CFG_APP_PIN_CELL_DTR -1 instead of switching off power saving control completely with UPSV=0". Was not quite clear sorry.

The additional consumption is mainly noticeable after RRC connection ist released and the timer T3324 is running.
Any communication over UART during this time wakes up the module for 6 seconds, which is completely unnecessary.

I understand that the extra consumption doesn't look like much but if you're focusing on low power it's a crucial.
The low power features of the SARA R510s were one reason why we chose it.

@philwareublox
Copy link

For my own clarity, are you saying without using the DTR line to drop the consumption of the UART you're then seeing the basic 6 second timeout for the UART. Whereas if you were able to use the DTR line successfully every time without this CTS issue, you can immediately control the UART on/off state as you want to.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

yes.

As an example: during the active time (T3324) when data can still be received, the ubxlib is polling for new incomning data.
this means that the 6 second timeouts for the UART are never reached and the module do not entering power saving.

If the DTR line is used for power saving control the module can switch to power save mode immediately after the response.

@RobMeades
Copy link
Contributor

RobMeades commented Feb 2, 2024

@philwareublox and I have been discussing this, maybe there is a way forward. Background first:

  • ubxlib is designed to present a sockets interface,
  • sockets [I think] doesn't have a concept of an "is there any data on a socket" type callback, so ubxlib doesn't have one,
  • u-blox modules will send a URC on the AT interface if there is received data available on a socket,
  • however, when uSockRead() or uSockReceiveFrom() is called, ubxlib always just asks the module for data, it doesn't simply trust a data count from the URC as the URC is not emitted again until there has been a read, and whether a read has been done is an application matter, out of ubxlib's control, there would be faaaar too much opportunity for misunderstandings,
  • you have no other option than to call uSockRead() or uSockReceiveFrom(), and that shags power saving in your scenario.

A fix, then, might be to offer a function something like uSockDataAvailableCallback(): if you set such a callback, we will call it when the URC lands. You can hook that in and not bother calling uSockRead() or uSockReceiveFrom() until your callback has been called.

Could that work?

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

I think ubxlib already offers a solution for this. We use uCellSockGetBytesPending() to get the pending bytes reported by URC.
Only when some data has been received, we call uSockRead() or uSockReceiveFrom()

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

However, this only solves the problem for the example I mentioned.
Any communication with the module during the active time (T3324) prevents the module from switching to power save mode for an unnecessarily long time.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

or do you see a problem with the use of uCellSockGetBytesPending() for the case described in the example?

@RobMeades
Copy link
Contributor

RobMeades commented Feb 2, 2024

It should work, I guess the only problem might be if there is a lag between when the URC arrived to updated the number and when the application called uCellSockGetBytesPending(); that might cause the uSockRead() to fall into sleepy-time. But even in the best case, where you read the data immediately, you will still get the 6 second lag. And reducing the value of U_CELL_PWR_UART_POWER_SAVING_GSM_FRAMES will result in more laggy behaviour from the ubxlib APIs in general, since ubxlib will have to send an AT to wake the module up on more occasions.

We will continue thinking.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

OK thanks. In the meantime, are there any further findings as to where the error comes from when DTR is active?

@RobMeades
Copy link
Contributor

The expert internally is trying to reproduce the problem, just now looking to find the same FW version as you are using, to be quite sure about it. It is a strange case: the CTS line was allowed to float high, indicating that the module has entered 32 kHz sleep, fully 3 milliseconds before the DTR line is asserted: that's quite a lot of time in module terms, hard to see what the relationship might be.

@RobMeades
Copy link
Contributor

RobMeades commented Feb 2, 2024

FYI, the expert internally has your FW version now and is trying to script some form of sliding-delay-window of "AT/OK followed by a delay and then DTR raised" to see if he can make the problem occur; nothing from the few stabs he's had at this yet.

Wondering idly: I guess you have a callback hooked into uCellNetSetBaseStationConnectionStatusCallback()? I understand that the +CSCON:x URC, which that callback is triggered by, marks the start of T3324 - is there something (admittedly probably hacky and complicated) you could do with that and some "usually correct" guess at T3324 to hold off or bounce modem-access as appropriate?

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

thanks for the update
yes we have a callback hooked into uCellNetSetBaseStationConnectionStatusCallback() which marks the start of T3324.
If I understand you correctly, you are thinking about a solution if the DTR problem cannot be solved and we have to fall back on the UART timeout right? That could be a possible option, but then we would still have the 6 second lag.

@RobMeades
Copy link
Contributor

RobMeades commented Feb 2, 2024

That could be a possible option, but then we would still have the 6 second lag.

True: if it is the case [as I think it is] that SARA-R5 cannot run the UART from a 32 kHz clock then every AT communication pulls the modem out of 32 kHz sleep and the only way to get back in is by letting DTR rise or by waiting for N (by default 6) seconds.

I was think that, if you could know that the module was not going to enter 32 kHz sleep anyway (which might be after +CSCON: 1) then, for those periods, you could just do anything on the AT interface, while for the times when the module could enter 32 kHz sleep (which might be after +CSCON: 0) you could, somehow, hold-off/batch-up/manage calls into ubxlib that might generate AT commands in order to maximize the power saved.

All a bit complex, and vague, I know.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

I have done a few more tests.
I think the first assumption that a DTR edge will be missed cannot be confirmed.
The module is no longer responsive in this state even after the DTR Line is pulled high and low again.
But I still have to verify it with a scope.

@RobMeades
Copy link
Contributor

Oh, that's excellent, thanks very much for taking the time to do this, I will pass it on to the right person and get back to you.

@RobMeades
Copy link
Contributor

RobMeades commented Feb 6, 2024

Browsing the Saleae log (I'll leave it up to the expert to view the OTII log as Windows Defender seems to have decided the OTII 3 application is a trojan on my laptop and keeps closing it), just to confirm the area of interest, it is here:

image

...zooming in a bit:

image

...and zooming in again, the last thing that happens being an AT+UCGED query which seems to go fine, then:

image

...zooming in again, DTR is pulled low by the MCU to do the next thing about 100 ms later (happens to be about 1 ms after CTS has gone high) and it has no effect; CTS remains high:

image

...and stays there for 100 seconds, until you pull the plug. During that time there are no URCs, VINT is high, but nothing at all of any use seems to be happening modem-wise:

image

Is that a correct interpretation? Anything I've missed?

@RobMeades
Copy link
Contributor

RobMeades commented Feb 6, 2024

For the benefit of our internal expert, I've exported from the Saleae log the AT exchange in the portion of interest, i.e. after coming out of PSM the second time, and transcribed it as best I can below; all pretty standard ubxlib stuff, with 3GPP power saving active and UPSV mode 3 enabled.

One interesting thing is that the module seems to be still connected when the problem occurs (we have not seen a +CSCON: 0 after the +CSCON: 1 that was emitted by the module 6.94 seconds before the problem point) so it might not be very sleepy.

AT
OK
ATE0
OK
AT+CMEE=2
OK
AT+UDCONF=1,0
OK
ATI9
03.30,A00.01
OK
AT&C1
OK
AT&D0
OK
AT&K3
OK
AT+UPSV=3
OK
AT+UPSMR=1
OK
AT+CPSMS?
+CPSMS: 1,,,"01000011","00001000"
OK
AT+UMNOPROF?
+UMNOPROF: 100
OK
AT+UGPRF?
+UGPRF: 2,0,""
OK
AT+UPSD=0,0,0
OK
AT+UPSD=0,100,1
OK
AT+UPSDA=0,3
OK

+UUPSDA: 0,"10.64.62.108"

AT+USOCR=17,14127
+USOCR: 0
OK
AT+USOCO=0,"<IP address>",<port>
OK
AT+USOST=0,"<IP address>",<port>,51
@ <51 bytes of binary data follows>
OK

+USOST: 0,51

AT+UGCNTRD
+UGCNTRD: 1,79,0,79,0
OK
AT+UCPSMS?
+UCPSMS: 1,,,"00111111","00001000",1
OK

+CSCON: 1

+UUSORD: 0,17

AT+USORF=0,1024
+USORF: 0,"<IP address>",<port>,17,"<17 bytes of binary data in here>"
OK
AT+UGCNTRD
+UGCNTRD: 1,79,45,79,45
OK
AT+CSQ
+CSQ: 11,3
OK
AT+UCGED?
+UCGED: 2
6,4,228,01
6300,20,50,50,849,1098a06,499,e8953ab5,8000,c0,19,15,2.18,3,255,255,19,-50,-50,300,255,255,80
OK

@RobMeades
Copy link
Contributor

OK, our internal expert has been staring at the Saleae and Otii logs and is wondering if the issue is that, around the point of failure, ubxlib releases DTR (after the AT+UCGED query) and then the next ubxlib API call is made and so ubxlib pulls DTR low again, which in this case happens to be around 20 ms later; this time is relatively short. Putting the HW into sleep and taking it out again apparently takes somewhere around 25ish ms so it might be that short releases of DTR are the problem.

I will add code to ubxlib which remembers the time at which DTR was last released and, when a subsequent ubxlib API call is made, it will ensure that a minimum period [to be safe we will make that 50 ms] has passed before DTR is pulled low again and the ubxlib API call is executed. If I push a preview branch of that code here, would you be able to try it?

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 7, 2024

Yes for sure

@RobMeades
Copy link
Contributor

RobMeades commented Feb 7, 2024

Actually, better than that, the code is already there (I'd obviously anticipated something of this nature and then forgotten): U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS is set to U_AT_CLIENT_ACTIVITY_PIN_HYSTERESIS_INTERVAL_MS, which is 10 ms.

Could you try overriding the conditional compilation flag U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS at build time to, say, 50 ms, see if that makes any difference? Since the problem is statistical in nature I guess it might be difficult to be sure of course.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 7, 2024

I will do that. But even if the time is too short to wake up again, the module is in a state where consumption is much higher than it should be.
power profil DTR

@RobMeades
Copy link
Contributor

RobMeades commented Feb 7, 2024

Yes: the thinking is that there is a bug somewhere in the module FW which means that, if DTR is raised and then lowered again too quickly, it goes off the rails, and in such a case it would never sleep.

EDIT: this is backed-up by the fact that the module does not awaken again after about 60 seconds, you having left it for at least 100 seconds; normally, processes within the module would cause it to awaken by itself, if this is not happening then it is relatively seriously upset.

Checking the code again, can I ask that, rather than defining U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS to be 50 ms, you define U_AT_CLIENT_ACTIVITY_PIN_HYSTERESIS_INTERVAL_MS to be 50 ms? Reason is that, ultimately, it is the code within the AT client which controls the pin behaviour, rather than the code in the cellular area.

EDIT: IGNORE WHAT I SAID ABOVE and set U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS to be 50 ms!!!

@RobMeades
Copy link
Contributor

Apologies for the confusion, please stick with setting U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS to be 50 ms, I should learn how to read my own code :-).

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 7, 2024

Unfortunately, the error still occurs

Here is the recording
capture U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS 50ms.zip

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 7, 2024

This time the error occurs without the level of DTR altering. However, something seems to wake the module from sleep which causes the error.

power profil CTS high

@RobMeades
Copy link
Contributor

RobMeades commented Feb 7, 2024

Oh! That is interesting: seems like an exceptionally short wake-up (low CTS for just 5 ms) performed autonomously by the module, DTR being high for the entire time. I will alert our internal expert to the new information.

@RobMeades
Copy link
Contributor

RobMeades commented Feb 7, 2024

Looking at the Saleae trace, the DTR toggles don't seem to have increased to 50 ms, still seem to around 20 ms, I wonder if there's something up with the hysteresis code? Will check it...

image

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 7, 2024

should not have been set U_CELL_PWR_UART_POWER_SAVING_DTR_HYSTERESIS_MS to 50ms instead of U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS ?

@RobMeades
Copy link
Contributor

Gah, yes, you're right: there are two, one in the cellular code which is applied when things like CMUX need to disable DTR, and the other one in the AT Client for when it is the one that has to toggle DTR (the usual case). It is probably not a good idea that these are uncoordinated as well.

I will look at how to do this better but, as you say, set U_CELL_PWR_UART_POWER_SAVING_DTR_HYSTERESIS_MS, which is currently 20 ms, to be 50 ms for the purposes of this experiment. My apologies.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 7, 2024

should i reset U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS to the previous value?

@RobMeades
Copy link
Contributor

If you are not using CMUX (which I think is the case) then U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS is not used. If you are using CMUX (which would only be if you are using a GNSS chip connected via or within a cellular module, or you are using PPP) then it probably still doesn't matter but set it to 50 ms to be safe.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 7, 2024

have checked it in the saleae trace should be ok now.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 7, 2024

The error has just occurred again. Do you need the trace?

@RobMeades
Copy link
Contributor

You're getting good at this :-|. The trace might be useful, if possible.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 7, 2024

:-) Would be happier if the error disappeared.

Here the recording
capture U_CELL_PRIVATE_DTR_PIN_HYSTERESIS_INTERVAL_MS 50ms -2.zip

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 14, 2024

Hi @RobMeades Is there any news on this topic?

@RobMeades
Copy link
Contributor

I was wondering that myself: apparently the internal expert was side-tracked onto something else, they will get back to it today.

@RobMeades
Copy link
Contributor

...but now they've had a high priority interrupt onto something else, will update you when I know something...

@RobMeades
Copy link
Contributor

FYI, the internal expert is now back on this, investigating...

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 27, 2024

Great, thanks for the information

@RobMeades
Copy link
Contributor

RobMeades commented Mar 4, 2024

Probably not good news but, FYI, our internal SW expert has now deferred to our internal HW expert: it seems that, on SARA-R5, when DTR is used for power saving control, there is a 3ish ms window, around the time the module is going into sleep, where pulling DTR low causes the module to get stuck, eventually resulting in a watchdog timer going off internally.

He has tried various SW workaround but so far has found nothing which can be guaranteed to avoid this window. Here's hoping the internal HW expert can find something we can use.

@eeFLis
Copy link
Contributor Author

eeFLis commented Mar 4, 2024

The good thing is that the cause of the problem could be identified.
Thank you for all the great support, if there is anything you want to test, please let me know.

@eeFLis
Copy link
Contributor Author

eeFLis commented Mar 18, 2024

Hi @RobMeades Do you have any news from the HW expert on this?

@RobMeades
Copy link
Contributor

Sorry for the delay in this: I'm told the HW team are currently discussing it, will get back to you when they have something to say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants