sntp: fix the handling of the rtc counter wrapping #148

ghost · 2016-06-20T11:16:23Z

This reworks the accounting of the time to address the wrapping of the
rtc counter. It currently assumes that the time is accessed more
frequently than the counter would wrap. It also adds a semaphore to
synchronize access to the time storage.

projectgus · 2016-06-26T10:05:29Z

extras/sntp/sntp_fun.c

 #define tim_ref 	(RTC.SCRATCH[2])
 // Calibration value
 #define cal 		(RTC.SCRATCH[3])

+// To protect access to the above.
+xSemaphoreHandle sntp_sem = NULL;


Can this be declared static, as it's only accessed inside this source file?

ghost · 2016-06-26T11:53:35Z

Thank you for the feedback. Changed the semaphore to be static.

I have since tried other ways to improve this code and think more could be done, but this PR fixes some significant issues. Fwiw I found the RTC and/or the calibrations to be rather unstable and gave up on using sntp for obtaining times for a data logging application and will log the rtc count and the calibration and have the time estimated when analysing the data.

ghost · 2016-06-26T12:31:29Z

Just add a bit more. I also tried to use the sntp times to estimate the calibration, and only using the internal calibration to reset this if the time step was large. The corrections to the calibration were filtered and it did converge to give a lot better time than just using the internal calibration. However, it still received impulse errors larger than a second and then took time to settle. When there was a network outage it drifted significantly - the error had typically not settled so it had little chance.

I guess it needed to also take into account the jitter etc. Also resetting the time with each sntp response causes jumps in the time, and these can be backwards, so I think that is the wrong approach. When the time difference is within a small bounds it should probably just be making small periodic adjustments to avoid jumps and to avoid going backwards. Surely there is a lot of state-of-the-art NTP code that could be followed.

For my application at hand it seemed to be less demanding on the node to just log the times when the node interacts with the server etc, and it might even have GPS events logged with a time etc in some deployments. A smooth time line might then be fitted to the events during analysis.

projectgus · 2016-06-30T07:10:41Z

Thanks for this fix and the comments. I think the suggestions you make sound sensible, although I agree that for a lot of purposes this is good enough.

I'm keen to merge this, but I just want to check first if @doragasu - who contributed the original SNTP implementation - has any comments.

doragasu · 2016-06-30T08:43:24Z

I had a quick look and it looks good, thanks for the work! I already knew that synchronizing access to time keeping structures was needed, but wasn't brave enought to try because I don't know newlib internals, and I thought that it provided a built-in synchronization mechanism (instead of having to use semaphores). But after reading your patch, I suppose that's not the case.

Other than that, I'd like to throw a comment. Inside sntp_update_rtc() you are reading the timer value after xSemaphoreTake(). I think the order should be reversed, so the instant you read the timer is as close as possible to the instant the SNTP server sends you the updated time. Have in mind that as the semaphore might block, reading the timer after taking the semaphore might introduce very big differences.

If I get some time I'll try reviewing the code more in-depth and testing it.

This reworks the accounting of the time to address the wrapping of the rtc counter. It currently assumes that the time is accessed more frequently than the counter would wrap. It also adds a semaphore to synchronize access to the time storage.

ghost · 2016-06-30T14:41:18Z

Thank you for the feedback. @doragasu the mutex protects the state that handles the counter wrapping too, so if the RTC counter were read before taking the mutex then another task might win a race to update the tim_ref in sntp_get_rtc_time() in which case the time difference in sntp_update_rtc would be negative which is not handled there. It might need some more design work.

doragasu · 2016-07-01T07:19:58Z

@Iosen I understand, thanks for explaining. If I read the timer inside sntp_update_rtc(), then after I enter the critical section, the timer wraps and sntp_get_rtc_time() gets executed, tim_ref will be updated and when finally sntp_update_rtc() enters the critical section, things will go nuts.

OK, having that cleared, I have absolutely no problem with the changes, they look good to me.

BTW, when I did some tests of the implementation accuracy, I saw that it is heavily dependent on the network you use. When I did tests at my worplace, I got much lower correction values each time SNTP was updated that when I did the same tests at home (my home network is slower and has greater ping). As SNTP does not take RTT into account, I suppose this is expected.

I don't think it makes too much sense trying to improve this implementation accuracy: if you want better accuracy maybe the correct thing would be implementing a full blown NTP client.

ghost · 2016-07-01T15:01:46Z

@doragasu See #157 for an alternative design which seems to be more stable. The results of sdk_system_rtc_clock_cali_proc() seem so noisy here that it is not great for calibration.

funnydog · 2017-03-15T18:04:45Z

About the accuracy of the algorithm, I was looking at the relevant RFC4330 and in section 5 there is a simple algorithm that compensates for the round trip time of the packets.

I've hacked together a proof of concept and it seems to work well here: I converted every timestamp to int64_t and changed sntp_update_rtc() to take an int64_t offset instead.

However I deeply changed also the source of sntp.c because it's a mess of #ifdefs and I wanted to understand the code before doing anything, so I'm afraid the changes I made are not directly appliable.

funnydog · 2017-03-15T22:46:11Z

After a lot of experiments this is how it converges with an update interval of 15 sec (Delay and Offset are in usecs):

Delay: 33945, Offset: 2041547930, Cali: 25804
Delay: 33737, Offset: 13475, Cali: 25827
Delay: 36122, Offset: 9730, Cali: 25843
Delay: 14981, Offset: 186, Cali: 25843
Delay: 17382, Offset: 5573, Cali: 25852
Delay: 15722, Offset: 3952, Cali: 25858
Delay: 15534, Offset: 167, Cali: 25858
Delay: 15403, Offset: 6756, Cali: 25869
Delay: 15337, Offset: 1585, Cali: 25871
Delay: 16062, Offset: -417, Cali: 25871

While the offsets become very small it seems to me that the embedded RTC is quite unstable. The calibration value keeps increasing and still cannot compensate the offsets; and then suddenly the same calibration value is too big.

With longer update intervals the situation gets worse.

Besides my code updates the RTC.SCRATCH[0..1] locations directly so the clock is not monotonic. I tried to adjust gradually but with those fluctuations it doesn't really work.

For the record this is how the sntp_update_rtc() function looks right now:

void sntp_update_rtc(int64_t offset)
{
	static uint32_t last_cal_tim = 0;

	xSemaphoreTake(sntp_sem, portMAX_DELAY);
	uint32_t tim = RTC.COUNTER;
	uint32_t diff = tim - tim_ref;
	tim_ref = tim;

	uint64_t diff_us = ((uint64_t)diff * sntp_cali) >> 12;
	sntp_base = sntp_base + diff_us + offset;
	if (-100000 < offset && offset <= 100000) {
		sntp_cali += offset * 4096 / (tim - last_cal_tim);
	}
	last_cal_tim = tim;

	xSemaphoreGive(sntp_sem);
}

Rutger798 · 2020-02-27T07:07:49Z

I see there is a fix for the race conditions in the sntp_fun, but nobody merged them?
is the SuperHouse/esp-open-rtos EndOfLife or could you please merge this?

projectgus reviewed Jun 26, 2016
View reviewed changes

iosen added 2 commits July 1, 2016 00:05

Use a mutex for synchronization.

de66e2b

sheinz added the extra label Nov 29, 2016

This was referenced Apr 15, 2020

odd SNTP behavior that sometimes happens when synchronizing #364

Open

Klok op hol thekr1s/esp8266_wordclock#24

Closed

SNTP_fix_rtc_counter_wrapping #744

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sntp: fix the handling of the rtc counter wrapping #148

sntp: fix the handling of the rtc counter wrapping #148

ghost commented Jun 20, 2016

projectgus Jun 26, 2016

ghost commented Jun 26, 2016

ghost commented Jun 26, 2016

projectgus commented Jun 30, 2016

doragasu commented Jun 30, 2016

ghost commented Jun 30, 2016 •

edited by ghost

doragasu commented Jul 1, 2016

ghost commented Jul 1, 2016

funnydog commented Mar 15, 2017 •

edited

funnydog commented Mar 15, 2017

Rutger798 commented Feb 27, 2020

sntp: fix the handling of the rtc counter wrapping #148

Are you sure you want to change the base?

sntp: fix the handling of the rtc counter wrapping #148

Conversation

ghost commented Jun 20, 2016

projectgus Jun 26, 2016

Choose a reason for hiding this comment

ghost commented Jun 26, 2016

ghost commented Jun 26, 2016

projectgus commented Jun 30, 2016

doragasu commented Jun 30, 2016

ghost commented Jun 30, 2016 • edited by ghost

doragasu commented Jul 1, 2016

ghost commented Jul 1, 2016

funnydog commented Mar 15, 2017 • edited

funnydog commented Mar 15, 2017

Rutger798 commented Feb 27, 2020

ghost commented Jun 30, 2016 •

edited by ghost

funnydog commented Mar 15, 2017 •

edited