Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix get future timestamp from hardware clock domain. #12401

Closed

Conversation

hsinkoyu
Copy link

Frame timestamp from hardware clock domain is incorrect. Sometimes the timestamp is a future time compared to
the time of getting it. This is because the samples for calculating the linear regression coefficients are not precision.
The system time of a sample is incorrect because the delay between hardware time and system time happens during
sending back the hardware time result, not prior to retrieving the hardware time.

@Nir-Az Nir-Az requested a review from ev-mp November 14, 2023 08:52
@ev-mp
Copy link
Collaborator

ev-mp commented Nov 14, 2023

@hsinkoyu , thank you for the feedback. Can you provide more detailed scenario where you've encountered timestamps being extrapolated into future?

Also addressing the comment "the delay between hardware time and system time happens during
sending back the hardware time result, not prior to retrieving the hardware time
" -
The delay is basically a propagation delay of transitioning the request over USB back and Forth.
In other words it is both before and after the HW timestamp is acquired.
Assuming that handling of the download and upload request are equal (exhibited by 'double command_delay = (system_time_finish-system_time_start)/2;'), the estimation of system_time can be deduced symmetrically by either adding or subtracting from one of the boundaries.
It may be the case that as the interval fluctuates, and the lack of ability to distinguish whether the extra delay occurred before or after the USB request has been sent, the decision to always look at the upper bound as the "Anchor" can introduce overshoots, but so may have the opposite case when it tries to "catch up" after one or more "undershot" extrapolations.

@hsinkoyu
Copy link
Author

Hi,

We use realsense in a timestamp mattered application and found this issue.
It is easy to notice it by adding logs to compare the frame's hardware time and
current time.

@@ -281,13 +283,20 @@ namespace librealsense
 
     double global_timestamp_reader::get_frame_timestamp(const std::shared_ptr<frame_interface>& frame)
     {
+        double system_time = duration<double, std::milli>(system_clock::now().time_since_epoch()).count();
         double frame_time = _device_timestamp_reader->get_frame_timestamp(frame);
         rs2_timestamp_domain ts_domain = _device_timestamp_reader->get_frame_timestamp_domain(frame);
         if (_option_is_enabled->is_true() && ts_domain == RS2_TIMESTAMP_DOMAIN_HARDWARE_CLOCK)
         {
             auto sp = _time_diff_keeper.lock();
             if (sp)
+            {
                 frame_time = sp->get_system_hw_time(frame_time, _ts_is_ready);
+                if (frame_time >= system_time)
+                    LOG_ERROR("Got future camera shot time due to un-precision samples. Future=" << frame_time - system_time << "(ms)");
+                else
+                    LOG_INFO("Interval between hardware clock timestamp and time of getting it: " << system_time - frame_time << "(ms)");
+            }
             else
                 LOG_DEBUG("Notification: global_timestamp_reader - time_diff_keeper is being shut-down");

You'll see error logs like below soon.

 15/11 10:26:32,959 ERROR [140703287981824] (global_timestamp_reader.cpp:294) Got future camera shot time due to un-precision samples. Future=7.381348(ms)
 15/11 10:26:32,993 ERROR [140703287981824] (global_timestamp_reader.cpp:294) Got future camera shot time due to un-precision samples. Future=7.864014(ms)
 15/11 10:26:33,026 ERROR [140703287981824] (global_timestamp_reader.cpp:294) Got future camera shot time due to un-precision samples. Future=8.487061(ms)
 15/11 10:26:33,059 ERROR [140703287981824] (global_timestamp_reader.cpp:294) Got future camera shot time due to un-precision samples. Future=9.001953(ms)
 15/11 10:26:33,093 ERROR [140703287981824] (global_timestamp_reader.cpp:294) Got future camera shot time due to un-precision samples. Future=9.584961(ms)
 15/11 10:26:33,126 ERROR [140703287981824] (global_timestamp_reader.cpp:294) Got future camera shot time due to un-precision samples. Future=10.011963(ms)
 15/11 10:26:33,160 ERROR [140703287981824] (global_timestamp_reader.cpp:294) Got future camera shot time due to un-precision samples. Future=10.655762(ms)

Hardware time is calculated by the linear equation that is the regression from samples we provided.
Let's dump samples to trace the problem.

@@ -210,19 +212,19 @@ namespace librealsense
             double system_time_finish = duration<double, std::milli>(system_clock::now().time_since_epoch()).count();
             double command_delay = (system_time_finish-system_time_start)/2;
 
+            LOG_DEBUG("Duration of getting the hardware clock: " << system_time_finish - system_time_start << "(ms)");
             std::lock_guard<std::recursive_mutex> lock(_read_mtx);
 15/11 10:45:42,535 INFO [140134229989120] (global_timestamp_reader.cpp:213) Duration of getting the hardware clock: 0.416016(ms)
 15/11 10:45:43,535 INFO [140134229989120] (global_timestamp_reader.cpp:213) Duration of getting the hardware clock: 0.346191(ms)
 15/11 10:45:44,586 INFO [140134229989120] (global_timestamp_reader.cpp:213) Duration of getting the hardware clock: 50.689453(ms)
 15/11 10:45:45,587 INFO [140134229989120] (global_timestamp_reader.cpp:213) Duration of getting the hardware clock: 0.698242(ms)
 15/11 10:45:46,638 INFO [140134229989120] (global_timestamp_reader.cpp:213) Duration of getting the hardware clock: 50.782227(ms)
 15/11 10:45:47,689 INFO [140134229989120] (global_timestamp_reader.cpp:213) Duration of getting the hardware clock: 50.772705(ms)
 15/11 10:45:48,690 INFO [140134229989120] (global_timestamp_reader.cpp:213) Duration of getting the hardware clock: 0.536133(ms)
 15/11 10:45:49,741 INFO [140134229989120] (global_timestamp_reader.cpp:213) Duration of getting the hardware clock: 50.770752(ms)
 15/11 10:45:50,741 INFO [140134229989120] (global_timestamp_reader.cpp:213) Duration of getting the hardware clock: 0.472900(ms)
 15/11 10:45:51,742 INFO [140134229989120] (global_timestamp_reader.cpp:213) Duration of getting the hardware clock: 0.360352(ms)

Normally the duration of getting hardware clock is less than 1 ms, but sometimes
it is about 50+ ms. I assume the firmware spends the delay and not USB transmission.
For quick fix the future time issue, I modified the code to only add precision
samples (duration < 4.0) into the sample queue. Please see the attached patch.
only_precision_samples.patch

And the future timestamp issue will not happen again.
Below log is the interval between hardware clock timestamp and the time of getting it. Around 29 ms.

 15/11 11:21:43,025 INFO [140161744594688] (global_timestamp_reader.cpp:298) Interval between hardware clock timestamp and time of getting it: 28.636230(ms)
 15/11 11:21:43,059 INFO [140161744594688] (global_timestamp_reader.cpp:298) Interval between hardware clock timestamp and time of getting it: 29.105469(ms)
 15/11 11:21:43,092 INFO [140161744594688] (global_timestamp_reader.cpp:298) Interval between hardware clock timestamp and time of getting it: 28.712402(ms)
 15/11 11:21:43,126 INFO [140161744594688] (global_timestamp_reader.cpp:298) Interval between hardware clock timestamp and time of getting it: 29.045654(ms)
 15/11 11:21:43,159 INFO [140161744594688] (global_timestamp_reader.cpp:298) Interval between hardware clock timestamp and time of getting it: 28.687988(ms)
 15/11 11:21:43,192 INFO [140161744594688] (global_timestamp_reader.cpp:298) Interval between hardware clock timestamp and time of getting it: 28.618164(ms)
 15/11 11:21:43,226 INFO [140161744594688] (global_timestamp_reader.cpp:298) Interval between hardware clock timestamp and time of getting it: 28.977783(ms)
 15/11 11:21:43,259 INFO [140161744594688] (global_timestamp_reader.cpp:298) Interval between hardware clock timestamp and time of getting it: 28.863281(ms)
 15/11 11:21:43,292 INFO [140161744594688] (global_timestamp_reader.cpp:298) Interval between hardware clock timestamp and time of getting it: 28.685791(ms)
 15/11 11:21:43,326 INFO [140161744594688] (global_timestamp_reader.cpp:298) Interval between hardware clock timestamp and time of getting it: 29.075928(ms)

Finally back to the current code, the author was aware of the jumping delay so he/she
defined the variable _min_command_delay to record the minimum, and always push the sample
with the minimum delay and even goes back to modify samples in queue.

            if (command_delay < _min_command_delay)
            {
                _coefs.add_const_y_coefs(command_delay - _min_command_delay);
                _min_command_delay = command_delay;
            }
            double system_time(system_time_finish - _min_command_delay);

The author thinks the delay happens before retrieving the hardware clock.
Since I don't know the design of firmware, I try the code that refers delay
happening after retrieving the hardware clock.

double system_time(system_time_start + _min_command_delay);

And the result is the same with my quick fix (precision samples).

 15/11 11:52:39,151 INFO [139646096897792] (global_timestamp_reader.cpp:296) Interval between hardware clock timestamp and time of getting it: 28.823975(ms)
 15/11 11:52:39,185 INFO [139646096897792] (global_timestamp_reader.cpp:296) Interval between hardware clock timestamp and time of getting it: 28.828125(ms)
 15/11 11:52:39,218 INFO [139646096897792] (global_timestamp_reader.cpp:296) Interval between hardware clock timestamp and time of getting it: 28.779785(ms)
 15/11 11:52:39,252 INFO [139646096897792] (global_timestamp_reader.cpp:296) Interval between hardware clock timestamp and time of getting it: 29.083496(ms)
 15/11 11:52:39,285 INFO [139646096897792] (global_timestamp_reader.cpp:296) Interval between hardware clock timestamp and time of getting it: 28.731201(ms)

Thanks for reviewing the commit.

@hsinkoyu
Copy link
Author

Hi,

It seems there is a problem when hardware clock (2^32) gets overflow.
I am figuring it out and first to close this pull request.

Thanks.

@hsinkoyu hsinkoyu closed this Nov 16, 2023
@hsinkoyu
Copy link
Author

#12430

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants