Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging events and dealing with clock drift #198

Open
LPardue opened this issue Mar 19, 2022 · 1 comment
Open

Logging events and dealing with clock drift #198

LPardue opened this issue Mar 19, 2022 · 1 comment

Comments

@LPardue
Copy link
Member

LPardue commented Mar 19, 2022

We had a question about quiche's logging on cloudflare/quiche#1197 this got me thinking some more about representations of time in qlog.

In quiche, we use a monotonic non-decreasing clock. When a trace is started, we store a monotonic start time internally and don't put that in the qlog. As events are created, we calculate a relative time since the start and record that.

I've now realised this behaviour seems to differ from what the qlog draft states in https://quicwg.org/qlog/draft-ietf-quic-qlog-main-schema.html#name-timestamps

Relative: Specify a full "reference_time" timestamp (typically this is done up-front in "common_fields", see Section 3.4.7) and include only relatively-encoded values based on this reference_time with each event. The "reference_time" value is typically the first absolute timestamp. This approach uses a medium amount of characters.

The problem is that clock drift can occur. The spec mentions this briefly but I wonder if the problem is larger than people give consideration to. For the relative and base time formats, you'd potentially end up smaller or negative values, which might break qlog tooling.

I don't have much insight into solving clock drift problems but could we consider adding a monotonic time format type in addition?

@LPardue
Copy link
Member Author

LPardue commented Mar 1, 2023

@rmarx asked me to explain this further or more clearly. So lets try a different angle.

Here is what qlog says

The "time" field indicates the timestamp at which the event occurred. Its value is typically the Unix timestamp since the 1970 epoch (number of milliseconds since midnight UTC, January 1, 1970, ignoring leap seconds). However, qlog supports two more succinct timestamps formats to allow reducing file size. The employed format is indicated in the "time_format" field, which allows one of three values: "absolute", "delta" or "relative".

Each event has a timestamp, whose value is calculated according to the format. qlog hints that the UNIX epoch is "typical", which leaves the door open to other epoch. But qlog doesn't really support expressing what the epoch is explicitly.

The relative encoding is described as:

specify a full "reference_time" timestamp (typically this is done up-front in "common_fields", see Section 3.4.7) and include only relatively-encoded values based on this reference_time with each event. The "reference_time" value is typically the first absolute timestamp. This approach uses a medium amount of characters.

If wishing to not use the unix epoch as the relative format's epoch, the spec if not clear how to represent that. reference_time is just a float64. If qlog producers are free to insert any value into this, it is not clear how qlog consumers are expected to consume the value. One use case could be in tooling to convert the relative-encoded timestamps back to into an absolute time for the purposes of converting into a human-readable format like a string, such a tool might assume the value of relative_time is the number of seconds since the unix epoch and, if it isn't, the calculation would be wrong.

A common producer use case is to want to log events starting with a timestamp of 0, and then to progress forward in ascending time durations. For this use case, the reference_time field does not add much value. Setting a reference_time of 0 could be confused with "0 seconds since the unix epoch" because the qlog specification is not tight enough.

An implied reference clock based on e.g. the unix clock, sets up other caveats. In some environments, it is possible to operate in abstract time durations, where the epoch cannot be serialized to a value that makes sense for any other system. For example, Rust provides an Instant type against which monotonic time durations can be calculated. The alternative would be to use SystemTime time, where durations do not have the same guarantee due the possibility of clocks being moved forward or backwards; see https://doc.rust-lang.org/std/time/struct.SystemTime.html#method.duration_since

Finally, qlog recommends

Events in each individual trace SHOULD be logged in strictly ascending timestamp order (though not necessarily absolute value, for the "delta" format). Tools CAN sort all events on the timestamp before processing them, though are not required to (as this could impose a significant processing overhead). This can be a problem especially for multi-threaded and/or streaming loggers, who could consider using a separate post-processor to order qlog events in time if a tool do not provide this feature.

While it alludes to some of the problems of time-synchronization, it could probably be clearer. This is especially true if using a relative time format based on a wall clock reference where clock changes could cause a negative value to be logged in the timestamp. Naively sorting these logs based on the relative timestamp would produce an incorrect view of the connection's events. Instead, a tool trying to do some log consolidation and sorting needs to be very date-time aware. That probably extends to qlog consumers, for example, if they convert to a human-friendly representation then it should not be unexpected that clocks go backward or forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant