Skip to content

Time date processing for riak_ts

John Daily edited this page Jun 23, 2016 · 4 revisions

Overview

There are several overlapping tasks that arise from recent discussions regarding desired time features. These include:

  1. Allow users to enter ISO 8601 time/date strings (including timezones) in SQL (both input and query).
    • Dates will be converted to UNIX epoch for storage.
    • If a timezone is not supplied, a decision must be made: reject, assume UTC, or assume system default timezone.
    • Leap seconds make queries much trickier since the same UNIX timestamp occurs on two different days.
    • A failure by the operators to maintain updated system timezone files would potentially result in unexpected behavior.
  2. Allow users to use time operations in queries (see Postgres' doc for examples).
  3. Display resulting timestamps in riak-shell as ISO 8601 format.
    • The output could be UTC, the system's timezone, drawn from the table definition, or even specified as part of the query.
  4. Allow operators to specify alternative timezones as the default for a table.
    • Impacts quantum boundaries.
    • May impact output from riak-shell if timestamps are processed into ISO 8601 and riak-shell has access to the table metadata.
    • Probably impact default timezone on input and query date strings.
    • Would not impact storage: dates would still be stored as UTC.
    • A failure by the operators to maintain updated system timezone files would potentially result in unexpected behavior.
  5. Allow operators to specify explicit integer values as a quantum function.
    • Allows the database to have any zero point that makes sense for the application.
    • Makes it easier to semantically treat the values as something other than time values (e.g. geolocation).
  6. Allow operators to define the meaning of a timestamp increment.
    • Removes the assumption that all time values are in milliseconds.
    • Might, for simplicity, require an integer-based quantum function.
  7. Allow clients to determine any above table-specific time-related metadata for display purposes.
  8. Allow clients to set a session-specific time zone offset (see Oracle for an illustration).
  9. Implement date and time data types.

Also, there's clearly a significant documentation task: we need to make certain the current assumptions and implications are captured for end users, along with the impact of changes from the above tasks. We have made the implementation almost irrelevant to end users, but many of them will still wish to understand how this fits together.

It is also important to document that use of the riak-shell translations to and from user-friendly time strings will impact performance.

Context

Currently, users must enter integer values for times, assumed for quanta calculations to be UNIX epoch values in UTC.

  • Client libraries are free to accept any time/date format and process them into UNIX epoch time, much like riak-shell could do.
  • The timestamp field in the database is a 64-bit unsigned integer.
  • The epoch for UNIX time is midnight January 1, 1970 UTC (effectively GMT).
    • The specification of a DDL quantum function in terms of days will result in quanta boundaries at midnight UTC.
    • Timezones with partial offsets from UTC (e.g. IST) will have quanta boundaries that do not fall on the local hour.
  • The time values are assumed to be in milliseconds, which impacts all quanta calculations.
  • The system clock and default timezone is currently completely irrelevant. All time values are calculated (quantum boundaries) or specified as input by the user/application.
  • Leap seconds are silently swallowed into the timestamp.
    • If a second is added at the end of a day, both that second and the one that follows are represented by a single timestamp. Wikipedia

Other than determining quantum boundaries, the fact that timestamps are assumed to be UNIX epoch values in milliseconds has zero consequences for Riak or the user. This means that the integer values currently stored in Riak can have any semantic meaning applied that the application/user so chooses.

Prioritization

All of these features are largely independent, although there is some sequencing that can be done based on utility and level of effort.

Allow clients to determine any above table-specific time-related metadata for display purposes is effectively a subtask of the previously-defined task to expose table metadata to client libraries.

User-facing features in rough priority/effort order

  • Allow users to enter ISO 8601 date/time strings in SQL (input and query).
  • Display resulting timestamps in riak-shell as ISO 8601 format.
  • Allow clients to specify a session-specific time zone offset.
  • Allow users to use time operations in queries.
  • Implement date and time data types.

Operator-facing features in rough priority/effort order

  • Allow operators to specify explicit integer values as a quantum function.
  • Allow operators to define the meaning of a timestamp increment.
  • Allow operators to specify alternative timezones as the default for a table.

Implementation notes

Date parsing will have to take place in Riak, with associated performance impact. Applications should do their own translation to UNIX time whenever practical.

Gordon has an existing date parsing library that handles some subset of the necessary changes.

Adding new keywords is problematic because SQL queries and inserts may have to be modified with double quotes to escape column names after that point in time. riak-shell could issue warnings on future reserved words with planning.