Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

date::parse() doesn't have a formatting character for ISO 8601 strings without 'T' separator #824

Open
kiplingw opened this issue Apr 17, 2024 · 2 comments

Comments

@kiplingw
Copy link

kiplingw commented Apr 17, 2024

The ISO 8601 standard mandates as of 2019 the T separator character between the date and the start of the time. Before that it was optional.

Despite this, there is a lot of code out there which doesn't properly emit the T separator. Part of the reason is probably because people think it's optional, and by the time they find out its not there's so much code and strings already out there it's too late.

It's confusing for a lot of people. PostgreSQL output is wrong. Even ISO itself gets it wrong in its own example.

I am a fan of the robustness principle, namely that code should generate correctly formatted ISO 8601 strings. But the principle also encourages the library to still be able to consume the sometimes broken strings too.

As of write now I'm not aware of any way of elegantly handling this with the library, other than to call the library's date::parse() twice:

// Convert ISO 8601 string to a C++ time point...

    // Not all ISO 8601 strings really are correctly formatted with a T
    //  separator, despite the 2019 standard mandating it. In the event
    //  there appears to be a time separator...
    if(ISO8601String.find_first_of('T') != string::npos)
        InputStringStream >> date::parse("%FT%T %z", TimePoint);

    // Otherwise if no 'T' separator present....
    else
        InputStringStream >> date::parse("%F %T %z", TimePoint);        

    // Failed...
    if(!InputStringStream)
        throw runtime_error(
            string(_("Failed to parse last training date in ISO 8601 format: ")) +
            "\""s + ISO8601String + "\"");

One suggestion might be to add a formatting character to the library's existing list, perhaps a %q ("quirk") which consumes only a T or a single white space character. That way the above call to date::parse() could be written only once as:

InputStringStream >> date::parse("%F%q%T %z", TimePoint);

Feedback welcome.

@jleffler
Copy link

jleffler commented Apr 18, 2024 via email

@HowardHinnant
Copy link
Owner

That's an interesting suggestion. If we were to do it, there's two more difficult things that would need to be done:

  1. Generalize it to accept any character, and provide a way to parse the character so you could inspect it.
  2. Standardize it. This library is now an example implementation of C++20 <chrono>.

The second part we could do after. This library would be the field experience for the proposal. But I wouldn't want to do the field experience without being serious about standardizing it if the field experience is positive.

Here's another way to to parse this pattern. It still requires two calls to parse, but it can be done by streaming consecutively through the stream:

using TimePoint = date::sys_seconds;

TimePoint
parseISO8601(std::istream& is)
{
    date::local_days td;
    TimePoint::duration tod;
    std::chrono::minutes offset;
    is >> date::parse("%F", td);
    auto T = static_cast<char>(is.get());
    if (T != 'T' && T != ' ')
        is.setstate(std::ios::failbit);
    is >> date::parse("%T %z", tod, offset);
    if (is.fail())
        throw std::runtime_error(" ... ");
    return TimePoint{(td + tod).time_since_epoch() - offset};
}

My current feeling is that this is clean enough that I'm not motivated to extend the standard (which is a heavy lift).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants