Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected error extracting intervals from dates given as POSIXct or datetime format #286

Open
cboettig opened this issue Aug 22, 2022 · 4 comments

Comments

@cboettig
Copy link

Turning dates to datetimes results in an undefined interval, causing tsibble to fail to parse the index correctly.
I would expect that if index_valid(datetime) is true, I could use datetime as a valid index in as_tsibble().
Consider the following minimal reprex:

We define a sequence of dates as datetime data:

library(tsibble)
dates <- seq(as.Date("2017-01-01"), as.Date("2017-01-10"), by = 1)
datetime <- lubridate::as_datetime(dates)
index_valid(datetime)
#> [1] TRUE

Above, {tsibble} tells us YES this is still a valid index! But it cannot calculate the interval:

interval_pull(datetime)
#> <interval[1]>
#> [1] ?

and so we cannot actually use this index as a tsibble index:

as_tsibble(data.frame(time = datetime), index= time)
#> Error in `validate_interval()`:
#> ! Can't obtain the interval due to the mismatched index class.
#> ℹ Please see `vignette("FAQ")` for details.

Created on 2022-08-22 by the reprex package (v2.0.1)

@mitchelloharawild
Copy link
Member

From the FAQ (https://tsibble.tidyverts.org/articles/faq.html)

Error: “Can’t obtain the interval due to the mismatched index class.”

The interval depends on the index class. It is unclear in this situation to tell if it’s daily data with implicit missingness or it’s monthly data. If using Date underlying monthly data, each month could range from 28 to 31 days, which isn’t regularly spaced. But class yearmonth puts emphasis on 12 months per year, which is clearly regularly spaced and the accurate representation for aggregations over months. This applies to POSIXct for sub-daily data, Date for daily, yearquarter for quarterly, and etc. If you encounter this error “Can’t obtain the interval due to mismatched index class.”, it’s the same underlying issue.

When you convert dates into datetimes you introduce time variations between some observations. While most observations will be 24 hours apart, some could be 23 hours or 25 hours apart due to daylight savings. So the common interval could be 1 hour, 30 minutes, 24 hours, or something else depending on the timezone of the datetime.

For your particular example, there are no timezone shifts so I agree that the correct interval here should be 24 hours (note not 1 day as days may not have the same duration). It's a bit pedantic, but if you have daily data you should use a daily precision to avoid these complications.

@cboettig
Copy link
Author

Thanks @mitchelloharawild , I thought it might be something like that. Really appreciate the excellent details here. Datetime math is always confusing. I may not be following properly though, here -- wouldn't this require a conversion to be ambiguous about timezone in order for my coercion to be ambiguous?

My reprex coerces dates to datetimes just to create the reprex -- in my actual use case I just have datetime data (with timezone) but hit the same error. The lubridate coercion above attaches an explicit timezone as well doesn't it? Shouldn't the interval be unambiguous with datetime data that includes timezone information?

Even if the interval is ambiguous here, I guess I don't really understand what index_valid() is telling me in this case then. In what sense is this still a valid index if it can't be used as an index?

I definitely appreciate that tsibble is pedantic about these things! But I thought I was being on the safe/pedantic side by using more precise time interval(?) If I am using unambiguous time encodings and can loss-less-ly round trip between them in other time packages like lubridate, it seems a bit weird to me that tsibble should suspect ambiguity where none exists? sorry if I'm just being thick here.

@mitchelloharawild
Copy link
Member

index_valid() is a class check for if that vector class can be used as an index column for a tsibble. It prevents you from setting things like a character vector as the time column.

The <interval> of a tsibble be a few things, including:

What happens when you explicitly define the timezone as UTC or whichever is most appropriate?

@cboettig
Copy link
Author

Thanks @mitchelloharawild . The above example is already explicitly setting the timezone as UTC, but we still get the error as shown in the reprex. (lubridate::as_date() takes an optional tz argument which defaults to UTC, presumably precisely to avoid creating the ambiguity in this case). Would it be possible for tsibble to identify the interval in this case?

Thanks for the note about index_valid() -- (also I see now the docs do say it is just checking type). Is there a different function I can use to check the validity of an index, including interval? (I see interval_pull() shows an ambiguous interval, I like the use of ? here and the great docs are pretty clear what that means. validate_interval() does not appear to be exposed to the user?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants