Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index support for {lubridate} start - end interval #245

Open
cregouby opened this issue Dec 2, 2020 · 4 comments
Open

index support for {lubridate} start - end interval #245

cregouby opened this issue Dec 2, 2020 · 4 comments

Comments

@cregouby
Copy link

cregouby commented Dec 2, 2020

Brief description of the problem
*_gaps are a very usefull set of commands. And I would love to be able to use them on irregular longitudinal data with start and stop index columns, like tsibbledata::nyc_bikes.

Currently, irregular interval tsibble are not supported by *_gaps function family

What output is expected

This is a manual edit :

nyc_bikes_dual_index <- build_tsibble(tsibbledata::nyc_bikes, key = bike_id, index=start_time, index2=stop_time)
scan_gaps(nyc_bikes_dual_index) %>% head(3)

# A tsibble: 4,258 x 12 [0.0149047991726547µs] <America/New_York>
# Key:       bike_id [10]
   bike_id start_time          stop_time           
   <fct>   <dttm>              <dttm>              
 1 26301    2018-02-26 19:15:40 2018-02-27 07:52:49
 2 26301    2018-02-27 07:58:13 2018-02-27 12:03:27
 3 26301    2018-02-27 12:04:54 2018-02-27 13:53:51

@earowang
Copy link
Member

earowang commented Jan 7, 2021

*_gaps() don't know how to handle irregular temporal data. {tsibble} doesn't support dual index. index2 means temporary grouping. Can you please elaborate on what outcome is expected for an irregular tsibble?

@cregouby
Copy link
Author

cregouby commented Jan 9, 2021

Hello @earowang, Sure *_gaps() don't know how to handle irregular temporal data, and this is the reason for my feature request here.
And sorry for my mistake in using index2 as a secondary index for the example.

Irregular time series with start-stop / duration are very often used in process control, industrial robot monitoring, ... So detecting gaps can be crucial on those datasets. As tsibble is a fantastic framework, I would love to have it extended to irregular time-series with start-stop, as it is a generalisation of the tsibble interval in the calculation of gaps.

Expected outcome is, like in the provided example with nyc_bikes, all start-stop interval where data is missing. This allow usage ratio, efficiency valuation, communication loss detection, ... nyc_bikes here is a toy example, as usual stat-stop data have contiguous time intervals.

@earowang earowang changed the title Please provide *_gaps fuctions for start - stop irregular tsibble index support for start - end duration Sep 28, 2021
@earowang earowang changed the title index support for start - end duration index support for {lubridate} start - end interval Sep 28, 2021
@earowang
Copy link
Member

The concept of tsibble's interval is different from {lubridate} start-end Interval. tsibble's interval is more of time differences between time indices, but lubridate's Interval defines specific start-end timestamps. A regular time series means a constant difference is assumed across all time indices, and *_gaps() therefore.

Using nyc_bikes as an example, the time index can be represented in lubridate's Interval class. I'm not sure what's the time difference here for x. Should it be 1 second by aligning start or end for these two observations? Or should it be 1 second from the difference between start and end? In the field you work on, what's the common practice?

library(lubridate)
obs1 <- interval(ymd_hms("2018-02-26 19:15:40"), ymd_hms("2018-02-27 07:52:49"))
obs2 <- interval(ymd_hms("2018-02-27 07:58:13"), ymd_hms("2018-02-27 12:03:27"))
x <- c(obs1, obs2)
x
#> [1] 2018-02-26 19:15:40 UTC--2018-02-27 07:52:49 UTC
#> [2] 2018-02-27 07:58:13 UTC--2018-02-27 12:03:27 UTC

Created on 2021-09-29 by the reprex package (v2.0.1)

@cregouby
Copy link
Author

cregouby commented Nov 19, 2021

I fully agree of the missleading use of term interval here, so let's call it the min_gap, that could be the minimum time between end-time of event n and start-time of event n+1 to consider it been a gap.

In the field I work on, the alignment between start-time and next end-time depends on each situation, and is usually linked to the source timestamp precision. ( This is not a very usefull statement, I know) We can only assume that the gap cannot be raised for less than min_gap<1 (of the last digit of the time precision, i.e. no less than one seconds for ymd_hms() timestamps)
The two main use-cases are

  • single agent/machine sequence of action ( like of a single log file ) where there is an implicit min_gap of 1 second. But we could think of a parameter to force that min_gap to be higher, when the times are known to be the result of rounding operation like "closest 10 second step" on slow agents/machines.
  • aggregation of agents/machines sequence of actions ( like a syslog server aggregated file) where the implicit min_gap is strictly 1, but may be relaxed to a some higher value to allow some clock offset between agents,...

That makes me think of a configurable min_gap parameter in the *_gaps functions for irregular start-end timeseries...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants