Event Based Time Series #229

nsteins · 2020-05-01T16:59:05Z

Proposing a new class for Traces EventSeries for handling data that is a series of timestamps denoting the occurrence of discrete events. For example this collection of 311 requests in Chicago, where each record is a request that has a timestamp for when it was opened and when it was closed. This is a fit for Traces because it is another example of unevenly-spaced time series and can use traces.TimeSeries for certain calculations

An example of how the API might look

df = pd.read_csv('311_Service_Requests.csv',nrows=10000)
creation = EventSeries(df['CREATED_DATE'].dropna())
completion = EventSeries(df['CLOSED_DATE'].dropna())

Event series could tell you the amount of events that occured between two arbitrary timestamps

>>> creation.events_between(pd.Timestamp('2018-01-01'),pd.Timestamp('2019-02-01'))
6681

EventSeries would also have a cumulative sum function which returns a TimeSeries of the cumulative number of events that have occured since the first record

>>>ts = creation.cumsum()
>>>ts.plot()

For events that have a "open" and "close" time stamp, EventSeries can calculate the number of active open cases

>>>diff = EventSeries.count_active(creation, completion)
>>>diff.plot()

Finally, EventSeries can calculate the inter-event arrival times and create visualizations for analysis

>>>after = creation.time_lag(how='after')
>>>creation.plot_time_lag(how='after')

I am already working on implementing this, but I would appreciate feedback and suggestions on API or features. Particularly interested if this can be extended to support the use case outlined in this issue #227

The text was updated successfully, but these errors were encountered:

johnhaire89 · 2020-08-26T00:00:02Z

This looks very useful, although I wonder if EventSeries could just a special case of TimeSeries.
Using your example, each service request might be represented as a TimeSeries with two points.

service_call_event = traces.TimeSeries(default=0)
service_call_event[pd.Timestamp('2019-07-17 11:56:40')] = 1
service_call_event[pd.Timestamp('2019-07-30 13:14:54')] = 0

Suppose if you have the list of all service calls in a list named service_call_list where each event is a TimeSeries with 2 points, then your cumsum function might be the same as a merge operation:

active_events = traces.TimeSeries.merge(service_call_list, operation=sum)

All that said, I guess that this way of processing the data would be far less efficient than your method.

I have a device that flashes according to a timetable. It reports a "commencement" event when it starts flashing and a "cessation" event where it stops. I'm looking into a method to represent the state on a timeline by creating a TimeSeries for that state and adding a value of 1 for each commencement and a value of 0 for each cessation.
I'm also trying to represent the device's timetable as a time series for the desired state, with a value for 1 for when it should start flashing and 0 for when it should stop flashing. With this method I can use a xor operation to generate a plottable time series of all the times that the desired state didn't equal the actual state.

I like your time_lag function because I want to work out the total amount of time that my actual flashing state didn't match with the desired state. However, now that I have a TimeSeries where y=1 for any time that the actual state didn't match the desired state, maybe that function can be performed by existing operation as well. @devs, Histogram.total() calculate the area under the curve?

nsteins · 2020-09-02T18:22:18Z

You are correct that you could represent this as a TimeSeries, and in fact, that was my first approach to modeling this kind of data. It's just slow because traces.TimeSeries.merge iterates through the entire SortedDict on every insertion.

johnhaire89 · 2020-09-03T02:44:51Z

Ah. Understood.

I feel like event_series is just a list of events, rather than something that fits into the library.

A faster way to build a timeseries could be

ts = traces.TimeSeries(default:0)
for row in df:
    ts[df['CREATED_DATE'].dropna()] = 1
    ts[df['CLOSED_DATE'].dropna()] = -1

A cumulative sum function could be an awesome addition to the api

cumsum_trace = traces.TimeSeries(default:0)
cumsum = 0
for k, v in ts.items():
    cum_sum += v
    cumsum_trace[k] = cumsum

As for feature requests, it could be cool if there was a function get_events(self, start_signal, end_signal) that returned a list of "events". Given (key, value) pairs in a time series, each event will have a start (key when value == start_signal) and an end (key when value == end_signal).

nsteins · 2020-09-04T17:44:39Z

I think that EventSeries fits in with Traces because it tries to follow a similar design and API to TimeSeries. There are obviously many ways to accomplish this, but I often found myself frustrated trying to accomplish this with pure pandas, and unable to do a lot of the things I wanted to with TimeSeries.

The main difference is that TimeSeries are designed around a model of an irregularly sampled continuous signal. I'm not sure what physical quantity a cumulative sum function would correspond to for a general TimeSeries.

Could you explain the get_events(self, start_signal, end_signal) request a bit more?

johnhaire89 · 2020-10-14T02:39:05Z

I think it could be nice to have a function that transforms a timeseries into a list of periods (each with a start and end time or a start time and duration) based on the values.
You can then answer questions like "provide a list of periods where a light was switched on" or, using the shopping cart example from the docs, "provide a list of periods where the user had apples in their cart".
start_signal and end_signal could be functions so that it works on non-numeric traces.

ThomDietrich · 2021-02-19T11:52:21Z

Hey @nsteins, coming here from #227. Are you working on this? The feedback was short but I think this would be a great addition to the library, as an EventSeries equally falls into the task traces tries to solve: Handling time series. The fact that there are these two main classes makes EventSeries quite logical.
@stringertheory came to the same conclusion in #227

Any timeline for this or questions you still want to discuss? I guess that would be easiest managed in a preliminary PR.

nsteins self-assigned this May 4, 2020

nsteins added the Enhancement Request label May 4, 2020

nsteins mentioned this issue May 5, 2020

Initial implementation of EventSeries #231

Closed

This was referenced May 18, 2020

Initial implementation of EventSeries #233

Closed

Dev #234

Merged

stringertheory mentioned this issue Feb 2, 2024

No longer maintained? #246

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Event Based Time Series #229

Event Based Time Series #229

nsteins commented May 1, 2020

johnhaire89 commented Aug 26, 2020

nsteins commented Sep 2, 2020

johnhaire89 commented Sep 3, 2020 •

edited

nsteins commented Sep 4, 2020

johnhaire89 commented Oct 14, 2020

ThomDietrich commented Feb 19, 2021

Event Based Time Series #229

Event Based Time Series #229

Comments

nsteins commented May 1, 2020

johnhaire89 commented Aug 26, 2020

nsteins commented Sep 2, 2020

johnhaire89 commented Sep 3, 2020 • edited

nsteins commented Sep 4, 2020

johnhaire89 commented Oct 14, 2020

ThomDietrich commented Feb 19, 2021

johnhaire89 commented Sep 3, 2020 •

edited