Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeSeries.values or method addition #227

Open
ThomDietrich opened this issue Dec 27, 2019 · 5 comments
Open

TimeSeries.values or method addition #227

ThomDietrich opened this issue Dec 27, 2019 · 5 comments

Comments

@ThomDietrich
Copy link

Looking at the usecase of tracing measurements of a process (e.g. production of paperclips over time) a user might be interested in values primarily in some cases (e.g. to compute the total amount of paperclips produced).

ts._d.values()
sum(ts._d.values())

I'd suggest to implement something similar to the following:

def values(self):
        """Return a list of all values in the time series (without time information)."""
        return self._d.values()

Wdyt? Are there other aspects to consider?

@stringertheory
Copy link
Owner

Using a traces.TimeSeries for measurements of a process like that is an interesting use case that we haven't been considering much — we've been primarily using traces for continuous processes measured at irregular times. For example: the distribution method doesn't really make sense in that case either.

Makes me wonder if there is a separate class that would be useful for this... are there other methods of TimeSeries that you've been finding useful for the process measurements (e.g. paperclips)?

@johnhaire89
Copy link

In the paperclips example, does the TimeSeries track the rate of paperclip production? If so, then you need to provide a unit for the x axis, for example "5213 paperclips per hour". Only then will you be able to work out total paperclips produced between two datetimes.

I think there needs to be a TimeSeries.total(self, x_unit) function that calculates the area under the curve (the integration of the time series).
Using the light bulb example from the documentation, we should be able to calculate the total power usage of all light bulbs between two datetimes in watt-hours.
Lets say

  • power_consumption_traces is a list of traces where each value is the power consumed in watts
  • total_power_consumption_series is the total power consumption of all light bulbs, generated by TimeSeries.merge(power_consumption_traces, operation=sum)

We could then return the total power consumption in watt-hours using something like
total_power_consumption = total_power_consumption_series.total(x_unit=datetime.timedelta(hours=1))

Without setting a unit for the x axis though, I don't think it's possible to calculate the total.

I expected traces.Histogram.total() to do what I'm suggesting, but it doesn't.

@johnhaire89
Copy link

Perhaps whichever solution should use a library such as https://pint.readthedocs.io/en/stable/index.html

@ThomDietrich
Copy link
Author

ThomDietrich commented Feb 19, 2021

Hey @johnhaire89 I generally agree with you on most points. I disagree on units and that pint is the better alternative. I used pint before but it's not comparable to what we are trying here.
I think your approach to require a unit is one-sided. In the simple example from above, data points would not be samples of a continuous process. So instead of "5213 paperclips per hour" I would rather say "5213 paperclips since the last sample". The problem is imho perfectly described by @nsteins in the proposal to implement a EventSeries.

Adding another view to this: @stringertheory suggested a new series class for this. Imho that's a question of philosophy behind traces. Is an object purpose driven or universal? As a counter example: You might be familiar with InfluxDB measurements. Think of them as tables in a database specifically architected for time series data. A whole set of statistical functions can be used to query, slice, join, and aggregate data. InfluxDB does not concern itself with the nature of your datapoints and therefore whether the sum or integral functions are meaningful for your use case is really just up to you and your use case.

How do you guys feel about this different perspective?

@johnhaire89
Copy link

johnhaire89 commented Mar 1, 2021

Apologies for the tone of my previous comments, for answering without understanding what was being discussed, and for the thought bubble about pint =)
I'm not sure if I'm adding anything beneficial to the thread, but I'm OK with the suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants