Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas reporter final_df_output event resolution #706

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

victorgarcia98
Copy link
Contributor

Add the field output_event_resolution field to PandasReporterConfigSchema used to set the event_resolution of the final_df_output. This is useful for cases in which we want to modify the event_resolution using methods that are not part of timely-beliefs (e.g: resample).

In addition, this PR solves the issue of having NaN DataSources in the returned timely-beliefs.

…tion, if provided

Signed-off-by: Victor Garcia Reolid <victor@seita.nl>
Signed-off-by: Victor Garcia Reolid <victor@seita.nl>
@victorgarcia98 victorgarcia98 self-assigned this May 30, 2023
@victorgarcia98 victorgarcia98 marked this pull request as ready for review May 30, 2023 21:21
@victorgarcia98 victorgarcia98 changed the title Pandas reporter schema output event resolution Pandas reporter final_df_output event resolution May 30, 2023
@Flix6x
Copy link
Contributor

Flix6x commented May 30, 2023

Why can't we get the resolution of the final output from self.sensor.event_resolution instead?

@victorgarcia98
Copy link
Contributor Author

Why can't we get the resolution of the final output from self.sensor.event_resolution instead?

The idea is, by default, check that the output event_resolution == self.sensor.event_resolution, happening in the Reporter class. If we set the output event_resolution to self.sensor.event_resolution, we always bypass this check.

@Flix6x
Copy link
Contributor

Flix6x commented May 31, 2023

So now you can pass the output resolution explicitly, but it must match the output sensor resolution. ...

Can we think of some alternatives? Is this resolution only passed to make that check succeed? If so, maybe we want that check only on proper BeliefsDataFrames, and want to have a different check on regular DataFrames (maybe checking the index frequency instead)?

@@ -75,7 +75,7 @@ def fetch_data(
)

# store data source as local variable
for source in bdf.sources.unique():
for source in bdf.sources.unique().dropna():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bdf.lineage.sources might also work.

https://github.com/SeitaBV/timely-beliefs/blob/main/timely_beliefs/beliefs/__init__.py#L109

I wouldn't have expected any NaN sources, actually. How come there are any?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using bdf.lineage.sources also yields a NaN source.

The following code snippet should be useful to reproduce this behavior using the data from 2023.

from datetime import datetime, timedelta
import pytz

from flexmeasures.data.models.time_series import Sensor

sensor = Sensor.query.get(6)

timezone = pytz.timezone("Europe/Amsterdam")

event_starts_after = tz.localize(datetime(2023, 1, 2))
event_ends_before = tz.localize(datetime(2023, 1, 23))
resolution = timedelta(seconds=3600)

bdf = sensor.search_beliefs(
    event_starts_after=event_starts_after,
    event_ends_before=event_ends_before,
    resolution=resolution,
)
>> bdf.sources.isna().sum()
37

…t sensor, try with the frequency of the events_start.

Signed-off-by: Victor Garcia Reolid <victor@seita.nl>
@victorgarcia98
Copy link
Contributor Author

Can we think of some alternatives? Is this resolution only passed to make that check succeed? If so, maybe we want that check only on proper BeliefsDataFrames, and want to have a different check on regular DataFrames (maybe checking the index frequency instead)?

Good point! I've updated the code to update the event_resolution of the output BeliefDataFrame to the event_start frequency in case this doesn't match the output sensor's.

Comment on lines +88 to +95
# use event_starts frequency when final_output event resolution
# does not correspond with the event resolution of the output sensor
event_frequency = final_output.event_starts.inferred_freq

if event_frequency:
event_frequency = pd.to_timedelta(
pd.tseries.frequencies.to_offset(event_frequency)
).to_pytimedelta()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be problematic. The suggested refactoring of the reporter class may shine some light, but there are some other things we also need to consider:

  • The distinction between frequency and resolution, at least in the TimelyBeliefsReporter.
  • FlexMeasures stores resolutions in the database as intervals, which allows distinguishing between a calendar day (interval '1 day') and a 24-hour day (interval '24 hours'), but uses mostly Python datetime.timedelta objects in the code, which doesn't allow expressing a calendar day (timedelta(days=1) is defined to be exactly timedelta(hours=24)). So Pandas offsets won't always end up correctly in our database. And even within Pandas, some assumptions regarding these conversions can be challenged. For example, in the case of "24H" = timedelta(hours=24).

We could use some test cases to see what works and what doesn't. And maybe we should prioritize the refactoring.

@victorgarcia98 victorgarcia98 marked this pull request as draft October 16, 2023 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants