Pandas reporter `final_df_output` event resolution #706

victorgarcia98 · 2023-05-30T21:20:41Z

Add the field output_event_resolution field to PandasReporterConfigSchema used to set the event_resolution of the final_df_output. This is useful for cases in which we want to modify the event_resolution using methods that are not part of timely-beliefs (e.g: resample).

In addition, this PR solves the issue of having NaN DataSources in the returned timely-beliefs.

…tion, if provided Signed-off-by: Victor Garcia Reolid <victor@seita.nl>

Signed-off-by: Victor Garcia Reolid <victor@seita.nl>

Flix6x · 2023-05-30T22:08:24Z

Why can't we get the resolution of the final output from self.sensor.event_resolution instead?

victorgarcia98 · 2023-05-31T07:44:54Z

Why can't we get the resolution of the final output from self.sensor.event_resolution instead?

The idea is, by default, check that the output event_resolution == self.sensor.event_resolution, happening in the Reporter class. If we set the output event_resolution to self.sensor.event_resolution, we always bypass this check.

Flix6x · 2023-05-31T12:25:29Z

So now you can pass the output resolution explicitly, but it must match the output sensor resolution. ...

Can we think of some alternatives? Is this resolution only passed to make that check succeed? If so, maybe we want that check only on proper BeliefsDataFrames, and want to have a different check on regular DataFrames (maybe checking the index frequency instead)?

Flix6x · 2023-05-31T12:33:01Z

flexmeasures/data/models/reporting/__init__.py

@@ -75,7 +75,7 @@ def fetch_data(
            )

            # store data source as local variable
-            for source in bdf.sources.unique():
+            for source in bdf.sources.unique().dropna():


bdf.lineage.sources might also work.

https://github.com/SeitaBV/timely-beliefs/blob/main/timely_beliefs/beliefs/__init__.py#L109

I wouldn't have expected any NaN sources, actually. How come there are any?

Using bdf.lineage.sources also yields a NaN source.

The following code snippet should be useful to reproduce this behavior using the data from 2023.

from datetime import datetime, timedelta import pytz from flexmeasures.data.models.time_series import Sensor sensor = Sensor.query.get(6) timezone = pytz.timezone("Europe/Amsterdam") event_starts_after = tz.localize(datetime(2023, 1, 2)) event_ends_before = tz.localize(datetime(2023, 1, 23)) resolution = timedelta(seconds=3600) bdf = sensor.search_beliefs( event_starts_after=event_starts_after, event_ends_before=event_ends_before, resolution=resolution, )

>> bdf.sources.isna().sum() 37

…t sensor, try with the frequency of the events_start. Signed-off-by: Victor Garcia Reolid <victor@seita.nl>

victorgarcia98 · 2023-05-31T21:40:57Z

Can we think of some alternatives? Is this resolution only passed to make that check succeed? If so, maybe we want that check only on proper BeliefsDataFrames, and want to have a different check on regular DataFrames (maybe checking the index frequency instead)?

Good point! I've updated the code to update the event_resolution of the output BeliefDataFrame to the event_start frequency in case this doesn't match the output sensor's.

Flix6x · 2023-06-27T15:48:01Z

flexmeasures/data/models/reporting/pandas_reporter.py

+        # use event_starts frequency when final_output event resolution
+        # does not correspond with the event resolution of the output sensor
+        event_frequency = final_output.event_starts.inferred_freq
+
+        if event_frequency:
+            event_frequency = pd.to_timedelta(
+                pd.tseries.frequencies.to_offset(event_frequency)
+            ).to_pytimedelta()


I think this could be problematic. The suggested refactoring of the reporter class may shine some light, but there are some other things we also need to consider:

The distinction between frequency and resolution, at least in the TimelyBeliefsReporter.

FlexMeasures stores resolutions in the database as intervals, which allows distinguishing between a calendar day (interval '1 day') and a 24-hour day (interval '24 hours'), but uses mostly Python datetime.timedelta objects in the code, which doesn't allow expressing a calendar day (timedelta(days=1) is defined to be exactly timedelta(hours=24)). So Pandas offsets won't always end up correctly in our database. And even within Pandas, some assumptions regarding these conversions can be challenged. For example, in the case of "24H" = timedelta(hours=24).

We could use some test cases to see what works and what doesn't. And maybe we should prioritize the refactoring.

victorgarcia98 added 2 commits May 30, 2023 23:13

feat: set the final_output_df event_resolution to output_event_resolu…

aca30a4

…tion, if provided Signed-off-by: Victor Garcia Reolid <victor@seita.nl>

fix: avoid fetching a NaN DataSource

9d5c9e7

Signed-off-by: Victor Garcia Reolid <victor@seita.nl>

victorgarcia98 added Scheduling Reporting labels May 30, 2023

victorgarcia98 self-assigned this May 30, 2023

victorgarcia98 removed the Scheduling label May 30, 2023

victorgarcia98 marked this pull request as ready for review May 30, 2023 21:21

victorgarcia98 requested a review from Flix6x May 30, 2023 21:21

victorgarcia98 changed the title ~~Pandas reporter schema output event resolution~~ Pandas reporter final_df_output event resolution May 30, 2023

Flix6x reviewed May 31, 2023

View reviewed changes

fix: if output df event resolution does not match the one of th outpu…

75fbdc3

…t sensor, try with the frequency of the events_start. Signed-off-by: Victor Garcia Reolid <victor@seita.nl>

victorgarcia98 requested a review from Flix6x June 1, 2023 06:34

Flix6x reviewed Jun 27, 2023

View reviewed changes

victorgarcia98 marked this pull request as draft October 16, 2023 08:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas reporter `final_df_output` event resolution #706

Pandas reporter `final_df_output` event resolution #706

victorgarcia98 commented May 30, 2023

Flix6x commented May 30, 2023

victorgarcia98 commented May 31, 2023

Flix6x commented May 31, 2023

Flix6x May 31, 2023

victorgarcia98 May 31, 2023 •

edited

victorgarcia98 commented May 31, 2023

Flix6x Jun 27, 2023

Pandas reporter final_df_output event resolution #706

Are you sure you want to change the base?

Pandas reporter final_df_output event resolution #706

Conversation

victorgarcia98 commented May 30, 2023

Flix6x commented May 30, 2023

victorgarcia98 commented May 31, 2023

Flix6x commented May 31, 2023

Flix6x May 31, 2023

Choose a reason for hiding this comment

victorgarcia98 May 31, 2023 • edited

Choose a reason for hiding this comment

victorgarcia98 commented May 31, 2023

Flix6x Jun 27, 2023

Choose a reason for hiding this comment

Pandas reporter `final_df_output` event resolution #706

Pandas reporter `final_df_output` event resolution #706

victorgarcia98 May 31, 2023 •

edited