New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas reporter final_df_output
event resolution
#706
base: main
Are you sure you want to change the base?
Pandas reporter final_df_output
event resolution
#706
Conversation
…tion, if provided Signed-off-by: Victor Garcia Reolid <victor@seita.nl>
Signed-off-by: Victor Garcia Reolid <victor@seita.nl>
final_df_output
event resolution
Why can't we get the resolution of the final output from |
The idea is, by default, check that the output |
So now you can pass the output resolution explicitly, but it must match the output sensor resolution. ... Can we think of some alternatives? Is this resolution only passed to make that check succeed? If so, maybe we want that check only on proper BeliefsDataFrames, and want to have a different check on regular DataFrames (maybe checking the index frequency instead)? |
@@ -75,7 +75,7 @@ def fetch_data( | |||
) | |||
|
|||
# store data source as local variable | |||
for source in bdf.sources.unique(): | |||
for source in bdf.sources.unique().dropna(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bdf.lineage.sources
might also work.
https://github.com/SeitaBV/timely-beliefs/blob/main/timely_beliefs/beliefs/__init__.py#L109
I wouldn't have expected any NaN sources, actually. How come there are any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using bdf.lineage.sources
also yields a NaN source.
The following code snippet should be useful to reproduce this behavior using the data from 2023.
from datetime import datetime, timedelta
import pytz
from flexmeasures.data.models.time_series import Sensor
sensor = Sensor.query.get(6)
timezone = pytz.timezone("Europe/Amsterdam")
event_starts_after = tz.localize(datetime(2023, 1, 2))
event_ends_before = tz.localize(datetime(2023, 1, 23))
resolution = timedelta(seconds=3600)
bdf = sensor.search_beliefs(
event_starts_after=event_starts_after,
event_ends_before=event_ends_before,
resolution=resolution,
)
>> bdf.sources.isna().sum()
37
…t sensor, try with the frequency of the events_start. Signed-off-by: Victor Garcia Reolid <victor@seita.nl>
Good point! I've updated the code to update the |
# use event_starts frequency when final_output event resolution | ||
# does not correspond with the event resolution of the output sensor | ||
event_frequency = final_output.event_starts.inferred_freq | ||
|
||
if event_frequency: | ||
event_frequency = pd.to_timedelta( | ||
pd.tseries.frequencies.to_offset(event_frequency) | ||
).to_pytimedelta() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be problematic. The suggested refactoring of the reporter class may shine some light, but there are some other things we also need to consider:
- The distinction between frequency and resolution, at least in the
TimelyBeliefsReporter
. - FlexMeasures stores resolutions in the database as intervals, which allows distinguishing between a calendar day (
interval '1 day'
) and a 24-hour day (interval '24 hours'
), but uses mostly Pythondatetime.timedelta
objects in the code, which doesn't allow expressing a calendar day (timedelta(days=1)
is defined to be exactlytimedelta(hours=24)
). So Pandas offsets won't always end up correctly in our database. And even within Pandas, some assumptions regarding these conversions can be challenged. For example, in the case of "24H" = timedelta(hours=24).
We could use some test cases to see what works and what doesn't. And maybe we should prioritize the refactoring.
Add the field
output_event_resolution
field toPandasReporterConfigSchema
used to set theevent_resolution
of thefinal_df_output
. This is useful for cases in which we want to modify theevent_resolution
using methods that are not part of timely-beliefs (e.g: resample).In addition, this PR solves the issue of having NaN DataSources in the returned timely-beliefs.