Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(fo): add historical production data for FO parser #6759

Conversation

amv213
Copy link
Contributor

@amv213 amv213 commented May 16, 2024

Description

If it can be helpful, this PR enhances the FO (Faroe Islands) parser to be able to fetch production data for past dates.

Currently, the parser can only fetch live "instantaneous" production data, but new APIs are available on https://www.sev.fo/framleidsla/hagtoel that make available historical data too 🙌🏽

In particular, this includes an API that offers production data with hourly granularity over windows of 48h!

image

https://www.sev.fo/api/elproduction/search?type=item&group=powerplant&from=20230215170000&to=20230216170000

The data returned is itemised by power plant name, production mode, hour, and area ('main' islands vs. 'south' islands) making it really nice to work with:

e.g.

{
    "powerplant": "Eiðisverkið",
    "motivepower": "Water",
    "area": "Main",
    "date": "2023-02-15 00:00:00",
    "mwh": 15.6
}

There is also an API that returns directly the aggregated production by production mode for each hour of the day:

image

but unfortunately the data cannot be split up into the subareas so might as well just use the unaggregated API to have access to all of the raw information directly and to the aggregations ourselves.

Now the FO parser will return hourly data for the target date of interest + the previous day!

Note that the historical API seems to make available hourly data with a delay of ~2:25h, so if requesting data for a target_date that is "live", I have set up the parser so that the historical data for the current day is supplemented with an instantaneous datapoint from the "live" API (the one that was being used exclusively so far). This will still leave a gap of missing data for 1 or 2 hours, but it should already be much better than just having access to live data.

I have also noticed on the app that for some reason it seems like Faroe Islands have not been generating any solar power (e.g. looking at the last year) but they definitely seem to be having solar production in the south of the island, so I wonder if there is a problem in how the "live" API data has been parsed so far 👀 This new API we definitely returns solar data so it might help on that front too!

Double check

  • I have tested my parser changes locally with poetry run test_parser "zone_key"
  • I have run pnpx prettier@2 --write . and poetry run format in the top level directory to format my changes.

@amv213 amv213 requested a review from VIKTORVAV99 as a code owner May 16, 2024 11:58
@github-actions github-actions bot added parser python Pull requests that update Python code tests zone config Pull request or issue for zone configurations labels May 16, 2024
parsers/FO.py Show resolved Hide resolved
@VIKTORVAV99
Copy link
Member

Very nice!

I'll review this later today (and try and get through some of the other parser upgrades) 😅

parsers/FO.py Outdated

# historical API has a ~2:25h delay, so supplement with realtime API data if applicable
# note that this will still leave a gap, but at least it will provide a realtime signal
if target_datetime > production_breakdown_list.events[-1].datetime:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might not be obvious in the diff, but the contents of this block are essentially the old parser code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd try and split these into two functions, that both return a ProductionBreakdownList, one for the historical data and one for the realtime data.

Then you can return it with return ProductionBreakdownList.update(realtime, historical, logger).to_list() which will ensure the historical data is always the one that is used even if there for some reason was a delay in the realtime data or it stopped updating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points! I've updated it but I feel like it's still not quite what you were thinking of so it might need another pass 😅 (I've still kept the conditional because I wanted to make sure we are still making a call to the realtime API only if it's needed e.g. no point if requesting data from the previous week)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take another look as soon as possible.

@VIKTORVAV99
Copy link
Member

VIKTORVAV99 commented May 16, 2024

Some early feedback before the full review:
I looked at the returned time stamps for the live data and they are not representative for the full hour, i.e they can end in 13:03 for example.

Generally we want to avoid this as it makes overwriting the data more or less impossible but it was not really an option before. Now that it is an option I think we should just remove the old "live" parser code and just use the historical parser. Once we have historical data it will allow us to enable estimation for the zone which should cover any potential gap that might occur.

EDIT: Flooring the realtime data to the hour is also an option as it would later be corrected. Might be preferable depending on how much historical data is available.

@amv213
Copy link
Contributor Author

amv213 commented May 16, 2024

I see! Thanks for the background on how things work 🙌🏽

From my little experimentation with this new API I think that in general the data for the hour is published at hour + 2:25h, so I guess that technically it actually does not satisfy the parsers' data 'freshness' requirements (e.g. in the worst case scenario at e.g. 8 UTC the latest available historical data is 5 UTC) but I thought it could be useful to at least have this shared somewhere in case it can be useful / the max 3h delay is still good enough.

If I understood correctly if a parser doesn't return any data within the most recent two hours the zone is not coloured in the app? Flooring could be a nice trick to have it coloured while also having historical data! (and then with the example above there would just be the 6 UTC and 7 UTC datapoints missing, with the 8 UTC being the instantaneous one)

@VIKTORVAV99
Copy link
Member

I see! Thanks for the background on how things work 🙌🏽

From my little experimentation with this new API I think that in general the data for the hour is published at hour + 2:25h, so I guess that technically it actually does not satisfy the parsers' data 'freshness' requirements (e.g. in the worst case scenario at e.g. 8 UTC the latest available historical data is 5 UTC) but I thought it could be useful to at least have this shared somewhere in case it can be useful / the max 3h delay is still good enough.

If I understood correctly if a parser doesn't return any data within the most recent two hours the zone is not coloured in the app? Flooring could be a nice trick to have it coloured while also having historical data! (and then with the example above there would just be the 6 UTC and 7 UTC datapoints missing, with the 8 UTC being the instantaneous one)

There should be no data missing as the live data would populate the database continuously and then the historical data would override them when they match.

The freshness requirement is less of a requirement and more of a guideline now that we support historical data in the app. But it seems we can get both here.

Copy link
Member

@VIKTORVAV99 VIKTORVAV99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good! And is working as expected!

The earliest data seem to be from datetime.datetime(2021, 12, 31, 23, 0, tzinfo=datetime.timezone.utc) so let's keep the realtime parser as I don't think that is enough historical data to enable TSA estimations.

I have two small comments though to help with the readability and performance of the parser.

parsers/FO.py Outdated

# historical API has a ~2:25h delay, so supplement with realtime API data if applicable
# note that this will still leave a gap, but at least it will provide a realtime signal
if target_datetime > production_breakdown_list.events[-1].datetime:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd try and split these into two functions, that both return a ProductionBreakdownList, one for the historical data and one for the realtime data.

Then you can return it with return ProductionBreakdownList.update(realtime, historical, logger).to_list() which will ensure the historical data is always the one that is used even if there for some reason was a delay in the realtime data or it stopped updating.

parsers/FO.py Outdated Show resolved Hide resolved
@amv213 amv213 requested a review from VIKTORVAV99 May 17, 2024 07:57
Copy link
Member

@VIKTORVAV99 VIKTORVAV99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉

Thanks for doing this and sorry about the review delay, had some final exams in Uni this past week I needed to focus on, I hope you understand. 🙂

@VIKTORVAV99 VIKTORVAV99 enabled auto-merge (squash) June 5, 2024 07:51
@VIKTORVAV99 VIKTORVAV99 merged commit ce81282 into electricitymaps:master Jun 5, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser python Pull requests that update Python code tests zone config Pull request or issue for zone configurations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants