Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pysteps 2.0 #216

Open
ladc opened this issue Jul 23, 2021 · 10 comments
Open

pysteps 2.0 #216

ladc opened this issue Jul 23, 2021 · 10 comments

Comments

@ladc
Copy link
Contributor

ladc commented Jul 23, 2021

During a meeting with @dnerini, @loforest and @wdewettin, the suggestion was made to start working towards a new version of pysteps.

  1. The initial motivation is the implementation of the blending with NWP (Implement STEPS blending scheme #212) which is likely to require some API-breaking changes.
  2. Secondly, a move towards a new data model based on xarray would simplify the existing code as well as the future implementation of the blending. This was suggested before, see Implement a data model #12 and this gist by @kmuehlbauer. This move might also simplify Implement importer for NWP (netcdf) data #215 and NetCDF exporter for nowcasts #5 since the self-describing structure of xarrays is close to the netCDF format.
  3. Implementation of @dnerini's Kalman Filter would be nice to have.
  4. The interface should be generic enough to allow for machine learning nowcasting and post-processing methods, which require multiple predictors. Maybe OO or duck typing can be useful here. The interface to perform the nowcasts could have a generic fit/predict/update structure.
  5. Logging can also be improved: Use logging instead of print #140
  6. Achieve 90% test coverage (Add tests to pysteps modules #32)
@dnerini dnerini added this to To do in pysteps-v2 Jul 24, 2021
@dnerini dnerini moved this from To do to In progress in pysteps-v2 Jul 24, 2021
@dnerini
Copy link
Member

dnerini commented Jul 25, 2021

At this point, it would be interesting to hear the opinion of the other core contributors @pulkkins @aperezhortal @cvelascof @RubenImhoff and in general the whole pysteps community.

Also, the list above is by no means exhaustive nor fixed. Version 2 will be the opportunity to introduce substantial changes based on our experience so far: we need your inputs! For this purpose, I created a new project where to list and track issues and PRs related to this effort.

In terms of roadmap, I'd suggest the following:

Phase 1: up to end of October 2021(*):

  • master contains pysteps-v1
  • version 1.5 is released, after which no more PRs into v1 except for bug fixes (only patches are released)
  • setup new development branch for pysteps-v2 (alpha), features branches are forked and merged into this v2 branch

Goal: pysteps-v2 is published as a pre-release (beta)

Phase 2: up to September 2022 (ERAD):

  • previous master with v1 is renamed to pysteps-v1
  • v1 will release patches only
  • development branch pysteps-v2 is renamed master and becomes default branch
  • submit abstract to ERAD2022

Goal: consolidate v2, which becomes default version.

Phase 3:

  • pysteps v2 is stable
  • pysteps v1 is discontinued (no more support, no more releases of any kind)
  • paper on v2
  • next focus: machine/deep learning methods

Goal: close project, publish results.

(*) This deadline is given by time constrains on the RMI side, which has the lead on this initial phase of the project (@ladc )

@RubenImhoff
Copy link
Contributor

Hi everyone,

Unfortunately I couldn't join the last meeting due to a week of vacation, but this looks really good! It is great to see how our initial idea to pick up blending again is a reason to move forward to a v2 of pysteps!

The present ideas are what I was thinking of as well, so not much to add at this point. I think that during my stay at RMI in August, we will come up with some (additional) ideas. The new importer that allows for importing radar and NWP data, hopefully allowing for some sort of up-/downscaling in space and time to match the resolution of the radar and NWP data, will be one of the challenges here. I think if we can make that as generic as possible, we already have a super useful tool (let alone for the blending and other functionalities that we will add). So, a great move forward!

In a week from now, I can devote most of my time to the project, so we will keep in touch!

@dnerini dnerini pinned this issue Jul 25, 2021
@aperezhortal
Copy link
Member

Hi everyone,

Version 2 sounds great 💪

This new version maybe is a good opportunity to change the pystepsrc parameter file format. Although the JSON configuration file does the job, it has a few disadvantages. A better, and more readable, alternative is the YAML format that I personally like a lot.

Also, this is a good opportunity to revisit the interface as @ladc mentioned in point 4. For example, using an interface like Scikit-learn allows a better integration of PySTEPS with other ML libraries (pytorch, keras, scikit-learn, etc.).

This was referenced Jul 26, 2021
@pulkkins
Copy link
Member

pulkkins commented Jul 27, 2021

Hi everyone,

I was also on a vacation, so I could not join the last meeting.

The points listed by @dnerini are very good and clearly show the limitations of the current version. It should be noted that its main goal was to implement a Python version of STEPS. As a result, the codebase is not generic enough to contain all the planned features (or even the currently implemented ones). The design was also a bit rushed since we needed a minimal working version for ERAD 2018. For implementing more advanced features like blending (item 1), a new data model (item 2) or a more generic interface compatible with machine learning methods (item 4), a major redesign is the only way to go.

@fox91
Copy link

fox91 commented Jul 28, 2021

Hi everyone,
I don't know if the topic has already come up but I would suggest splitting the library into multiple components (eg: core, visualization, etc.) so that you can only install what you need (eg: downscaling) and avoid what you don't need (eg: eg: visualizattion) and all related dependencies (eg: matplotlib).

I say this for all those like me who find themselves installing libraries in AWS Lambda (and similar) where space is precious and therefore it is very convenient to be able to install only what you need without having to manually delete files and folders after installation.

@aperezhortal
Copy link
Member

aperezhortal commented Jul 28, 2021

Hi @fox91, thanks for the suggestion. I like the idea.

I did a quick search and it seems that maybe we can do that using something like Namespace packages.

@fox91
Copy link

fox91 commented Jul 28, 2021

Hi @fox91, thanks for the suggestion. I like the idea.

I did a quick search and it seems that maybe we can do that using something like Namespace packages.

I don't know how is implemented but I like the way I can install submodules in boto3-stubs.

@dnerini
Copy link
Member

dnerini commented Jul 28, 2021

Hi @fox91

In a way, we already partially support the same idea by using optional dependencies.
What is still missing is the option to install a particular set of dependencies depending on the needs.

For example, pip install pysteps should only install the core dependencies (and I agree that matplotlib should not be part of it), while pip install pysteps[visualization] could come with matplotlib and pip install pysteps[develop] would come with black, pytest and other.

This should be easily achievable with the extra_requires option in the setup.cfg file.

Or do you have something different in mind?

(ps: I suggest we open a dedicated issue and continue the discussion there?)

@cvelascof
Copy link
Member

cvelascof commented Aug 11, 2021

I love the idea of using xarray for pySTEPSv2.
However wondering if we can agree the process we will follow with the conversion of original pySTEPS rutines into xarray.
The possible options that I can see are:

  • Modify the contents of each routine to work with both xarray and legacy and keep the name of the routine.
  • To create new versions of each routine solely using xarray and use different names for the new routines.
  • To create wraps (pipe) of the original routines using xarray as it was proposed in this gist by @kmuehlbauer.
  • A combination of all above.

What are your thoughts?

@dnerini
Copy link
Member

dnerini commented Aug 11, 2021

All very relevant points, thanks @cvelascof for raising them. This is my personal opinion:

I see this as an iterative process, where version 2.0.0 should include the changes necessary to provide support to xarray across the whole library. This means that all methods need to be able to use and return xarray objects instead of numpy arrays as it is currently done. meaning:

  • no more ndarray, metadata dichotomy
  • all methods that apply to precipitation fields ingest and output a xr.DataArray object that includes x and y coordinates as well as timestamps when needed
  • deprecate utility methods that are already part of the xarray library (for example the utils.dimension module)

How this changes will be implemented in practice will depend a lot on the individual methods. My impressions is that most methods would benefit from a more in-depth refactoring leveraging the advanced functionalities of xarray, but I'm also conscious that this would represent a considerable effort. As a compromise, we could adapt the most complex methods (as for example the existing nowcasting methods) so that the input xarray object is simply converted into a numpy array at the beginning of the function, and back to an xarray at the very end (I guess this would correspond more or less to the wrapping option that you mention above).

Because the switch to v2, we can introduce breaking changes, including renaming methods and modules. So I wouldn't worry too much in terms of back-compatibility, although it makes sense to try and keep the general structure as close as possible to v1.

There will be plenty of details to sort out, but I think we can discuss and solve them within the relevant PRs as we proceed with the refactoring (see #223 for a practical example).

Edit: just to make it clearer, I don't think we want to create new versions of each routine (your option 2), that would just be very confusing. Personally, I see pysteps V2 working with xarray objects only (at least in terms of interfaces). This means discarding your options 1 and 2, and going for option 3 for methods that are too expensive to refactor, while applying a more in-depth refactoring where possible.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
pysteps-v2
In progress
Development

No branches or pull requests

8 participants