Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split ArviZ modules into separate packages #2088

Open
OriolAbril opened this issue Aug 8, 2022 · 2 comments
Open

Split ArviZ modules into separate packages #2088

OriolAbril opened this issue Aug 8, 2022 · 2 comments
Assignees

Comments

@OriolAbril
Copy link
Member

The 3 modules inside ArviZ are already quite independent, it would be good to divide them into smaller packages, and in the process clean up the dependency handling.

  • Create an arviz-data, inferencedata or arviz-converters package. It should contain only the InferenceData base library (or maybe we could use the opportunity to change to DataTree Track DataTree progress #2015) and package the converters, maybe iterators like the ones on sel utils? and little more. The main pro for this library would be to keep it minimal to make it as easy as possible for other libraries to depend on this. i.e. both netcdf and zarr should be optional but not required dependencies. It would need to depend on xarray (therefore also numpy and pandas) but not even scipy would be needed, much less matplotlib.
  • Create an arviz-stats, arviz-diagnostics or arviz-compute package with the stats module and general utilities (i.e. the labeller classes are used mostly in plots but also in summary, so it would go here). It would depend on the library above plus scipy and xarray-einstats, probably nothing else
  • Create an arviz-plots. Depends on the two above and has the plots module, both matplotlib and bokeh would be optional.

Things for consideration:

  • I think keeping an arviz library even if it is only a metalibrary that installs and imports the ones above would be much more friendly to the average user, not sure about the dependencies though, should that library continue to depend on netcdf and matplotlib as defaults for example?

Useful references:

  • I used https://github.com/astrojuanlu/cookiecutter-pylib as template from which to generate xarray-einstats. It made the whole process quite easy and fast. I basically only had to set up codecov, remove mypy and change flake for pylint. It set up all the building infrastructure, testing locally and CI with github actions, readthedocs, black, isort, pydocstyle. It could be useful for this.

Extra notes:

@michaelosthege
Copy link
Contributor

For the datastructures package what do you think about adopting the protobuf-generated datastructures? (e.g. meta.protometa.py)

This would formalize a PPL-agnostic specification of the metadata structures that will end up in storage structures such as InferenceData.
And you could generate the Julia code or even C++ code for Stan.

@OriolAbril
Copy link
Member Author

More specific proposal to start working on this:

arviz-base

Library with base ArviZ functionality that is common between plots, stats and converters. i.e. converters themselves, xarray_to_sel_iter and similars, labellers, var_names filtering utilities, rcparams. I would try using DataTree instead of InferenceData to see if we can reduce the codebase.

Dependencies: numpy, xarray, datatree
Optional dependencies: h5netcdf and/or netcdf4, zarr, ujson

arviz-stats

Library with stats and diagnostics that are specific to bayesian modeling/mcmc (more basic things are already in xarray-einstats for example).

Dependencies: arviz-base, numpy, xarray, datatree, scipy, xarray-einstats, pandas
Optional dependencies: numba, dask

xrtist

General functionality for facetting and aesthetics mapping on xarray objects. (note: might end up being only a module in arviz-plots)

Dependencies: numpy, xarray, datatree
Optional dependencies: bokeh, matplotlib

arviz-plots

Library with "plug and play" plotting functions similar to existing ones as well as lower level building blocks to ease creation of custom plots when combined with xrtist.

Dependencies: arviz-stats
Optional dependencies: matplotlib, bokeh

arviz

It would continue to exist as more of a meta-library, that imports all the previous ones and exposes them through a common namespace.

@OriolAbril OriolAbril self-assigned this Apr 14, 2023
@OriolAbril OriolAbril pinned this issue Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Assigned
Development

No branches or pull requests

2 participants