Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repository structure (file layouts, cookiecutter templating engines, distribution options) #1

Open
agriyakhetarpal opened this issue Jul 28, 2023 · 13 comments
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed infrastructure Issues that relate to the infrastructure of the repository or the template provided through it

Comments

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Jul 28, 2023

Starting this as a placeholder issue for tracking down tasks to be completed and those that are complete. I will be dividing these into separate issues and PRs

Cookiecutters

  1. https://github.com/cookieninja-generator/cookieninja
  2. https://github.com/cookiecutter/cookiecutter

as suggested by @Saransh-cpp

Examples

  1. https://github.com/osl-incubator/scicookie
  2. https://github.com/scientific-python/cookie
  3. https://github.com/copier-org/copier
  4. https://github.com/cruft/cruft/
  5. Cookiecutter Data Science (drivendata.github.io)

Suggestions from @brosaplanella:

  1. "A bit different, but Julia has DrWatson.jl which has many cool features, maybe we can get some ideas".
  2. A figures folder (with a .gitkeep)
  3. A data folder (later we could have some pipelines to process it, like the Data Science example above).

Possible layout

The folder structure can look like this

├── .github/workflows
├── src
├── data
└── docs
      ├── examples  # notebooks that can be rendered with nbsphinx
      ├── _static
      ├── sphinxext
      ├── source
      └── conf.py
├── tests  # (optional)
├── parameters
    └── my_parameters.py  # contains the get_parameter_values() function
├── examples/  # alternatively, example scripts or notebooks that are not to be rendered with the Sphinx builder
├── pyproject.toml
├── README.md
├── .pre-commit-config.yaml
├── .readthedocs.yaml
├── .env # contains DATA_PATH
├── noxfile.py  # (or tox.ini, if users want to use tox)

The required documentation should

  • Explain how to clone this template to start a new project
  • Explain how to rename the project in pyproject.toml and docs/conf.py
  • Explain how to structure a typical project with source files and utility classes and methods (in src/), unit tests with
  • Parameter entry points in pyproject.toml, with
[project.entry-points.pybamm_parameter_sets]
MyParameters = "package_name.parameters.my_parameters:get_parameter_values"
  • Point to the PyBaMM documentation wherein the developments and advancements in this repository shall be reflected on a separate page or section in the user guide

which can then be accessed as pybamm.ParameterValues(“MyParameters”) in the source code.

Tracked in #6.

Configuration options

Build-backends

  1. hatch
  2. flit
  3. poetry
  4. setuptools (later)

Documentation

  1. Theme
  2. Sphinx extensions

Project structure

  1. Project metadata in pyproject.toml
  2. and so on

Available licenses (#2)

  1. MIT
  2. Apache-2.0
  3. BSD-3-Clause
  4. and so on (should be permissive licenses suitable for collaborative research practices and open science)

Addendum 27/02/2024: another thing we would want would be entry points for models in the PyBaMM model structure rather than just parameter sets, please see pybamm-team/PyBaMM#3839 (comment)

@agriyakhetarpal agriyakhetarpal pinned this issue Jul 28, 2023
@agriyakhetarpal agriyakhetarpal added documentation Improvements or additions to documentation help wanted Extra attention is needed infrastructure Issues that relate to the infrastructure of the repository or the template provided through it labels Jul 28, 2023
@agriyakhetarpal agriyakhetarpal changed the title Repository template (file layouts, cookiecutter templating engines, distribution options) Repository structure (file layouts, cookiecutter templating engines, distribution options) Jul 28, 2023
@agriyakhetarpal
Copy link
Member Author

So I am not sure how this will go yet, though I learned that cruft retains full compatibility with templates based on cookiecutter, but copier has some differences (it uses a YAML file instead of JSON for the project specification).

However, scientific-python/cookie supports all three of them, so I think our use case as a stripped-down, barebones version of it can also support all three—unless we don't need to support all three and just cookiecutter and cruft will be enough

@Saransh-cpp
Copy link
Member

Thanks for summarising this! Don't worry about supporting a lot of things at the start. We can start with a simple structure, one backend (hatch) support, and just cookiecutter support.

@valentinsulzer
Copy link
Member

Rather than having a data folder, we should encourage separation of code and data with data path specified via the .env file. People are welcome to keep their code and data in the same place, but the data should ideally not be updated with the code to github, except for some examples.

If the data path is set as DATA_PATH="path/to/data" in .env, then the following code will load it

from dotenv import load_dotenv
load_dotenv()

path_to_data = os.environ["DATA_PATH"]

We could add that as one of the default utility functions in the src folder, e.g. in util.py

from dotenv import load_dotenv
load_dotenv()

def environ():
    return os.environ

@agriyakhetarpal
Copy link
Member Author

What sorts of data would DATA_PATH contain ideally? We can further streamline the process of using it with some extra utility functions based on that too

@agriyakhetarpal
Copy link
Member Author

Also, I renamed the project from pybamm-cookie-cutter to pybamm-cookiecutter because I saw that most templates with the cookiecutter topic on GitHub were named as such, i.e., without the space between "cookie" and "cutter"

@valentinsulzer
Copy link
Member

What sorts of data would DATA_PATH contain ideally? We can further streamline the process of using it with some extra utility functions based on that too

Probably csv or parquet

@agriyakhetarpal
Copy link
Member Author

I think csv and parquet files would be nice, we would have to use pandas as a dependency in that case (or just get it from the optional dependencies after pybamm-team/PyBaMM#3144 is merged)

A utility function for them could be something like

from pybamm_cookiecutter.util import DataLoader
import pybamm

battery_data = DataLoader.load_data("file1.csv")

In other words, as a wrapper over a combination of load_dotenv and pandas.read_csv() with some customisation here and there

@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented Aug 1, 2023

See also: https://learn.scientific-python.org/development/patterns/data-files/. We could adopt pooch within PyBaMM too, especially for the SuiteSparse and SUNDIALS downloadables in scripts/install_KLU_Sundials.py

@valentinsulzer
Copy link
Member

Adding something else to this roadmap, it would be nice if we could add new models via entry points as well. This requires a few changes in PyBaMM though

@agriyakhetarpal
Copy link
Member Author

Adding a model via an entry point sounds like a nice idea, but it could be too excessive as well if it isn't done correctly. Do you have a proof-of-concept – I'm not entirely sure how it would go?

@valentinsulzer
Copy link
Member

The author would create a model class similar to pybamm's BasicDFN where it's entirely self-contained, then anyone else would be able to call the model with something like pybamm.lithium_ion.Model("author/model-name").

This would solve several existing pain points with adding new models:

  • PyBaMM’s submodel structure is too complex, huge barrier to entry
  • You have to add things to the PyBaMM repo, uncertain ownership and IP
  • PyBaMM team has to "endorse" every model or gatekeep

With entry points, adding a new model is separate from PyBaMM and authors get to retain ownership but we don't have to endorse the models

@agriyakhetarpal
Copy link
Member Author

Ah, sounds great – a bootstrapped model should be possible to implement, and IIUC should work similar to how we do parameter sets; though I would like to note that parameter sets are returned as Python dictionaries so it's easier to handle them, here we might have to establish a class that can either parse the AST for a model (or rather just import a JSON-serialised model) to pass it to pybamm.lithium_ion.Model("author/model-name"). This might be better to do in the PyBaMM source code, as you mentioned.

This issue has been referenced in the GSoC 2024 ideas page for potential readers and contributors, so if and when we flesh out these ideas a bit more, I suggest we should edit and add everything to the top of the thread as well.

@santacodes
Copy link

Rather than having a data folder, we should encourage separation of code and data with data path specified via the .env file. People are welcome to keep their code and data in the same place, but the data should ideally not be updated with the code to github, except for some examples.

I guess now with the pooch PR merged we could add the default pooch data files path for storing data here as well, which is under .cache for POSIX, and under %appdata% for windows machines. That way, we could use pybamm.DataLoader to load data files inside PyBaMM based projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed infrastructure Issues that relate to the infrastructure of the repository or the template provided through it
Projects
None yet
Development

No branches or pull requests

4 participants