Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infrastructure for nightly releases #3251

Open
agriyakhetarpal opened this issue Aug 8, 2023 · 5 comments · May be fixed by #3945
Open

Infrastructure for nightly releases #3251

agriyakhetarpal opened this issue Aug 8, 2023 · 5 comments · May be fixed by #3945
Assignees
Labels

Comments

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Aug 8, 2023

I found some resources for our proposed nightly builds and hosting solutions plus options as to how we can configure them to work with our existing release infrastructure.

  1. Anaconda provides artifact storage as a custom PyPI repository and offers 5 GB of storage on a free plan. The paid plans start from $9/month. Note: this method will require the creation of our own Anaconda organisation since the scientific-python repository is not open for packages outside its ecosystem. Some examples of this are
    a. NumPy and SciPy store their weekly wheels in this repository: https://anaconda.org/scipy-wheels-nightly/repo
    b. Some packages under the scientific-python umbrella (scikit-learn, pandas, matplotlib, xarray, et cetera) publish at https://anaconda.org/scientific-python-nightly-wheels/repo
    c. Anaconda’s own repository
    d. AstroPy used Azure earlier, now they use Anaconda.
    e. If we decide to use Anaconda, we can follow the Scientific Python SPEC-0004 which was specifically drafted for this
  2. Cloudsmith.io provides an expensive artifact repository priced at $89 per user per month.
  3. Google Cloud Artifact Registry provides hosting and storage for Python packages. This is an enterprise solution so it might be private-only, I am not sure if it can be used for public, open-source projects.
  4. A Homebrew formula using Ruby for PyBaMM (this one is a mostly undocumented and niche solution and does not have many resources to seek. Requires submission to and acceptance from homebrew-core. Here is a tutorial I found: https://til.simonwillison.net/homebrew/packaging-python-cli-for-homebrew). cookiecutter is an example of a Python package that has been packaged for Homebrew, since it is possible to install it with brew install cookiecutter. This solution would be available for Linux and macOS only, since Homebrew does not support Windows.
  5. Nexus Repository from Sonatype has support for many packaging formats and supports proxying and hosting for PyPI packages. This might also be a private-only solution.
  6. Artifactory also provides a super expensive artifact registry plan priced at $150 per user per month.
  7. PEP 503 describes how we can make our own PyPI repository for indexing. Anaconda is compliant with this specification. Some self-hosted solutions that are compatible with this comprise
    a. devpi
    b. pypiserver, and a tutorial by Linode to accompany it
    c. Artipie, an open-source PyPI package repository
    d. Pulp is another self-hosted package index that can be used as described in this blog by Red Hat
  8. I looked at GitHub Packages (https://github.com/features/packages) and found it to be pretty cool, but it doesn’t support Python packages yet and probably will not. There have been discussions around this in their roadmap earlier, but they closed the feature request, sadly: Packages: Python (PyPi) support github/roadmap#94, possibly because Microsoft owns GitHub; so they have more precedence over what the folks at GitHub end up doing—plus I assume that they would like to keep Azure as the standard for a Python package registry, rather than offer a competing standard.
  9. However, related to point 7, we could create a PEP-503 compliant pybamm-team/pybamm-nightly repository and push wheels to it daily. The size of a PyBaMM installation is around 160 MB, sourced from (libraries.io). We could write a workflow to delete releases older than 30 days so that the size of the repository remains limited. scientific-python has a workflow to do this in their Anaconda index. Though there is no real-time vulnerability detection in this case like how other commercial solutions provide, we have more control over our release infrastructure anyway and we can mitigate bad actors. It is easy to install a package from GitHub as well since we can use pip install git+ with the version from the release tag. There are many resources available, some of them I found are
    a. https://www.freecodecamp.org/news/how-to-use-github-as-a-pypi-server-1c3b0d07db2/
    b. https://medium.com/network-letters/using-github-as-a-private-python-package-index-server-798a6e1cfdef
    c. An example of a PyPI index hosted as a GitHub repository: https://github.com/astariul/github-hosted-pypi which works with GitHub Releases. The total size of cumulative GitHub Releases has no limits, but individual releases have to be below 2 GB each, which is many margins above our release size.
  10. The GitLab package registry supports PyPI packages. We could host a read-only mirror of PyBaMM on GitLab that gets updated with every commit to the develop and main branches. The better way would be to create a package registry there and write a GitLab CI pipeline to download artifacts from a GitHub Actions pipeline that uploads them, therefore starting a chain of CI/CD pipelines. Some resources (GitHub Actions in the marketplace, StackOverflow answers, and blogs) that might be useful in this case are listed below.
    a. https://github.com/marketplace/actions/trigger-gitlab-ci
    b. https://stackoverflow.com/questions/63308904/push-to-gitlab-with-access-token-using-github-actions
    c. https://github.com/marketplace/actions/trigger-gitlab-ci-through-webhooks
    d. https://github.com/marketplace/actions/trigger-gitlab-ci-job
    e. https://github.com/marketplace/actions/gitlab-pipeline-trigger
    f. https://dev.to/edersonbrilhante/gitlab-runners-as-a-service-with-github-action-149n
    g. https://www.anapaulagomes.me/2021/04/publishing-your-python-package-in-your-gitlab-package-registry/

We might need to ensure that the guide to downloading and using nightly releases is documented properly and warn unsuspecting users from using them. An edge case to take care of is that pip does not fall back to using PyPI to download via the --extra-index flag if a package is not found on the custom index which is a common modus operandi for dependency hijacking attacks. Source: https://discuss.python.org/t/advice-to-avoid-extra-index-url-to-install-private-packages-from-gitlab-ci/18242/11

@agriyakhetarpal
Copy link
Member Author

I think either Anaconda or GitHub Releases would be the best methods overall. Both of them can be integrated with the existing release infrastructure very well

@valentinsulzer
Copy link
Member

We are below the 5Gb limit so we can try the Anaconda free plan https://pypi.org/project/pybamm/#files

@lskillen
Copy link

lskillen commented Aug 9, 2023

Hello! I work at Cloudsmith. :-) Small correction is that it's $89 flat pcm, not per user, at Cloudsmith.

Happy to help with questions!

@agriyakhetarpal
Copy link
Member Author

Hi there @lskillen, does Cloudsmith's artifact management solution offer a free plan for technical open-source scientific projects like PyBaMM? The reason I ask this is because I found this resource in the Cloudsmith guides: https://help.cloudsmith.io/docs/open-source-hosting-policy, it would be great if we do qualify as an apposite project!

@lskillen
Copy link

Hi there @lskillen, does Cloudsmith's artifact management solution offer a free plan for technical open-source scientific projects like PyBaMM? The reason I ask this is because I found this resource in the Cloudsmith guides: https://help.cloudsmith.io/docs/open-source-hosting-policy, it would be great if we do qualify as an apposite project!

If it's open-source, I don't see why? 😁 Generally we don't require pre-approval for OSS projects, just signup, create an OSS repository, add a license, and away you go. All we require is an attribution link to say we're providing it for you. Approval is only needed is you start to use significantly more bandwidth (as mentioned in the doc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants