Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for incremental migration of conda environments #292

Open
2 tasks done
Wh1isper opened this issue Oct 20, 2023 · 0 comments
Open
2 tasks done

Support for incremental migration of conda environments #292

Wh1isper opened this issue Oct 20, 2023 · 0 comments
Labels
type::feature request for a new feature or capability

Comments

@Wh1isper
Copy link

Checklist

  • I added a descriptive title
  • I searched open requests and couldn't find a duplicate

What is the idea?

  • Incremental Packaging Support in conda-pack
  • One new command that can merge two conda environments with the same Python version

Why is this needed?

Scenario: In a unified container environment, a user uses a pre-built conda environment for development tasks, and for a variety of reasons, the user may make changes to packages in the current conda environment (add/remove/update...) . We want to deploy the user's code with the modified environment.

Status quo: it is easy to package existing environments using conda-pack, but the problem is that when users are working on scientific computing tasks, they often introduce a large number of bulky libraries whose conda env may be tens of gigabytes in size, and transferring this environment to the deployment environment (which may have multiple nodes), or packaging this environment as a container image significantly raises the storage usage.

Benefit: With incremental packages, we only need to transfer the incremental packages, not the entire conda environment, and we can directly update the conda environment inside the container, which is similar to git merge!

Alternative: Download packages over the network, but internet connections may be regulated in some organizations. Save the user's image (docker commit), but this may contain more useless information.

What should happen?

Input:

  • List of excluded Conda and pip packages, including their versions. (In the above scenario, it is the package and its version of the container environment that is provided to the user)

Output:

  • A archive containing packages that are not in the excluede, or are different from the exclude version

Incremental migration: Remove old and install new

  • Remove duplicate packages from archive in new environment
  • Extract the packages from archive to the new environment

Exception:

  • When the two python versions are not the same, raise an error

Example:

Packaging an incremental archive

conda-pack -n debug-env --exclude-conda environment.yaml --exclude-pip requirements.txt

Incremental installation

conda-pack migration -n deploy-env debug-env.tar.gz

Additional Context

I understand that this goal may be a bit different from the original goal of the conda pack, and I've found that a lot of the code in the conda pack can be reused.

I'd like to know if conda pack is interested in adding this feature, and if so, I can make a pr for it.

@Wh1isper Wh1isper added the type::feature request for a new feature or capability label Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type::feature request for a new feature or capability
Projects
Status: 🆕 New
Development

No branches or pull requests

1 participant