Skip to content

v1.7.0

Compare
Choose a tag to compare
@kevin-wangzefeng kevin-wangzefeng released this 08 Jan 16:18
· 645 commits to master since this release
1933d46

What's New

Enhanced Plugin for PyTorch Jobs

As one of the most popular AI frameworks, PyTorch has been widely used in deep learning fields such as computer vision and natural language processing. More and more users turn to Kubernetes to run PyTorch in containers for higher resource utilization and parallel processing efficiency.

Volcano 1.7 enhanced the plugin for PyTorch Jobs, freeing you from the manual configuration of container ports, MASTER_ADDR, MASTER_PORT, WORLD_SIZE, and RANK environment variables.

Other enhanced plugins include those for TensorFlow, MPI, and PyTorch Jobs. They are designed to help you run computing jobs on desired training frameworks with ease.

Volcano also provides an extended development framework for you to tailor Job plugins to your needs.

Refer to the links for more details. (#2313, @ccchenjiahuan)

Ray on Volcano

Ray is a unified framework for extending AI and Python applications. It can run on any machine, cluster, cloud, and Kubernetes cluster. Its community and ecosystem are growing steadily.

As machine learning workloads are hosting computing jobs at a density higher than ever before, single-node environments are failing in providing enough resources for training tasks. Here's where Ray comes in, which seamlessly coordinates resources of the entire cluster, instead of a single node, to run the same set of code. Ray is designed for common scenarios and any type of workloads.

For users running multiple types of Jobs, Volcano partners with Ray to provide high-performance batch scheduling. Ray on Volcano has been released in KubeRay 0.4.

Refer to the links for more details. (#2601(#755) @tgaddair)

Enhance Scheduling for Kubernetes long-running services

This enhancement makes Volcano fully compatible with the Kubernetes default scheduler for long-running services. With this enhancement, users can use Volcano to uniformly schedule long-running services and batch workloads in a single cluster.

Refer to the links for more details:

Support Kubernetes v1.25

This feature is designed to make Volcano compatible with Kubernetes 1.25.

Refer to the links for more details. (#2533, @wangyang0616)

Support multi-arch images for Volcano

This feature is designed to cross-compile volcano images of different architectures. For example, compile an image for the ARM64 architecture on an AMD64 machine.

Refer to the links for more details.(#2435, @ccchenjiahuan)

Optimize Queue Status Information

This feature is designed to enrich the information of the queue. Through this function, users can view the resource allocation of queues in real time, which is convenient for administrators to dynamically plan resources.

Refer to the links for more details.(#2592, @jiangkaihua)

Other Notable Changes

Bug Fixes