Edit task environment variables #503

mjpieters · 2024-02-26T16:55:28Z

A detailed description of the feature you would like to see added.

Currently, you can edit three aspects of a scheduled task: the command to be run, the path where the task is run and it's label.

Tasks have another component: the environment variables under which the command is executed. Can we have an option to edit these?

A stretch goal would be the ability to add, set or remove environment variables across all tasks or a task group. This lets you do things like set bandwidth limits on a whole group of tasks where the commands in those tasks honour a bandwidth limit set in an environment variable. Powerful stuff!

Explain your usecase of the requested feature

Loads of UNIX command change their behaviour based on the environment variables that are present when they start. Sometimes, we need to change what environment variables are present for a given task, because sometimes circumstances change or we made a mistake with the env vars when the task was added.

The ability to alter environment variables across all tasks is another powerful option. This may need to be a separate command (or even a separate FR!).

Alternatives

I've actively edited the command for tasks to add extra environment variables. This is a poor workaround however, as it is easy to accidentally alter the command itself.

Additional context

No response

Nukesor · 2024-02-27T21:53:38Z

It's a bit of an edge-case, but I definitely see how this is a good QoL feature.

I'm thinking about the best way to do this.
My first thought would be to set the key-values directly on the commandline, especially if one wants to set multiple values at the same time (e.g. pueue edit 0 -e KEY=value -e OTHER_KEY=othervalue).
The behavior will then be a bit inconsistent, but I feel like editing environment variables in an editor will be pretty cumbersome as there're usually a lot of variables.

Maybe we should add a new subcommand for this? This would also make it quite easy to edit variables for groups or multiple commands, as i feel like those flags would be very confusing on the current edit command.

mjpieters · 2024-03-01T16:46:46Z

Yes, the more I think about this, the more this sounds like a separate subcommand.

Nukesor · 2024-03-10T10:35:19Z

If you like, feel free to start hacking on this :) Your contributions have always been top notch and I believe that you'll make the right design decisions ;D

activatedgeek · 2024-04-04T06:12:18Z

Just wanted to chime in with another use case.

I currently schedule jobs on a machine with GPUs using pueue. Most times, I need multiple GPUs and set the CUDA_VISIBLE_DEVICES argument with a list of GPU IDs for a particular run. To get the queueing effect for that run jobs serially on one set of GPUs, I create a group with the default parallel 1 but manually attach the variable with a bash function,

function pugrun {
  GPUS=${GPUS:-0}

  pueue group add gpu-${GPUS}

  CUDA_VISIBLE_DEVICES=${GPUS} pueue add -g gpu-${GPUS} -- "${@}"

  unset GPUS
}

Editing/adding environment variables to groups would be a great addition! Thank you for the amazing project!

Nukesor · 2024-04-04T10:04:45Z

@activatedgeek It sounds like you could make good use of the https://github.com/Nukesor/pueue/wiki/Get-started#load-balancing feature. This would allow you to do automatic balancing on your gpus for you :)

activatedgeek · 2024-04-04T15:16:12Z

Thank you.

My understanding was that this behavior would require me to change existing code to parse out the PUEUE_WORKER_ID and transform it to appropriate values of CUDA_VISIBLE_DEVICES (e.g. something like a round robin over the list of GPUs). To avoid going back and change internal code plumbing, it was the easiest to just modify at pueue add time.

But of course, nothing too complicated since it can be similarly plugged in by instead of calling the GPU job directly, calling another bash script that does the transform to call to the GPU job.

Another tricky scenario with the example was this series of steps, say for group cluster1 with parallel 2 (assuming a machine with 2 GPUs with each job using 1 GPU in a round-robin manner):

Queue experiment 1 on cluster 1
Queue experiment 2 on cluster 1
Queue experiment 3 on cluster 1
Scheduled experiment 1 (PUEUE_WORKER_ID = 0 assigned to GPU 0)
Scheduled experiment 2 (PUEUE_WORKER_ID = 1 assigned to GPU 1)
Completed experiment 2
Scheduled experiment 3 (PUEUE_WORKER_ID = 2 assigned to GPU 0)

Now experiment 1 and experiment 3 both run on GPU 0, and in many of my cases with large models would lead to an out of memory error on GPU 0. Of course, this could be fixed with slightly smarter scheduling at the expense of more plumbing. This is why I took the easier route of limiting each group to schedule on a fix set of GPU IDs.

Let me know if the scenario makes sense, and if I am missing something simpler here.

Nukesor · 2024-04-04T17:14:53Z

My understanding was that this behavior would require me to change existing code to parse out the PUEUE_WORKER_ID and transform it to appropriate values of CUDA_VISIBLE_DEVICES (e.g. something like a round robin over the list of GPUs).

Exactly. That's how it was designed to be used, if the Ids weren't directly mappable to the external resources. (Except the round-robin)

Now experiment 1 and experiment 3 both run on GPU 0, and in many of my cases with large models would lead to an out of memory error on GPU 0. Of course, this could be fixed with slightly smarter scheduling at the expense of more plumbing. This is why I took the easier route of limiting each group to schedule on a fix set of GPU IDs.

That's actually not how it would work. In your example, the last line would look like this:

Completed experiment 2
Scheduled experiment 3 (PUEUE_WORKER_ID = 1 assigned to GPU 1)

The ID is not the task id, but rather the internal "worker" or rather "slot" id. A queue with 5 parallel tasks will get 5 slots with ids 0-4. As soon as a task in a queue finishes, its "slot" is freed and the next task then gets that slot.

If it doesn't work this way, this is definitely a bug! That system was specifically designed to handle your usecase. Machines/pools with a set number of limited resources, to which the tasks are then assigned in a greedy manner.

If the current wiki entry doesn't properly convey this behavior, it would be awesome if you could rephrase it! The wiki is free to be edited by anyone :)

activatedgeek · 2024-04-04T18:59:50Z

Ahaa! That makes much more sense. Thank you for clarifying.

Let me run a quick test simulating this scenario, and will report back. I'll update the wiki with this example so that the behavior is more explicit.

activatedgeek · 2024-04-08T21:25:41Z

Sorry for the delay. I have verified this behavior, and this actually solves my specific problem.

I also realized I missed this:

There'll never be two concurrent tasks with the same $PUEUE_WORKER_ID in the same group.

In that case, the wiki already explains it precisely, and I will skip the wiki edit.

Nukesor added t: Feature A new feature that needs implementation s: Pueue-lib This issue touches the pueue-lib library s: Daemon This issue touches pueue daemon s: Client This issue touches the pueue client labels Feb 29, 2024

Nukesor added the f: Help wanted label Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Edit task environment variables #503

Edit task environment variables #503

mjpieters commented Feb 26, 2024

Nukesor commented Feb 27, 2024 •

edited

mjpieters commented Mar 1, 2024

Nukesor commented Mar 10, 2024 •

edited

activatedgeek commented Apr 4, 2024

Nukesor commented Apr 4, 2024

activatedgeek commented Apr 4, 2024

Nukesor commented Apr 4, 2024 •

edited

activatedgeek commented Apr 4, 2024

activatedgeek commented Apr 8, 2024

Edit task environment variables #503

Edit task environment variables #503

Comments

mjpieters commented Feb 26, 2024

A detailed description of the feature you would like to see added.

Explain your usecase of the requested feature

Alternatives

Additional context

Nukesor commented Feb 27, 2024 • edited

mjpieters commented Mar 1, 2024

Nukesor commented Mar 10, 2024 • edited

activatedgeek commented Apr 4, 2024

Nukesor commented Apr 4, 2024

activatedgeek commented Apr 4, 2024

Nukesor commented Apr 4, 2024 • edited

activatedgeek commented Apr 4, 2024

activatedgeek commented Apr 8, 2024

Nukesor commented Feb 27, 2024 •

edited

Nukesor commented Mar 10, 2024 •

edited

Nukesor commented Apr 4, 2024 •

edited