Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edit task environment variables #503

Open
mjpieters opened this issue Feb 26, 2024 · 9 comments
Open

Edit task environment variables #503

mjpieters opened this issue Feb 26, 2024 · 9 comments
Labels
f: Help wanted s: Client This issue touches the pueue client s: Daemon This issue touches pueue daemon s: Pueue-lib This issue touches the pueue-lib library t: Feature A new feature that needs implementation

Comments

@mjpieters
Copy link
Contributor

A detailed description of the feature you would like to see added.

Currently, you can edit three aspects of a scheduled task: the command to be run, the path where the task is run and it's label.

Tasks have another component: the environment variables under which the command is executed. Can we have an option to edit these?

A stretch goal would be the ability to add, set or remove environment variables across all tasks or a task group. This lets you do things like set bandwidth limits on a whole group of tasks where the commands in those tasks honour a bandwidth limit set in an environment variable. Powerful stuff!

Explain your usecase of the requested feature

Loads of UNIX command change their behaviour based on the environment variables that are present when they start. Sometimes, we need to change what environment variables are present for a given task, because sometimes circumstances change or we made a mistake with the env vars when the task was added.

The ability to alter environment variables across all tasks is another powerful option. This may need to be a separate command (or even a separate FR!).

Alternatives

I've actively edited the command for tasks to add extra environment variables. This is a poor workaround however, as it is easy to accidentally alter the command itself.

Additional context

No response

@Nukesor
Copy link
Owner

Nukesor commented Feb 27, 2024

It's a bit of an edge-case, but I definitely see how this is a good QoL feature.

I'm thinking about the best way to do this.
My first thought would be to set the key-values directly on the commandline, especially if one wants to set multiple values at the same time (e.g. pueue edit 0 -e KEY=value -e OTHER_KEY=othervalue).
The behavior will then be a bit inconsistent, but I feel like editing environment variables in an editor will be pretty cumbersome as there're usually a lot of variables.

Maybe we should add a new subcommand for this? This would also make it quite easy to edit variables for groups or multiple commands, as i feel like those flags would be very confusing on the current edit command.

@Nukesor Nukesor added t: Feature A new feature that needs implementation s: Pueue-lib This issue touches the pueue-lib library s: Daemon This issue touches pueue daemon s: Client This issue touches the pueue client labels Feb 29, 2024
@mjpieters
Copy link
Contributor Author

Yes, the more I think about this, the more this sounds like a separate subcommand.

@Nukesor
Copy link
Owner

Nukesor commented Mar 10, 2024

If you like, feel free to start hacking on this :) Your contributions have always been top notch and I believe that you'll make the right design decisions ;D

@activatedgeek
Copy link

Just wanted to chime in with another use case.

I currently schedule jobs on a machine with GPUs using pueue. Most times, I need multiple GPUs and set the CUDA_VISIBLE_DEVICES argument with a list of GPU IDs for a particular run. To get the queueing effect for that run jobs serially on one set of GPUs, I create a group with the default parallel 1 but manually attach the variable with a bash function,

function pugrun {
  GPUS=${GPUS:-0}

  pueue group add gpu-${GPUS}

  CUDA_VISIBLE_DEVICES=${GPUS} pueue add -g gpu-${GPUS} -- "${@}"

  unset GPUS
}

Editing/adding environment variables to groups would be a great addition! Thank you for the amazing project!

@Nukesor
Copy link
Owner

Nukesor commented Apr 4, 2024

@activatedgeek It sounds like you could make good use of the https://github.com/Nukesor/pueue/wiki/Get-started#load-balancing feature. This would allow you to do automatic balancing on your gpus for you :)

@activatedgeek
Copy link

Thank you.

My understanding was that this behavior would require me to change existing code to parse out the PUEUE_WORKER_ID and transform it to appropriate values of CUDA_VISIBLE_DEVICES (e.g. something like a round robin over the list of GPUs). To avoid going back and change internal code plumbing, it was the easiest to just modify at pueue add time.

But of course, nothing too complicated since it can be similarly plugged in by instead of calling the GPU job directly, calling another bash script that does the transform to call to the GPU job.

Another tricky scenario with the example was this series of steps, say for group cluster1 with parallel 2 (assuming a machine with 2 GPUs with each job using 1 GPU in a round-robin manner):

Queue experiment 1 on cluster 1
Queue experiment 2 on cluster 1
Queue experiment 3 on cluster 1
Scheduled experiment 1 (PUEUE_WORKER_ID = 0 assigned to GPU 0)
Scheduled experiment 2 (PUEUE_WORKER_ID = 1 assigned to GPU 1)
Completed experiment 2
Scheduled experiment 3 (PUEUE_WORKER_ID = 2 assigned to GPU 0)

Now experiment 1 and experiment 3 both run on GPU 0, and in many of my cases with large models would lead to an out of memory error on GPU 0. Of course, this could be fixed with slightly smarter scheduling at the expense of more plumbing. This is why I took the easier route of limiting each group to schedule on a fix set of GPU IDs.

Let me know if the scenario makes sense, and if I am missing something simpler here.

@Nukesor
Copy link
Owner

Nukesor commented Apr 4, 2024

My understanding was that this behavior would require me to change existing code to parse out the PUEUE_WORKER_ID and transform it to appropriate values of CUDA_VISIBLE_DEVICES (e.g. something like a round robin over the list of GPUs).

Exactly. That's how it was designed to be used, if the Ids weren't directly mappable to the external resources. (Except the round-robin)

Now experiment 1 and experiment 3 both run on GPU 0, and in many of my cases with large models would lead to an out of memory error on GPU 0. Of course, this could be fixed with slightly smarter scheduling at the expense of more plumbing. This is why I took the easier route of limiting each group to schedule on a fix set of GPU IDs.

That's actually not how it would work. In your example, the last line would look like this:

Completed experiment 2
Scheduled experiment 3 (PUEUE_WORKER_ID = 1 assigned to GPU 1)

The ID is not the task id, but rather the internal "worker" or rather "slot" id. A queue with 5 parallel tasks will get 5 slots with ids 0-4. As soon as a task in a queue finishes, its "slot" is freed and the next task then gets that slot.

If it doesn't work this way, this is definitely a bug! That system was specifically designed to handle your usecase. Machines/pools with a set number of limited resources, to which the tasks are then assigned in a greedy manner.

If the current wiki entry doesn't properly convey this behavior, it would be awesome if you could rephrase it! The wiki is free to be edited by anyone :)

@activatedgeek
Copy link

Ahaa! That makes much more sense. Thank you for clarifying.

Let me run a quick test simulating this scenario, and will report back. I'll update the wiki with this example so that the behavior is more explicit.

@activatedgeek
Copy link

Sorry for the delay. I have verified this behavior, and this actually solves my specific problem.

I also realized I missed this:

There'll never be two concurrent tasks with the same $PUEUE_WORKER_ID in the same group.

In that case, the wiki already explains it precisely, and I will skip the wiki edit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
f: Help wanted s: Client This issue touches the pueue client s: Daemon This issue touches pueue daemon s: Pueue-lib This issue touches the pueue-lib library t: Feature A new feature that needs implementation
Projects
None yet
Development

No branches or pull requests

3 participants