Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

columnflow setup for slurm cluster #418

Open
JohanWulff opened this issue Apr 16, 2024 · 7 comments
Open

columnflow setup for slurm cluster #418

JohanWulff opened this issue Apr 16, 2024 · 7 comments

Comments

@JohanWulff
Copy link

Dear Columnflow Developers,

I am trying to run the hh2bbtautau analysis using a slurm scheduler that connects to a cluster at my institute. When running tasks with the --workflow slurm parameter set, I run into this error: sbatch: error: invalid partition specified: cms-uhh. The corresponding job script contains the following line causing this:

#SBATCH --partition=cms-uhh

I cannot find where this parameter is set in columnflow. This partition is clearly uhh-specific and I'd have to change it but I cannot find where. Any pointer would be appreciated.

@pkausw
Copy link
Member

pkausw commented Apr 16, 2024

Thanks for raising this issue @JohanWulff ! The remote workflows are managed by law, so I'm tagging @riga as well. Have you checked your law installation as well when you looked for the uhh specific line?

@JohanWulff
Copy link
Author

JohanWulff commented Apr 16, 2024

thanks @pkausw! thanks to @kramerto, who already kindly pointed out the lines where this variable is being set:
columnflow/setup.sh & columnflow/columnflow/tasks/framework/remote.py.
I'm wondering though if this might be something that should be included in the list of variables the user sets when running setup.sh for the first time?

@pkausw
Copy link
Member

pkausw commented Apr 16, 2024

Ah very good! The code actually looks like you can overwrite this by setting the appropiate environment variables, see

columnflow/setup.sh

Lines 326 to 328 in c8516a5

export CF_HTCONDOR_FLAVOR="${CF_HTCONDOR_FLAVOR:-${cf_htcondor_flavor_default}}"
export CF_SLURM_FLAVOR="${CF_SLURM_FLAVOR:-${cf_slurm_flavor_default}}"
export CF_SLURM_PARTITION="${CF_SLURM_PARTITION:-${cf_slurm_partition_default}}"
. The one you are after would be CF_SLURM_PARTITION.

You can add this to your shell script in .setups/YOURSETUPNAME.sh so it will always be source automatically when you source your analysis setup script. Can you try this?

@JohanWulff
Copy link
Author

thanks @pkausw -- the overriding of the variable works in principle. I'm stuck with a slurm-related submit error but that doesn't have anything to do with columnflow anymore. Looking at the default of cf_slurm_flavor which is maxwell, I was also under the impression I'd have to change this since I obv. don't have access to maxwell from my institute. Changing that causes law to complain though, saying that my choice of cf_slurm_flavor is not a valid choice from {maxwell}. Is this desired? This happens because the possible choices for this parameter are defined in columnflow/columnflow/tasks/framework/remote.py as: ("maxwell")

@pkausw
Copy link
Member

pkausw commented Apr 16, 2024

Hm, that is indeed a shame :/ What happens if you add your choice of slurm flavor to the list? I'm not entirely sure how law will react to this of course, but might be worth trying nonetheless

@pkausw
Copy link
Member

pkausw commented Apr 24, 2024

Hi @JohanWulff , are there any updated regarding this? Were you able to resolve the issue?

If yes, we might want to think about remove the choices for this particular parameter or at least ammend the list of supported options. It would be very interesting what you added to the choices to move on with this!

@JohanWulff
Copy link
Author

Not yet sorry. I'll try to understand how the submission is handled here at our scheduler to be able to test it properly and close this issue soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants