Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update JSON for Sherlockv1, set JSON structure, and upload a SLURM-to-JSON script #4

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

akkornel
Copy link

This PR does a number of things:

• Updates machines.json with real information from Sherlock v1.
• Uploads a script to generate "intermediate JSON" from a SLURM cluster, with instructions on how to combine multiple intermediates into a single machines.json file.

Unless my understanding of SLURM bits has changed, I think the basic JSON structure is set. The only stuff really missing is information on which users can access which partitions. Right now, only globally-accessible partitions are specified, for the * user.

slurm2json reads a SLURM cluster's `slurm.conf` file, and also calls out to
SLURM's `sacctmgr`, to generate a JSON file of information about the
cluster's partitions, QOSes, features, and user access.

slurm2json creates an "intermediate" JSON file, which will need
additional processing to output the `machines.json` file needed by
job-maker.
This covers Sherlock v1.  Sherlock v2 is TBD.
@vsoch
Copy link
Member

vsoch commented May 16, 2017

This is really a great! A few small requests:

Not a dependency

  • Right now, this is one massive batch script that runs from start to finish, with no modularity. It shouldn't be a requirement for the user to have to use this script to generate data for the web interface - it should be something that is as easy as writing a manual text file, and then the application handles the rest. This is the approach I would take:

  • have a folder of (some format, yml or json is fine) individual (cluster specific, or even a level deeper to partition) files that the application knows to build when it's pushed to github.

  • this means that the user can manually generate these files, or (optionally), can use this extra provided perl tool, given access to the correct command line utils and permissions.

We can't make the assumption that the user (an admin of some sort) has access to all these tools, is familiar possibly with perl and modules, and wants to generate one huge file at once.

Modular

So the second request is to make this script more of an optional module. This means a few things:

Organization

I would suggest a folder called modules or helpers with a separate README to use the tool (if they desire) to generate the equivalent data.

Containers

Further, the module should come as a container. Docker and Singularity. The user could install all the stuff manually, but really they probably would find it easier to run a container.

Functions

Right now it's one massive perl dump, and I don't see functions or anything. Could we have functions instead? If you are not averse, I'd also like to have (and/or) python. Perl isn't hugely used in scientific programming.

In a nutshell, I would want to be able to use the tool, OR just edit some files manually, push the pages, and have it work, without needing to install perl modules / dependencies or even use this tool at all. There are severeal approaches that (from the client / javascript side) compiles individual files from a folder into the machines.json upon build, and that's the direction we want to go to. If you want to start by giving a go at python, or making the perl more modular (and teaching me how to test and run) we can start from there! Thanks for helping with this!

@vsoch
Copy link
Member

vsoch commented May 16, 2017

btw @akkornel, the sheer amount of code that you produced for this PR in a short amount of time is nothing short of astounding! I think it could be a really good learning opportunity for me to pick up some perl too, what say you? What is the best environment / interpreter for testing things?

I'm going to start dinner and just stare at it for a while, lol :#)

@akkornel
Copy link
Author

Woah, lots of text! Here are my replies to the various sections:

Not a Dependency

To be honest, I don't think a cluster admin (the intended audience for this tool) should need to use this tool to get the webapp up and working in a minimal mode. It should be enough, I think, for job-maker to provide a basic example JSON file, as well as documentation on the minimum amount of machines.json that the webapp needs in order to work.

For example, I think the minimum would be…

  • A list of features, if the cluster has any.
  • A list of partitions (there should be at least one).
  • A list of QOSes (I think there should be at least one).
  • An entry for user *, saying that user * (that is, all users) has access to the listed partitions.

With just those options, it should be enough to fill in things like drop-down menus & select/multi-select boxes. What would be missing is basic validation, to check beforehand if a job can't fit in a given partition/QOS. But, the minimum listed above should be enough to get the user (that is, the cluster admin) going, which is the most important thing.

So, a separate issue/PR would be good, to have an example minimal JSON file, and documentation on format/fields.

The "folder of stuff"

To be honest, I'd like to keep the number of "source files" at an absolute minimum.

I should define what I mean when I say "source files".

All of the information in the JSON file comes from /etc/slurm/slurm.conf, as well as from SLURM information that exists in a MySQL database. That information (database + slurm.conf) is the "source". The cluster admin already had to do the manual work of setting up slurm.conf, as well as running the sacctmgr commands to load the info into the database; and the cluster admin still has to do the additional work of maintaining those things, as people and machines come and go.

I really don't want the cluster admin to have to manually maintain the same information in another set of files, in a different format. All of the information that job-maker needs should be obtainable from SLURM directly, which is why I think a conversion tool is the best option here.

But, as I said back in the 'Not a Dependency' section, there should be documentation telling the cluster admin that they don't need to provide all the cluster information to get a basically-functioning webapp. So, if the cluster admin doesn't want to, she should be able to—as a one-time activity—provide the minimal amount of information needed to have a functioning app.

There's an additional benefit: The cluster admin could use the "minimal JSON" to set up the webapp as a proof of concept, and then later (once a decision is made to move forward with the webapp) set up this tool to generate the "complete JSON".

By the way, I understand that it's not possible (or at least not easy) to pull the info directly from SLURM, for a number of reasons:

  • If this webapp is written in JavaScript, it makes sense to have the app's configuration in JSON, even though this means that conversion work is required. I expect it's easier to do the conversion work separately, instead of coding the JavaScript necessary to parse slurm.conf`` directly. Besides, since you need sacctmgr` to get some of the information from the cluster, you'd still need some sort of dynamic, server-side support. So, much easier to do the JSON creation separately.
  • Since the webapp can support multiple clusters, it's not really feasible to have the web app reach out to clusters directly for all this info. Besides, even though the information changes over time, it's not so dynamic as to require querying the cluster on every page load.

Organization

I don't really have a preference regarding organization, except I don't like the word "modules", because this code never gets called by the webapp. "helper" would be a more appropriate term, in my opinion.

Although, right now job-maker only has one helper, so putting slurm2json into a helpers directory—again, in my opinion—just adds an additional directory layer. Unless you've got plans for adding other helpers?

Still, I'm not particularly passionate on this topic.

Containers

You're absolutely right: I should have gotten this set up as a Singularity container, to make it easy to deploy onto a cluster. That'll be the next thing I do here.

BTW, when you said "the module should come as a container", I assume you meant a bootstrap definition file? I assume you weren't asking me to add a pre-built image to the repo; adding a big binary blob to a Git repo doesn't seem right to me (unless you have Git LFS).

Functions

I kindof expected the "Could you write it in Python?" comment. 8-\ Honestly, both the "Why Perl?" and "Why no functions?" questions have the same answer: For the most part, this program is a simple parser, which Perl does extremely well, and functions would break up the flow.

The vast majority of line of code (~350, out of 461 total) are one big loop, iterating over all the lines of slurm.conf. There are only two real places inside the loop where I could see refactoring code into functions; those are the code blocks dealing with NodeName (lines 102-139) and PartitionName (lines 141-231). But, moving those out to functions would mean, when you want to consider the NodeName or PartitionName code; you'd have to mentally context-shift your brain out of the main loop, and in to the function.

It might help if you do what I did while writing the code: In a separate window, log into a Sherlock v1 node, and open /etc/slurm/slurm.conf.

I'm leaning much more towards moving the external command-calling code (lines 293-403) into separate functions, which live in a separate Perl file. Then, the code could be broken into three files:

  1. The slurm.conf parsing loop.
  2. Code which gets info from sacctmgr.
  3. Main code, which takes the above stuff & builds the JSON.

So, I'll look at breaking up things into the three parts listed above.

@akkornel
Copy link
Author

If you're interested in picking up some Perl, I can't say that my code is the best thing to look at, but I hope it'll help!

As for a place to run it, Sherlock v1 is probably the best place to do this, for a few reasons:

  • There is a new-enough Perl available (as a module), which isn't available yet on Sherlock v2.
  • The Perl module supports local::lib, which is similar to Python's virtualenv.
  • The tool needs access to a slurm.conf and sacctmgr.

Perl is a compiler-interpreter—it converts Perl code into a bytecode, which is then executed—so Perl doesn't have an interactive interpreter like Python does. But, startup time is very quick.

On Sherlock v1, something similar to this should get you up and running with the needed Perl modules:

module load perl/5.24
module load (pick some newer GCC version)
eval `perl -Mlocal::lib`
cpan install List::Util
cpan install JSON
cpan install Text::CSV

To see what shell commands are being generated in line 2, just run perl -Mlocal::lib directly. It's setting environment variables so that Perl looks for modules in your home directory, and so that the module installers know to install modules into your home directory. Each cpan install command takes care of auto-installing necessary dependencies.

Then, to run the tool, see the README I included. Note that you'll need to use one of the longer-form commands, because of the taint-mode issue I described.

(BTW, Taint Mode is an optional feature: When taint mode is on, anything coming in from the outside is "tainted". If you use a tainted variable somewhere, that result is also tainted (so, concatenating a tainted string and an untainted string taints the result). If you try to use tainted data in a dangerous way (like, to execute an outside program), Perl kills your program. The way you un-taint a variable is to pass it through a regular expression. It's an awesome safety feature, unique to Perl.)

If you have questions, feel free to let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants