Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobspec: allow multiple 'slot' entries, particularly above 'node' #5958

Open
jameshcorbett opened this issue May 10, 2024 · 3 comments
Open

Comments

@jameshcorbett
Copy link
Member

The following jobspec

{
  "resources": [
    {
      "type": "slot",
      "count": 1,
      "label": "rabbit",
      "with": [
        {
          "type": "node",
          "count": 1,
          "exclusive": true,
          "with": [
            {
              "type": "slot",
              "count": 1,
              "with": [
                {
                  "type": "core",
                  "count": 1
                }
              ],
              "label": "task"
            }
          ]
        }
      ]
    }
  ],
  "tasks": [
    {
      "command": [
        "hostname"
      ],
      "slot": "task",
      "count": {
        "per_slot": 1
      }
    }
  ],
  "attributes": {
    "system": {
      "duration": 0,
      "environment": {},
      "shell": {}
    }
  },
  "version": 1
}

is rejected by the shell with error: jobspec: node resource encountered after slot resource. The shell is properly enforcing a V1 restriction. However, in the current way that rabbit resources are organized, it would be very useful to have a top-level 'slot' entry above 'node' and 'ssd'. Fluxion understands such jobspecs, and according to @grondo , sched-simple does as well.

Thoughts on whether we could enable this functionality / disable this check in the shell?

@grondo
Copy link
Contributor

grondo commented May 10, 2024

There was some work by @SteVwonder a few years ago in the job shell jobspec parser to support non-V1, e.g.
#3160, and #3175. However, the job shell now explicitly checks for version == 1 and rejects anything without that version, so I'm not sure how a non-V1 jobspec is meant to be processed by the shell.

I'm not sure if we need to draft a V2 (or Vn), or extend V1, etc. Looking for any opinions here... :-)

@jameshcorbett
Copy link
Member Author

At the moment this is mostly needed for flux-coral2 testing, because in actual usage jobspecs like mentioned above are constructed after submission and only the job-manager's copy (which it sends to the scheduler) is modified. However, as @grondo mentioned on a call, the copy the shell receives will still be the user's original, signed version. So I think this issue shouldn't be seen in production, under the current design of flux-coral2.

@grondo
Copy link
Contributor

grondo commented May 16, 2024

Related: #3310

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants