refactor: migrate docker & wasm run to v2 api #3970

frrist · 2024-05-02T01:04:21Z

This change removes the pkg/model dependency from the docker run and wasm run commands and migrates both of them to the new API methods. The flags on each command remain unchanged.

makes progress towards #3831 and #3832

TODO:

fix some todo's I'll call out in review

coderabbitai · 2024-05-02T01:04:28Z

Important

Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

frrist

left some comments to guide review

cmd/cli/docker/docker_run.go

frrist · 2024-05-02T01:06:05Z

cmd/cli/docker/docker_run.go

-	labels, err := parse.Labels(ctx, opts.SpecSettings.Labels)
-	if err != nil {
-		return nil, err
+	// TODO(forrest) [refactor]: this logic is duplicated in wasm_run


I should extract the common logic here before merging.

frrist · 2024-05-02T01:07:29Z

cmd/cli/docker/docker_run.go

-		),
-	)
+	task, err := models.NewTaskBuilder().
+		Name("TODO").


We currently don't have a means for users to specify a task name with the current flags on wasm run or docker run I could stub in a UUID here and add a --task-name flag later, thoughts?

we can use "main" as the default task name. Task names are only useful when we support multiple tasks per job, and that won't be doable through docker run and wasm run as they will most likely submit a single task per job. So calling it "main" or any other standard name is acceptable

actually this is what we are currently doing

bacalhau/pkg/models/migration/legacy/from.go

Line 91 in a02d156

Name: "main",

frrist · 2024-05-02T01:08:13Z

cmd/cli/docker/docker_run.go

+		Name:        "TODO",
+		Namespace:   "TODO",


Similar point to task name in my comment above - these could be UUIDs, values from flags we add, or something else?

we have a default namespace for bacalhau job run

bacalhau/pkg/models/constants.go

Line 5 in 6f00c8a

DefaultNamespace = "default"

though we are still using clientID for docker run. I think we should use the default namespace, but expose a flag to set the desired namespace. This means users won't be able to filter their own jobs in the public network, but that might be acceptable based on our longer term plans. Thoughts?

frrist · 2024-05-02T01:09:38Z

cmd/cli/docker/docker_run.go

+		Type:        models.JobTypeBatch,
+		Priority:    0,


I believe it is assumed that docker and wasm jobs are batch when run over the CLI, but we could easily change that. Additionally priority doesn't have a flag. Gut says we need flags for these - ideas on good default values appreciated.

there is a --target flag that controls this:

--target all|any Whether to target the minimum number of matching nodes ("any") (default) or all matching nodes ("all") (default any)

any is a batch job, where all is an ops job

My comment here was with respect to the priority fields. Are we okay adding a flag called --priority that sets the field?

With respect to the job type, currently controlled via --target:
Would we be willing to drop the target flag in favor of a --type flag whose default value is batch and other possible values are service, ops, and daemon? This would bring the CLI more inline with the fields of the job spec.

lets do the migration one step at a time. Lets avoid exposing or renaming flags as this PR's main focus should be internal and to just switch to new APIs.

To answer your questions though, --priority is not fully implemented as the compute nodes for example don't respect priority when queueing executions locally in their buffer. Will make sense to expose priority, blog about it and make sure it is documented when it is fully implemented. We should use whatever default value bacalhau job run has.

As for --target, we don't and should not support service or daemon jobs using imperative job submissions (i.e. docker run and wasm run). We should allow users to easily update their long running jobs, compare what is deployed in the network and what their local job spec says, and utilize gitops to deploy and update jobs based on committed specs. All that is not feasible with imperative job submissions, and we shouldn't make it easy for users to do bad things to their network.

frrist · 2024-05-02T01:18:00Z

cmd/util/flags/cliflags/spec.go

-		`name:path of the output data volumes. `+
-			`'outputs:/outputs' is always added unless '/outputs' is mapped to a different name.`,
+		`name=path of the output data volumes. `+
+			`'outputs=/outputs' is always added unless '/outputs' is mapped to a different name.`,


I don't think this is true anymore. Will need to address this before merging.

why changed from name:path to name=path? this will break users. Also we will need to define a default output path outputs:/outputs for both docker run and wasm run to avoid breaking existing behavour and breaking all our examples and docs. I don't think it is a bad design anyways for those commands, but worth exploring the right long term option with @aronchick

For context here @aronchick: This comment is about a change in behavior of the output flag from expecting : as a separator to = as a separator.

When using a flag that requires pairs of strings (key,value) - for example environment variables - and in the case here ResultPaths I find a StringToStringVar flag type (a map) to provide clearer validation compared to accepting a StringArrayVar flag (a slice). As an example, consider:

Valid:
bacalhau docker run -o outputs=/outputs -o inputs=/inputs

Invalid:
bacalhau docker run -o outputs=/outputs -o inputs

Cobra, our CLI library, will immediately catch the invalid case and return an error since inputs is missing a value field.

Comparatively, the behavior prior to this change accepted both options, and it was on us (developers) to write the validation logic to catch the invalid case of a key missing a value.

If we feel this is too large of a change for our users to understand - replacing : with = - then I can write a bespoke implementation of a map flag parser that uses : as the separator instead of = to preserve existing behavior.

Can we please stop changing behaviour or adding more complexity to the migration story? The migration of the APIs and CLIs is a huge and complex story as is, and we've discussed that no change in behaviour to be done while we migrate to avoid as many issues as possible.

While I appreciate you identifying areas of improvements as you work on this task, the right way to handle this is by cutting issues to track them.

frrist · 2024-05-02T01:18:54Z

cmd/util/opts/storage.go

+	// TODO(forrest) [correctness]: need to allow aliases to be provided over CLI
+	alias := "TODO"


Need to address, not sure if the alias should be passed by the client or picked by the CLI.

If I remember correctly, alias is the replacement for name. How did we set names before?

frrist · 2024-05-02T01:20:14Z

pkg/models/docker_spec_config.go

This was ported from the model package. We will be deleting the model package at which point the duplicated code will be remove.

frrist · 2024-05-02T01:20:32Z

pkg/models/publisher_spec_config.go

also ported from model.

frrist · 2024-05-02T01:20:39Z

pkg/models/storage_source_config.go

also ported from model.

wdbaruni · 2024-05-07T06:47:19Z

cmd/cli/docker/docker_run.go

 	if opts.RunTimeSettings.DryRun {
-		// Converting job to yaml
-		var yamlBytes []byte
-		yamlBytes, err = yaml.Marshal(j)
+		out, err := helpers.JobToYaml(job)
 		if err != nil {
-			return fmt.Errorf("converting job to yaml: %w", err)
+			return err
 		}
-		cmd.Print(string(yamlBytes))
+		cmd.Print(out)
 		return nil
 	}

-	executingJob, err := util.ExecuteJob(ctx, j, opts.RunTimeSettings)
+	api := util.GetAPIClientV2(cmd)


can we do things in a way where docker run and wasm run only generate job spec from flags, and eventually call job run or the same method that job run calls? I assume --dry-run, --follow, --id-only and similar flags apply to all and can be handled by job run

Yes I can unify the method calls across the docker, wasm, exec, and run commands in this PR. Great way to de-dupe some of this code!

wdbaruni · 2024-05-07T06:55:54Z

cmd/cli/docker/docker_run.go

+	engineSpec, err := models.DockerSpecBuilder(image).
+		WithParameters(parameters...).
+		WithWorkingDirectory(opts.WorkingDirectory).
+		WithEntrypoint(opts.Entrypoint...).
+		WithEnvironmentVariables(opts.EnvironmentVariables...).
+		Build()


we started to define the specs related to each publisher, source and executor under their package instead of models. This helps with initializing new types, validation and serde from and to models.SpecConfig. There is already a defined type under pkg/executor/docker/models/types.go:22 that should be used

There is also a builder in that package, but I don't think it is correct/useful as it is building a models.SpecConfig directly instead of EngineSpec

Also feel free to move EngineSpec from executor/docker/models to executor/docker

wdbaruni · 2024-05-07T06:56:02Z

cmd/cli/docker/docker_run.go

+// Function for validating the workdir of a docker command.
+func validateWorkingDir(jobWorkingDir string) error {
+	if jobWorkingDir != "" {


We can add these validations under EngineSpec.Validate() method

wdbaruni · 2024-05-07T07:06:42Z

cmd/cli/helpers/togo.go

+// users are not permitted to set, like ID, Version, ModifyTime, State, etc.
+// The solution here is to have a "JobSubmission" type that is different from the actual job spec.
+func JobToYaml(job *models.Job) (string, error) {


I hear your point. My preference is to delay any decision to change our API specs and have different ways to submit jobs until we collect more feedback about current APIs, and have more dedicated time to explore alternatives and align on the right option. Options might include splitting the job type into spec and state top level fields, where state hold the system defined fields, and spec is what the user submits. Again, I truly don't recommend figuring this out just yet.

A low hanging fruit is to introduce SanitizeForSubmission() method under models.Job that will clear out all fields that are reserved by the system so that something like bacalhau job describe <job_id> --output yaml | bacalhau job run would work

wdbaruni · 2024-05-07T07:11:41Z

cmd/util/opts/storage.go

+	// TODO(forrest) [correctness]: need to allow aliases to be provided over CLI
+	alias := "TODO"


If I remember correctly, alias is the replacement for name. How did we set names before?

wdbaruni · 2024-05-07T07:21:26Z

pkg/models/spec_config.go

+func DecodeSpecConfig[T any](spec *SpecConfig) (*T, error) {
+	params, err := json.Marshal(spec.Params)


how useful is this method? today we have each type implements its own way to decode SpecConfig to the right type. e.g. pkg/s3/types.go:92

wdbaruni · 2024-05-07T07:21:55Z

pkg/models/storage_source_config.go

+//nolint:gocyclo
+func StorageStringToSpecConfig(sourceURI, destinationPath, alias string, options map[string]string) (*InputSource, error) {
+	sourceURI = strings.Trim(sourceURI, " '\"")
+	destinationPath = strings.Trim(destinationPath, " '\"")


same comment as to the publisher parser

wdbaruni · 2024-05-07T07:22:44Z

pkg/models/wasm_spec_config.go

+// WasmEngineSpec contains necessary parameters to execute a wasm job.
+type WasmEngineSpec struct {
+	// EntryModule is a Spec containing the WASM code to start running.


similar comment as in the docker engine spec

wdbaruni · 2024-05-07T07:22:58Z

pkg/models/wasm_spec_config.go

+// WasmEngineSpec contains necessary parameters to execute a wasm job.
+type WasmEngineSpec struct {
+	// EntryModule is a Spec containing the WASM code to start running.


similar comment as in the docker engine spec

wdbaruni · 2024-05-07T14:03:44Z

pkg/util/idgen/idgen.go

+
+	// TaskNamePrefix is the prefix of a system generated task name.
+	TaskNamePrefix = "t-name-"
+
+	// JobNamePrefix is the prefix of a system generated job name.
+	JobNamePrefix = "j-name-"


I don't like this? Names are not system generated and our expectation is for the user to provide them as a reference to their jobs

Yeah, me either, I just needed something to generate a name if one wasn't provided. What is the expected behavior if the user doesn't provide a name? The bacalhau job run command will fail, should docker run and wasm run do the same? Note, this would be a change in their existing behavior.

You are right about job name. This PR should help #4005
For task, use main

frrist · 2024-05-22T18:29:54Z

Scope of this change is larger than expected and there are some breaking changes in the new job spec. Descoping this to a smaller change via #4020.

frrist marked this pull request as ready for review May 2, 2024 01:04

frrist commented May 2, 2024

View reviewed changes

frrist requested a review from wdbaruni May 2, 2024 01:21

wdbaruni reviewed May 7, 2024

View reviewed changes

frrist added 6 commits May 21, 2024 12:16

refactor: migrate docker & wasm run to v2 api

ee8d7c9

refactor: simplify flags and fix docker tests

100eb4d

fix: uncomment code in get cli

be4f0d0

fix: uncomment code in run cli

9168096

wasm cli tests pass

289e5d8

wip

c796ccd

frrist force-pushed the frrist/cli/deprecate-model branch from 793da06 to c796ccd Compare May 21, 2024 21:05

wdbaruni mentioned this pull request May 22, 2024

Migrate docker CLI to API v2 #4020

Draft

frrist closed this May 22, 2024

		// TODO(forrest) [correctness]: need to allow aliases to be provided over CLI
		alias := "TODO"

		func DecodeSpecConfig[T any](spec SpecConfig) (T, error) {
		params, err := json.Marshal(spec.Params)

refactor: migrate docker & wasm run to v2 api #3970

refactor: migrate docker & wasm run to v2 api #3970

Conversation

frrist commented May 2, 2024 • edited

TODO:

coderabbitai bot commented May 2, 2024 • edited

Review Skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

frrist left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wdbaruni May 7, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frrist May 9, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frrist commented May 22, 2024

frrist commented May 2, 2024 •

edited

coderabbitai bot commented May 2, 2024 •

edited

CodeRabbit Configration File (`.coderabbit.yaml`)

wdbaruni May 7, 2024 •

edited

frrist May 9, 2024 •

edited