Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add custom key to projects table, backfilling based on current project name, and API support #9134

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

corban-beaird
Copy link
Contributor

@corban-beaird corban-beaird commented Apr 9, 2024

Ticket

ET-97
ET-101
ET-102
ET-164

Description

Introduce custom project key (similar to JIRA's project key), to be able to refer to projects. This adds the column to the DB & backfills existing projects based on their existing project name.

This also includes allowing the key to be specified in the CreateProject endpoint (PostProject).

Test Plan

  • Automated Integration Testing

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

Commentary

Work will need to be done to allow updating project keys of existing projects (that is being handled in ET-98). This will likely include updating the logic for how we determine uniqueness, since the requirement will need to ensure no past project keys had been in use.

Copy link

netlify bot commented Apr 9, 2024

Deploy Preview for determined-ui ready!

Name Link
🔨 Latest commit a165316
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/663ead19cb4ae000082fdc68
😎 Deploy Preview https://deploy-preview-9134--determined-ui.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Apr 9, 2024

Codecov Report

Attention: Patch coverage is 25.00000% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 46.49%. Comparing base (c3901c8) to head (a165316).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9134      +/-   ##
==========================================
+ Coverage   45.14%   46.49%   +1.35%     
==========================================
  Files        1230      743     -487     
  Lines      154567   106604   -47963     
  Branches     2405     2405              
==========================================
- Hits        69782    49568   -20214     
+ Misses      84590    56841   -27749     
  Partials      195      195              
Flag Coverage Δ
backend 34.23% <ø> (-7.52%) ⬇️
harness 64.07% <25.00%> (-0.01%) ⬇️
web 36.06% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
harness/determined/common/api/bindings.py 40.20% <25.00%> (-0.02%) ⬇️

... and 487 files with indirect coverage changes

@corban-beaird corban-beaird changed the title chore: add custom project keys to project table and backfill feat: add custom key to projects table, backfilling based on current project name, and API support Apr 9, 2024
@corban-beaird corban-beaird marked this pull request as ready for review April 19, 2024 14:40
@corban-beaird corban-beaird requested review from a team as code owners April 19, 2024 14:40
Copy link
Contributor

@AmanuelAaron AmanuelAaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

p := &projectv1.Project{}
err = a.m.db.QueryProto("insert_project", p, req.Name, req.Description,
req.WorkspaceId, curUser.ID)
req.WorkspaceId, curUser.ID, projectKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worried about a race condition here where two projects with the same first 3 chars are being created at the same time. Both of them would generate the same key. Perhaps we can create the project first without the key, then generates the key filtering out projects created after its own "created_at" time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the race condition here will cause a rough UX, "if no key is specified during project creation, but getting an error about a key conflict" doesn't really make sense.

The suggest approach would require dropping the unique constraint on the key column (which might not be the worst move, since I think that is really just a bandaid on an application level solution for how we handle uniqueness on project keys). I'd also lean away from using the created_at time, since it's susceptible to clock drift/syncs, and could cause some funky flakes. We could piggy back of the strict event ordering work done by streaming updates and leverage the seq value (would do the same thing we'd want to do with a timestamp, without the risk of clock drift).

count_suffix := CAST(count_value as text);

-- Concatenate prefix and count suffix
RETURN lower(prefix || count_suffix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure about this way of generating project keys. Imagine that we have a project with key aa12, then we have a project with name aa1, running this function would run into duplicated keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, you have a good point, I don't think the case you brought up is an immediate issue, assuming this function is only used to backfill keys for existing projects, but it will be a problem if it gets used to generate keys for new projects when there are existing keys.

If this is only being used to backfill:

  • if we have a project key of aa12, that would mean that:
    • this is either the 12 instance of the prefix aa (assuming the name of the project is aa)
    • or the 2nd instance of the prefix aa1 (assuming the name of the project is aa1)
  • so if we have a project with the prefix aa1
    • the count would be 3 since we only care about the keys that start with aa1
    • since there is only going to be 2 or 3 other cases where that string appears as a key
      • aa11 and aa12 (and possibly aa1)

That being said, I do see a case where:

  • project name: aa1...
  • resulting key: aa11
  • If there are 10 other projects with the name aa (can happen with enough workspaces)
  • then we'd end up with:
    • aa1 -> aa11
    • aa -> aa2
    • aa -> aa3
    • aa -> aa4
    • ...
    • aa -> aa10
    • aa -> aa11 << error on duplicate key

Now I don't think this case is as likely to happen since it's requires the project name to only be 2 characters, there to be identical 2 character projects across 10 different workspaces; but it does demonstrate it's possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we sort the way these things are handled to avoid cases like this, so instead of handling any project with aa1... first, we'd instead handle all the aa projects first, and then that would result with the aa1... project being assigned the key aa12 instead of aa11.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't put us in the clear for using this style of approach for generating keys for new projects since we can't order how those will be processed or how they had been handled previously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new project, since we allow passing key when creating, it would by alright to just reject duplicated key as long as we have a clear error message

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose we have keys a1 a2 a3 in the system, then we delete the row associated with a1, then a newly created project with name a would have the generated key to be a3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is also an issue with this approach if used for anything outside of backfilling. If instead of counting we parse the number at the end of the key, and get the max from the set, then we could avoid that problem, but then we get into a tough spot since the generated keys will never re-use earlier values.

But then if do the parsing + max approach, someone could specify a key of a99999 and if we don't have way to cycle back to free keys we'll be stuck.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd be in a position where we'd have to search for the free key, we could do a binary search by comparing the max & count in each subset; but this all seems like over engineering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really want something like this, I could imagine instead of basing it off an incremental value, keep the 1-3 character prefix from the project name + 2-4 random characters (w/ a retry).

That avoids the RBAC concern outside of key collisions + avoids never re-using freed keys. This does mean that the key generation performance is probabilistic and degrades as we start hitting the upper limits of a given prefix.

If it wasn't a product requirement to have the generated keys be based on the names, either assigning this to be 5 random alphanumeric character (or just be equivalent to the project id by default), we could avoid this problem entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus using the random character suffix means we don't have to enforce and order on the update & can use the same function to generate keys outside of backfill

@corban-beaird corban-beaird force-pushed the corban-beaird-add-project-key branch from 23446b9 to 21c95ba Compare May 7, 2024 16:52
Copy link
Contributor

@gt2345 gt2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to allow changing project key after creation? My concern is that, it's possible that the generated key is an annoying/racist word, and the user might want to change that

projects
SET
key = function_generate_project_key(5, 3, name)
WHERE key IS NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a migration down file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, down migrations got removed recently 🥳

@@ -50,3 +66,61 @@ func ProjectIDByName(ctx context.Context, workspaceID int, projectName string) (
}
return &pID, nil
}

// GenerateProjectKey generates a unique project key for a project based on its name.
func generateProjectKey(ctx context.Context, tx bun.Tx, projectName string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we switch from using sql function to using go function?

return nil
}
}

func (a *apiServer) PostProject(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add the ability to change the project key. Or do we want to add that in a separate ticket?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test case for when someone adds with a key that ends in a number like "TEST99". Additionally in that case what would happen if we try to add a another project with that same key

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants