New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add custom key to projects table, backfilling based on current project name, and API support #9134
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for determined-ui ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9134 +/- ##
==========================================
+ Coverage 45.14% 46.49% +1.35%
==========================================
Files 1230 743 -487
Lines 154567 106604 -47963
Branches 2405 2405
==========================================
- Hits 69782 49568 -20214
+ Misses 84590 56841 -27749
Partials 195 195
Flags with carried forward coverage won't be shown. Click here to find out more.
|
b30f36e
to
23446b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall.
master/internal/api_project.go
Outdated
p := &projectv1.Project{} | ||
err = a.m.db.QueryProto("insert_project", p, req.Name, req.Description, | ||
req.WorkspaceId, curUser.ID) | ||
req.WorkspaceId, curUser.ID, projectKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worried about a race condition here where two projects with the same first 3 chars are being created at the same time. Both of them would generate the same key. Perhaps we can create the project first without the key, then generates the key filtering out projects created after its own "created_at" time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the race condition here will cause a rough UX, "if no key is specified during project creation, but getting an error about a key conflict" doesn't really make sense.
The suggest approach would require dropping the unique constraint on the key column (which might not be the worst move, since I think that is really just a bandaid on an application level solution for how we handle uniqueness on project keys). I'd also lean away from using the created_at time, since it's susceptible to clock drift/syncs, and could cause some funky flakes. We could piggy back of the strict event ordering work done by streaming updates and leverage the seq value (would do the same thing we'd want to do with a timestamp, without the risk of clock drift).
count_suffix := CAST(count_value as text); | ||
|
||
-- Concatenate prefix and count suffix | ||
RETURN lower(prefix || count_suffix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too sure about this way of generating project keys. Imagine that we have a project with key aa12
, then we have a project with name aa1
, running this function would run into duplicated keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, you have a good point, I don't think the case you brought up is an immediate issue, assuming this function is only used to backfill keys for existing projects, but it will be a problem if it gets used to generate keys for new projects when there are existing keys.
If this is only being used to backfill:
- if we have a project key of
aa12
, that would mean that:- this is either the 12 instance of the prefix
aa
(assuming the name of the project isaa
) - or the 2nd instance of the prefix
aa1
(assuming the name of the project isaa1
)
- this is either the 12 instance of the prefix
- so if we have a project with the prefix
aa1
- the count would be 3 since we only care about the keys that start with
aa1
- since there is only going to be 2 or 3 other cases where that string appears as a key
aa11
andaa12
(and possiblyaa1
)
- the count would be 3 since we only care about the keys that start with
That being said, I do see a case where:
- project name:
aa1...
- resulting key:
aa11
- If there are 10 other projects with the name
aa
(can happen with enough workspaces) - then we'd end up with:
aa1
->aa11
aa
->aa2
aa
->aa3
aa
->aa4
- ...
aa
->aa10
aa
->aa11
<< error on duplicate key
Now I don't think this case is as likely to happen since it's requires the project name to only be 2 characters, there to be identical 2 character projects across 10 different workspaces; but it does demonstrate it's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we sort the way these things are handled to avoid cases like this, so instead of handling any project with aa1...
first, we'd instead handle all the aa
projects first, and then that would result with the aa1...
project being assigned the key aa12
instead of aa11
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't put us in the clear for using this style of approach for generating keys for new projects since we can't order how those will be processed or how they had been handled previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For new project, since we allow passing key when creating, it would by alright to just reject duplicated key as long as we have a clear error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose we have keys a1 a2 a3
in the system, then we delete the row associated with a1
, then a newly created project with name a
would have the generated key to be a3
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is also an issue with this approach if used for anything outside of backfilling. If instead of counting we parse the number at the end of the key, and get the max from the set, then we could avoid that problem, but then we get into a tough spot since the generated keys will never re-use earlier values.
But then if do the parsing + max approach, someone could specify a key of a99999
and if we don't have way to cycle back to free keys we'll be stuck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd be in a position where we'd have to search for the free key, we could do a binary search by comparing the max & count in each subset; but this all seems like over engineering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really want something like this, I could imagine instead of basing it off an incremental value, keep the 1-3 character prefix from the project name + 2-4 random characters (w/ a retry).
That avoids the RBAC concern outside of key collisions + avoids never re-using freed keys. This does mean that the key generation performance is probabilistic and degrades as we start hitting the upper limits of a given prefix.
If it wasn't a product requirement to have the generated keys be based on the names, either assigning this to be 5 random alphanumeric character (or just be equivalent to the project id by default), we could avoid this problem entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus using the random character suffix means we don't have to enforce and order on the update & can use the same function to generate keys outside of backfill
23446b9
to
21c95ba
Compare
…ck on duplicate project name
… testing, adjust transaction isolation level
21c95ba
to
a165316
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to allow changing project key after creation? My concern is that, it's possible that the generated key is an annoying/racist word, and the user might want to change that
projects | ||
SET | ||
key = function_generate_project_key(5, 3, name) | ||
WHERE key IS NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a migration down file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, down migrations got removed recently 🥳
@@ -50,3 +66,61 @@ func ProjectIDByName(ctx context.Context, workspaceID int, projectName string) ( | |||
} | |||
return &pID, nil | |||
} | |||
|
|||
// GenerateProjectKey generates a unique project key for a project based on its name. | |||
func generateProjectKey(ctx context.Context, tx bun.Tx, projectName string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we switch from using sql function to using go function?
return nil | ||
} | ||
} | ||
|
||
func (a *apiServer) PostProject( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add the ability to change the project key. Or do we want to add that in a separate ticket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a test case for when someone adds with a key that ends in a number like "TEST99". Additionally in that case what would happen if we try to add a another project with that same key
Ticket
ET-97
ET-101
ET-102
ET-164
Description
Introduce custom project key (similar to JIRA's project key), to be able to refer to projects. This adds the column to the DB & backfills existing projects based on their existing project name.
This also includes allowing the key to be specified in the
CreateProject
endpoint (PostProject).Test Plan
Checklist
docs/release-notes/
.See Release Note for details.
Commentary
Work will need to be done to allow updating project keys of existing projects (that is being handled in ET-98). This will likely include updating the logic for how we determine uniqueness, since the requirement will need to ensure no past project keys had been in use.