Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize job names with runner? #185

Open
jbusecke opened this issue Apr 22, 2024 · 2 comments
Open

Sanitize job names with runner? #185

jbusecke opened this issue Apr 22, 2024 · 2 comments

Comments

@jbusecke
Copy link
Contributor

jbusecke commented Apr 22, 2024

I am running repeatedly into issues with dataflow job names that are not allowed on dataflow like here.

Runner warns about this, but fails. This is acceptable in general but adds avoidable overhead for the maintainer (who often already has a steep learning curve ahead of them). It is also only an issue on Dataflow and thus would be nicely abstracted away on the runner level?

I suggest that we sanitize jobnames within runner in the following way:

  • replace offending characters (e.g. _ with -)
  • trim the job name to the max length (with appropriate warning)
@cisaacstern
Copy link
Member

This block of code is supposed to be preventing this from happening. So there must be something missing from the logic here:

@validate("job_name")
def _validate_job_name(self, proposal):
"""
Validate that job_name conforms to ^[a-z][-_0-9a-z]{0,62}$ regex.
That's what is valid in dataflow job names, so let's keep everyone
in that range.
Dataflow job names adhere to the following GCP cloud labels requirements:
https://cloud.google.com/resource-manager/docs/creating-managing-labels#requirements
"""
validating_regex = r"^[a-z][-_0-9a-z]{0,62}$"
if not re.match(validating_regex, proposal.value):
raise ValueError(
f"job_name must match the regex {validating_regex}, instead found {proposal.value}"
)
return proposal.value

@jbusecke
Copy link
Contributor Author

@cisaacstern I think it is 100% working! It raises exactly that error. But what I am proposing is to actually change the jobname within runner so that the error is not raised in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants