-
Notifications
You must be signed in to change notification settings - Fork 728
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: Deterministic foreach task id's for Argo Workflows (#1704)
* change to more deterministic task ids for argo workflows * cleanup comments * wip: rework task id generation for nested foreaches * wip: stash * wip: rework * wip: possibly finally working nested foreach joins * cleanup * cleanup and fix non-nested foreach input params * cleanup * one more fix for foreach join cases. * add more thorough comment on foreach step task id generation * rename max-split to split-cardinality * more comments on task id generation * cleanup generate_input_paths * comment updates * changes --------- Co-authored-by: savin <savingoyal@gmail.com>
- Loading branch information
1 parent
9989bc6
commit cb200c5
Showing
4 changed files
with
188 additions
and
39 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
import sys | ||
from hashlib import md5 | ||
|
||
|
||
def generate_input_paths(step_name, timestamp, input_paths, split_cardinality): | ||
# => run_id/step/:foo,bar | ||
run_id = input_paths.split("/")[0] | ||
foreach_base_id = "{}-{}-{}".format(step_name, timestamp, input_paths) | ||
|
||
ids = [_generate_task_id(foreach_base_id, i) for i in range(int(split_cardinality))] | ||
return "{}/{}/:{}".format(run_id, step_name, ",".join(ids)) | ||
|
||
|
||
def _generate_task_id(base, idx): | ||
# For foreach splits generate the expected input-paths based on split_cardinality and base_id. | ||
# newline required at the end due to 'echo' appending one in the shell side task_id creation. | ||
task_str = "%s-%s\n" % (base, idx) | ||
hash = md5(task_str.encode("utf-8")).hexdigest()[-8:] | ||
return "t-" + hash | ||
|
||
|
||
if __name__ == "__main__": | ||
print(generate_input_paths(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4])) |
This file was deleted.
Oops, something went wrong.