-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the taxonomy script for testing classification #442
base: main
Are you sure you want to change the base?
Refactor the taxonomy script for testing classification #442
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are good with me, but they will conflict with @AlmightyYakob's PR #433.
np, it is a pretty easy thing to fix-up. I also added the script I used to do the testing. Usage is something like python validate_taxonomy.py $(ls trace*) |
OPENSEARCH_ENDPOINT = os.environ["OPENSEARCH_ENDPOINT"] | ||
OPENSEARCH_USERNAME = os.environ["OPENSEARCH_USERNAME"] | ||
OPENSEARCH_PASSWORD = os.environ["OPENSEARCH_PASSWORD"] | ||
return job_error_class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following #440 (comment), why can't we:
return job_error_class | |
return matching_patterns, job_error_class |
?
break | ||
else: | ||
job_error_class = "other" | ||
job_error_class = classify(job_trace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one good reason is not filtering to a single match can lead to mislabeled classifications. The ordering below is meant to identify the "best' error match.
For example, file_not_found
is only an error if there are no other errors since it is possible for uploading artifacts to not find a file, but not be a relevant error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mean to hide taxonomy, that's another field (as you can see I am returning both). I'd like to know all the matches. Then I can try to query things like: how many fetch errors also match a network error? Or stuff like that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, thought I approved this already. This will need to be rebased to resolve conflicts but the changes LGTM
I re-worked this a bit so I can use this script as an imported module for testing.