Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the taxonomy script for testing classification #442

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kwryankrattiger
Copy link
Collaborator

I re-worked this a bit so I can use this script as an imported module for testing.

import upload_gitlab_failure_logs as taxonomy

with read("my-test-trace.txt", "r") as trace:
    print(taxonomy.classify(trace))

Copy link
Collaborator

@mvandenburgh mvandenburgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are good with me, but they will conflict with @AlmightyYakob's PR #433.

@kwryankrattiger
Copy link
Collaborator Author

np, it is a pretty easy thing to fix-up. I also added the script I used to do the testing.

Usage is something like

python validate_taxonomy.py $(ls trace*)

OPENSEARCH_ENDPOINT = os.environ["OPENSEARCH_ENDPOINT"]
OPENSEARCH_USERNAME = os.environ["OPENSEARCH_USERNAME"]
OPENSEARCH_PASSWORD = os.environ["OPENSEARCH_PASSWORD"]
return job_error_class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following #440 (comment), why can't we:

Suggested change
return job_error_class
return matching_patterns, job_error_class

?

break
else:
job_error_class = "other"
job_error_class = classify(job_trace)
Copy link
Member

@alalazo alalazo Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here do something like:

Suggested change
job_error_class = classify(job_trace)
matches, job_error_class = classify(job_trace)
job_input_data["matches"] = sorted(matches)

?1

Footnotes

  1. Obviously, "matches" is just a placeholder, so any better name for the field works for me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one good reason is not filtering to a single match can lead to mislabeled classifications. The ordering below is meant to identify the "best' error match.

For example, file_not_found is only an error if there are no other errors since it is possible for uploading artifacts to not find a file, but not be a relevant error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean to hide taxonomy, that's another field (as you can see I am returning both). I'd like to know all the matches. Then I can try to query things like: how many fetch errors also match a network error? Or stuff like that...

Copy link
Collaborator

@mvandenburgh mvandenburgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, thought I approved this already. This will need to be rebased to resolve conflicts but the changes LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants