Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate go-git concurrency issues #162

Closed
nerdalert opened this issue Apr 5, 2024 · 3 comments · Fixed by #372
Closed

Investigate go-git concurrency issues #162

nerdalert opened this issue Apr 5, 2024 · 3 comments · Fixed by #372
Labels
bug Something isn't working priority worker

Comments

@nerdalert
Copy link
Member

Consistently seeing 3/3 on retries. Either bump them up or dig into the issue. Not a big deal to just add more retries imo since it's just a local thing but worth a little time investigating.

Example:

2024-04-04T18:54:19.938Z	INFO	cmd/generate.go:246	Processing job	{"job": "5"}
2024-04-04T18:54:20.037Z	INFO	cmd/generate.go:310	Retrying fetching updates, attempt 2/3	{"job": "5", "pr_number": "4", "work_dir": "/data", "origin": "origin"}
2024-04-04T18:54:22.097Z	INFO	cmd/generate.go:310	Retrying fetching updates, attempt 3/3	{"job": "5", "pr_number": "4", "work_dir": "/data", "origin": "origin"}
@nerdalert nerdalert self-assigned this Apr 5, 2024
@russellb
Copy link
Member

russellb commented Apr 6, 2024

Another option would be to bail on go-git and use a wrapper for the real git.

https://github.com/ldez/go-git-cmd-wrapper?tab=readme-ov-file

@russellb russellb added worker bug Something isn't working labels Apr 6, 2024
@nerdalert
Copy link
Member Author

@mingxzhao reported a back to back error in this job instructlab/taxonomy#80 :

Apr 18 15:49:24 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]: 2024-04-18T15:49:24.581Z        INFO        cmd/generate.go:647        Job took 34s to run
Apr 18 15:49:24 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]: 2024-04-18T15:49:24.633Z        INFO        cmd/generate.go:518        Job done        {"job": "163", "pr_number": "715", "work_dir": "/home/fedora/instruct-lab-bot", "origin": "origin", "out_dir": "/home/fedora/instruct-lab-bot/precheck-pr-715-8e87e8e15b71c0fe3e9aa7aced29211003d5c138"}
Apr 18 15:57:55 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]: 2024-04-18T15:57:55.883Z        INFO        cmd/generate.go:341        Processing job 164        {"job": "164"}
Apr 18 15:57:56 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]: 2024-04-18T15:57:56.034Z        INFO        cmd/generate.go:561        Retrying fetching updates, attempt 2/3        {"job": "164", "pr_number": "80", "work_dir": "/home/fedora/instruct-lab-bot", "origin": "origin"}
Apr 18 15:57:58 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]: 2024-04-18T15:57:58.167Z        INFO        cmd/generate.go:561        Retrying fetching updates, attempt 3/3        {"job": "164", "pr_number": "80", "work_dir": "/home/fedora/instruct-lab-bot", "origin": "origin"}
Apr 18 15:58:00 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]: 2024-04-18T15:58:00.660Z        ERROR        cmd/generate.go:389        git operations error: could not fetch PR branch: some refs were not updated
Apr 18 15:58:00 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]: github.com/instruct-lab/instruct-lab-bot/worker/cmd.(*Worker).processJob
Apr 18 15:58:00 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]:         /home/fedora/worker-intruct-lab-bot/instruct-lab-bot/worker/cmd/generate.go:389
Apr 18 15:58:00 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]: github.com/instruct-lab/instruct-lab-bot/worker/cmd.glob..func1.4
Apr 18 15:58:00 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]:         /home/fedora/worker-intruct-lab-bot/instruct-lab-bot/worker/cmd/generate.go:196
Apr 18 17:40:59 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]: 2024-04-18T17:40:59.883Z        INFO        cmd/generate.go:341        Processing job 165        {"job": "165"}
Apr 18 17:41:00 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]: 2024-04-18T17:41:00.382Z        ERROR        cmd/generate.go:389        git operations error: could not fetch PR branch: some refs were not updated
Apr 18 17:41:00 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]: github.com/instruct-lab/instruct-lab-bot/worker/cmd.(*Worker).processJob
Apr 18 17:41:00 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]:         /home/fedora/worker-intruct-lab-bot/instruct-lab-bot/worker/cmd/generate.go:389
Apr 18 17:41:00 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]: github.com/instruct-lab/instruct-lab-bot/worker/cmd.glob..func1.4
Apr 18 17:41:00 ip-172-31-13-79.us-east-2.compute.internal instruct-lab-bot-worker[226392]:         /home/fedora/worker-intruct-lab-bot/instruct-lab-bot/worker/cmd/generate.go:196

@nerdalert nerdalert removed their assignment Apr 29, 2024
@nerdalert nerdalert added good first issue Good for newcomers and removed good first issue Good for newcomers labels Apr 29, 2024
@nerdalert
Copy link
Member Author

Here was another incident of this: instructlab/taxonomy#1060

nerdalert added a commit to nerdalert/instructlab-bot that referenced this issue May 23, 2024
- delete the repo after the job completes
- clone at the beginning of a job
- resolves the retries that were present in all of the job runs

Closes instructlab#162

Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>
nerdalert added a commit to nerdalert/instructlab-bot that referenced this issue May 23, 2024
- delete the repo after the job completes
- clone at the beginning of a job
- resolves the retries that were present in all of the job runs

Closes instructlab#162

Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>
nerdalert added a commit to nerdalert/instructlab-bot that referenced this issue May 23, 2024
- delete the repo after the job completes
- clone at the beginning of a job
- resolves the retries that were present in all of the job runs
e.g. "Retrying fetching updates, attempt 1/5 2/5 3/5..."

Closes instructlab#162

Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>
@nerdalert nerdalert mentioned this issue May 23, 2024
@mergify mergify bot closed this as completed in #372 May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority worker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants