Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for delays in GitHub API updates #187

Open
bluekeyes opened this issue Jun 8, 2020 · 6 comments
Open

Account for delays in GitHub API updates #187

bluekeyes opened this issue Jun 8, 2020 · 6 comments
Labels
bug Problems with existing functionality

Comments

@bluekeyes
Copy link
Member

We've observed a rare race condition where the commit status API does not show the latest values for statuses when processing a status event. This can cause PRs to get stuck unmerged, since there are no more events.

I'm not sure how to fix this, but one option might be to thread through the status payload from the event and merge it with the values from the API. This doesn't help if two different status payloads arrive at the same time and both don't appear in the API response but it might help in the common case.

@bluekeyes bluekeyes added the bug Problems with existing functionality label Jun 8, 2020
@nmiyake
Copy link
Contributor

nmiyake commented Jun 9, 2020

Just wanted to flag that I don't think this is necessarily rare -- looks like there were at least 5-6 instances of this occurring for a round of excavator PRs, and anecdotally I find that there are often excavator PRs that are unmerged on my repos (wasn't aware of the trick to poke via comment until today)

@aeweidne
Copy link

I believe we may be hitting this as well, in combination with Policybot. Policybot UI indicates that the policy matches and is fulfilled, but the commit status seems to be hung and Bulldozer won't merge until someone pokes with a comment. It appears to happen once we have more than 10-15 PRs being managed at once actively, which seems low

@bluekeyes
Copy link
Member Author

@shimmerjs in your case, it sounds like GitHub is still showing a pending status for the Policybot check. Is that correct? If so, leaving a comment also triggers Policybot evaluation again, which will post a new status and trigger Bulldozer. Viewing the UI re-evaluates the policy (without posting a new status), so it's possible for it to disagree with the last posted status.

If you have access to the Policybot logs, there might be something there to explain why the status wasn't updated. 10-15 PRs is very low compared to the number we manage internally, but there are several possible reasons why it might happen: rate limits, saturated processing queue, etc.

@aeweidne
Copy link

@bluekeyes thanks for the response.

in your case, it sounds like GitHub is still showing a pending status for the Policybot check.

Specifically, it is actually still showing failure (no policies match -- our default policy simply requires CI status checks and a review from a member of our team, so the Policybot status remains in a failure state until those CI status checks are present & passing), instead of pending. It seems to get hung in that state even though the UI will correctly show pending (CI status checks passed + no review) or successful (CI status checks passed + one review).

If you have access to the Policybot logs, there might be something there to explain why the status wasn't updated.

I do. I was tailing them on Friday when I noticed the issue and wasn't able to see anything logged as error/warn that indicated anything was incorrectly happening, in fact I saw that policies were being successfully applied with the intended result consistently, which led me to thinking that Bulldozer might be the issue.

I did however notice some calls for team membership were returning size: -1, and assumed it was because Policybot wasn't pulling children team members by default -- I added all children team members to the parent team referenced in our Policybot configuration and that doesn't seem to have changed anything. I will continue investigating as cycles become available and sharing information here. Unless you believe I should file a Policybot issue?

Thanks again

@bluekeyes
Copy link
Member Author

Ah, thank you for clarifying. That does sound like it could be a similar issue, where Policy Bot does not detect the new status after receiving a status event. Since the root cause might be different and the solution will probably be different, it would help to have a separate issue on Policy Bot. If you can include the logs for one of the evaluations that should have updated but didn't, that would be great, but I understand if it's too hard to remove the identifying information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Problems with existing functionality
Projects
None yet
Development

No branches or pull requests

3 participants