Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout while generating release notes #648

Open
RCTycooner opened this issue Mar 18, 2020 · 37 comments · Fixed by #893, #894, #896, #897 or #912
Open

Timeout while generating release notes #648

RCTycooner opened this issue Mar 18, 2020 · 37 comments · Fixed by #893, #894, #896, #897 or #912

Comments

@RCTycooner
Copy link

RCTycooner commented Mar 18, 2020

Current Status of this Issue

This is an issue with the underlying Azure DevOps Node SDK or REST API endpoints, not this task. Hence, an issue has been raised in the the appropriate Repo #425

Historically the only workaround has been to always place this task, and any associated tasks e.g. one that upload the generated release notes to a WIKI, in a dedicated YML pipeline job. This allows the task to be easily retried without rerunning the whole pipeline.

However, with 3.37.x the error traps have been changed in the task to treat any error that occurs whilst accessing the API as a warning, but allow the task to run on to try to generate any release notes it can with the data it has managed to get. This is far from perfect but a bit more robust.

Issue Details

Azure DevOps Extension you are using

Generate Release Notes (Crossplatform)

Where are you running it?

  • Azure DevOps Service (VSTS)

Version of Extension/Task

Current public version, according to the log: 2.28.7

Expected behaviour and actual behaviour

Trying to see if this is a solution for us, but I keep getting a timeout:
[error]Error: connect ETIMEDOUT 13.107.43.18:443

Any idea on how to fix or circumvent it?

Full log attached:
log.txt

@rfennell
Copy link
Owner

Never seen that before.

I quick search of the web can occur when proxies are involved between the agent and Azure DevOps.

So I guess the obvious question are

  • is this a hosted or private agent?
  • is it possible to try with a different agent, to see if its agent/connection specific?
  • if you rerun on the same agent does it always fail at the same point?
  • is the list of WI listed all the ones you expect i.e is the failure trying to get a WI or the next step/api call?

@RCTycooner
Copy link
Author

It's running on a hosted agent. Currently a vs2017-win2016.

Rerunning on the same agent gives the same error, but happened after another message (e.g. in the "getting the details of xxxx" --> it got down to a lower number after some retries).
Not sure what that number represents, as we're only up to about 400 work items in this project.

This is the first time I'm using the tool, so it's trying to collect information about a few months of history. So I'd expect about 350-380 work items to be fetched.

@rfennell
Copy link
Owner

The Getting the details of xxx is the task looping across all the builds that have occurred in the past. In your cases as it is the first run it is every build.

I could improve the logging to give the number it intends to loop over, and put the word 'build' in the message to make it clearer.

I would expect the next message to be Detected xxxx commits/changesets and xxxx workitems between the builds., which we obviously are not getting to.

That said, I will get a new build out with more logging to give a clue where the problem lies. So I have added more logging and ran it via all the test in my release pipeline.

  • It ran ok on a private agent (on my laptop)
  • The logs showed it found 154 builds
  • When i tried it on the same hosted VS2017 build agents you are using I got timeouts too.

I think it is an API throttling issue.

I will continue to investigate, but any chance you could try a private agent?

@rfennell
Copy link
Owner

Try 2.29.4 which has just been released. I have altered the retry settings for all Azure DevOps API calls.

It has fixed the issue for me on hosted agents

Let me know how it goes

@RCTycooner
Copy link
Author

Hi, just tried with v2.29.4. same error, but on a different build. See the attached log.
log2.txt

I'm unable to try it on a private agent as we don't have any running at this time.

@rfennell
Copy link
Owner

Looks like it managed to have got further this time. But I can see that you have a lot more builds to scan than I have in my tests (700+ over about 150), so more load if it is a throttling issue.

I have had another look at the Azure DevOps Node API and added another time parameter. I set this to I think 10sec, assuming it is measured in Milliseconds, it is a bit unclear.

I made that change last night (I am in the UK so GMT) and tested it, but it timed out with a hosted agent. I tried it again this morning and it got further. I wonder if it is Azure DevOps load based?

Anyway I have upped the timeout to 30s and retested, it still failed for a hosted agent, but again was fine for a private agent.

I continue to investigate for a real solution

@RCTycooner
Copy link
Author

Thanks, haven't looked at the code yet. But perhaps a retry mechanisme is in order? (I'm in Belgium btw, GMT+1)
It sure could be that they throttle it based on the entire load of DevOps.

@rfennell
Copy link
Owner

The SDK is meant to do throttling.

I have just done a upgrade of all packages to make sure I have the latest and retested. Still get the problem.

I am going to reach out to Microsoft to see if they have a suggestion, or if I need to write my own retry logic

I will get back when I have more to report

@rfennell rfennell self-assigned this Apr 4, 2020
@rfennell
Copy link
Owner

Just an update, I still have no real fix for this. The workaround seems to be to use a private build agent. I cannot get a timeout with a private agent.

I have an issue logged with on the Azure DevOps Node SDK repo 378

@RCTycooner
Copy link
Author

Hi, thanks for the update. I've actually wrote my own code to get the release notes based on what you were doing. So far it seems to work fine. Not sure what I'm doing differently...

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions github-actions bot added the Stale label Jul 15, 2020
@rfennell rfennell removed the Stale label Jul 15, 2020
@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@rfennell
Copy link
Owner

Closing this as the upstream issue has been closed with no solution

@mturner67
Copy link

Hi Richard, Did you figure out anything more on this? I'm seeing this same timeout issue using version 3.30.21. Same scenario where it's the first time to generate release notes for a "classic" release with lots of WI history. It is also sporadic. It will fail then it might succeed on a subsequent re-deploy.

@rfennell
Copy link
Owner

Yes I have been seeing the same. I have raised issues on various repos and forums, but with no answer.

I think I am going to have to implement my own retry logic as the code in the azure devops node sdk foes not seem to do the job.

@mturner67
Copy link

Thanks for the update. Do you have an ETA on the new retry logic?

@rfennell rfennell reopened this Dec 1, 2020
@rfennell
Copy link
Owner

rfennell commented Dec 1, 2020

I have started on a PR, assuming it works (though it is hard to test) should be out soon

@rfennell
Copy link
Owner

rfennell commented Dec 2, 2020

Release 3.33.2 now creates the Azure DevOps REST SDK connection objects, hopefully that should clear the retry issues.

rfennell added a commit that referenced this issue Dec 2, 2020
… related to a connection to Azure DevOps (#896)

fixes #648
@rfennell rfennell reopened this Dec 2, 2020
@mturner67
Copy link

Okay... my first run was successful with no retries. But the 2nd run failed all 5 retries with a wait of 60 seconds. Log below on the first retry appears to be recreating objects but immediately returns the ETIMEDOUT. It's like the host is hosed once this starts with the Dev Ops API.
image

@rfennell
Copy link
Owner

rfennell commented Dec 2, 2020

We I am out of ideas, short of a huge pause

@rfennell
Copy link
Owner

rfennell commented Dec 3, 2020

I have ripped out all my retry logic as it does not help and wired my configuration options to the retry setting built into the Node SDK. The new retry feature is 20 times. The pause time is now ignored as this is not an option for the SDK.

This will release will come out as 3.34.x later today

@mturner67
Copy link

Thanks for the update. I'll give it a test when it's available.

@rfennell rfennell reopened this Dec 3, 2020
@mturner67
Copy link

I tested the new version using the default retry of 20. The first 2 runs failed with the 3rd run succeeding. For now, I'll inform my client to re-deploy until a successful run which seems to occur in 2-3 re-deploys. I'll keep you posted if we uncover anything new that might help.

@rfennell
Copy link
Owner

rfennell commented Dec 3, 2020

Sorry I can't get any further, I am going into to log a new more details issue with the Azue DevOps Node SDK team

@mturner67
Copy link

Thanks Richard... I can tell you've tried everything you can on your side. I've seen these type of odd platform issues myself in other areas of Azure.

@rfennell
Copy link
Owner

Current Status of this Issue

This is an issue with the underlying Azure DevOps Node SDK or REST API endpoints, not this task. Hence, an issue has been raised in the the appropriate Repo #425

The best workaround at present is to always place this task, and any associated tasks e.g. one that upload the generated release notes to a WIKI, in a dedicated YML pipeline job. This allows the task to be easily retried without rerunning the whole pipeline

@rfennell
Copy link
Owner

Historically the only workaround has been to always place this task, and any associated tasks e.g. one that upload the generated release notes to a WIKI, in a dedicated YML pipeline job. This allows the task to be easily retried without rerunning the whole pipeline.

However, with 3.37.x the error traps have been changed in the task to treat any error that occurs whilst accessing the API as a warning, but allow the task to run on to try to generate any release notes it can with the data it has managed to get. This is far from perfect but a bit more robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment