Skip to content
This repository has been archived by the owner on Aug 28, 2019. It is now read-only.

Intermittent Netlify failures #7925

Closed
2 tasks done
jp-sauve opened this issue Dec 7, 2017 · 14 comments
Closed
2 tasks done

Intermittent Netlify failures #7925

jp-sauve opened this issue Dec 7, 2017 · 14 comments
Assignees
Labels

Comments

@jp-sauve
Copy link
Contributor

jp-sauve commented Dec 7, 2017

Netlify builds failing occasionally, requiring a manual restart.

It looks like node is giving "ENOSPC" error, which is a full filesystem, or could be that there are not enough free inotify file watchers available to do what you're doing.

There is advice and a brief description here:
https://github.com/guard/listen/wiki/Increasing-the-amount-of-inotify-watchers#the-technical-details

If this isn't possible, then I think limiting the number of concurrent runs of (I think it's gatsby) might help.

✅️ By submitting this issue, I have verified the following

  • Checked to see if the issue has already been discussed before. 🤔️
  • If proposing new content to be added, made sure enough details were provided. 🔍️
@Ethan-Arrowood
Copy link
Member

There is also an aspect of timing here. I noticed this evening after powering through some PRs that Netlify will backup and won't trigger on resolved conflict merges. It creates an illusion that the PR is ready to be merged, but it simply hasn't triggered the Netlify build script.

@systimotic
Copy link
Member

systimotic commented Dec 7, 2017

Thanks for reporting this @jp-sauve.

Some impacted PR's were shared on Gitter: #2830 #2848 #2840
The build logs: log 1 log 2 log 3 log 4

We actually ran into this issue before: #204 Sadly, I don't think we can use the same solution on Netlify.
We're not the only ones having this issue with Gatsby: gatsbyjs/gatsby#1767

Gatsby isn't great at handling massive sites like the Guide (yet). We had an issue on their repo about the memory usage: gatsbyjs/gatsby#1829 gatsbyjs/gatsby#445 I think splitting the build might also fix this issue.

I'm not sure if there's anything we can do with Netlify to fix this issue, but I've sent an email to their support to ask if they have any suggestions.

/cc @Bouncey Any ideas?


@Ethan-Arrowood Netlify building was enabled about than two weeks ago, so PR's older than that won't have its tests yet. I'm not sure when it's triggered on updates to PR's that existed before it was added.

@brycekahle
Copy link

Hey folks, it looks like gatsby had a bug that was fixed in 1.9.17. Can you try upgrading?

@systimotic
Copy link
Member

Thanks @brycekahle! I actually came across that issue, but I didn't realize it could be related to this.

Unfortunately, we currently can't update to a newer version of Gatsby because of gatsbyjs/gatsby#1979 (still an issue in 1.9). 1.9.17 does not work anymore because of incompatible dependency versions, and any version after it has the call stack issue.

Looking into it a bit more, I don't think the fix in 1.9.17 would have helped here, as it should only apply to development, not the build.

@brycekahle
Copy link

Unfortunately, we currently can't update to a newer version of Gatsby because of gatsbyjs/gatsby#1979 (still an issue in 1.9)

I'll take a look and see if I can contribute to getting that issue fixed. Sounds like you are pushing the limits of Gatsby. 😄

Looking into it a bit more, I don't think the fix in 1.9.17 would have helped here, as it should only apply to development, not the build.

You may be right here. I do find it odd that gatsby is using inotify during a production build. Those are usually only used for "watch" style operations. I wonder if their usage has been removed in future versions too.

@KyleAMathews
Copy link
Contributor

Hey y'all! There was several PRs recently which should help out with your larger site :-)

Would love to see if the latest Gatsby is working for ya!

@QuincyLarson
Copy link
Contributor

@KyleAMathews Awesome! Thanks for the heads-up, and for everything you're doing with Gatsby! Loved your Jam Stack Podcast interview!

@Bouncey Could you take a look at these PRs when you get a chance?

@systimotic
Copy link
Member

Hi @KyleAMathews! Thank you for all of the hard work you've put into Gatsby ❤️

I tested this with gatsby@1.9.146. It runs perfectly in Firefox, and also builds fine if I allocate the extra memory, but gatsbyjs/gatsby#1979 is still present in the latest stable Chrome (63).

Here's where it gets really weird: This issue did not appear in Chrome Canary when its version was 63, but now that 63 is stable it's broken again. The current Canary (65) doesn't show this issue either, but I'm not confident that it will still work when 65 is stable 😅

I've also found that the call stack numbers I found previously are definitely not reliable, and I'm not seeing a large call stack in the profiler anymore.

I'll add my findings to the original issue tomorrow, but I honestly have no idea what's causing this and how to fix it. I'd love to hear if anybody has any suggestions.


I'm thinking about seeing whether it would be possible to create a build script that catches the ENOSPC error and does a limited number of retries as a temporary workaround. 🤔

@johnkennedy9147
Copy link
Contributor

this seems to have gotten worse - now if there are concurrent deploy preview builds running, the second will fail every time. Interestingly the production builds seem to be fine with multiple running concurrently. Is there any difference in the configuration, resources ... ?

@QuincyLarson
Copy link
Contributor

@johnkennedy9147 Thanks for letting us know.

@Bouncey Is Guide running the latest version of Gatsby? I'm curious whether the changes @KyleAMathews mentioned will alleviate these Netlify build issues.

@Bouncey
Copy link
Member

Bouncey commented Feb 24, 2018

We have tried upgrading multiple times. But we have an issue with chrome 63 giving itself a stack-overflow, but all other browsers are fine.

I can try again after the beta release

@QuincyLarson
Copy link
Contributor

@Bouncey Thanks for looking into this. When you say "beta release" you mean of beta.freecodecamp.org?

@Bouncey
Copy link
Member

Bouncey commented Feb 25, 2018

So, after a bit of digging, this issue goes as low down as a node issue. On Linux, node fs uses inotify to watch files/directories, this has a limit of 8192 watchers. The max listeners can be raised, but that requires root privileges, which we do not have access to inside the netlify build image.

We may have to rethink our implementation.

@Bouncey
Copy link
Member

Bouncey commented Mar 29, 2018

A reply from Netlify on this on-going issue:

We used to share the same pool of 8GB of ram for four separate build bots, but it was changed about 2 months ago with the move to Kubernetes because some builds that used large amounts of memory and CPU resources would then use most of the resources in a single build box, which meant that people would have inconsistent results. Now each build has access to 2GB of memory which makes builds more predictable. Previously it was possible for one build to starve out another one. There isn't a way to increase the memory in our regular buildbot networks

After some investigating locally, our build process currently tops out at 8.186GB of RAM. This is the cause to the once intermittent failures, which are permanent until we can find a way around it.

Currently I am having to build the site locally in order to deploy it.

Closing until we have a viable solution to reduce our build resource consumption by 75%+

@Bouncey Bouncey closed this as completed Mar 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

8 participants