Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out of memory #395

Open
akohlbecker opened this issue Sep 24, 2021 · 11 comments
Open

out of memory #395

akohlbecker opened this issue Sep 24, 2021 · 11 comments
Labels

Comments

@akohlbecker
Copy link

After starting the service facette eats up ~ 19GB RAM and terminates with fatal error: runtime: out of memory

Please find thew corresponding systemlog attached to this ticket.
facette-outofmem.log

@vbatoufflet
Copy link
Member

It clearly looks like a nasty memory leak.

Just to try to pinpoint what's going on. You're using RRD provider right, and you just let the service run with a refresh interval (if yes what was the value)?

@akohlbecker
Copy link
Author

Hi Vincent,

yes I am using the RRD provider. The refresh interval for the UI is set to 10s, this is default I guess.
As far as I know there is no other refresh interval setting, or did I miss something?

The memory leak occurs purely on startup of the server, even if it is running completely without any client interaction.
I assume it initially scans the rrd sources and the memory leak occurs in that initialization phase.

Can I provide you with more information to help pinning down the cause for the memory leak?

Best,
Andreas

@vbatoufflet
Copy link
Member

Sadly, there is no pprof endpoint on Facette to help gathering information on this leak.

I'll try to reproduce and build a custom version with such endpoint.

Regarding the refresh interval, I was talking about the one from the provider definition (default is 0, i.e. no refresh):

Screenshot 2021-09-27 at 22-41-43 New provider – Administration panel – Facette

@vbatoufflet
Copy link
Member

vbatoufflet commented Sep 27, 2021

hey @akohlbecker,

Quick additional questions:

  1. Do you have any symbolic links in your RRD folders, and is there any possible symlink loop (this case is indeed not handled)?
  2. Did you try running Facette in debug mode (see https://github.com/facette/facette/blob/master/docs/examples/facette.yaml#L6) to see if there is something weird in the logs?

Regards,
Vincent

@akohlbecker
Copy link
Author

The refresh interval for the rrd provider was set to 10 . After setting this to 500 facette seems to behave normally.
These are milliseconds, not seconds as I was assuming originally, right?

I will report long term results from this settings change tomorrow.

@vbatoufflet
Copy link
Member

These are milliseconds, not seconds as I was assuming originally, right?

This setting unit is seconds: https://docs.facette.io/latest/api/providers/#create-a-provider

It would be surprising that raising this interval fixes the issue. It might take longer to trigger though.

@akohlbecker
Copy link
Author

akohlbecker commented Sep 28, 2021

Hi Vincent,

your expectation was correct, the memory consumption has increased over night and is now at about 10 GB.

I checked the rrd folders for symlinks and found none.

The debug log contains many entries like these (ellypsed here):

2021/09/28 09:59:51.058919 DEBUG: poller[collectd]: inserted record {Origin: "collectd", Source: .... in "collectd" catalog
2021/09/28 09:59:51.058926 DEBUG: poller[collectd]: does not match "/average$" sieve pattern, discarding: .... 

apart from these 588313 lines after running facette for 10 minutes with an rrd provider refresh interval of 500, the log only has these entries:

2021/09/28 09:59:49.834419 INFO: http: started
2021/09/28 09:59:49.834420 INFO: poller: started
2021/09/28 09:59:49.834630 INFO: http: listening on "127.0.0.1:12003"
2021/09/28 09:59:49.835399 DEBUG: poller[collectd]: started
2021/09/28 09:59:50.867371 DEBUG: poller[collectd]: restored previous catalog state in 1.031827415s
2021/09/28 09:59:50.867445 DEBUG: poller[collectd]: refreshing "collectd" provider
2021/09/28 10:08:10.867708 DEBUG: poller[collectd]: refreshing "collectd" provider

@vbatoufflet
Copy link
Member

Which version are you running, the latest release or a build from master?

What's your platform/architecture, linux/amd64?

I built a custom version yesterday having a pprof HTTP endpoint that might would allow us to visualize heap usage while running the service. I'll try to push it in a dedicated branch tonight but I can build the binary for you to test if you want.

@akohlbecker
Copy link
Author

I am running version 0.5.1 on linux/amd64 (4.9.0-15-amd64 #1 SMP Debian 4.9.258-1 (2021-03-08) x86_64 GNU/Linux)

It would be great if you could build the binary for me.

TNX
Andreas

@vbatoufflet
Copy link
Member

Hi @akohlbecker,

Sorry for the delay here.

I just pushed changes to a dedicated branch that registers debugging pprof endpoints to the web server, see 593ce3f.

Here comes a .deb file embedding those changes (note: had to gzip it to make GitHub accept it 🤷 ):
facette_0.6.0-0~git20211005.593ce3f7_amd64.deb.gz

Once installed and the issue triggered, you should be able to visualize heap information from the running service using:

go tool pprof -http=:8080 http://your-facette-instance:12003/debug/pprof/heap

If you could extract it for me, it would be great too:

curl -s http://your-facette-instance:12003/debug/pprof/heap >facette-heap.out

@akohlbecker
Copy link
Author

akohlbecker commented Oct 11, 2021

Hi Vincent,

thank you for the binary.

BTW: Since I've set the refresh interval for the rrd provider to 500 I no longer have problems.

In installed the debug build anyway and here is the pprof output:
facette-heap.out.gz

Cheers
Andreas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants