Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when replication targets are unhealthy #562

Open
pbedat opened this issue Jan 28, 2024 · 4 comments
Open

Memory leak when replication targets are unhealthy #562

pbedat opened this issue Jan 28, 2024 · 4 comments

Comments

@pbedat
Copy link

pbedat commented Jan 28, 2024

I have the following replication setup:

We are replicating ~40 databases to two cifs mounts (Storageboxes hosted by Hetzner). Those boxes are sometimes undergoing maintenance and one of the mounts go bad.
This happened on 27.01. around 11:15 AM and from this point memory steadily increased with each snapshot interval (4h):

Screenshot from 2024-01-28 19-57-58

I'm also seeing the following error logs:

"snapshots: cannot fetch generations: open /opt/copilot/data/replicas-2/milchsackfabrik/generations: permission denied"

It's not a huge problem, but I just wanted to report it.
If you need anything, I could setup a replication environment and take readings.

Edit: v0.3.13

@hifi
Copy link
Collaborator

hifi commented Jan 28, 2024

It's likely the LZ4 compression library as it never frees a buffer pool it keeps and error conditions seem to hit it hard. Though this pattern is quite suspicious as I'd expect it to be able to reuse the same pool so it might not be it.

If you can take pprof memory dump from a repro it would answer the question where the memory is actually going to.

@pbedat
Copy link
Author

pbedat commented Jan 28, 2024

@hifi I guess you are right about lz4.

profile001

pprof.litestream.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz

PS: I took the snapshot from replication setup with only 4 DBs. Hence the smaller size.

@hifi
Copy link
Collaborator

hifi commented Jan 28, 2024

Yeah, unfortunately there's nothing Litestream can do except change implementation or to support multiple compression schemes with different tradeoffs like CPU over RAM.

There's an open issue against the lz4 library about the ever growing pool but the author hadn't responded to it last time I checked.

@pbedat
Copy link
Author

pbedat commented Jan 28, 2024

Never mind. It's not a problem, since I'm getting alerts, when replicas go unhealthy and can respond before memory runs out.
Thank's for clearifying it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants