Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tmp directory not being cleared #179

Open
caleb15 opened this issue Apr 20, 2021 · 4 comments
Open

tmp directory not being cleared #179

caleb15 opened this issue Apr 20, 2021 · 4 comments

Comments

@caleb15
Copy link
Contributor

caleb15 commented Apr 20, 2021

One of our containers has been a naughty naughty boy:

root@influxdb0:/var/lib/docker/overlay2# sudo docker exec pganalyze-mirror-heavy du / -h -d 1
331.6G	/tmp
... other results truncated
root@influxdb0:/var/lib/docker/overlay2# sudo docker exec pganalyze-mirror-heavy du /tmp -h -d 2 -a
913.9M	/tmp/515771707
211.4M	/tmp/206147942
212.5M	/tmp/563747060
211.1M	/tmp/219882107
583.1M	/tmp/675476903
214.5M	/tmp/481994459
132.7M	/tmp/201391050
215.8M	/tmp/994851552
... etc
root@influxdb0:/var/lib/docker/overlay2# sudo docker exec pganalyze-mirror-heavy head /tmp/515771707
2021-03-27 12:54:33 UTC:<ip removed>(<port removed>):<user removed>@<removed>:[<removed>]:LOG:  connection authorized: user=<removed> database=<removed> <removed>
2021-03-27 12:54:33 UTC:<ip removed>(<port removed>):<user removed>@<removed>:[<removed>]:LOG:  connection authorized: user=<removed> database=<removed> <removed>
2021-03-27 12:54:33 UTC:<ip removed>(<port removed>):[<user removed>]<removed>un<removed>]:[5190]:LOG:  connection received: host=<removed> port=<removed>
2021-03-27 12:54:33 UTC:<ip removed>(<port removed>):<user removed>@<removed>:[<removed>]:LOG:  disconnection: session time: 0:00:00.036 user=<removed> database=<removed> host=<removed> port=<removed>
... and so on

The /tmp directory has been growing steadily in size ever since march 18th, when we set a 2 gig memory limit on the container. Oddly it's just pganalyze-mirror-heavy that ran into the issue. The other mirrors are normal:

root@influxdb0:/var/lib/docker/overlay2# sudo docker exec pganalyze-mirror du /tmp -h -d 2 -a
4.0K	/tmp
root@influxdb0:/var/lib/docker/overlay2# sudo docker exec pganalyze-mirror1 du /tmp -h -d 2 -a
4.0K	/tmp

I'll leave the container alone for a bit in case you want to look into it. I'll have to delete the container soon so it doesn't take up all the space on the server.

@lfittl
Copy link
Member

lfittl commented Apr 20, 2021

@caleb15 Any chance that container has been crashing and restarting multiple times? (there is a condition when the collector crashes that can leave behind files)

@caleb15
Copy link
Contributor Author

caleb15 commented Apr 20, 2021

Maybe? Status says the image has only been up for 35 minutes. Is there someplace where I can see how many times / when the image has crashed? 🤔

@lfittl
Copy link
Member

lfittl commented Apr 20, 2021

@caleb15 You could look in "docker ps -a" and then run "docker logs" on any instances that have run recently but have been stopped.

Also, in case it's an out of memory condition, review any memory limit settings you have on the container, or memory limits on the system overall. For some systems with high log volume we've seen issues with high memory usage during log parsing.

@caleb15
Copy link
Contributor Author

caleb15 commented Apr 20, 2021

nvm, I found it: https://serverfault.com/a/909267/512362

You're exactly right, looks like the container has been repeatedly dying from oom:

caleb@influxdb0.CLOUD100:~$ sudo docker events --since=120m
2021-04-20T04:06:12.653592243Z container oom 490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0 (image=quay.io/pganalyze/collector:v0.36.0, name=pganalyze-mirror-heavy)
2021-04-20T04:06:12.886462028Z container die 490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0 (exitCode=137, image=quay.io/pganalyze/collector:v0.36.0, name=pganalyze-mirror-heavy)
2021-04-20T04:06:12.982581172Z network disconnect c9bbf642c5c183bff16c2933ecdb988d3b0be6ea6cbc33b856780aacd138617b (container=490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0, name=bridge, type=bridge)
2021-04-20T04:06:13.049410541Z volume unmount ea1bd04f298155a3275c1da243e766e818779d5150d4fa6ece492a91983ad925 (container=490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0, driver=local)
2021-04-20T04:06:13.084732358Z network connect c9bbf642c5c183bff16c2933ecdb988d3b0be6ea6cbc33b856780aacd138617b (container=490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0, name=bridge, type=bridge)
2021-04-20T04:06:13.091250793Z volume mount ea1bd04f298155a3275c1da243e766e818779d5150d4fa6ece492a91983ad925 (container=490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0, destination=/state, driver=local, propagation=, read/write=true)
2021-04-20T04:06:13.504545174Z container start 490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0 (image=quay.io/pganalyze/collector:v0.36.0, name=pganalyze-mirror-heavy)
2021-04-20T05:09:49.300653527Z container oom 490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0 (image=quay.io/pganalyze/collector:v0.36.0, name=pganalyze-mirror-heavy)
2021-04-20T05:09:49.606117173Z container die 490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0 (exitCode=137, image=quay.io/pganalyze/collector:v0.36.0, name=pganalyze-mirror-heavy)
2021-04-20T05:09:49.734725089Z network disconnect c9bbf642c5c183bff16c2933ecdb988d3b0be6ea6cbc33b856780aacd138617b (container=490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0, name=bridge, type=bridge)
2021-04-20T05:09:49.855292233Z volume unmount ea1bd04f298155a3275c1da243e766e818779d5150d4fa6ece492a91983ad925 (container=490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0, driver=local)
2021-04-20T05:09:49.900484616Z network connect c9bbf642c5c183bff16c2933ecdb988d3b0be6ea6cbc33b856780aacd138617b (container=490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0, name=bridge, type=bridge)
2021-04-20T05:09:49.904166747Z volume mount ea1bd04f298155a3275c1da243e766e818779d5150d4fa6ece492a91983ad925 (container=490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0, destination=/state, driver=local, propagation=, read/write=true)
2021-04-20T05:09:50.301050392Z container start 490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0 (image=quay.io/pganalyze/collector:v0.36.0, name=pganalyze-mirror-heavy)
2021-04-20T05:16:09.101193715Z container oom 490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0 (image=quay.io/pganalyze/collector:v0.36.0, name=pganalyze-mirror-heavy)
2021-04-20T05:16:09.273349383Z container die 490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0 (exitCode=137, image=quay.io/pganalyze/collector:v0.36.0, name=pganalyze-mirror-heavy)
2021-04-20T05:16:09.402131327Z network disconnect c9bbf642c5c183bff16c2933ecdb988d3b0be6ea6cbc33b856780aacd138617b (container=490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0, name=bridge, type=bridge)
2021-04-20T05:16:09.419386173Z volume unmount ea1bd04f298155a3275c1da243e766e818779d5150d4fa6ece492a91983ad925 (container=490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0, driver=local)
2021-04-20T05:16:09.444734606Z network connect c9bbf642c5c183bff16c2933ecdb988d3b0be6ea6cbc33b856780aacd138617b (container=490ccc38d41dfbe712a80bfa48e033f62173ff6e90167e192702ceaf3812b6c0, name=bridge, type=bridge)

We only have so much memory :( I suppose we will need to buy more memory for our ec2 instance.

I might suggest changing pganalyze to clear /tmp at start to avoid disk space piling up. Not sure if it's worth it to do that considering the root problem but maybe? Up to you. Feel free to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants