Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgresql database size #89

Open
solarmon opened this issue Sep 28, 2020 · 21 comments
Open

Postgresql database size #89

solarmon opened this issue Sep 28, 2020 · 21 comments

Comments

@solarmon
Copy link

Hi,

I'm having problems my dockered Homer 7 setup. It was working and now I am not able to login. Initially login was saying the password is wrong, now I'm getting a '502 Bad Gateway'.

The docker 'sipcapture/webapp' docker imaga seems to be in a 'Restarting' state after trying to restart it, or reboot the VM it is on.

I then noticed that there was no space left on root and the following directory was taking up most of the storage space:

/usr/local/src/homer7-docker/heplify-server/hom7-prom-all/postgres-data/base/24576

I assume this where the SIP captures are actually stored omn the postgresql database.

How do I clean this up to make more space, without being able to login to the Homer GUI? Is there a way to truncate or dropthe SIP captures after a certain date?

Thank you.

@solarmon
Copy link
Author

I checked the helplify-server environment variables and rotation should be enable with the HEPLIFYSERVER_DBROTATE and HEPLIFYSERVER_DBDROPDAYS variables:

HEPLIFYSERVER_DBROTATE=true
HEPLIFYSERVER_DBDROPDAYS=5
HEPLIFYSERVER_LOGLVL=info
HEPLIFYSERVER_LOGSTD=true

So I'm not sure why/how the postgresql database has got so large.

@solarmon
Copy link
Author

I managed to clear enough storage space to be able to restart the docker services, get it running, and now able to login.

However, how do I now manually delete the old entries?!

@lmangani
Copy link
Member

We have reports of the ENV variables not working indeed. The immediate solution is map your own heplify configuration file to the container and specify the values there while we investigate this.

@solarmon
Copy link
Author

solarmon commented Sep 29, 2020

We have reports of the ENV variables not working indeed. The immediate solution is map your own heplify configuration file to the container and specify the values there while we investigate this.

Hi. Thanks for your response!

I'm new to docker so please could you provide further details on how exactly to do that.

Also, I'm able to access the 'db' docker image now and can use psql to manage the postgres database. I see that the 'homer_data' database has many tables in it - I assume this is whether the SIP captures are kept. Is it just a case of dropping these tables to delete the data, or would that break anything that is referencing them?

Thank you!

@lmangani
Copy link
Member

Could you provide the output of docker logs heplify-server

@solarmon
Copy link
Author

Hi.

Not sure whether you need all of it, but this is the end of the logs. There are lots of ERR entries for rotator.go.

2020/09/29 08:58:57.367745 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving
2020/09/29 08:58:59.365966 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving
2020/09/29 08:59:01.366679 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving
2020/09/29 08:59:03.366140 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving
2020/09/29 08:59:05.366100 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving
2020/09/29 08:59:07.367422 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving
2020/09/29 08:59:09.366574 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving
2020/09/29 08:59:11.367762 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving
2020/09/29 08:59:13.366531 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving
2020/09/29 08:59:15.366537 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving
2020/09/29 08:59:17.366794 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving
2020/09/29 08:59:19.364418 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving
2020/09/29 08:59:21.363446 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving
2020/09/29 08:59:23.364576 rotator.go:94: ERR dial tcp 172.18.0.3:5432: connect: connection refused
2020/09/29 08:59:25.368256 rotator.go:406: WARN pq: database "homer_data" already exists
2020/09/29 08:59:25.370230 rotator.go:406: WARN pq: role "homer_user" already exists
2020/09/29 08:59:25.371741 rotator.go:406: WARN pq: role "postgres" does not exist
2020/09/29 08:59:25.375747 rotator.go:344: INFO start creating tables (2020-09-29 08:59:25.375725691 +0000 UTC m=+778.360364288)
2020/09/29 08:59:29.303691 heplify-server.go:102: INFO stopping heplify-server...
2020/09/29 08:59:29.338305 udp.go:27: INFO stopping UDP listener on [::]:9060
2020/09/29 08:59:30.303800 tls.go:44: INFO stopping TLS listener on [::]:9060
2020/09/29 08:59:38.572172 rotator.go:357: INFO end creating tables (2020-09-29 08:59:38.572152754 +0000 UTC m=+791.556791401)
2020/09/29 08:59:38.579472 postgres.go:61: INFO postgres connection established
2020/09/29 08:59:38.579495 database.go:75: INFO close postgres channel
2020/09/29 08:59:38.579510 remotelog.go:44: INFO close remotelog channel
2020/09/29 08:59:38.579528 metric.go:69: INFO close metric channel
2020/09/29 08:59:38.579538 heplify-server.go:111: INFO heplify-server has been stopped
2020/09/29 08:59:38.913283 server.go:91: INFO start heplify-server 1.53 with config.HeplifyServer{HEPAddr:"0.0.0.0:9060", HEPTCPAddr:"", HEPTLSAddr:"0.0.0.0:9060", HEPWSAddr:"", ESAddr:"", ESDiscovery:true, ESUser:"", ESPass:"", LokiURL:"http://loki:3100/api/prom/push", LokiBulk:400, LokiTime r:2, LokiBuffer:100000, LokiHEPFilter:[]int{1, 5, 100}, ForceHEPPayload:[]int(nil), PromAddr:"0.0.0.0:9096", PromTargetIP:"", PromTargetName:"", DBShema:"homer7", DBDriver:"postgres", DBAddr:"db:5432", DBUser:"root", DBPass:"homerSeven", DBDataTable:"homer_data", DBConfTable:"homer_config", D BBulk:400, DBTimer:4, DBBuffer:400000, DBWorker:8, DBRotate:true, DBPartLog:"2h", DBPartIsup:"6h", DBPartSip:"2h", DBPartQos:"6h", DBDropDays:5, DBDropDaysCall:0, DBDropDaysRegister:0, DBDropDaysDefault:0, DBDropOnStart:false, Dedup:false, DiscardMethod:[]string(nil), AlegIDs:[]string(nil), C ustomHeader:[]string(nil), SIPHeader:[]string{"ruri_user", "ruri_domain", "from_user", "from_tag", "to_user", "callid", "cseq", "method", "user_agent"}, LogDbg:"", LogLvl:"info", LogStd:true, LogSys:false, Config:"./heplify-server.toml", ConfigHTTPAddr:"", ConfigHTTPPW:"", Version:false, Scri ptEnable:false, ScriptEngine:"lua", ScriptFolder:"", ScriptHEPFilter:[]int{1, 5, 100}, TLSCertFolder:"."}
2020/09/29 08:59:38.913490 prometheus.go:38: INFO expose metrics without or unbalanced targets
Could not find toml config file, use flag defaults.
2020/09/29 08:59:40.943668 rotator.go:406: WARN pq: database "homer_data" already exists
2020/09/29 08:59:40.946180 rotator.go:406: WARN pq: role "homer_user" already exists
2020/09/29 08:59:40.947029 rotator.go:406: WARN pq: role "postgres" does not exist
2020/09/29 08:59:40.949883 rotator.go:344: INFO start creating tables (2020-09-29 08:59:40.949871595 +0000 UTC m=+2.045789502)
2020/09/29 08:59:41.267689 rotator.go:357: INFO end creating tables (2020-09-29 08:59:41.267679821 +0000 UTC m=+2.363597725)
2020/09/29 08:59:41.272125 postgres.go:61: INFO postgres connection established
2020/09/29 09:04:38.917968 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8937, Filtered: 0, Error: 0
2020/09/29 09:09:38.917958 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8906, Filtered: 0, Error: 0
2020/09/29 09:14:38.917962 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8914, Filtered: 0, Error: 0
2020/09/29 09:19:38.917886 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8912, Filtered: 0, Error: 0
2020/09/29 09:24:38.917910 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8892, Filtered: 0, Error: 0
2020/09/29 09:29:38.917904 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8916, Filtered: 0, Error: 0
2020/09/29 09:34:38.917864 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8904, Filtered: 0, Error: 0
2020/09/29 09:39:38.917947 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8910, Filtered: 0, Error: 0
2020/09/29 09:44:38.917905 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8914, Filtered: 0, Error: 0
2020/09/29 09:49:38.917816 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8906, Filtered: 0, Error: 0
2020/09/29 09:54:38.917949 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8892, Filtered: 0, Error: 0

Please let me know if there is anything else that is needed.

Thank you.

@lmangani
Copy link
Member

The settings seems to propagate correctly. Could you confirm you have data older than 5 days in the DB?

@solarmon
Copy link
Author

solarmon commented Sep 29, 2020

To be honest, I'm not sure how check for the data, since this particular system does not get any/much SIP traffic - this why it is worrying that storage space has been heavily used by the postgresql database.

I can see that homer_data database contains lost of tables, with the date stamps in their names that are older than 5 days - i.e. from 17th September. I haven't tried to check the size and contents of the tables yet. There are 607 rows.

For example:

List of relations
Schema | Name | Type | Owner
--------+----------------------------------------+-------+-------
public | hep_proto_100_default | table | root
public | hep_proto_100_default_20200917_0000 | table | root
public | hep_proto_100_default_20200917_0200 | table | root
public | hep_proto_100_default_20200917_0400 | table | root
:
public | hep_proto_1_call | table | root
public | hep_proto_1_call_20200917_0000 | table | root
public | hep_proto_1_call_20200917_0200 | table | root
public | hep_proto_1_call_20200917_0400 | table | root
public | hep_proto_1_call_20200917_0600 | table | root
:
public | hep_proto_1_default | table | root
public | hep_proto_1_default_20200917_0000 | table | root
public | hep_proto_1_default_20200917_0200 | table | root
public | hep_proto_1_default_20200917_0400 | table | root
public | hep_proto_1_default_20200917_0600 | table | root
public | hep_proto_1_default_20200917_0800 | table | root
:
public | hep_proto_1_registration | table | root
public | hep_proto_1_registration_20200917_0000 | table | root
public | hep_proto_1_registration_20200917_0200 | table | root
public | hep_proto_1_registration_20200917_0400 | table | root
:
public | hep_proto_35_default | table | root
public | hep_proto_35_default_20200917_0000 | table | root
public | hep_proto_35_default_20200917_0600 | table | root
public | hep_proto_35_default_20200917_1200 | table | root
:
public | hep_proto_54_default | table | root
public | hep_proto_54_default_20200917_0000 | table | root
public | hep_proto_54_default_20200917_0600 | table | root
public | hep_proto_54_default_20200917_1200 | table | root
:
public | hep_proto_5_default | table | root
public | hep_proto_5_default_20200917_0000 | table | root
public | hep_proto_5_default_20200917_0600 | table | root
public | hep_proto_5_default_20200917_1200 | table | root
:

What does the different hep_prot_x values means?

@solarmon
Copy link
Author

Our root partition is 16G in size and the homer_data database is 11G in size:

homer_data=# SELECT
pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database;
datname | size
--------------+---------
postgres | 7577 kB
root | 7717 kB
template1 | 7577 kB
template0 | 7577 kB
homer_config | 8257 kB
homer_data | 11 GB
(6 rows)

I've checked the storage again and it has got full again - after I cleared up only several hundred megs of storage to get it accessible again. This on a setup with hardly any SIP traffic running through it at the moment.

@solarmon
Copy link
Author

solarmon commented Sep 29, 2020

Please advise whether I can just delete the tables in the homer_data database.

@lmangani
Copy link
Member

Sure, that's what the rotation does as well. Did you see any suspect logs on the postgres node?

@solarmon
Copy link
Author

solarmon commented Sep 29, 2020

I can't use psql now - since there is no more storage... :'(

@lmangani
Copy link
Member

I suppose the container maps to a local folder? If so - You should be able to drop data manually and restart to regain control

@solarmon
Copy link
Author

I managed to clear some storage - deleted the old kernels - and got access with psql again.

I deleted the tables:

hep_proto_1_default_20200917*
hep_proto_1_default_20200918*

Do I need to delete any other related/correspond tables? I.e. any other hep_proto_x tables on those dates?

This has given me 2G of storage so I should be OK for a while.

@solarmon
Copy link
Author

Hi,

I checked the server this morning and it seems something has happened and I now have 8.3G of free space.

However, looking at the heplify-server logs, I see the following such messages:

2020/09/30 03:44:22.532031 postgres.go:291: ERR pq: Could not complete operation in a failed transaction
2020/09/30 03:44:26.536029 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device
2020/09/30 03:44:26.537880 postgres.go:291: ERR pq: Could not complete operation in a failed transaction
2020/09/30 03:44:30.531951 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device
2020/09/30 03:44:30.536009 postgres.go:291: ERR pq: Could not complete operation in a failed transaction
2020/09/30 03:44:34.535546 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device
2020/09/30 03:44:34.537660 postgres.go:291: ERR pq: Could not complete operation in a failed transaction
2020/09/30 03:44:38.532047 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device
2020/09/30 03:44:38.536830 postgres.go:291: ERR pq: Could not complete operation in a failed transaction
2020/09/30 03:44:42.531374 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device
2020/09/30 03:44:42.531555 postgres.go:291: ERR pq: Could not complete operation in a failed transaction
2020/09/30 03:44:46.534014 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device
2020/09/30 03:44:46.534527 postgres.go:291: ERR pq: Could not complete operation in a failed transaction
2020/09/30 03:44:50.531989 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device
2020/09/30 03:44:50.532223 postgres.go:291: ERR pq: Could not complete operation in a failed transaction
2020/09/30 03:44:54.537805 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device
2020/09/30 03:44:54.539538 postgres.go:291: ERR pq: Could not complete operation in a failed transaction
2020/09/30 03:44:58.534055 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device
2020/09/30 03:44:58.534536 postgres.go:291: ERR pq: Could not complete operation in a failed transaction
2020/09/30 03:45:00.000329 rotator.go:319: INFO run drop job
2020/09/30 03:45:00.213385 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.219928 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.225446 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.230886 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.235984 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.241855 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.246985 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.252332 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.257559 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.262968 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.267984 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.272982 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.278314 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.283454 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.288574 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.293860 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.298996 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.304384 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.309573 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.314994 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.319979 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.324976 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.330305 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.335328 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.340490 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:00.345512 rotator.go:406: WARN pq: could not write init file
2020/09/30 03:45:05.252843 rotator.go:323: INFO finished drop job, next will run at 2020-10-01 03:45:05.252822203 +0000 UTC m=+140301.891340068
2020/09/30 03:46:43.373121 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8896, Filtered: 0, Error: 0
2020/09/30 03:51:43.373071 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8896, Filtered: 0, Error: 0
2020/09/30 03:56:43.373282 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8910, Filtered: 0, Error: 0
2020/09/30 04:01:43.373119 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8926, Filtered: 0, Error: 0
2020/09/30 04:06:43.373205 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8904, Filtered: 0, Error: 0
2020/09/30 04:11:43.373167 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8910, Filtered: 0, Error: 0
2020/09/30 04:16:43.373146 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8922, Filtered: 0, Error: 0

Checking the docker processes, I see that alertmanager and prometheus were in 'Restarting' state, and restarting them doesn't change the status. Whether this is related to the above log messages, I don't know.

Please advise what those log ERR and WARN messages mean and how to resolve them.

Thank you.

@solarmon
Copy link
Author

Looking at the contents of the homer_data tables, it seems most of it is filled with SIP OPTIONS. However, this traffic is not visible in the Homer web app.

I will look in to how to stop SIP OPTIONS from being sent from the HEP source to reduce the storage requirements. But how can I make SIP OPTIONS visible in the Homer web app? Having visibility of SIP OPTIONS can be useful in certain situations.

@lmangani
Copy link
Member

lmangani commented Oct 1, 2020 via email

@solarmon
Copy link
Author

solarmon commented Oct 1, 2020

Do you mean in the Search form/widget in the Homer web app GUI? I don't see a 'Transaction Type' search field to add to it. Only 'Proto Family', 'Protocol Type', 'Payload Type'.

But even with all the search fields empty - so everything should be included? - it does not show SIP OPTIONS in the Display Results.

Am I looking in the wrong place?

@solarmon
Copy link
Author

solarmon commented Oct 1, 2020

In the 'SIP Method' search field - only the following methods appears in the drop down menu:

INVITE
BYE
100
200
183
CANCEL

Even if I type in 'OPTIONS' manually in to it, it does not show any search results.

@lmangani
Copy link
Member

lmangani commented Oct 1, 2020

HOMER splits messages by transaction type (call, registration, everything else) in different buckets to allow rotating them at different speeds - Most likely you are searching for out-of-band messages. Try using the SIP - Default protocol bucket.

image

Please confirm if this is your case so we can improve documentation - its all pretty fresh.

@solarmon
Copy link
Author

solarmon commented Oct 1, 2020

Ah, thank you!

I now see where the 'Proto' field is in the search Settings window. It is currently on 'SIP - call' and I have now changed it to 'SIP - default' and the search SIP Methods and results now include SIP OPTIONS.

I was not aware that this 'Proto' option and that it limits what the results return. I now read it as for the 'SIP Method' search field, if nothing is selected then it will match all methods in that list, which can be different based on what is selected in the search Settings for 'Proto'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants