Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On server we lose our platform ID. #719

Open
bfis108137 opened this issue Jul 20, 2021 · 25 comments
Open

On server we lose our platform ID. #719

bfis108137 opened this issue Jul 20, 2021 · 25 comments

Comments

@bfis108137
Copy link

Our server is with digitalocean. The platformId just disappears randomly. I believe it happens with a lot of activity but I am not sure. It only happens with digital ocean. We are in the process of trying another provider. On a simple virtualbox vm with debian or ubuntu it has no problem but we have issues in the cloud.

@bfis108137
Copy link
Author

Interesting development. The platform id disappeared again. Interestingly enough, it happened after a delay of about 5 seconds. I decided to restart stelace and I did a reboot too because the containers wouldn't come up after about 15 seconds. After the reboot the platformId came back. It's like there is some cache on a bad result. Could it be?

@woyuen
Copy link
Member

woyuen commented Jul 22, 2021

What do you mean by the platformId disappears?
When you hit GET /store/platforms, the platformId is no more in the list, isn't it ?
There is no cache so that shouldn't disappear.

I cannot reproduce your problem in my local environment.

@bfis108137
Copy link
Author

I believe it's due to load. I think I can reproduce it.

@bfis108137
Copy link
Author

Ok. I have an update. I don't think it's due to load (but we are checking this still). It happened and I restarted stelace and it didn't help. However after a reboot of the server the problem disappeared. Apparently there is some problem with one of the dockers (Postgres or Redis or Elasticsearch). I bet it's Postgres but I can't say for sure.

@DmitryKvant
Copy link

DmitryKvant commented Jul 26, 2021

I get same problem, error

{
"message": "Internal Server Error",
"statusCode": 500,
"_stack": [
"Error: Platform 1 does not exist.",
" at getPlatformEnvData (/home/kvant/dringo-back/src/redis.js:191:24)",
" at runMicrotasks ()",
" at processTicksAndRejections (internal/process/task_queues.js:97:5)",
" at async getConnection (/home/kvant/dringo-back/src/models/index.js:96:28)",
" at async getModels (/home/kvant/dringo-back/src/models/index.js:142:34)",
" at async /home/kvant/dringo-back/src/services/apiKey.js:453:24",
" at async newHandler (/home/kvant/dringo-back/src/cote/CustomResponder.js:34:24)"
],
"_message": "Platform 1 does not exist."
}

rebooting docker services helps

@bfis108137
Copy link
Author

I get same problem, error

{
"message": "Internal Server Error",
"statusCode": 500,
"_stack": [
"Error: Platform 1 does not exist.",
" at getPlatformEnvData (/home/kvant/dringo-back/src/redis.js:191:24)",
" at runMicrotasks ()",
" at processTicksAndRejections (internal/process/task_queues.js:97:5)",
" at async getConnection (/home/kvant/dringo-back/src/models/index.js:96:28)",
" at async getModels (/home/kvant/dringo-back/src/models/index.js:142:34)",
" at async /home/kvant/dringo-back/src/services/apiKey.js:453:24",
" at async newHandler (/home/kvant/dringo-back/src/cote/CustomResponder.js:34:24)"
],
"_message": "Platform 1 does not exist."
}

rebooting docker services helps

Are you using Digital Ocean?

@bfis108137
Copy link
Author

I opened up a free account on Google Cloud and I don't have this issue. Only on Digital Ocean (DO). The thing is that the server works fine otherwise so I will have a hard time explaining this to support. I would be willing to give @woyuen access to a test server. I am currently running a script testing the two servers and so far I have only seen the DO server have the problem

@bfis108137
Copy link
Author

Happened again on DO.

@bfis108137
Copy link
Author

bfis108137 commented Jul 29, 2021

Ok. I rebuilt the server except this time I didn't use the Redis docker rather I installed Redis and it runs as a service. After 28 hours the platform id has not disappeared. Can you give me some insight as to how this could happen? While technically a real version of Redis should be installed perhaps even on a different server. Still it doesn't happen with other providers and it concerns me. I would like to issue a formal complaint but I have no idea what I would tell them.

FYI, the other 2 dockers are running. In addition, the platform id disappeared twice in the previous 24 hours before the new server.

@woyuen
Copy link
Member

woyuen commented Jul 30, 2021

@bfis108137, thanks for the test.

That's really weird. We didn't encounter any data loss with Docker containers, although we didn't use Digital Ocean.
Maybe there's some issue with memory. Is there some memory limitation with this provider?

Just to be sure, is there any action you can tell that you did before noticing the disappearance of this platform ID?
Did it reappear only when you restart the Docker service?

@bfis108137
Copy link
Author

@bfis108137, thanks for the test.

That's really weird. We didn't encounter any data loss with Docker containers, although we didn't use Digital Ocean.
Maybe there's some issue with memory. Is there some memory limitation with this provider?

Just to be sure, is there any action you can tell that you did before noticing the disappearance of this platform ID?
Did it reappear only when you restart the Docker service?

No action taken. Just checking the platform ID every 15 seconds. I don't think there is a memory issue. We have 4gb of memory dedicated . Currently it's at 50% usage. I didn't see any increase in memory usage. We have 2 dedicated virtual CPUs. You never know what that really means but I guess it means 2 cores not shared.

I am running a new test. I started 2 vpses with 2gb memory and 1 vcpu. One is running redis as a docker and one running redis as a regular service with systemctl. I will leave it running for a few days and we will see what happens.

@bfis108137
Copy link
Author

I can already report that the one running the redis docker lost it's platformid sometime yesterday. I have two other servers at digital ocean running redis as a regular service with no issue. One is another server with the exact same specs for the test and another one which has 2 dedicated vCPUs. I was not watching an a day off so I can't say how long the platformid was lost for.

@bfis108137
Copy link
Author

bfis108137 commented Jul 31, 2021

Interesting that I just got the following error.

MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error."

This is very weird because I have never seen this error (even when the platformid disappeared) and I never had to configure permissions in other installs. If there really was such an issue wouldn't it it be expected to never work?

@bfis108137
Copy link
Author

It's almost a fact that it's the redis docker. I have started numerous servers on Digital Ocean with the redis docker running and they all had the problem. One server now running for 5 days with redis running as a system service has not lost it's platformId even once. I am suspect of that error because on more powerful servers I didn't get it. I just lost the platformId.

@woyuen
Copy link
Member

woyuen commented Aug 4, 2021

Indeed, this save error certainly lost your data.

We haven't encountered this error. You shouldn't lost any data, even with powerful servers.
Maybe there's some restrictions on Digital Ocean servers but I cannot find them.
If you log the Redis containers, do you confirm this error was present from the beginning?

@bfis108137
Copy link
Author

bfis108137 commented Aug 4, 2021 via email

@bfis108137
Copy link
Author

I was mistaken. This is the error. For some reason my script doesn't detect it but a manual check reveals it clear as day.

Before you said even with powerful servers. Don't you mean even with weak servers?

The issue is clearly reduced with a more powerful server. On a 2gb ram/1 vCpu shared it happened once every 4 hours. On a 4gb ram/2 vCpus dedicated it happened once every 12 hours and I saw it last for 20 hours once.

What do you mean by lost data? A restart of redis fixes the problem and everything comes back.

How can I log the Redis container?

@woyuen
Copy link
Member

woyuen commented Aug 6, 2021

When I mean by lost data is that your platform ID temporarily disappear. The fact your powerful server has this problem less frequently means there's something related to machine characteristics.

That's indeed weird that data recover after Redis restart.

If you use the docker-compose files we expose in the project to create Redis, you can do in the project folder :

docker-compose logs redis

@bfis108137
Copy link
Author

Below is my Redis log after redoing things.

There were two things that I noticed.

1.) WARNING overcommit_memory is set to 0! Background save may fail under low memory condition.

2.) WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis.

I have implemented the two recommendations in the log for those issues and I have restarted the dockers and Stelace. Let's see what happens.

redis_1 | 1:M 06 Aug 2021 14:12:57.698 * Ready to accept connections
redis_1 | 1:signal-handler (1628259256) Received SIGTERM scheduling shutdown...
redis_1 | 1:M 06 Aug 2021 14:14:16.279 # User requested shutdown...
redis_1 | 1:M 06 Aug 2021 14:14:16.279 * Saving the final RDB snapshot before exiting.
redis_1 | 1:M 06 Aug 2021 14:14:16.281 * DB saved on disk
redis_1 | 1:M 06 Aug 2021 14:14:16.281 # Redis is now ready to exit, bye bye...
redis_1 | 1:C 06 Aug 2021 14:18:59.479 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1 | 1:C 06 Aug 2021 14:18:59.480 # Redis version=5.0.1, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1 | 1:C 06 Aug 2021 14:18:59.480 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis_1 | 1:M 06 Aug 2021 14:18:59.499 * Running mode=standalone, port=6379.
redis_1 | 1:M 06 Aug 2021 14:18:59.505 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
redis_1 | 1:M 06 Aug 2021 14:18:59.505 # Server initialized
redis_1 | 1:M 06 Aug 2021 14:18:59.506 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis_1 | 1:M 06 Aug 2021 14:18:59.506 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
redis_1 | 1:M 06 Aug 2021 14:18:59.508 * DB loaded from disk: 0.002 seconds
redis_1 | 1:M 06 Aug 2021 14:18:59.508 * Ready to accept connections
redis_1 | 1:signal-handler (1628259788) Received SIGTERM scheduling shutdown...
redis_1 | 1:M 06 Aug 2021 14:23:08.743 # User requested shutdown...
redis_1 | 1:M 06 Aug 2021 14:23:08.743 * Saving the final RDB snapshot before exiting.
redis_1 | 1:M 06 Aug 2021 14:23:08.745 * DB saved on disk
redis_1 | 1:M 06 Aug 2021 14:23:08.745 # Redis is now ready to exit, bye bye...

@bfis108137
Copy link
Author

bfis108137 commented Aug 7, 2021

in the end it seems the following error is the issue. From my google search this seems to be something is a security issue but these are new fresh installs with only required software by stelace. Interesting that Redis wants to save to /etc...

redis_1 | 1:M 07 Aug 2021 21:51:55.132 # Background saving error
redis_1 | 1:M 07 Aug 2021 21:52:01.051 * 100 changes in 300 seconds. Saving...
redis_1 | 1:M 07 Aug 2021 21:52:01.051 * Background saving started by pid 245
redis_1 | 245:C 07 Aug 2021 21:52:01.051 # Failed opening the RDB file crontab (in server root dir /etc) for saving: Permission denied

@bfis108137
Copy link
Author

bfis108137 commented Aug 7, 2021

in the end it seems the following error is the issue. From my google search this seems to be something is a security issue but these are new fresh installs with only required software by stelace. Interesting that Redis wants to save to /etc...

redis_1 | 1:M 07 Aug 2021 21:51:55.132 # Background saving error
redis_1 | 1:M 07 Aug 2021 21:52:01.051 * 100 changes in 300 seconds. Saving...
redis_1 | 1:M 07 Aug 2021 21:52:01.051 * Background saving started by pid 245
redis_1 | 245:C 07 Aug 2021 21:52:01.051 # Failed opening the RDB file crontab (in server root dir /etc) for saving: Permission denied

Interesting. I thought it was the provider's setup since I don't have this problem elsewhere so I manually uploaded a cloud init image directly from debian's site but it had the same issue. Either the provisioning from Digital Ocean actually adds the problem anyways or it's actually part of the cloud init image.

In any case how can I fix this? It seems that the correct solution is to change the folder where the file is saved since the other alternative is to give permissions on /etc which is obviously wrong. How do I do that?

@woyuen
Copy link
Member

woyuen commented Aug 10, 2021

Did you load a volume for Redis container?
Normally with the official image, loading a volume to /data within it does the job.

I'll keep searching if it doesn't work.

@ahmed-umair
Copy link

I ran into the same issue and I believe it may have been due to the redis instance being exposed to the internet if the default settings are left untouched.

@bfis108137 if you don't mind, could you please answer the following:

  1. Did you use the .env file as-is without setting up a password for redis?
  2. Did you by any chance try checking the keys on the redis instance after losing platformID?

I didn't assign a password for redis in the .env file and this happened.

In my case, the folder that the redis instance was trying to write to was /var/spool/cron and I had the fishy "backupXYZ" keys as pointed out in the first answer here as well.

Perhaps, it would be better to not expose the supporting containers' ports (elasticsearch, postgres, redis) to the host by default in the docker-compose files. Those who really need them exposed can do it themselves by changing the docker-compose file, a minor inconvenience at best, especially when compared to the security issues it will help prevent.

@bfis108137
Copy link
Author

bfis108137 commented Aug 30, 2021 via email

@ahmed-umair
Copy link

Well, that makes two of us.

Disabling exporting container ports to the host by default is the most feasible thing I can think of right now to prevent this from happening to other unsuspecting users. Does anyone have any other suggestions which you think would make more sense? I'm all ears.

I did not touch a thing. Used all default settings. I did not check the keys

On Mon, Aug 30, 2021 at 11:35 PM Muhammad Umair Ahmed < @.***> wrote: I ran into the same issue and I believe it may have been due to the redis instance being exposed to the internet if the default settings are left untouched. @bfis108137 https://github.com/bfis108137 if you don't mind, could you please answer the following: 1. Did you use the .env file as-is without setting up a password for redis? 2. Did you by any chance try checking the keys on the redis instance after losing platformID? I didn't assign a password for redis in the .env file and this happened. In my case, the folder that the redis instance was trying to write to was /var/spool/cron and I had the fishy "backupXYZ" keys as pointed out in the first answer here https://stackoverflow.com/questions/41887280/redis-config-dir-periodically-modified-to-var-spool-cron-with-failed-opening as well. Perhaps, it would be better to not expose the supporting containers' ports (elasticsearch, postgres, redis) to the host by default in the docker-compose files. Those who really need them exposed can do it themselves by changing the docker-compose file, a minor inconvenience at best, especially when compared to the security issues it will help prevent. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#719 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF5EJ3JBQCNDLPJ2KMPY73T7PTQ7ANCNFSM5AWFIRWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants