-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On server we lose our platform ID. #719
Comments
Interesting development. The platform id disappeared again. Interestingly enough, it happened after a delay of about 5 seconds. I decided to restart stelace and I did a reboot too because the containers wouldn't come up after about 15 seconds. After the reboot the platformId came back. It's like there is some cache on a bad result. Could it be? |
What do you mean by the platformId disappears? I cannot reproduce your problem in my local environment. |
I believe it's due to load. I think I can reproduce it. |
Ok. I have an update. I don't think it's due to load (but we are checking this still). It happened and I restarted stelace and it didn't help. However after a reboot of the server the problem disappeared. Apparently there is some problem with one of the dockers (Postgres or Redis or Elasticsearch). I bet it's Postgres but I can't say for sure. |
I get same problem, error { rebooting docker services helps |
Are you using Digital Ocean? |
I opened up a free account on Google Cloud and I don't have this issue. Only on Digital Ocean (DO). The thing is that the server works fine otherwise so I will have a hard time explaining this to support. I would be willing to give @woyuen access to a test server. I am currently running a script testing the two servers and so far I have only seen the DO server have the problem |
Happened again on DO. |
Ok. I rebuilt the server except this time I didn't use the Redis docker rather I installed Redis and it runs as a service. After 28 hours the platform id has not disappeared. Can you give me some insight as to how this could happen? While technically a real version of Redis should be installed perhaps even on a different server. Still it doesn't happen with other providers and it concerns me. I would like to issue a formal complaint but I have no idea what I would tell them. FYI, the other 2 dockers are running. In addition, the platform id disappeared twice in the previous 24 hours before the new server. |
@bfis108137, thanks for the test. That's really weird. We didn't encounter any data loss with Docker containers, although we didn't use Digital Ocean. Just to be sure, is there any action you can tell that you did before noticing the disappearance of this platform ID? |
No action taken. Just checking the platform ID every 15 seconds. I don't think there is a memory issue. We have 4gb of memory dedicated . Currently it's at 50% usage. I didn't see any increase in memory usage. We have 2 dedicated virtual CPUs. You never know what that really means but I guess it means 2 cores not shared. I am running a new test. I started 2 vpses with 2gb memory and 1 vcpu. One is running redis as a docker and one running redis as a regular service with systemctl. I will leave it running for a few days and we will see what happens. |
I can already report that the one running the redis docker lost it's platformid sometime yesterday. I have two other servers at digital ocean running redis as a regular service with no issue. One is another server with the exact same specs for the test and another one which has 2 dedicated vCPUs. I was not watching an a day off so I can't say how long the platformid was lost for. |
Interesting that I just got the following error.
This is very weird because I have never seen this error (even when the platformid disappeared) and I never had to configure permissions in other installs. If there really was such an issue wouldn't it it be expected to never work? |
It's almost a fact that it's the redis docker. I have started numerous servers on Digital Ocean with the redis docker running and they all had the problem. One server now running for 5 days with redis running as a system service has not lost it's platformId even once. I am suspect of that error because on more powerful servers I didn't get it. I just lost the platformId. |
Indeed, this save error certainly lost your data. We haven't encountered this error. You shouldn't lost any data, even with powerful servers. |
As I said before I don't think it's the same issue. On more powerful
servers I don't see the error but the platform id disappears. I am on a
4gb ram/2 vCpus server and the platformId disappears about once every 12
hours.
On Wed, Aug 4, 2021 at 10:09 AM Wing-On Yuen ***@***.***>
wrote:
… Indeed, this save error certainly lost your data.
We haven't encountered this error. You shouldn't lost any data, even with
powerful servers.
Maybe there's some restrictions on Digital Ocean servers but I cannot find
them.
If you log the Redis containers, do you confirm this error was present
from the beginning?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#719 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABF5EJZIECOXY6YCK25GUM3T3DRRDANCNFSM5AWFIRWQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
I was mistaken. This is the error. For some reason my script doesn't detect it but a manual check reveals it clear as day. Before you said even with powerful servers. Don't you mean even with weak servers? The issue is clearly reduced with a more powerful server. On a 2gb ram/1 vCpu shared it happened once every 4 hours. On a 4gb ram/2 vCpus dedicated it happened once every 12 hours and I saw it last for 20 hours once. What do you mean by lost data? A restart of redis fixes the problem and everything comes back. How can I log the Redis container? |
When I mean by lost data is that your platform ID temporarily disappear. The fact your powerful server has this problem less frequently means there's something related to machine characteristics. That's indeed weird that data recover after Redis restart. If you use the docker-compose files we expose in the project to create Redis, you can do in the project folder : docker-compose logs redis |
Below is my Redis log after redoing things. There were two things that I noticed. 1.) WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. 2.) WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. I have implemented the two recommendations in the log for those issues and I have restarted the dockers and Stelace. Let's see what happens.
|
in the end it seems the following error is the issue. From my google search this seems to be something is a security issue but these are new fresh installs with only required software by stelace. Interesting that Redis wants to save to /etc...
|
Interesting. I thought it was the provider's setup since I don't have this problem elsewhere so I manually uploaded a cloud init image directly from debian's site but it had the same issue. Either the provisioning from Digital Ocean actually adds the problem anyways or it's actually part of the cloud init image. In any case how can I fix this? It seems that the correct solution is to change the folder where the file is saved since the other alternative is to give permissions on /etc which is obviously wrong. How do I do that? |
Did you load a volume for Redis container? I'll keep searching if it doesn't work. |
I ran into the same issue and I believe it may have been due to the redis instance being exposed to the internet if the default settings are left untouched. @bfis108137 if you don't mind, could you please answer the following:
I didn't assign a password for redis in the .env file and this happened. In my case, the folder that the redis instance was trying to write to was /var/spool/cron and I had the fishy "backupXYZ" keys as pointed out in the first answer here as well. Perhaps, it would be better to not expose the supporting containers' ports (elasticsearch, postgres, redis) to the host by default in the docker-compose files. Those who really need them exposed can do it themselves by changing the docker-compose file, a minor inconvenience at best, especially when compared to the security issues it will help prevent. |
I did not touch a thing. Used all default settings. I did not check the
keys
…On Mon, Aug 30, 2021 at 11:35 PM Muhammad Umair Ahmed < ***@***.***> wrote:
I ran into the same issue and I believe it may have been due to the redis
instance being exposed to the internet if the default settings are left
untouched.
@bfis108137 <https://github.com/bfis108137> if you don't mind, could you
please answer the following:
1. Did you use the .env file as-is without setting up a password for
redis?
2. Did you by any chance try checking the keys on the redis instance
after losing platformID?
I didn't assign a password for redis in the .env file and this happened.
In my case, the folder that the redis instance was trying to write to was
/var/spool/cron and I had the fishy "backupXYZ" keys as pointed out in the
first answer here
<https://stackoverflow.com/questions/41887280/redis-config-dir-periodically-modified-to-var-spool-cron-with-failed-opening>
as well.
Perhaps, it would be better to not expose the supporting containers' ports
(elasticsearch, postgres, redis) to the host by default in the
docker-compose files. Those who really need them exposed can do it
themselves by changing the docker-compose file, a minor inconvenience at
best, especially when compared to the security issues it will help prevent.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#719 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABF5EJ3JBQCNDLPJ2KMPY73T7PTQ7ANCNFSM5AWFIRWQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Well, that makes two of us. Disabling exporting container ports to the host by default is the most feasible thing I can think of right now to prevent this from happening to other unsuspecting users. Does anyone have any other suggestions which you think would make more sense? I'm all ears.
|
Our server is with digitalocean. The platformId just disappears randomly. I believe it happens with a lot of activity but I am not sure. It only happens with digital ocean. We are in the process of trying another provider. On a simple virtualbox vm with debian or ubuntu it has no problem but we have issues in the cloud.
The text was updated successfully, but these errors were encountered: