[Megathread] pockethost.io system status, health, and outage reports - Report Problems Here 馃尞 #223
Replies: 29 comments 73 replies
-
Ok everyone, I think there are enough of us experiencing downtime issues that we can pull together to solve it. BackgroundSeveral of you (including me) have noticed that PocketHost seems to intermittently disappear and then come back to life 10-20 minutes later. This has never been observed during development, so it seems to be an issue that only shows up in production and under real use conditions. Possible causes and solutionsSomething about Digital OceanLikelihood: extremely unlikely 馃毇 Cause: It is possible that the Digital Ocean network or VPS becomes inaccessible for periods of time. Solution: Change providers A bug in in PocketBaseLikelihood: unlikely 馃槖 Cause: It is possible there is an unaddressed bug in PocketBase. I consider this to be unlikely because the outages happen regardless of which version of PocketBase a user is running. Solution: Identify and report bugs to the PocketBase project 'Bad Neighbor' theory - something about how a particular user is using PocketBaseLikelihood: likely 馃え Cause: We have already seen instances where a single PocketBase process will spike to max CPU usage because of a badly written or long query. This starves the other processes and has the effect of making the PocketHost server unresponsive. Solution: Isolate user processes and use cgroups to prevent a single PocketBase process from becoming a 'bad neighbor'. This is worth implementing anyway because it would make the system more secure by disallowing system-level access to PocketBase processes. Something in the PocketHost codeLikelihood: most likely 馃檲 Cause: Humility demands that I recognize my own code is the most likely problem :) When a web request comes in, it hits a nodejs proxy that decides what to do and routes the request accordingly. Most requests are forwarded to a secondary proxy layer (called the daemon) which is responsible for launching PocketBase instances on demand. Then, the request is forwarded to the actual PocketBase instance. We have seen cases where something in this chain of proxies stops responding. It isn't clear why, but it has been observed. Solution: There is no universal answer to this. The code is not particularly deep or complicated, but it is possible we have a leak somewhere and that the process eventually runs out of some kind of resource. Current action itemsWhile I think the PocketHost code itself is the most likely culprit, I think that adding support to isolate PocketBase processes and limit resource usage per instance is a good way to create more visibility. If a PocketBase processes runs out of control due to a bad query or really for any reason at all, the nodejs code might not handle that situation well, especially if incoming requests are piling up. So I think I'll focus my energy on implementing process isolation and then see how stability changes. |
Beta Was this translation helpful? Give feedback.
-
@benallfree it seems like long term maybe the solution is to isolate the environments, even if it is your code - your "bad neighbor" scenario never goes away and becomes worse as your repo (and pocketbase) grow in popularity. Have you considered building on top of something like railway.io? They use a Docker implementation of Pocketbase and while it is fairly simple to use that is subjective and would still require some Docker knowledge. I can still see a platform like Pockethost still being very useful to the community given the roadmap and features you already provide. There is a generous free tier and it would inherently isolate each instance and you could impose some hard capped limit more easily so the experience is stable. I think what you're doing is great but the product manager in me sees this as non-viable for a single person long term and I want pockethose and pocketbase to go on! That's my input, I will be watching with a learning eye. :) |
Beta Was this translation helpful? Give feedback.
-
Thank you for structuring the possible causes and possible initial solutions @benallfree . Would it help to try running some controlled PocketBase instances by mirroring the infrastructure where PocketHost is, and let them simulate production website patterns (such as queries per minute, execution time, and query complexity) for a few hours and observe how CPU, bandwidth, disk i/o, memory, etc. are maintained and which errors appear in the logs? Maybe this could give more insights? |
Beta Was this translation helpful? Give feedback.
-
Testing some new features, there may be some instability for a few hours! |
Beta Was this translation helpful? Give feedback.
-
#234 Reports of expired cert. Investigating |
Beta Was this translation helpful? Give feedback.
-
PocketHost will be down for scheduled maintenance window while we roll out v0.8.1. |
Beta Was this translation helpful? Give feedback.
-
Are you all running production apps with PH/PB yet? I love the dx so far but worried about longer term stability. If I do go with PB/PH is it relatively easy to migrate out to planetscale, supabase etc if needed? |
Beta Was this translation helpful? Give feedback.
-
It's down buddy ! |
Beta Was this translation helpful? Give feedback.
-
looks like it is down |
Beta Was this translation helpful? Give feedback.
-
Looking into it. |
Beta Was this translation helpful? Give feedback.
-
We are back up, now investigating! |
Beta Was this translation helpful? Give feedback.
-
Just to recap, it looks like we had our first outage in a month. This is actually really impressive, because I made a bunch of stability improvements to PocketHost that appear to be very durable. We had a problem with the logging, and I'm not sure if I will be able to determine the cause of this most recent incident, but I did fix the logging issue. For reference, here's what an outage looks like: As you can see, bandwidth dropped to nothing and then shortly after that, CPU usage died off as well. To cause this, something would have needed to cause the main Thanks for the reports @MOBO-Timmy @GroomingMetro |
Beta Was this translation helpful? Give feedback.
-
Getting reports of 403 errors from #253 and #254 and #255. Investigating. |
Beta Was this translation helpful? Give feedback.
-
#259 reports an outage. Investigating... |
Beta Was this translation helpful? Give feedback.
-
Hi! Seems that is currently down... https://pockethost.io/ site not showing the Login / Dashboard buttons Thanks |
Beta Was this translation helpful? Give feedback.
-
Same here, seeing login button for a split second then it disappears. Unexpected response {"url":"","status":0,"data":{},"isAbort":false,"originalError":{"cause":{"errno":-111,"code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":8090}},"name":"ClientResponseError 0"} from mothership |
Beta Was this translation helpful? Give feedback.
-
@HEAVYPOLY @cheskoxd Back up, thanks for the report. Investigating the cause now. Usually it ends up being some corner case that wasn't properly handled in code. Every time we go down, the system gets stronger! Thank you again! |
Beta Was this translation helpful? Give feedback.
-
Great, thank yoU! |
Beta Was this translation helpful? Give feedback.
-
FYI possible dns outages today as we move to cloudflare. |
Beta Was this translation helpful? Give feedback.
-
About 10 minutes ago, I noticed that my website has been experiencing errors due to the connection with PocketBase timing out repeatedly. |
Beta Was this translation helpful? Give feedback.
-
My instance seems to be down. Site can't be reached. |
Beta Was this translation helpful? Give feedback.
-
My instance is giving me the below error |
Beta Was this translation helpful? Give feedback.
-
Getting this on dashboard: Unexpected response {"url":"","status":0,"data":{},"isAbort":false,"originalError":{"cause":{"errno":-111,"code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":8090}},"name":"ClientResponseError 0"} from mothership |
Beta Was this translation helpful? Give feedback.
-
I get this Unexpected response {"url":"","status":0,"data":{},"isAbort":false,"originalError":{"cause":{"errno":-111,"code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":8090}},"name":"ClientResponseError 0"} from mothership And my service is down. Please fix this immediately. |
Beta Was this translation helpful? Give feedback.
-
@twiddlecoding @HEAVYPOLY @burraksumer @DiceBreakers Please try now. As per usual, investigating the cause takes some time but restarting the system is easy and effective :) I'll update this thread more when I find out what is failing. I soft-released 0.9.0 which Dockerizes things, so there could be some instability with that. Typically the issues revolve around processing failing and the main thread giving up on serving more. Always looking for ways to continue improving this. |
Beta Was this translation helpful? Give feedback.
-
FYI there may be some instability this morning as I roll out some enhancements. Should't be too noticeable though. |
Beta Was this translation helpful? Give feedback.
-
Your app is down |
Beta Was this translation helpful? Give feedback.
-
The new update appears to have caused an outage. |
Beta Was this translation helpful? Give feedback.
-
Everyone, please join us on Discord https://discord.gg/nVTxCMEcGT This thread has moved to https://discord.com/channels/1128192380500193370/1179852349011939439 Discord provides a much better forum for troubleshooting issues in real time. |
Beta Was this translation helpful? Give feedback.
-
If you experience any issues where you believe PocketHost may be down or experiencing a service disruption, please use this thread to report and discuss.
Also, head to https://discord.com/channels/1128192380500193370/1156285045494005771 for live incident chat.
Beta Was this translation helpful? Give feedback.
All reactions