New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
direct reads in high availability mode #153
Comments
@mapshen I wasn't able to recreate this problem yet with your information. I am assuming that you also have Can you try to start the processes with Does this happen if you do a |
I can sort of replicate this by using |
Yes, I do use What you described above is exactly what I am seeing. When the 2 processes start, they come up in
When that happens, one of the two new processes (927dc273ea23) will register itself and resume work
Logs from 927dc273ea23 also supports this observation:
But 927dc273ea23 still doesn't process the direct reads whereas, I think, in this case, it should. Also, I turned on It seems the shutdown procedure never gets run for me because I see the document in By the way, is it correct that timestamp written to |
In addition, an easier way to reproduce this is:
You will see the elastic index doesn't proceed to populate. |
Thanks for confirming the behavior. One of the issues is the following code which turns off direct reads when the process is in a cluster and it is not enabled: https://github.com/rwynn/monstache/blob/v4.13.0/monstache.go#L3730 The assumption there is that if the process is not enabled then some other process must be (and doing direct reads). And in that case we don't want every process in the cluster to re-perform the same direct reads. Not sure if it would be better to have them all process the direct-reads regardless of whether or not they are enabled. In this case that assumption is bad cause all processes are starting disabled due to the zombie process still lingering in the The problem with using the timestamp in The best you can do in monstache for delta direct-reads is to setup a pipeline with a $match clause that targets a date/timestamp field in your document with a $gt operator. Here is a link to that shutdown logic that resets the cluster state which looks not to be happening here: |
Clearly it's my oversight when dockerizing monstache. Made a change on my end and now Although this means the two cases I mentioned above are no longer a concern, I found a third scenario where this would happen:
So for high availability, monstache only works for oplog-tailing events, not direct reads. Thanks for pointing me to the source code, and that really helps. To be honest, I haven't read all the code, but I was wondering if there is a logic for checking whether the direct reads are complete or not and if so, we may save its value to |
You raise a valid issue. I think it would be difficult to track and reliably clean up the state of direct reads. I think we could add an option such that all processes do direct reads and only active processes track the oplog (they would be redundant but maybe not a big overhead?). Or, a different approach would be to turn off direct reads in the configs of the cluster processes and then add a third docker container that only does direct-reads. This 3rd (non-clustered) container could be setup to turn on
I was thinking about scheduled direct-reads (restart after an interval) but thought maybe that should be handled outside of monstache. |
This scenario could potentially dangerous in the sense it's possible for us to lose data without knowing when one process goes down during direct reads which is what we can't afford to live with. Put some thought into this and I think I'm in favor of your adding a third container approach. What I have in mind is all the 3 containers will still share same configuration file but the first one will take P.S. I have two other ideas, in which the second process starts with the first one but will stand by till it fails. When running in the HA mode, processes started will not perform direct reads automatically, i.e., they will not start go routines to read the docs from a collection. And based on this, my two ideas are:
Then the process at work will first check Note that there will be a TTL index on the "expireAt" field in the "_direct_reads" doc, in order to clean up the doc. The downside of this is when we are to rebuild an index, we stop the containers, remove the index and update the template but we can't restart the containers right away with the same because of the legacy "_direct_reads" doc. |
I will respond in more length when I get some time. But one thing.
|
Good call, Ryan. Will test it out and circle back. |
@mapshen I put some fixes in related to this in
By wait I mean that go routine consuming events from a channel will be blocked until an active status is determined. This whole thing is predicated on the fact that the go routines producing data from MongoDB cannot advance unless the channels they write to are emptied. So by not reading on those channels, the producers (direct readers) will feel back pressure and effectively be paused. I think this should fix all the cases that you outlined in this issue. For the first case where all the monstache processes are killed forceably without cleanup, monstache will wait till the zombie process is expired by MongoDB (max 30 seconds), but proceed to process the direct reads. In the case where the cleanup does occur and one of the new active processes is killed before completing the direct reads, the subsequent active process in the cluster will still sync direct reads (repeating any from the killed process). |
@rwynn Thanks a lot for the fix, much appreciated. Did a test in my dev env, and based on my setup, where two monstache processes running as a cluster, I killed the active process when it finished the direct reads so that the subsequent one started a full re-sync. It worked as expected and the RAM usage increased about 50% on the Elasticsearch data nodes during the re-sync and the amount disk space used doubled when the re-sync is done, which we can accept given this is the worst case scenario. Of course, the nodes will try to utilize all the CPUs available. In sum, we are going down this road. |
@mapshen that's interesting about the disk usage. I can see the RAM usage increasing but surprised about the disk since the documents should be overwriting existing not adding more to the index. Maybe this will decrease again when Elasticsearch merges segments? |
@rwynn Yes, the disk usage went back to normal later on, but I haven't found a good explanation on why the merges only happen after the re-sync is complete. It also seems to be what happens even when I build the indexes from scratch. |
@rwynn think we need to reopen this as it seems I have found a case in which this solution fails (and it just happened in our production environment). Here are the steps to reproduce it:
If we take a look at those extra docs, you will find they are the docs we removed from mongo earlier. The reason, I suspect, is that although 3d5ef1f2efdd doesn't consume events from a channel when it's not active, old events get queued up. When it starts to work, it will process those events as they just come in, while they should have been discarded. The consequence is I have to manually clean up data in elastic periodically providing fail-over happens once a while. Would we be able to fix this? Let me if you need more information. Many thanks in advance! |
@mapshen I will take a look at fixing this issue. One measure monstache has to prevent old data from getting into Elasticsearch is the use of version numbers (used by default but not used when Monstache uses the timestamp from the oplog as a version number such that old data will get rejected ensue the index is consistent with MongoDB. However, when it comes to deletes, Elasticsearch garbage collects deleted versions after 60s by default. This results in this rejection not happening if the 2nd process takes over more than 60s after the delete is performed by the 1st. For example, I just tried the scenario but killed the main process immediately after performing the deletes. The 2nd process took over and sent the buffered index requests, but since the version number was less than the current they were rejected. Obviously, this won't work in the general case since the process may take over well after the 60s has elapsed. You can change 60s to a higher value using the All that having been said, I think there are some measures I can take to ensure that the accumulated buffer of change events is cleared when a process takes over for another. The trick is still ensure that the original reason for this issue is still fixed. All the direct reads should still be performed when a process takes over. So I cannot just discard those. I will need to rerun them. |
Agreed, clearing up the buffer and re-perform the direct reads will do. Happy to help test the fix. Let me know! |
@rwynn Thanks for the quick turnaround and it looks good in my testing. However, it seems we've got a regression here. Currently, the behavior is the http server won't start until a process first becomes "enabled". Since we use something like |
@mapshen Thanks for reporting the issue with the http server. I just checked in a fix across all branches. Do you think you would be able to do a local build and let me know if you run into any issues? |
@rwynn all good now in my testing. Had to change things up a bit to support local builds on my end though ;) |
Are we going to make a release for this? |
@mapshen just pushed a new release. thanks. |
Hi Ryan,
Recently noticed a "strange" behavior of Monstache and am hoping you could shed some light on it.
Sometimes, because of an index template change, we need to recreate an index. By that, we mean stopping Monstache, deleting the current index, updating the index template and restarting Monstache immediately. However, at this time, it doesn't seem Monstache has realized the old index is gone and it will not perform a direct read to recreate the index.
Here are my steps to reproduce it:
Suspect it may have something to do with
monstache.cluster
, which has an TTL index onexpireAt
because this issue will not come up if we wait for like 1 minute before restarting Monstache.The text was updated successfully, but these errors were encountered: