Skip to content
This repository has been archived by the owner on Oct 22, 2021. It is now read-only.

Cluster dies on mem3_rep_manager #106

Open
opie4624 opened this issue May 23, 2012 · 1 comment
Open

Cluster dies on mem3_rep_manager #106

opie4624 opened this issue May 23, 2012 · 1 comment

Comments

@opie4624
Copy link

Yesterday a single node was unable to start due to errors in '''mem3_rep_manager''' after a few hours, all 6 nodes are unable to start.

Each node's run, right now, looks like this:

[Tue, 22 May 2012 19:04:07 GMT] [info] [<0.87.0>] [--------] Apache CouchDB has started on http://undefined:5986/
[Tue, 22 May 2012 19:04:08 GMT] [error] [emulator] [--------] Error in process <0.171.0> on node 'bigcouch@couchdb1' with exit value: {{badmatch,nil},[{fabric_view,remove_down_shards,2},{rexi_utils,process_mailbox,6},{fabric_view_changes,receive_results,5},{fabric_view_changes,send_changes,6},{fabric_view_changes,go,5}]}


[Tue, 22 May 2012 19:04:08 GMT] [error] [<0.164.0>] [--------] ** Generic server mem3_rep_manager terminating 
** Last message in was {'EXIT',<0.171.0>,
                               {{badmatch,nil},
                                [{fabric_view,remove_down_shards,2},
                                 {rexi_utils,process_mailbox,6},
                                 {fabric_view_changes,receive_results,5},
                                 {fabric_view_changes,send_changes,6},
                                 {fabric_view_changes,go,5}]}}
** When Server state == {state,<0.165.0>,10,nil,[<0.171.0>]}
** Reason for termination == 
** {unexpected_msg,{'EXIT',<0.171.0>,
                           {{badmatch,nil},
                            [{fabric_view,remove_down_shards,2},
                             {rexi_utils,process_mailbox,6},
                             {fabric_view_changes,receive_results,5},
                             {fabric_view_changes,send_changes,6},
                             {fabric_view_changes,go,5}]}}}

The last http request was retrieving a view, then a stack trace just like above happened and the node went down. Since the load balancer would kick each request to a working node, that eventually downed the entire cluster.

@opie4624
Copy link
Author

Here's a crash dump and the console output from trying to start up one of the nodes. http://ge.tt/3EPR6AI

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant