Skip to content
This repository has been archived by the owner on Jun 22, 2018. It is now read-only.

Zookeeper timeout on AWS when using hostname #483

Open
philwinder opened this issue Jul 18, 2016 · 3 comments
Open

Zookeeper timeout on AWS when using hostname #483

philwinder opened this issue Jul 18, 2016 · 3 comments

Comments

@philwinder
Copy link
Contributor

philwinder commented Jul 18, 2016

There seems to be an issue with hostnames using a Zookeeper version prior to 2.5.
https://issues.apache.org/jira/browse/ZOOKEEPER-2367
https://issues.apache.org/jira/browse/ZOOKEEPER-2171
soabase/exhibitor#269
http://grokbase.com/t/kafka/users/163jd6pj49/zookeeper-dns-ttl
mesosphere/marathon#412

I think the issue is that Zookeeper gets the IP from DNS on startup, then never (or not very often) re-resolves it.

And my minimesos Marathon /v2/info states:

  marathon_config": {
    "master": "zk://minimesos-zookeeper:2181/mesos",
    "failover_timeout": 604800,
...
  "zookeeper_config": {
    "zk": "zk://minimesos-zookeeper:2181/marathon",
    "zk_timeout": 10000,

So I think it actually has to fail before it tries to get a valid IP address. The odd thing is that this seems only to be a serious issue on AWS.

I think an upgrade to Zookeeper 2.5 would fix this, but I'm not confident about the source of the issue.

@philwinder
Copy link
Contributor Author

I've just had another look at the timings. This only occurs when I start my CPU intensive app. I think that the app is starving the Marathon/Mesos/Zookeper containers of resources, and it times out. Upon refresh it seems to struggle to reconnect.

I'm not sure what the solution is.

@philwinder
Copy link
Contributor Author

I've found the issue. Zookeeper is timing out when writing the transaction log:

2016-07-18 08:33:18,848 [myid:] - WARN  [SyncThread:0:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:0 took 2802ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide

The solution for me is to use a faster disk (not ebs volumes in AWS) or edit the zookeeper settings to disable the synchronisation of the transaction log. I.e. keep it in memory, don't worry about it flushing to disk. This is obviously a bit risky, if zookeeper fails, it's state will be out of sync.

forceSync
(Java system property: zookeeper.forceSync)

Requires updates to be synced to media of the transaction log before finishing processing the update. If this option is set to no, ZooKeeper will not require updates to be synced to the media.

So -DforceSync=no

I would recommend this in minimesos, as it is intended for testing. This will significantly improve zookeeper performance.

@frankscholten
Copy link
Contributor

frankscholten commented Jul 20, 2016

Let's make Zookeeper properties configurable in the zookeeper block in the minimesosFile

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants