Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCOS_OSS-5277: Explore switching to ZooKeeper 3.5.x for DC/OS 1.14 #5878

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

jgehrcke
Copy link
Contributor

The first step is to see if integration tests pass.

https://jira.mesosphere.com/browse/DCOS_OSS-5277

@d2iq-mergebot
Copy link
Collaborator

This repo has @mesosphere-mergebot integration. You can perform the following commands by submitting a comment. Submit a comment with content "@mesosphere-mergebot help" to view more detailed help text and examples. Be sure the have a look at the mergebot documentation, too.

@mesosphere-mergebot changelog-not-required reason 
@mesosphere-mergebot sync  
@mesosphere-mergebot bump-ee  
@mesosphere-mergebot merge-it  
@mesosphere-mergebot override-status pr-status-check jira-url 
@mesosphere-mergebot request-review  
@mesosphere-mergebot label [Ship It|Request For Comment|Ready For Review|Holding|Work In Progress|merge-by-mergebot] 
  • PR creators can apply one of [Ready For Review|Work In Progress]. Owners can apply any label.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump PR: mesosphere/dcos-enterprise/pull/6194

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/6194 updated.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/6194 updated.

@jgehrcke
Copy link
Contributor Author

@mesosphere-mergebot override-status teamcity/dcos/test/terraform/aws/onprem/static/group3 https://jira.mesosphere.com/browse/DCOS-56126

@d2iq-mergebot
Copy link
Collaborator

@jgehrcke, only listed owners other than the PR creator may issue that command.

@adamtheturtle
Copy link
Contributor

@mesosphere-mergebot override-status teamcity/dcos/test/terraform/aws/onprem/static/group3 https://jira.mesosphere.com/browse/DCOS-56126

@adamtheturtle
Copy link
Contributor

(as discussed, @jgehrcke will re-run the overridden command).

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/6194 updated.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/6194 updated.

@jgehrcke
Copy link
Contributor Author

Got ZooKeeper to start, but Exhibitor does not detect that ZooKeeper is working. Changed log level output to DEBUG for both ZK and Exhibitor and I think this shows an important clue:

Jul 18 13:21:59 dcos-e2e-default-f52b5-master-0 java[5263]: [myid:1] INFO  [main:ContainerManager@64] - Using checkIntervalMs=60000 maxPerMinute=10000
Jul 18 13:21:59 dcos-e2e-default-f52b5-master-0 start_exhibitor.py[5149]: INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper Server: Starting zookeeper ... STARTED [pool-2-thread-2]
Jul 18 13:22:00 dcos-e2e-default-f52b5-master-0 java[5263]: [myid:1] DEBUG [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:53942
Jul 18 13:22:00 dcos-e2e-default-f52b5-master-0 java[5263]: [myid:1] INFO  [NIOWorkerThread-1:FourLetterCommands@234] - The list of known four letter word commands is : [{1936881266=srvr, 1937006964=stat, 2003003491=wchc, 168541732
Jul 18 13:22:00 dcos-e2e-default-f52b5-master-0 java[5263]: [myid:1] INFO  [NIOWorkerThread-1:FourLetterCommands@235] - The list of enabled four letter word commands is : [[srvr]]
Jul 18 13:22:00 dcos-e2e-default-f52b5-master-0 java[5263]: [myid:1] DEBUG [NIOWorkerThread-1:NIOServerCnxn@511] - Command ruok is not executed because it is not in the whitelist.
Jul 18 13:22:00 dcos-e2e-default-f52b5-master-0 java[5263]: [myid:1] DEBUG [NIOWorkerThread-1:NIOServerCnxn@627] - Closed socket connection for client /127.0.0.1:53942 (no session established for client)
Jul 18 13:22:00 dcos-e2e-default-f52b5-master-0 start_exhibitor.py[5149]: INFO  com.netflix.exhibitor.core.activity.ActivityLog  ZooKeeper down/not-serving waiting 2021 of 40000 ms before restarting [ActivityQueue-0]

@jgehrcke
Copy link
Contributor Author

Command ruok is not executed because it is not in the whitelist.

https://issues.apache.org/jira/browse/ZOOKEEPER-2764

4lw.commands.whitelist
(Java system property: zookeeper.4lw.commands.whitelist)

New in 3.5.3: A list of comma separated Four Letter Words commands that user wants to use. A valid Four Letter Words command must be put in this list else ZooKeeper server will not enable the command. By default the whitelist only contains "srvr" command which zkServer.sh uses. The rest of four letter word commands are disabled by default.

Here's an example of the configuration that enables stat, ruok, conf, and isro command while disabling the rest of Four Letter Words command:

                4lw.commands.whitelist=stat, ruok, conf, isro
              
If you really need enable all four letter word commands by default, you can use the asterisk option so you don't have to include every command one by one in the list. As an example, this will enable all four letter word commands:

                4lw.commands.whitelist=*

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/6194 updated.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/6194 updated.

@jgehrcke
Copy link
Contributor Author

@mesosphere-mergebot label Ready For Review

@jgehrcke
Copy link
Contributor Author

Bumped kazoo, rebased on current master, force-pushed

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/6194 updated.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Your pull request's branch is not based on the most recent version of master. Please rebase your changes against this repo's master branch.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/6194 updated.

@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Your pull request's branch is not based on the most recent version of master. Please rebase your changes against this repo's master branch.

Otherwise we get

   cp: cannot create regular file '/opt/mesosphere/packages/exhibitor--XX/usr/zookeeper/lib/': Not a directory
Otherwise Exhibitor cannot detect ZK health, error message:

    Command ruok is not executed because it is not in the whitelist.
Quote from ZK docs:

    The AdminServer is an embedded Jetty server that
    provides an HTTP interface to the four letter word
    commands. By default, the server is started on port
    8080, and commands are issued by going to the URL
    "/commands/[command name]", e.g.,
    http://localhost:8080/commands/stat. The command
    response is returned as JSON

In DC/OS, do not expose this on all interfaces, but
instead accept connections from localhost only.

Also, use a port different from 8080 (this is Root
Marathon's port).
Explicitly listen on IPv4 interface. Config var is documented with:

    New in 3.3.0: the address (ipv4, ipv6 or hostname)
    to listen for client connections; that is, the address
    that clients attempt to connect to. This is optional,
    by default we bind in such a way that any connection
    to the clientPort for any address/interface/nic on the
    server will be accepted.
@d2iq-mergebot
Copy link
Collaborator

@mesosphere-mergebot bump-ee

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/6194 updated.

@d2iq-mergebot
Copy link
Collaborator

Enterprise Bump mesosphere/dcos-enterprise/pull/6194 unchanged and not updated.

@d2iq-mergebot
Copy link
Collaborator

👋 This PR has been inactive for 7 days. Please review the status checks to see what needs to be done to move it forward or change the label to Work in Progress. Thank you!

@jgehrcke
Copy link
Contributor Author

(we found a problem while working on this patch: https://issues.apache.org/jira/browse/ZOOKEEPER-3466)

Copy link
Contributor

@adamtheturtle adamtheturtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review added to remove this from my review requests list in GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants