Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

returned non-zero exit status 3 #1861

Open
wiwengweng opened this issue Jul 4, 2019 · 4 comments
Open

returned non-zero exit status 3 #1861

wiwengweng opened this issue Jul 4, 2019 · 4 comments

Comments

@wiwengweng
Copy link

I am using miniDCOS to install and test dcos but without successful.

env:
ubuntu16.04
docker: Docker version 18.09.7, build 2d0083d
Python 3.6
minidcos, version 2019.06.19.0

(base) root@bogon:~/dcos-venv/bin# ./minidcos docker wait --transport docker-exec -vvv
and get the following output:

DEBUG:dcos_e2e.node:Running command `/bin/sh -c systemctl status sshd.socket || systemctl status sshd` on a node `Node(public_ip=172.17.0.5, private_ip=172.17.0.5)`
DEBUG:dcos_e2e._subprocess_tools:● sshd.socket - OpenSSH Server Socket
DEBUG:dcos_e2e._subprocess_tools:   Loaded: loaded (/usr/lib/systemd/system/sshd.socket; disabled; vendor preset: disabled)
DEBUG:dcos_e2e._subprocess_tools:   Active: inactive (dead)
DEBUG:dcos_e2e._subprocess_tools:     Docs: man:sshd(8)
DEBUG:dcos_e2e._subprocess_tools:           man:sshd_config(5)
DEBUG:dcos_e2e._subprocess_tools:   Listen: 0.0.0.0:22 (Stream)
DEBUG:dcos_e2e._subprocess_tools: Accepted: 0; Connected: 0
DEBUG:dcos_e2e._subprocess_tools:● sshd.service - OpenSSH server daemon
DEBUG:dcos_e2e._subprocess_tools:   Loaded: loaded (/usr/lib/systemd/system/sshd.service; disabled; vendor preset: enabled)
DEBUG:dcos_e2e._subprocess_tools:   Active: inactive (dead)
DEBUG:dcos_e2e._subprocess_tools:     Docs: man:sshd(8)
DEBUG:dcos_e2e._subprocess_tools:           man:sshd_config(5)
DEBUG:dcos_e2e._subprocess_tools:
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 systemd[1]: Starting OpenSSH server daemon...
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 sshd[243]: Server listening on 0.0.0.0 port 22.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 sshd[243]: Server listening on :: port 22.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 systemd[1]: Started OpenSSH server daemon.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:56:32 dcos-e2e-default-07034-public-agent-0 systemd[1]: Stopping OpenSSH server daemon...
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:56:32 dcos-e2e-default-07034-public-agent-0 systemd[1]: Stopped OpenSSH server daemon.
WARNING:retry.api:Command '['docker', 'exec', '--user', 'root', '--interactive', '3d603f5136ab808bcb4953e997a4128f71aaf1092dc355fc3592c746e3906afc', '/bin/sh', '-c', 'systemctl status sshd.socket || systemctl status sshd']' returned non-zero exit status 3., retrying in 1 seconds...
DEBUG:dcos_e2e.node:Running command `/bin/sh -c systemctl status sshd.socket || systemctl status sshd` on a node `Node(public_ip=172.17.0.5, private_ip=172.17.0.5)`
DEBUG:dcos_e2e._subprocess_tools:● sshd.socket - OpenSSH Server Socket
DEBUG:dcos_e2e._subprocess_tools:   Loaded: loaded (/usr/lib/systemd/system/sshd.socket; disabled; vendor preset: disabled)
DEBUG:dcos_e2e._subprocess_tools:   Active: inactive (dead)
DEBUG:dcos_e2e._subprocess_tools:     Docs: man:sshd(8)
DEBUG:dcos_e2e._subprocess_tools:           man:sshd_config(5)
DEBUG:dcos_e2e._subprocess_tools:   Listen: 0.0.0.0:22 (Stream)
DEBUG:dcos_e2e._subprocess_tools: Accepted: 0; Connected: 0
DEBUG:dcos_e2e._subprocess_tools:● sshd.service - OpenSSH server daemon
DEBUG:dcos_e2e._subprocess_tools:   Loaded: loaded (/usr/lib/systemd/system/sshd.service; disabled; vendor preset: enabled)
DEBUG:dcos_e2e._subprocess_tools:   Active: inactive (dead)
DEBUG:dcos_e2e._subprocess_tools:     Docs: man:sshd(8)
DEBUG:dcos_e2e._subprocess_tools:           man:sshd_config(5)
DEBUG:dcos_e2e._subprocess_tools:
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 systemd[1]: Starting OpenSSH server daemon...
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 sshd[243]: Server listening on 0.0.0.0 port 22.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 sshd[243]: Server listening on :: port 22.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 systemd[1]: Started OpenSSH server daemon.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:56:32 dcos-e2e-default-07034-public-agent-0 systemd[1]: Stopping OpenSSH server daemon...
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:56:32 dcos-e2e-default-07034-public-agent-0 systemd[1]: Stopped OpenSSH server daemon.
WARNING:retry.api:Command '['docker', 'exec', '--user', 'root', '--interactive', '3d603f5136ab808bcb4953e997a4128f71aaf1092dc355fc3592c746e3906afc', '/bin/sh', '-c', 'systemctl status sshd.socket || systemctl status sshd']' returned non-zero exit status 3., retrying in 1 seconds...
DEBUG:dcos_e2e.node:Running command `/bin/sh -c systemctl status sshd.socket || systemctl status sshd` on a node `Node(public_ip=172.17.0.5, private_ip=172.17.0.5)`
DEBUG:dcos_e2e._subprocess_tools:● sshd.socket - OpenSSH Server Socket
DEBUG:dcos_e2e._subprocess_tools:   Loaded: loaded (/usr/lib/systemd/system/sshd.socket; disabled; vendor preset: disabled)
DEBUG:dcos_e2e._subprocess_tools:   Active: inactive (dead)
DEBUG:dcos_e2e._subprocess_tools:     Docs: man:sshd(8)
DEBUG:dcos_e2e._subprocess_tools:           man:sshd_config(5)
DEBUG:dcos_e2e._subprocess_tools:   Listen: 0.0.0.0:22 (Stream)
DEBUG:dcos_e2e._subprocess_tools: Accepted: 0; Connected: 0
DEBUG:dcos_e2e._subprocess_tools:● sshd.service - OpenSSH server daemon
DEBUG:dcos_e2e._subprocess_tools:   Loaded: loaded (/usr/lib/systemd/system/sshd.service; disabled; vendor preset: enabled)
DEBUG:dcos_e2e._subprocess_tools:   Active: inactive (dead)
DEBUG:dcos_e2e._subprocess_tools:     Docs: man:sshd(8)
DEBUG:dcos_e2e._subprocess_tools:           man:sshd_config(5)
DEBUG:dcos_e2e._subprocess_tools:
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 systemd[1]: Starting OpenSSH server daemon...
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 sshd[243]: Server listening on 0.0.0.0 port 22.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 sshd[243]: Server listening on :: port 22.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 systemd[1]: Started OpenSSH server daemon.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:56:32 dcos-e2e-default-07034-public-agent-0 systemd[1]: Stopping OpenSSH server daemon...
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:56:32 dcos-e2e-default-07034-public-agent-0 systemd[1]: Stopped OpenSSH server daemon.
WARNING:retry.api:Command '['docker', 'exec', '--user', 'root', '--interactive', '3d603f5136ab808bcb4953e997a4128f71aaf1092dc355fc3592c746e3906afc', '/bin/sh', '-c', 'systemctl status sshd.socket || systemctl status sshd']' returned non-zero exit status 3., retrying in 1 seconds...
DEBUG:dcos_e2e.node:Running command `/bin/sh -c systemctl status sshd.socket || systemctl status sshd` on a node `Node(public_ip=172.17.0.5, private_ip=172.17.0.5)`
DEBUG:dcos_e2e._subprocess_tools:● sshd.socket - OpenSSH Server Socket
DEBUG:dcos_e2e._subprocess_tools:   Loaded: loaded (/usr/lib/systemd/system/sshd.socket; disabled; vendor preset: disabled)
DEBUG:dcos_e2e._subprocess_tools:   Active: inactive (dead)
DEBUG:dcos_e2e._subprocess_tools:     Docs: man:sshd(8)
DEBUG:dcos_e2e._subprocess_tools:           man:sshd_config(5)
DEBUG:dcos_e2e._subprocess_tools:   Listen: 0.0.0.0:22 (Stream)
DEBUG:dcos_e2e._subprocess_tools: Accepted: 0; Connected: 0
DEBUG:dcos_e2e._subprocess_tools:● sshd.service - OpenSSH server daemon
DEBUG:dcos_e2e._subprocess_tools:   Loaded: loaded (/usr/lib/systemd/system/sshd.service; disabled; vendor preset: enabled)
DEBUG:dcos_e2e._subprocess_tools:   Active: inactive (dead)
DEBUG:dcos_e2e._subprocess_tools:     Docs: man:sshd(8)
DEBUG:dcos_e2e._subprocess_tools:           man:sshd_config(5)
DEBUG:dcos_e2e._subprocess_tools:
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 systemd[1]: Starting OpenSSH server daemon...
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 sshd[243]: Server listening on 0.0.0.0 port 22.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 sshd[243]: Server listening on :: port 22.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 systemd[1]: Started OpenSSH server daemon.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:56:32 dcos-e2e-default-07034-public-agent-0 systemd[1]: Stopping OpenSSH server daemon...
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:56:32 dcos-e2e-default-07034-public-agent-0 systemd[1]: Stopped OpenSSH server daemon.
WARNING:retry.api:Command '['docker', 'exec', '--user', 'root', '--interactive', '3d603f5136ab808bcb4953e997a4128f71aaf1092dc355fc3592c746e3906afc', '/bin/sh', '-c', 'systemctl status sshd.socket || systemctl status sshd']' returned non-zero exit status 3., retrying in 1 seconds...
DEBUG:dcos_e2e.node:Running command `/bin/sh -c systemctl status sshd.socket || systemctl status sshd` on a node `Node(public_ip=172.17.0.5, private_ip=172.17.0.5)`
DEBUG:dcos_e2e._subprocess_tools:● sshd.socket - OpenSSH Server Socket
DEBUG:dcos_e2e._subprocess_tools:   Loaded: loaded (/usr/lib/systemd/system/sshd.socket; disabled; vendor preset: disabled)
DEBUG:dcos_e2e._subprocess_tools:   Active: inactive (dead)
DEBUG:dcos_e2e._subprocess_tools:     Docs: man:sshd(8)
DEBUG:dcos_e2e._subprocess_tools:           man:sshd_config(5)
DEBUG:dcos_e2e._subprocess_tools:   Listen: 0.0.0.0:22 (Stream)
DEBUG:dcos_e2e._subprocess_tools: Accepted: 0; Connected: 0
DEBUG:dcos_e2e._subprocess_tools:● sshd.service - OpenSSH server daemon
DEBUG:dcos_e2e._subprocess_tools:   Loaded: loaded (/usr/lib/systemd/system/sshd.service; disabled; vendor preset: enabled)
DEBUG:dcos_e2e._subprocess_tools:   Active: inactive (dead)
DEBUG:dcos_e2e._subprocess_tools:     Docs: man:sshd(8)
DEBUG:dcos_e2e._subprocess_tools:           man:sshd_config(5)
DEBUG:dcos_e2e._subprocess_tools:
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 systemd[1]: Starting OpenSSH server daemon...
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 sshd[243]: Server listening on 0.0.0.0 port 22.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 sshd[243]: Server listening on :: port 22.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:48:48 dcos-e2e-default-07034-public-agent-0 systemd[1]: Started OpenSSH server daemon.
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:56:32 dcos-e2e-default-07034-public-agent-0 systemd[1]: Stopping OpenSSH server daemon...
DEBUG:dcos_e2e._subprocess_tools:Jul 04 03:56:32 dcos-e2e-default-07034-public-agent-0 systemd[1]: Stopped OpenSSH server daemon.
Traceback (most recent call last):
  File "./minidcos", line 10, in <module>
    sys.exit(minidcos())
  File "/root/dcos-venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/root/dcos-venv/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/root/dcos-venv/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/root/dcos-venv/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/root/dcos-venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/root/dcos-venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/root/dcos-venv/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/root/dcos-venv/lib/python3.6/site-packages/dcos_e2e_cli/dcos_docker/commands/wait.py", line 70, in wait
    cluster=cluster_containers.cluster,
  File "/root/dcos-venv/lib/python3.6/site-packages/dcos_e2e_cli/dcos_docker/commands/_common.py", line 190, in cluster
    public_agents=set(map(self.to_node, self.public_agents)),
  File "/root/dcos-venv/lib/python3.6/site-packages/dcos_e2e/cluster.py", line 116, in from_nodes
    cluster_backend=backend,
  File "/root/dcos-venv/lib/python3.6/site-packages/dcos_e2e/cluster.py", line 86, in __init__
    _wait_for_ssh(node=node)
  File "</root/dcos-venv/lib/python3.6/site-packages/decorator.py:decorator-gen-6>", line 2, in _wait_for_ssh
  File "/root/dcos-venv/lib/python3.6/site-packages/retry/api.py", line 74, in retry_decorator
    logger)
  File "/root/dcos-venv/lib/python3.6/site-packages/retry/api.py", line 33, in __retry_internal
    return f()
  File "/root/dcos-venv/lib/python3.6/site-packages/dcos_e2e/cluster.py", line 46, in _wait_for_ssh
    shell=True,
  File "/root/dcos-venv/lib/python3.6/site-packages/dcos_e2e/node.py", line 520, in run
    capture_output=capture_output,
  File "/root/dcos-venv/lib/python3.6/site-packages/dcos_e2e/_node_transports/_docker_exec_transport.py", line 125, in run
    pipe_output=capture_output,
  File "/root/dcos-venv/lib/python3.6/site-packages/dcos_e2e/_subprocess_tools.py", line 145, in run_subprocess
    stderr=stderr,
subprocess.CalledProcessError: Command '['docker', 'exec', '--user', 'root', '--interactive', '3d603f5136ab808bcb4953e997a4128f71aaf1092dc355fc3592c746e3906afc', '/bin/sh', '-c', 'systemctl status sshd.socket || systemctl status sshd']' returned non-zero exit status 3.
@wiwengweng
Copy link
Author

Did it use the ssh service? I also try to enable sshd service and start the service, but again see the last error.

@wiwengweng
Copy link
Author

as I ran on, I recreate the cluster again. But still meet some errors.

  1. this time ssh is up, but others are not.
⠴ Waiting for DC/OS to startWARNING:retry.api:Command '['docker', 'exec', '--user', 'root', '--interactive', '2669ffd6c3f2633df6fb3be6fad73fd8f5080918f825465e376b48d3b914aa72', '/bin/sh', '-c', 'sudo /opt/mesosphere/bin/dcos-check-runner check node-poststart || sudo /opt/mesosphere/bin/dcos-diagnostics check node-poststart || /opt/mesosphere/bin/3dt --diag']' returned non-zero exit status 127., retrying in 10 seconds...

It means command returned status is not correct. So I log into master node, and check the commands and services.

  1. some of the commands does not exist.
[root@dcos-e2e-default-766cf-master-0 bin]# /opt/mesosphere/bin/dcos-diagnostics --verbose check node-poststart
Error: unknown command "check" for "dcos-diagnostics"
Run 'dcos-diagnostics --help' for usage.
unknown command "check" for "dcos-diagnostics"

in fact dcos-diagnostics dose not support check command anymore, and here is the help file.

Usage:
  dcos-diagnostics [flags]
  dcos-diagnostics [command]

Available Commands:
  daemon      Start an http server and polls the components health.
  help        Help about any command

Flags:
      --config string   config file (default is $HOME/.dcos-diagnostics.yaml)
      --diag            Check DC/OS components health.
  -h, --help            help for dcos-diagnostics
      --role string     Set node role
      --verbose         Use verbose debug output.
      --version         Print dcos-diagnostics version
  1. 3dt command dose not exist.
[root@dcos-e2e-default-766cf-master-0 bin]#  /opt/mesosphere/bin/3dt --diag
bash: /opt/mesosphere/bin/3dt: No such file or directory
  1. key services are not running good.
[root@dcos-e2e-default-766cf-master-0 bin]# systemctl | grep dcos-
  dcos-adminrouter.service             loaded activating auto-restart       Admin Router Master: Uses NGINX to expose a unified control plane proxy for components and services
  dcos-bouncer.service                 loaded activating auto-restart       DC/OS Identity and Access Manager (Bouncer): offers login methods and user management
  dcos-checks-api.service              loaded active     running            DC/OS Checks API: Provides node and cluster checks
● dcos-checks-poststart.service        loaded failed     failed             DC/OS Poststart Checks: Run node-poststart checks

and these services cannot be restarted using systemctl restart command. Hope someone can help me. TIA

@adamtheturtle
Copy link
Contributor

Thank you @wiwengweng for this thorough bug report.
I appreciate that the logging could be better.

The rough order of operations on the wait command is this:

  • Start nodes
  • Wait for the SSH service to start up on all nodes (this is what did not work the first time for you, I am unsure why)
  • Wait for DC/OS checks to pass

The last of these has to support multiple versions of DC/OS.
It runs multiple commands on each node, to support each version of DC/OS.
Only one of those needs to pass.
It runs those commands on repeat until the services are running well.

The following is a related comment in the code:

    The execution will differ for different version of DC/OS.
    ``dcos-check-runner`` only exists on DC/OS 1.12+. ``dcos-diagnostics
    check node-poststart`` only works on DC/OS 1.10 and 1.11. ``3dt`` only
    exists on DC/OS 1.9. ``node-poststart`` requires ``sudo`` to allow
    reading the CA certificate used by certain checks.

This means that it is not a problem for example that 3dt does not exist on your nodes.

as I ran on, I recreate the cluster again. But still meet some errors.

How long did you wait to see if the services just started up?

@wiwengweng
Copy link
Author

wiwengweng commented Jul 15, 2019

thanks for your response, adamtheturtle. I think I have waited for about 30mins+ before I see this error. Then I also try to manually start sshd in all nodes, and be sure they are all alive. But still see some services are "down/auto restart" state in my master container.

  dcos-adminrouter.service             loaded activating start-pre    start Admin Router Master: Uses NGINX to expose a unified control plane proxy for components and services
  dcos-bouncer.service                 loaded activating auto-restart       DC/OS Identity and Access Manager (Bouncer): offers login methods and user management
  dcos-checks-api.service              loaded active     running            DC/OS Checks API: Provides node and cluster checks
● dcos-checks-poststart.service        loaded failed     failed             DC/OS Poststart Checks: Run node-poststart checks
  dcos-cockroach.service               loaded activating auto-restart       CockroachDB: Database for the DC/OS IAM

by this command line docker exec --user root --interactive 4449ed14b85f /bin/sh -c systemctl status sshd.socket || systemctl status sshd

and here 4449ed14b85f is just my master container ID

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants