Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP ends up ignoring that it's jobs have been killed #577

Open
daledude opened this issue Mar 8, 2019 · 0 comments
Open

CP ends up ignoring that it's jobs have been killed #577

daledude opened this issue Mar 8, 2019 · 0 comments

Comments

@daledude
Copy link

daledude commented Mar 8, 2019

  • what is happening and what you expect to see
    Consul had a half hour issue accepting service checks. Containerpilot eventually stopped PUT-ing health check updates for all jobs to consul. CP does continue to PUT health status updates for itself.

Also, CP seems to get into a state where it doesn't see that any of the spawned jobs are gone. The /status endpoint shows jobs as healthy when I manually killed them myself.

Also, the rsyslog-check that is in every config ends up outputting the following even though running the check manually is successful:

check.rsyslog timeout after 5s: '[514]'

The "check-port" health check script is merely this:

#!/bin/bash
/bin/netstat -tunl | /bin/grep ":$1 " > /dev/null 2>&1
ret=$?
exit $ret
  • the output of containerpilot -version
    Version: 3.8.0
    GitHash: 408dbc9

  • the ContainerPilot configuration you're using
    Doesn't matter the config. Happens to all my containers. Here is one anyways:

{
    consul: "{{.CONTAINER_HOST}}:8500",
    logging:
    {
        level: "INFO",
        format: "default",
        output: "stdout"
    },
    jobs: [
        {
            name: "rsyslog",
            exec: [ "rsyslogd-wrapper" ],
            restarts: "unlimited",
            health:
            {
                exec: "check-port 514", // Just simple: netstat ntlp | grep PORT
                interval: 2,
                ttl: 10,
                timeout: "5s",
            },
        },
        {{ if .DNSMASQ_SIDECAR }}
        {
            name: 'dnsmasq-{{.SERVICE_NAME_FULL}}',
            exec: [ "/usr/sbin/dnsmasq", "-k" ],
            restarts: "unlimited",
            port: "53",
            health:
            {
                exec: "check-port 53",
                interval: 2,
                ttl: 10,
                timeout: "5s",
            },
        },
        {{ end }}
        {
            name: "{{.SERVICE_NAME_FULL}}",
            when: {
              source: "watch.namingservice-{{.SERVICE_PLATFORM}}-{{.SERVICE_ENVIRONMENT}}",
              once: "healthy"
            },
            exec: [ 
                   "gosu", "admin",
                   "{{.BINDIR}}/{{.SERVICE_NAME}}", "-c", "{{.BASEDIR}}/cfg/{{.SERVICE_NAME}}.cfg", "-r", "short-recovery"
                  ],
	        restarts: "unlimited",
            port: "{{.SERVICE_PORT}}", // Causes service to be registered with Consul.
            health:
            {
                exec: "check-port {{.SERVICE_PORT}}",
                interval: 1,
                ttl: 10,
                timeout: "5s",
            },
            tags: [
                "{{.SERVICE_NAME}}",
                "{{.CONTAINER_HOST}}",
                "{{.SERVICE_ENVIRONMENT}}",
                "{{.SERVICE_PLATFORM}}"
            ],
            interfaces: [
                "10.0.0.0/8"
            ],
            consul:
            {
                enableTagOverride: true,
                deregisterCriticalServiceAfter: "6h"
            }
        },
        {
            // This job will watch for an event from Containerpilot that is fired
            //   when the "source" job in this config exits with a retcode > 0.
            // It then sends an event through Consul to notify this has occured.
            // A script run on the monitoring server will read the event
            //   from Consul.
            name: "{{.SERVICE_NAME_FULL}}-exit-failed-watcher",
            when: {
                source: "{{.SERVICE_NAME_FULL}}", // Must match the job name of the exec to watch.
                each: "exitFailed"
            },
            exec: [
                "send-consul-event", "service-exit-failed", "container_host={{.CONTAINER_HOST}}|service={{.SERVICE_NAME_FULL}}|hostname={{.HOSTNAME}}"
            ]
        }
    ],
    watches: [
      {
        name: "namingservice-{{.SERVICE_PLATFORM}}-{{.SERVICE_ENVIRONMENT}}",
        interval: 3
      }
    ]
}
  • the output of any logs you can share; if you can it would be very helpful to turn on debug logging by adding logging: { level: "DEBUG"} to your ContainerPilot configuration.
    I have logging set to debug but I don't have anything related to the issue. Seems logging output stopped?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant