Skip to content
This repository has been archived by the owner on Feb 8, 2021. It is now read-only.

can't start container which has been run #715

Open
linengier opened this issue Apr 19, 2018 · 24 comments
Open

can't start container which has been run #715

linengier opened this issue Apr 19, 2018 · 24 comments

Comments

@linengier
Copy link

when I start my container "hello", the result is "please create it first".But when I create container "hello", hyperctl output the words ""/hello" is already in use". when I use hyperctl list cmd , there did't outpur anything. It's a bug or hyper not support as docker container? Item /var/lib/hyper/ has the same files as docker.

hyperctl start hello

hyperctl ERROR: Error from daemon's response: The pod(hello) can not be found, please create it first

hyperctl run -d --name hello a7c41708ef58

hyperctl ERROR: Error from daemon's response: Conflict. The name "/hello" is already in use by container 20d8fea07a87785aced03c1e7be26f4fb5b268f958f287619dfce5a2becfea70. You have to remove (or rename) that container to be able to reuse that name.

hyperctl list -p hello

POD ID POD Name VM name Status

@gnawux
Copy link
Member

gnawux commented Apr 19, 2018

looks like some bugs in rollback procedures.

What is the a7c41708ef58?

@linengier
Copy link
Author

a7c41708ef58 is my docker image which pull form my registry

hyperctl images

REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
127.0.0.1:5000/linhaidong/hello v1 a7c41708ef58 2018-01-16 15:50:14 122.8 MB

@gnawux
Copy link
Member

gnawux commented Apr 19, 2018

could you try hyperctl run with another name?

and could you try list all pod and containers? Looks the message shows there is no pod named as hello, but there has already been a container named with /hello

@linengier
Copy link
Author

when hyperd runing, all operation is normal. When I restart hyperd, the hyperctl list cmd can't list container which I create by running hyperctl run cmd.
#hyperd runing:
$ hyperctl list
POD ID POD Name VM name Status
hello3 hello3 vm-EcbpSCOwia running
hello4 hello4 vm-TpUJhFLJsC running
$ hyperctl stop hello3
Successfully shutdown the POD: hello3!
$ hyperctl list
POD ID POD Name VM name Status
hello3 hello3 failed
hello4 hello4 vm-TpUJhFLJsC running
$ hyperctl start hello3
Successfully started the Pod(hello3)

#hyperd restart:
$ hyperctl list
POD ID POD Name VM name Status
$hyperctl start hello3
hyperctl ERROR: Error from daemon's response: The pod(hello3) can not be found, please create it first
$hyperctl run -d --name hello3 a7c41708ef58
hyperctl ERROR: Error from daemon's response: Conflict. The name "/hello3" is already in use by container 88390a41f913fc76518d18292f26bb2aa113d13da890530e5aca3fabe837d347. You have to remove (or rename) that container to be able to reuse that name.

@joelmcdonald
Copy link

I had a similar issue with container name conflicts appearing in the hyperd INFO, WARNING and ERROR logs (/var/log/hyper/hyperd.*).

It turned out to be some orphan hosts (/var/lib/hyper/hosts/) and containers (/var/lib/hyper/containers/) hanging around after a hyperd service restart. These were enough to trigger the errors and cause contracts to fail, but would never appear in the POD list (hyperctl list)... just like you're describing.

After I deleted everything from the hosts and containers directories, and restarted hyperd, everything worked fine.

It would be good if a hyperd service restart somehow triggered a graceful stop of any active PODs, and then started them again afterwards. Theory needs testing, but I think this would avoid creating the orphans.

@gnawux
Copy link
Member

gnawux commented Jul 10, 2018

@joelmcdonald which version of hyperd are you working on? master branch or a binary package?

@joelmcdonald
Copy link

good question, it was installed by remote script as part of setting up a codius host... https://codius.s3.amazonaws.com/hyper-bootstrap.sh

'hyperctl info' shows Library Version: 1.02.146-RHEL7 (2018-01-22)

@gnawux
Copy link
Member

gnawux commented Jul 10, 2018

After the release, there are quite a few fixes landed in master branch, and we are going to release a new package in a week (some of the PRs have not been merged), before that, would you mind try built with the latest master?

@joelmcdonald
Copy link

Happy to try, I'll come back with the results

@joelmcdonald
Copy link

@gnawux I'm having trouble with the make... returns lots of "cannot use description in field value" type errors

Pre-requisites all installed ok, though I did have to add a symlink to resolve a failed file path.

Below path looks very circular... I get 10+ of these on screen, then make [2] fails with error 2, and make [1] fails with error 1.

Have I missed something?

[root@auscodius hyperd]# make Making all in cmd make[1]: Entering directory /src/github.com/hyperhq/hyperd/cmd'
Making all in hyperd
make[2]: Entering directory /src/github.com/hyperhq/hyperd/cmd/hyperd' go build -gcflags="if [ "" != "" ]; then echo "-N -l"; else echo ""; fi" -tags "static_build exclude_graphdriver_btrfs libdm_no_deferred_remove" -ldflags "-X github.com/hyperhq/hyperd/utils.VERSION=1.0.0 -X github.com/hyperhq/hyperd/utils.GITCOMMIT=git describe --dirty --always --tags 2> /dev/null || true`" hyperd.go

github.com/hyperhq/hyperd/vendor/github.com/hyperhq/hyperd/daemon/pod

/usr/lib/golang/src/github.com/hyperhq/hyperd/vendor/github.com/hyperhq/hyperd/daemon/pod/persist.go:359:11: cannot use c.descript (type *"github.com/hyperhq/hyperd/vendor/github.com/hyperhq/hyperd/vendor/github.com/hyperhq/runv/api".ContainerDescription) as type *"github.com/hyperhq/hyperd/vendor/github.com/hyperhq/hyperd/vendor/github.com/hyperhq/hyperd/vendor/github.com/hyperhq/runv/api".ContainerDescription in field value`

@gnawux
Copy link
Member

gnawux commented Jul 10, 2018

could you try these RPMs:

https://s3-us-west-1.amazonaws.com/hypercontainer-build/upload/1.1-rc1/hyper-container-1.0.0-1.el7.src.rpm
https://s3-us-west-1.amazonaws.com/hypercontainer-build/upload/1.1-rc1/hyperstart-1.0.0-1.el7.x86_64.rpm

I have just built them for release test, and have not update the package meta (still show version 1.0.0-1).

@joelmcdonald
Copy link

Installed OK, service started OK...

I'll load a pod, restart and see if it errors

`[root@auscodius centos]# systemctl status hyperd
hyperd.service - hyperd
Loaded: loaded (/usr/lib/systemd/system/hyperd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2018-07-10 08:47:11 UTC; 11s ago
Docs: http://docs.hypercontainer.io
Main PID: 12671 (hyperd)
CGroup: /system.slice/hyperd.service
└─12671 /usr/bin/hyperd --log_dir=/var/log/hyper

Jul 10 08:47:11 auscodius.com systemd[1]: Started hyperd.
Jul 10 08:47:11 auscodius.com systemd[1]: Starting hyperd...
Jul 10 08:47:11 auscodius.com hyperd[12671]: time="2018-07-10T08:47:11Z" level=warning msg="devmapper: Usage of loopback devices is strongly discouraged for production use. Please use --storage-opt dm.thinpooldev or us...pooldev section."
Jul 10 08:47:11 auscodius.com hyperd[12671]: time="2018-07-10T08:47:11Z" level=warning msg="devmapper: Base device already exists and has filesystem xfs on it. User specified filesystem will be ignored."
Jul 10 08:47:11 auscodius.com hyperd[12671]: time="2018-07-10T08:47:11Z" level=info msg="[graphdriver] using prior storage driver "devicemapper""
Jul 10 08:47:11 auscodius.com hyperd[12671]: time="2018-07-10T08:47:11Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Jul 10 08:47:11 auscodius.com hyperd[12671]: time="2018-07-10T08:47:11Z" level=info msg="Firewalld running: true"
Jul 10 08:47:11 auscodius.com hyperd[12671]: time="2018-07-10T08:47:11Z" level=info msg="Loading containers: start."
Jul 10 08:47:11 auscodius.com hyperd[12671]: time="2018-07-10T08:47:11Z" level=info msg="Loading containers: done."
Hint: Some lines were ellipsized, use -l to show in full.`

@joelmcdonald
Copy link

Unfortunately, the issue is still there in the new rpms.

  1. upgraded and started hyperd
  2. successfully loaded test POD
  3. hyperctl list showed POD active
  4. restarted hyperd
  5. hyperctl list showed nothing
  6. hosts and containers were still present in /var/lib/hyper/hosts/ and /var/lib/hyper/containers/
  7. logs confirm name conflict error (below)

[root@auscodius centos]# more /var/log/hyper/hyperd.INFO Log file created at: 2018/07/10 08:52:36 Running on machine: auscodius Binary: Built with gc go1.8 for linux/amd64 Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg I0710 08:52:36.477023 13221 config.go:39] config file: %!(EXTRA string=/etc/hyper/config) I0710 08:52:36.477346 13221 config.go:74] [/etc/hyper/config] config items: &types.HyperConfig{ConfigFile:"/etc/hyper/config", Root:"/var/lib/hyper", Host:"", GRPCHost:"", StorageDriver:"", StorageBaseSize:"", VmFactoryPolicy:"", Driver:" ", Kernel:"/var/lib/hyper/kernel", Initrd:"/var/lib/hyper/hyper-initrd.img", Bridge:"", BridgeIP:"", DisableIptables:false, EnableVsock:false, DefaultLog:"", DefaultLogOpt:map[string]string{}, logPrefix:"[/etc/hyper/config] "} I0710 08:52:36.631031 13221 daemon.go:225] The hypervisor's driver is I0710 08:52:36.653608 13221 migration.go:23] Migrate lagecy persistent pod data, found: 0, migrated: 0 I0710 08:52:36.653858 13221 persist.go:57] layout loading finished E0710 08:52:36.654224 13221 persist.go:100] Pod[l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa] failed to load inf info of : leveldb: not found W0710 08:52:36.654536 13221 daemon.go:82] Got a unexpected error when creating(load) pod l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa, leveldb: not found I0710 08:52:36.654706 13221 hyperd.go:189] Hyper daemon: 1.0.0 I0710 08:52:44.537987 13221 list.go:94] got list request for pod (pod: , vm: ) E0710 08:53:15.119353 13221 server.go:170] Handler for GET /pod/info returned error: Can not get Pod info with pod ID(l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa) I0710 08:53:15.122847 13221 vm_states.go:301] SB[vm-YhvXsEcVAg] startPod: &json.Pod{Hostname:"l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa", DeprecatedContainers:[]json.Container(nil), DeprecatedInterfaces:[]json.NetworkInf(nil), Dns:[]string(nil), DnsOptions:[]string(nil), DnsSearch:[]string(nil), DeprecatedRoutes:[]json.Route(nil), ShareDir:"share_dir", PortmappingWhiteLists:(*json.PortmappingWhiteList)(0xc4208195f0)} I0710 08:53:19.510273 13221 vm_states.go:304] SB[vm-YhvXsEcVAg] pod start successfully I0710 08:53:19.510309 13221 provision.go:285] Pod[l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa] sandbox init result: <nil> E0710 08:53:19.512929 13221 container.go:411] Pod[l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa] Con[(l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa__app)] Conflict. The name "/l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5m lioa__app" is already in use by container 0931f8ea35412777a4bd22e76a50f52ccd760383d55dc91cc4b3eb6df2583e6a. You have to remove (or rename) that container to be able to reuse that name. I0710 08:53:19.512963 13221 vm_states.go:332] SB[vm-YhvXsEcVAg] poweroff vm based on command: vm.Kill() E0710 08:53:19.512982 13221 run.go:34] l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa: failed to add pod: Conflict. The name "/l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa__app" is already in use by container 0931f8ea3541277 7a4bd22e76a50f52ccd760383d55dc91cc4b3eb6df2583e6a. You have to remove (or rename) that container to be able to reuse that name. E0710 08:53:19.512997 13221 server.go:170] Handler for POST /pod/create returned error: Conflict. The name "/l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa__app" is already in use by container 0931f8ea35412777a4bd22e76a50f52ccd76038 3d55dc91cc4b3eb6df2583e6a. You have to remove (or rename) that container to be able to reuse that name. I0710 08:53:19.513193 13221 qemu_process.go:93] kill Qemu... 13334 I0710 08:53:19.513243 13221 context.go:199] SB[vm-YhvXsEcVAg] VmContext Close() I0710 08:53:19.513277 13221 qmp_handler.go:344] quit QMP by command QMP_QUIT E0710 08:53:19.513293 13221 qmp_handler.go:141] QMP exit as got error: read unix @->/var/run/hyper/vm-YhvXsEcVAg/qmp.sock: use of closed network connection I0710 08:53:19.513335 13221 decommission.go:536] Pod[l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa] got vm exit event I0710 08:53:19.513342 13221 etchosts.go:97] cleanupHosts /var/lib/hyper/hosts/l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa, /var/lib/hyper/hosts/l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa/hosts E0710 08:53:19.513395 13221 json.go:601] SB[vm-YhvXsEcVAg] get hyperstart API version error: hyperstart closed E0710 08:53:19.513427 13221 json.go:141] read init data failed E0710 08:53:19.513439 13221 json.go:601] SB[vm-YhvXsEcVAg] get hyperstart API version error: hyperstart closed W0710 08:53:19.513450 13221 hypervisor.go:47] SB[vm-YhvXsEcVAg] keep-alive test end with error: hyperstart closed E0710 08:53:19.513482 13221 json.go:401] read tty data failed E0710 08:53:19.513496 13221 json.go:458] SB[vm-YhvXsEcVAg] tty socket closed, quit the reading goroutine: read unix @->/var/run/hyper/vm-YhvXsEcVAg/tty.sock: use of closed network connection I0710 08:53:19.517880 13221 server.go:388] getting image: calerobertson/test-manifest@sha256:7c8b236c5f01e5abc78c9500cce7c974ffcedd980e60c2d3a5f1a44c8455ae2d: I0710 08:53:19.519579 13221 decommission.go:578] Pod[l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa] pod stopped I0710 08:53:22.206354 13221 server.go:406] got image: calerobertson/test-manifest@sha256:7c8b236c5f01e5abc78c9500cce7c974ffcedd980e60c2d3a5f1a44c8455ae2d I0710 08:53:22.207930 13221 server.go:388] getting image: docker.coil.com/codius-moneyd@sha256:4c02fc168e6b4cfde90475ed3c3243de0bce4ca76b73753a92fb74bf5116deef: I0710 08:53:23.005769 13221 server.go:406] got image: docker.coil.com/codius-moneyd@sha256:4c02fc168e6b4cfde90475ed3c3243de0bce4ca76b73753a92fb74bf5116deef I0710 08:53:23.007600 13221 vm_states.go:301] SB[vm-njySycyLVm] startPod: &json.Pod{Hostname:"l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa", DeprecatedContainers:[]json.Container(nil), DeprecatedInterfaces:[]json.NetworkInf(nil), Dns:[]string(nil), DnsOptions:[]string(nil), DnsSearch:[]string(nil), DeprecatedRoutes:[]json.Route(nil), ShareDir:"share_dir", PortmappingWhiteLists:(*json.PortmappingWhiteList)(0xc42083b1d0)} I0710 08:53:27.387600 13221 vm_states.go:304] SB[vm-njySycyLVm] pod start successfully I0710 08:53:27.387630 13221 provision.go:285] Pod[l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa] sandbox init result: <nil> E0710 08:53:27.390305 13221 container.go:411] Pod[l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa] Con[(l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa__app)] Conflict. The name "/l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5m lioa__app" is already in use by container 0931f8ea35412777a4bd22e76a50f52ccd760383d55dc91cc4b3eb6df2583e6a. You have to remove (or rename) that container to be able to reuse that name. I0710 08:53:27.390330 13221 vm_states.go:332] SB[vm-njySycyLVm] poweroff vm based on command: vm.Kill() E0710 08:53:27.390345 13221 run.go:34] l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa: failed to add pod: Conflict. The name "/l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa__app" is already in use by container 0931f8ea3541277 7a4bd22e76a50f52ccd760383d55dc91cc4b3eb6df2583e6a. You have to remove (or rename) that container to be able to reuse that name. E0710 08:53:27.390358 13221 server.go:170] Handler for POST /pod/create returned error: Conflict. The name "/l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa__app" is already in use by container 0931f8ea35412777a4bd22e76a50f52ccd76038 3d55dc91cc4b3eb6df2583e6a. You have to remove (or rename) that container to be able to reuse that name. I0710 08:53:27.390481 13221 qemu_process.go:93] kill Qemu... 13348 I0710 08:53:27.390529 13221 context.go:199] SB[vm-njySycyLVm] VmContext Close() I0710 08:53:27.390555 13221 qmp_handler.go:344] quit QMP by command QMP_QUIT E0710 08:53:27.390571 13221 qmp_handler.go:141] QMP exit as got error: read unix @->/var/run/hyper/vm-njySycyLVm/qmp.sock: use of closed network connection I0710 08:53:27.390611 13221 decommission.go:536] Pod[l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa] got vm exit event I0710 08:53:27.390618 13221 etchosts.go:97] cleanupHosts /var/lib/hyper/hosts/l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa, /var/lib/hyper/hosts/l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa/hosts I0710 08:53:27.390626 13221 etchosts.go:101] cannot find /var/lib/hyper/hosts/l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa/hosts I0710 08:53:27.390671 13221 decommission.go:578] Pod[l2xvchk27rbrnh3mc3y4p3iaeoonjzur2u24qxuod2iaqd5mlioa] pod stopped E0710 08:53:27.390682 13221 json.go:601] SB[vm-njySycyLVm] get hyperstart API version error: hyperstart closed E0710 08:53:27.390706 13221 json.go:401] read tty data failed E0710 08:53:27.390714 13221 json.go:458] SB[vm-njySycyLVm] tty socket closed, quit the reading goroutine: read unix @->/var/run/hyper/vm-njySycyLVm/tty.sock: use of closed network connection E0710 08:53:27.390730 13221 json.go:141] read init data failed E0710 08:53:27.390737 13221 json.go:601] SB[vm-njySycyLVm] get hyperstart API version error: hyperstart closed W0710 08:53:27.390744 13221 hypervisor.go:47] SB[vm-njySycyLVm] keep-alive test end with error: hyperstart closed

@gnawux
Copy link
Member

gnawux commented Jul 10, 2018

could you purge your old pods and try again with the new binaries?

purge the /var/lib/hyper/ stuff, but leave /var/lib/hyper/{kernel,hyper-initrd.img} not delete

@joelmcdonald
Copy link

Pretty sure that's what I did... stopped and removed all PODs, stopped hyperd, deleted all hosts and containers, installed new rpms with --force option, started hyperd, etc per above

Did I miss anything?

@joelmcdonald
Copy link

sorry... just read the second half of your message. I'll purge those files, apply the new binaries and retest

@joelmcdonald
Copy link

OK, purged /var/lib/hyper/* but left kernel and hyper-initrd.img, reinstalled new binaries, started service and retested...

The issue is still there. Logs report container name conflict, old containers and host are still present, but nothing shows in hyperctl list

@gnawux
Copy link
Member

gnawux commented Jul 10, 2018

Weird, as it report name conflict, the old leveldb file should still be there.

@joelmcdonald
Copy link

No rush, it will be 24 hours before I have a chance to retest any further updates.

What's the path for leveldb file?

@gnawux
Copy link
Member

gnawux commented Jul 10, 2018

it should under /var/lib/hyper/lib/

@joelmcdonald
Copy link

joelmcdonald commented Jul 10, 2018

Within /var/lib/hyper/lib/hyper.db I have 3 .ldb files, 1 .log file, and a CURRENT, LOCK, LOG and MANIFEST-000010 file.

Only the log file has been updated since the service was restarted.

hyperd service status
hyperd: active (running) since Tue 2018-07-10 10:11:02 UTC; 1h 13min ago

file list of hyper.db
[root@auscodius hyper.db]# ls -la total 36 drwxr-xr-x 2 root root 139 Jul 10 10:11 . drwxr-xr-x 3 root root 50 Jul 10 09:57 .. -rw-r--r-- 1 root root 4205 Jul 10 10:01 000002.ldb -rw-r--r-- 1 root root 192 Jul 10 10:08 000005.ldb -rw-r--r-- 1 root root 5790 Jul 10 10:11 000008.ldb -rw-r--r-- 1 root root 152 Jul 10 10:12 000009.log -rw-r--r-- 1 root root 16 Jul 10 10:11 CURRENT -rw-r--r-- 1 root root 0 Jul 10 09:57 LOCK -rw-r--r-- 1 root root 2301 Jul 10 10:11 LOG -rw-r--r-- 1 root root 657 Jul 10 10:11 MANIFEST-000010

@1maginarium
Copy link

Any luck on this?

@bergwolf
Copy link
Member

@1maginarium @linengier @joelmcdonald We have recently release v1.1.0 and it contains several bugfix on resource management. Would you please see if the problem still exists?

@1maginarium
Copy link

1maginarium commented Oct 10, 2018

hyper-container.x86_64 1.1.0-1.el7 installed hyperstart.x86_64 1.1.0-1.el7 installed qemu-hyper.x86_64 2.4.1-3.el7.centos installed

This issue is very easily reproduced by simply stopping or restarting hyperd / the server while a pod is running.
Then they turn into stale, zombie pods, that cannot be seen in hyperctl list only hyperctl info under pods. They also show up in systemctl status hyperd -l as to their pod names, but can not be found or deleted under any circumstance. This prevents self-test from being successful, and prevents any pods from actually being able to run in HyperD.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants