Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlite not starting correctly after reboot #214

Open
getninjaN opened this issue Dec 1, 2016 · 18 comments
Open

dlite not starting correctly after reboot #214

getninjaN opened this issue Dec 1, 2016 · 18 comments

Comments

@getninjaN
Copy link

Bug Reports

  • dlite version in use (run dlite --version):
    dlite version 2.0.0-beta8

  • expected behavior:
    dlite should start correctly after reboot and make my day the best day ever.

  • actual behavior:
    dlite doesn't start correctly after reboot and makes the reboot day the worst day ever.

  • steps to reproduce
    I haven't got a clue

TL;DR

Something seems to be wrong with extractUser, lookupUser or proxy on my machine, I don't really know...

My story

After the first install dlite starts without any problems and runs great, but after a reboot it won't start correctly.
Directly after I log in I can find a dlite process but no hyperkit process in Activity Monitor.
The dlite process is using 1-2 MB of RAM, which sounds small but probably isn't anything weird.

docker ps returns an error

$ docker ps
Error response from daemon: Unable to connect to the virtual machine

dlite start runs into a timeout. (two dlite processes during this time and when it's done one process is terminated and the original process persists)

$ dlite start
Starting the virtual machine: ERROR!
Timed out waiting for virtual machine

dlite stop runs into infinity and beyond until I press ctrl-c. (the dlite process is still running)

Running dlite stop again after this:

$ dlite stop
Stopping the virtual machine: done

(the dlite process is still running)

Debug mode activated

So I start digging and I find out that a some commands makes a HTTP POST request to http://127.0.0.1:1050/[command].

Running curl -X POST http://127.0.0.1:1050/start returns Unauthorized
Running curl -X POST --header "X-Username: emil" http://127.0.0.1:1050/start returns Timed out waiting for virtual machine
Running curl -X POST http://127.0.0.1:1050/stopreturns Virtual machine is not running (which is expected)
Using Chrome and visiting http://127.0.0.1:1050/status also returns Unauthorized.

It seems like there's something wrong with extractUser, lookupUser or proxy.

I have tried to uninstall everything (I think) and reinstall dlite but with the same results.
I have tried to unload local.docker.plist and loading it again but with the same results.

@kakoni
Copy link

kakoni commented Dec 9, 2016

I'm experiencing same symptoms, except http://127.0.0.1:1050/status returns valid looking status

{"id":"a54321e06-be54-11e6-9769-7056818e1367","hostname":"local.docker","disk_size":20,"disk_path":"/Users/kakoni/.dlite/disk.qcow","cpu_cores":2,"memory":2,"dns_server":"192.168.64.1","docker_version":"latest","docker_args":"","route":true,"started":true,"ip":"192.168.64.7","pid":8549}

@bibendi
Copy link

bibendi commented Dec 13, 2016

My story

reboot

$ docker ps
Cannot connect to the Docker daemon. Is the docker daemon running on this host?

$ ls -l /var/run/docker.sock
srwxrwxrwx 1 root daemon 0 13 дек 23:03 /var/run/docker.sock

$ sudo rm -rf /var/run/docker.sock
$ sudo launchctl stop local.dlite
$ sudo launchctl start local.dlite
$ dlite start
Starting the virtual machine: ERROR!
Timed out waiting for virtual machine
$ dlite status
vm_state: started
ip_address: 192.168.64.4
pid: 963
id: aceb2746-a7d0-11e6-affc-80e6502222b0
hostname: local.docker
disk_size: 25
disk_path: /Users/merkushin/.dlite/disk.qcow
cpu_cores: 2
memory: 3
dns_server: 192.168.64.1
docker_version: latest
docker_args: --bip=172.17.0.1/24 --dns=172.17.0.1

$ docker ps
Error response from daemon: Unable to connect to the virtual machine

Waiting 1 minute...

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7b5e00b68d69 aacebedo/dnsdock:latest-amd64 "dnsdock" 4 weeks ago Up 2 seconds 172.17.0.1:53->53/udp dnsdock

@nlf
Copy link
Owner

nlf commented Dec 13, 2016

can those of you experiencing this issue run dlite ssh and tell me the version of dlite-os shown?

if you have a version earlier than 1.0.0-beta3 this problem should be fixed by re-running dlite init, if you have dlite-os 1.0.0-beta3 and you're still experiencing this issue i have more debugging to do

@getninjaN
Copy link
Author

$ dlite ssh
dlite-os version 1.0.0-beta3
Docker version 1.12.3, build 6b644ec

@bibendi
Copy link

bibendi commented Dec 19, 2016

$ dlite ssh
merkushin@local.docker's password:

ctrl + c =)

$ ssh docker@$(dlite ip)
docker@192.168.64.4's password:
dlite-os version 1.0.0-beta3
Docker version 1.12.3, build 6b644ec

@getninjaN
Copy link
Author

A thing just hit me, I downloaded the binary from "Releases" and did not build it myself.
Could this be a thing causing me trouble?

@nlf
Copy link
Owner

nlf commented Dec 31, 2016

@getninjaN that shouldn't be causing a problem, that's the binary i run on my own laptop without issues.

interesting that the vm seems to be coming up and just isn't phoning home correctly.. can someone who is able to login to their vm run df and tell me available disk space on their vm?

@maeldur
Copy link

maeldur commented Jan 22, 2017

i'm getting this same issue (beta8) and

$ dlite ssh
ssh: connect to host local.docker port 22: Operation timed out

@getninjaN
Copy link
Author

I managed to fix this.

TL;DR

PEBKAC 🙃

The long story

  1. $ dlite stop (2.0.0-beta8 or beta9)
  2. Run Docker Toolbox Uninstall Script
  3. $ brew uninstall docker docker-machine
  4. $ dlite uninstall (2.0.0-beta8 or beta9)
  5. $ brew uninstall dlite (2.0.0-beta8 or beta9)
  6. $ brew install dlite (1.1.5)
  7. $ brew uninstall dlite (1.1.5)
  8. Restart macOS
  9. $ brew install docker-compose
  10. Download dlite 2.0.0-beta9
  11. $ cp dlite /usr/local/bin
  12. $ dlite init
  13. Got DISK ERROR! as in Creating disk: ERROR!  #217
  14. $ brew install libev (To fix DISK ERROR!)
  15. $ dlite init
  16. docker-compose up and when this was done...
  17. ... restart macOS
  18. $ dlite start
  19. ...
  20. PROFIT!

Conclusion

Now everything is working like clockwork again.
What the problem was from the beginning is probably a combination of having used Kitematic, Docker for Mac and dlite-1.1.5, without properly uninstalling them first and in between use.

@bibendi
Copy link

bibendi commented Feb 13, 2017

I'm tired of waiting for fix of the problem 😸
docker-machine-driver-xhyve is working like a charm

@getninjaN
Copy link
Author

Well sh*t... Ran into another problem now.
My Mac was acting up and I had to kill it with the power switch.

Now when I try to run dlite start I get this error

Starting the virtual machine: ERROR!
chown /Users/emil/.dlite/vm.tty: no such file or directory

In my console I get this for InternetSharing (/usr/libexec/InternetSharing)

2017-02-15 17:03:20.636867
com.docker.hyperkit: com.apple.NetworkSharing.broadcast-1 has been started
2017-02-15 17:03:20.650614
com.docker.hyperkit: com.apple.NetworkSharing.broadcast-1 (idle) has been stopped

Peace and love

@nlf
Copy link
Owner

nlf commented Feb 15, 2017

well that's a new one.. i've actually seen the no such file or directory for the vm.tty before, though, so i'll open an issue for that specifically.

the InternetSharing stuff though, that's a new one. is there anything interesting in /Users/emil/.dlite/vm.log? likely at the very bottom

@getninjaN
Copy link
Author

Nope.. vm.log wasn't modified at all.
Checked it when I first got the problem and after a new reboot, to see if that made it work, it had nothing new in it.

I can try to see if I'm able to reproduce this and check again.
Tried a whole bunch of stupid things without any success so I just reinstalled.

@KiSchulte
Copy link

Hi,

still an issue.
Is there any progress on this issue?

Just downloaded the binary today.

dlite ssh dlite-os version 1.0.0-beta3 Docker version 1.12.3, build 6b644ec

@synic
Copy link

synic commented Apr 5, 2017

This is the biggest issue I have with dlite ATM. Any progress here? Is dlite still being developed?

@nlf
Copy link
Owner

nlf commented Apr 5, 2017

@synic sorry, yes. i'm still working on this one. doing some refactoring to make things more testable and also make it easier to handle error cases, and log more debugging information.

not being able to reproduce this one makes fixing it like playing a game of whack-a-mole in the dark with a blindfold on, rather than doing that i'm going to shuffle things around to try to isolate pieces of logic as much as possible. with that and some additional logging it should become a lot more clear when things go wrong. plus it means i can start actually writing unit tests for things, which will be nice.

it is, however, slow going. i promise it'll all be worth it in the long run though!

@synic
Copy link

synic commented Apr 5, 2017 via email

@WoodrowShigeru
Copy link

Not sure if just a coincidence but … after over an hour of dlite start and dlite stop, I just deactivated my WLAN and ran dlite start again – it started on the first try.

Maybe this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants