Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrapping of vmx 18.2 isn't working #183

Open
tias77 opened this issue Nov 26, 2018 · 10 comments
Open

Bootstrapping of vmx 18.2 isn't working #183

tias77 opened this issue Nov 26, 2018 · 10 comments

Comments

@tias77
Copy link
Contributor

tias77 commented Nov 26, 2018

I'm working with vmx-bundle-18.2R1.9.tgz
The reason it isn't working seems to be related to the bootstrapping of the config.
I have tried adding "set system services ssh root-login allow" to the bootstrapping but I'm not quite there yet.

@tias77
Copy link
Contributor Author

tias77 commented Nov 26, 2018

Perhaps related:

ALERT:Auto image Upgrade will start. This can terminate config         CLI sess
ion(s). Modified configuration will be lost. To stop Auto-image, in CLI do the 
following: 'edit; delete chassis auto-image-upgrade; commit'. 

@tias77
Copy link
Contributor Author

tias77 commented Nov 27, 2018

After a fresh start, there seems to be something built in in vmx that does a factory reset after "a while" using the "button".

root> show system commit    
0   2018-11-27 08:14:21 UTC by root via button
1   2018-11-27 08:14:11 UTC by root via cli
2   2018-11-27 08:14:04 UTC by root via cli
3   2018-11-27 08:10:24 UTC by root via other

adding a "time.sleep(15)" in the beginning of bootstrap_config helps:

root> show system commit 
0   2018-11-27 09:04:02 UTC by root via cli
1   2018-11-27 09:03:59 UTC by root via cli
2   2018-11-27 09:03:53 UTC by root via button
3   2018-11-27 09:00:32 UTC by root via other

@fredsod
Copy link
Contributor

fredsod commented Dec 2, 2018

@tias77 I think this will be fixed when #178 is merged.

@plajjan
Copy link
Collaborator

plajjan commented Dec 11, 2018

@tias77 @fredsod any feedback since #178 is merged?

@networkop
Copy link

networkop commented Jan 7, 2019

I don't think that commit fixed this issue completely. The auto image upgrade gets started on boot when it can't find the saved config and adds some special autoimage config. When this happens before we run bootstrap_config, we're good and bootstrap completes successfully. However in my case, on some vmx devices, it happened during bootstrap_config which disconnected the config/cli session and left vmx uninitialized. It might looks something like this:

Retype new password:
2019-01-07 08:10:17,135: launch     DEBUG    writing to serial console: VR-netlab9
                                                                               
Broadcast Message from root@                                                   
        (no tty) at 8:10 UTC...                                                
                                                                               
Committing autoinstall config                                                  
                                                                               
cli: remote side unexpectedly closed connection

root@:~ #
2019-01-07 08:10:17,178: launch     DEBUG    writing to serial console: set interfaces fxp0 unit 0 family inet address 10.0.0.15/24
2019-01-07 08:10:17,223: launch     DEBUG    RES:  set interfaces fxp0 unit 0 family inet address 10.0.0.15/24
set: Variable name must begin with a letter.
root@:~ #
2019-01-07 08:10:17,223: launch     DEBUG    writing to serial console: delete interfaces fxp0 unit 0 family inet dhcp
2019-01-07 08:10:17,268: launch     DEBUG    RES:  delete interfaces fxp0 unit 0 family inet dhcp
delete: Command not found.
root@:~ #

I've solved it by making the config session exclusive, like this:

    def bootstrap_config(self):
        """ Do the actual bootstrap config
        """
        self.wait_write("cli", None)
        self.wait_write("configure exclusive", '>', 10)       
        self.wait_write("set chassis fpc 0 pic 0 number-of-ports 96")

happy to do a pull request if everyone's happy with this

@networkop
Copy link

looks like it was just a false positive when i thought that workaround actually worked. this auto image upgrade thing will kick you out even if you're in exclusive session. In fact there are many other ways it can fail the bootstrap. So the only solution that is more or less stable for me is this

@@ -98,7 +98,13 @@ class VMX_vcp(vrnetlab.VM):
             if ridx == 1:
                 if self.install_mode:
                     self.logger.info("requesting power-off")
-                    self.wait_write("cli", None)
+                    self.wait_write("/usr/sbin/mgd \"-ZS\" \"intialsetup-commit\" \"ex_series_auto_config\"", None)
+                    time.sleep(10)
+                    self.wait_write("cli", "root@(%|:~ #)")
+                    self.wait_write("configure", '>', 10)
+                    self.wait_write("delete chassis auto-image-upgrade")
+                    self.wait_write("commit")
+                    self.wait_write("exit")
                     self.wait_write("request system power-off", '>')
                     self.wait_write("yes", 'Power Off the system')
                     self.running = True

It adds 10 extra seconds to build time (who cares when it's 15 mins anyway), but like any timer-based solution can still fail. The benefit is that if build phases passes, you're guaranteed that all subsequent runs of that images will be successful.
But I reckon config-drive should be the way forward, as it is the most reliable solution to this ztp mess.

@plajjan
Copy link
Collaborator

plajjan commented Jan 13, 2019

@networkop can you elaborate on the solution (second/last one) you suggested? What does that mgd command do? Feel free to open PR. I just want a better explanation so I understand what is going on here (the next maintainer needs to understand) and I've never seen this initialsetup-commit stuff before.

I agree 10 seconds is nothing for long build time (it's more like 5 minutes for me though). Can't we write a for loop and make sure it's configured the way we want it?

You think config-drive for run-time only or for build or both?

@networkop
Copy link

Based on my understanding, that mgd command is what triggers the auto-image-upgrade mode on vMX. it can be found in /usr/sbin/auto_image_config_enable inside ex_auto_image_config_mgd_action function. When it is run, it installs a special "ztp" config (e.g. dhcp on fxp0) and commits it. Normally, it would run on boot anyway, but by running it explicitly we're forcing it to make it happen quicker.

For loop can be a better solution, however, sleep is a) simpler and b) does the job in all 100% of my cases, but ultimately it may be a more robust solution.

I think config-drive for run-time since we don't really need to do much during build/install phases. I'm assuming that config-drive will prevent auto_image_upgrade from being triggered so none of the above workarounds would be needed (worst case just adding the delete chassis auto-image-upgrade to bootstrap config would be enough). Plus it's more reliable then expect and you can potentially mount startup configuration from outside of container, merge it with bootstrap and pass it all the way inside the VM.

So in my view, solutions to this problem in order of preference are:

  1. config-drive
  2. for loop
  3. sleep 10

@hellt
Copy link

hellt commented Jan 20, 2021

Thats funny how 2yr later I still hit the very same issue =) Thanks @networkop, your witchery with mgd helped to solve this riddle for me.

ADD: unfortunately that workaround seems to have its problems.
For starters, I am not sure if after running mgd we should still have an auto-upgrade statement. Because it seems I dont have it in 20.2

2021-01-21 10:33:25,267: launch     INFO     matched login prompt
2021-01-21 10:33:25,267: launch     DEBUG    writing to serial console: root
2021-01-21 10:33:26,552: launch     INFO     requesting power-off
2021-01-21 10:33:26,552: launch     DEBUG    writing to serial console: /usr/sbin/mgd "-ZS" "intialsetup-commit" "ex_series_auto_config"
2021-01-21 10:33:36,562: launch     TRACE    Waiting for root@(%|:~ #)
2021-01-21 10:33:36,563: launch     TRACE    Read:  /usr/sbin/mgd "-ZS" "intialsetup-commit" "ex_series_auto_config"
root@:~ #
2021-01-21 10:33:36,563: launch     DEBUG    writing to serial console: cli
2021-01-21 10:33:36,563: launch     TRACE    Waiting for >
2021-01-21 10:33:37,267: launch     TRACE    Read:  cli
root>
2021-01-21 10:33:37,268: launch     DEBUG    writing to serial console: configure
2021-01-21 10:33:37,268: launch     TRACE    Waiting for #
2021-01-21 10:33:37,312: launch     TRACE    Read:  configure 
Entering configuration mode

[edit]
root#
2021-01-21 10:33:37,312: launch     DEBUG    writing to serial console: delete chassis auto-image-upgrade
2021-01-21 10:33:37,312: launch     TRACE    Waiting for #
2021-01-21 10:33:37,334: launch     TRACE    Read:  delete chassis auto-image-upgrade 
warning: statement not found

[edit]
root#
2021-01-21 10:33:37,334: launch     DEBUG    writing to serial console: commit
2021-01-21 10:33:37,334: launch     TRACE    Waiting for #
2021-01-21 10:33:38,319: launch     TRACE    Read:  commit 
commit complete

@hellt
Copy link

hellt commented Jan 21, 2021

unfortunately this workaround doesn't work all the time.
I started to have consequent failures, as delete chassis auto-image-upgrade doesn't apply with an error warning: statement not found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants