Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xrv9k slow boot #184

Open
plajjan opened this issue Dec 5, 2018 · 8 comments
Open

xrv9k slow boot #184

plajjan opened this issue Dec 5, 2018 · 8 comments

Comments

@plajjan
Copy link
Collaborator

plajjan commented Dec 5, 2018

This presentation https://clnv.s3.amazonaws.com/2018/eur/pdf/BRKSPG-2724.pdf hints that there is a "baking" process happening the first time xrv9k is booted. I think I've seen this in logs, i.e. xr starts up, does something and then reloads. It likely installs itself somehow.

However, that presentation and several of the guide indicate that one should provide a disk for xrv9k and the install media. That's not what we do. What's the difference? Is there a difference?

One thing I did seem to see is that some SSH key was generated on that first boot, so by making it part of the build process we might end up with the same host key on all VRs.

@bdreisbach
Copy link

I have done some testing around this, and indeed doing an "install" procedure can cut the boot time by roughly half(~11 minutes in my testing). Here are my observations:

  1. the calvados host key is identical on all VRs, but does this matter? the XR layer is generating its own unique keys when its bootstrapped the first time.
  2. xrv9000 is quite sensitive to kernel version(i think). for example i was pretty much unable to build the xrv9k image on a debian buster docker host(cpu lock ups), but on ubuntu xenial it works fine.
  3. if you start a VR with --nics other than what the install procedure used it will do the long boot procedure(basically rebuilds the XR partition it seems)
  4. the number of interfaces/pci buses seems to have some impact on how fast the boot time is(not done a ton of testing around this, but i discovered the above issue when i would install with 1 nic and it was definitely faster than with 128)

my patch looks like this:

diff --git a/xrv9k/Makefile b/xrv9k/Makefile
index 40f7df5..f9ad59d 100644
--- a/xrv9k/Makefile
+++ b/xrv9k/Makefile
@@ -11,3 +11,4 @@ VERSION=$(shell echo $(IMAGE) | sed -e 's/.\+[^0-9]\([0-9]\.[0-9]\.[0-9]\(\.[0-9

 -include ../makefile-sanity.include
 -include ../makefile.include
+-include ../makefile-install.include
diff --git a/xrv9k/docker/launch.py b/xrv9k/docker/launch.py
index b4ae986..88ece71 100755
--- a/xrv9k/docker/launch.py
+++ b/xrv9k/docker/launch.py
@@ -6,6 +6,7 @@ import os
 import random
 import re
 import signal
+import subprocess
 import sys
 import telnetlib
 import time
@@ -33,12 +34,13 @@ logging.Logger.trace = trace


 class XRV_vm(vrnetlab.VM):
-    def __init__(self, username, password, nics):
+    def __init__(self, username, password, nics, install_mode=False):
         for e in os.listdir("/"):
             if re.search(".qcow2", e):
                 disk_image = "/" + e
         super(XRV_vm, self).__init__(username, password, disk_image=disk_image, ram=12288)
         self.num_nics = nics
+        self.install_mode = install_mode
         self.qemu_args.extend(["-cpu", "host",
                                "-smp", "cores=4,threads=1,sockets=1",
                                "-serial", "telnet:0.0.0.0:50%02d,server,nowait" % (self.num + 1),
@@ -94,9 +96,9 @@ class XRV_vm(vrnetlab.VM):
                 self.wait_write(self.username, wait=None)
                 self.wait_write(self.password, wait="Enter secret:")
                 self.wait_write(self.password, wait="Enter secret again:")
-                self.credentials.insert(0, [self.username, self.password])
             if ridx == 3: # matched login prompt, so should login
                 self.logger.debug("matched login prompt")
+                self.credentials.insert(0, [self.username, self.password])
                 try:
                     username, password = self.credentials.pop(0)
                 except IndexError as exc:
@@ -107,6 +109,15 @@ class XRV_vm(vrnetlab.VM):
                 self.wait_write(password, wait="Password:")
                 self.logger.debug("logged in with %s / %s" % (username, password))
             if self.xr_ready == True and ridx == 4:
+                if self.install_mode:
+                    # wait for the baking process to complete
+                    self.wait_write("", wait="SDR/XR partition preparation completed successfully")
+                    self.logger.debug('install completed, shutting down device')
+                    self.logger.info("requesting power-off")
+                    self.wait_write("admin hw-module location 0/RP0 shutdown", '#')
+                    self.wait_write("yes", 'Shutdown hardware module ? [no,yes]')
+                    self.running = True
+                    return
                 # run main config!
                 if not self.bootstrap_config():
                     # main config failed :/
@@ -213,6 +224,38 @@ class XRV(vrnetlab.VR):
         self.vms = [ XRV_vm(username, password, nics) ]


+class XRV_installer(XRV):
+    """ XRV9K installer
+    """
+    def __init__(self, username, password, nics):
+        super(XRV, self).__init__(username, password)
+
+        self.vms = [ XRV_vm(username, password, nics, install_mode=True) ]
+
+    def install(self):
+        self.logger.info("Installing XRV9K")
+        xrv = self.vms[0]
+        while not xrv.running:
+            xrv.work()
+
+        # wait for system to shut down cleanly
+        for i in range(0, 600):
+            time.sleep(1)
+            try:
+                (ridx, match, res) = xrv.tn.expect([b"System halted"], 1)
+                if not match:
+                    self.logger.trace("OUTPUT XRV9K: %s" % res.decode())
+                else:
+                    self.logger.info("System halted")
+                    break
+            except Exception as exc:
+                # assume it's dead
+                self.logger.info("Can't communicate over serial console, assuming VM has shut down properly." + str(exc))
+                break
+
+        xrv.stop()
+        self.logger.info("Installation complete")

 if __name__ == '__main__':
     import argparse
@@ -220,6 +263,7 @@ if __name__ == '__main__':
     parser.add_argument('--trace', action='store_true', help='enable trace level logging')
     parser.add_argument('--username', default='vrnetlab', help='Username')
     parser.add_argument('--password', default='VR-netlab9', help='Password')
+    parser.add_argument('--install', action='store_true', help='Install XRV9K')
     parser.add_argument('--nics', type=int, default=128, help='Number of NICS')
     args = parser.parse_args()

@@ -231,5 +275,9 @@ if __name__ == '__main__':
     if args.trace:
         logger.setLevel(1)

-    vr = XRV(args.username, args.password, args.nics)
-    vr.start()
+    if args.install:
+        vr = XRV_installer(args.username, args.password, args.nics)
+        vr.install()
+    else:
+        vr = XRV(args.username, args.password, args.nics)
+        vr.start()

i can submit a pull request if there is interest and this looks ok.

@hellt
Copy link

hellt commented Mar 5, 2021

Hi @bdreisbach
thanks for playing with the install process
How did you enforce the number of nics you used during the install? Did you just changed the default or manually changed the makefile to specify the nics?

@bdreisbach
Copy link

i guess manually, in launch.py i called XRV_installer with 1 vs args.nics

@hellt
Copy link

hellt commented Mar 7, 2021

I tried to apply this patch and used 7.2.1

Seems like it doesn't work for me for because prompt doesn't match the expected string...:

2021-03-07 16:31:11,688: launch     INFO     IOS XR system configuration is complete, should be able to proceed with bootstrap configuration
2021-03-07 16:31:11,689: vrnetlab   DEBUG    writing to serial console:
2021-03-07 16:31:11,689: launch     TRACE    OUTPUT: Not settable: Success
2021-03-07 16:31:13,692: launch     TRACE    OUTPUT:


2021-03-07 16:31:41,724: launch     TRACE    OUTPUT: 0/RP0/ADMIN0:Mar  7 16:31:39.530 UTC: inst_agent[3128]: %INFRA-INSTAGENT-4-XR_PART_PREP_RPM : SDR/XR additional RPM installation is in progress

2021-03-07 16:37:12,048: launch     TRACE    OUTPUT: 0/RP0/ADMIN0:Mar  7 16:37:10.139 UTC: inst_agent[3128]: %INFRA-INSTAGENT-4-XR_PART_PREP_RESP : SDR/XR partition preparation completed successfully

2021-03-07 16:37:30,065: launch     TRACE    OUTPUT:
!!!!!!!!!!!!!!!!!!!! NO root-system username is configured. Need to configure root-system username. !!!!!!!!!!!!!!!!!!!!Configuration lock is held by another agent.  Please wait. [.
2021-03-07 16:37:34,070: launch     TRACE    OUTPUT: OK]


         --- Administrative User Dialog ---


2021-03-07 16:37:35,072: launch     INFO     Creating initial user
2021-03-07 16:37:35,072: vrnetlab   DEBUG    writing to serial console: vrnetlab
2021-03-07 16:37:35,073: vrnetlab   TRACE    waiting for 'Enter secret:' on serial console
2021-03-07 16:37:36,581: vrnetlab   TRACE    read from serial console: : vrnetlab

  % Entry must not be null.

  Enter root-system username:   Enter secret:
2021-03-07 16:37:36,581: vrnetlab   DEBUG    writing to serial console: VR-netlab9
2021-03-07 16:37:36,581: vrnetlab   TRACE    waiting for 'Enter secret again:' on serial console
2021-03-07 16:37:36,737: vrnetlab   TRACE    read from serial console:
  Enter secret again:
2021-03-07 16:37:36,737: vrnetlab   DEBUG    writing to serial console: VR-netlab9
2021-03-07 16:37:36,738: launch     TRACE    OUTPUT:
  Enter root-system username
2021-03-07 16:37:38,740: launch     TRACE    OUTPUT:  VR-netlab9

2021-03-07 16:37:56,760: launch     TRACE    OUTPUT:

2021-03-07 16:37:57,761: launch     DEBUG    matched login prompt
2021-03-07 16:37:57,762: launch     DEBUG    trying to log in with vrnetlab / VR-netlab9
2021-03-07 16:37:57,762: vrnetlab   DEBUG    writing to serial console: vrnetlab
2021-03-07 16:37:57,762: vrnetlab   TRACE    waiting for 'Password:' on serial console
2021-03-07 16:37:59,986: vrnetlab   TRACE    read from serial console:  0/RP0/ADMIN0:Mar  7 16:37:56.333 UTC: vm_manager[3155]: %INFRA-VM_MANAGER-4-INFO : Info: vm_manager started VM default-sdr--2
vrnetlab
Password:
2021-03-07 16:37:59,986: vrnetlab   DEBUG    writing to serial console: VR-netlab9
2021-03-07 16:37:59,986: launch     DEBUG    logged in with vrnetlab / VR-netlab9
Use the 'configure' command to modify this configuration.
User Access Verification

Username:
2021-03-07 16:38:01,990: launch     TRACE    OUTPUT:



2021-03-07 16:38:02,991: vrnetlab   TRACE    waiting for 'SDR/XR partition preparation completed successfully' on serial console

for comparison, that is the successful log without install:

2021-03-07 18:41:16,386: launch     INFO     IOS XR system configuration is complete, should be able to proceed with bootstrap configuration
2021-03-07 18:41:16,386: vrnetlab   DEBUG    writing to serial console:
2021-03-07 18:41:16,387: launch     TRACE    OUTPUT: Not settable: Success
2021-03-07 18:41:18,389: launch     TRACE    OUTPUT:


2021-03-07 18:41:46,426: launch     TRACE    OUTPUT: 0/RP0/ADMIN0:Mar  7 18:41:43.987 UTC: inst_agent[3127]: %INFRA-INSTAGENT-4-XR_PART_PREP_RPM : SDR/XR additional RPM installation is in progress

2021-03-07 18:46:48,792: launch     TRACE    OUTPUT: 0/RP0/ADMIN0:Mar  7 18:46:46.560 UTC: inst_agent[3127]: %INFRA-INSTAGENT-4-XR_PART_PREP_RESP : SDR/XR partition preparation completed successfully

2021-03-07 18:47:16,827: launch     TRACE    OUTPUT:
!!!!!!!!!!!!!!!!!!!! NO root-system username is configured. Need to configure root-system username. !!!!!!!!!!!!!!!!!!!!Configuration lock is held by another agent.  Please wait. [.
2021-03-07 18:47:20,236: launch     INFO     Creating initial user
2021-03-07 18:47:20,236: vrnetlab   DEBUG    writing to serial console: clab
2021-03-07 18:47:20,236: vrnetlab   TRACE    waiting for 'Enter secret:' on serial console
2021-03-07 18:47:21,100: vrnetlab   TRACE    read from serial console: :
  % Entry must not be null.

  Enter root-system username: clab
  Enter secret:
2021-03-07 18:47:21,100: vrnetlab   DEBUG    writing to serial console: clab@123
2021-03-07 18:47:21,100: vrnetlab   TRACE    waiting for 'Enter secret again:' on serial console
2021-03-07 18:47:21,439: vrnetlab   TRACE    read from serial console:  clab@123

  Enter secret again:
2021-03-07 18:47:21,439: vrnetlab   DEBUG    writing to serial console: clab@123
2021-03-07 18:47:21,440: launch     TRACE    OUTPUT: OK]


         --- Administrative User Dialog ---


  Enter root-system username
2021-03-07 18:47:23,442: launch     TRACE    OUTPUT:
2021-03-07 18:47:31,456: launch     TRACE    OUTPUT:

2021-03-07 18:47:32,458: launch     DEBUG    matched login prompt
2021-03-07 18:47:32,458: launch     DEBUG    trying to log in with clab / clab@123
2021-03-07 18:47:32,458: vrnetlab   DEBUG    writing to serial console: clab
2021-03-07 18:47:32,458: vrnetlab   TRACE    waiting for 'Password:' on serial console
2021-03-07 18:47:35,987: vrnetlab   TRACE    read from serial console:  clab
0/RP0/ADMIN0:Mar  7 18:47:32.454 UTC: vm_manager[3152]: %INFRA-VM_MANAGER-4-INFO : Info: vm_manager started VM default-sdr--2
Password:
2021-03-07 18:47:35,987: vrnetlab   DEBUG    writing to serial console: clab@123
2021-03-07 18:47:35,987: launch     DEBUG    logged in with clab / clab@123
Use the 'configure' command to modify this configuration.
User Access Verification

Username:
2021-03-07 18:47:37,991: launch     TRACE    OUTPUT:



2021-03-07 18:47:38,993: launch     INFO     applying bootstrap configuration
2021-03-07 18:47:38,993: vrnetlab   DEBUG    writing to serial console:
2021-03-07 18:47:38,993: vrnetlab   TRACE    waiting for '#' on serial console
2021-03-07 18:47:39,033: vrnetlab   TRACE    read from serial console:
RP/0/RP0/CPU0:ios#
2021-03-07 18:47:39,033: vrnetlab   DEBUG    writing to serial console: terminal length 0
2021-03-07 18:47:39,033: vrnetlab   TRACE    waiting for '#' on serial console
2021-03-07 18:47:42,694: vrnetlab   TRACE    read from serial console: terminal length 0

@vincentbernat
Copy link

So, I gave it a try. It works for me with 7.2.2 but it redoes the whole installation. The fact that each try is so slow makes it a bit difficult to iterate.

From 1e8b4127a599dd1e43531c907aa721be01aa9a57 Mon Sep 17 00:00:00 2001
From: Vincent Bernat <vincent@bernat.ch>
Date: Mon, 20 Sep 2021 10:12:29 +0200
Subject: [PATCH] WIP

---
 xrv9k/Makefile         |  1 +
 xrv9k/docker/launch.py | 97 ++++++++++++++++++++++++++++++++++++------
 2 files changed, 86 insertions(+), 12 deletions(-)

diff --git a/xrv9k/Makefile b/xrv9k/Makefile
index 40f7df5723c0..f9ad59df6b75 100644
--- a/xrv9k/Makefile
+++ b/xrv9k/Makefile
@@ -11,3 +11,4 @@ VERSION=$(shell echo $(IMAGE) | sed -e 's/.\+[^0-9]\([0-9]\.[0-9]\.[0-9]\(\.[0-9
 
 -include ../makefile-sanity.include
 -include ../makefile.include
+-include ../makefile-install.include
diff --git a/xrv9k/docker/launch.py b/xrv9k/docker/launch.py
index 69d28437aa6f..a8f534951741 100755
--- a/xrv9k/docker/launch.py
+++ b/xrv9k/docker/launch.py
@@ -7,6 +7,7 @@
 import signal
 import sys
 import time
+import subprocess
 
 
 import vrnetlab
@@ -38,7 +39,17 @@ def trace(self, message, *args, **kws):
 
 
 class XRV_vm(vrnetlab.VM):
-    def __init__(self, hostname, username, password, nics, conn_mode, vcpu, ram):
+    def __init__(
+        self,
+        hostname,
+        username,
+        password,
+        nics,
+        conn_mode,
+        vcpu,
+        ram,
+        install_mode=False,
+    ):
         disk_image = ""
         for e in os.listdir("/"):
             if re.search(".qcow2", e):
@@ -47,6 +58,7 @@ def __init__(self, hostname, username, password, nics, conn_mode, vcpu, ram):
         self.hostname = hostname
         self.conn_mode = conn_mode
         self.num_nics = nics
+        self.install_mode = install_mode
         self.qemu_args.extend(
             [
                 "-cpu",
@@ -111,7 +123,7 @@ def gen_mgmt(self):
         return res
 
     def bootstrap_spin(self):
-        """"""
+        """ """
 
         if self.spins > 600:
             # too many spins with no result ->  give up
@@ -162,6 +174,19 @@ def bootstrap_spin(self):
                 self.wait_write(password, wait="Password:")
                 self.logger.debug("logged in with %s / %s" % (username, password))
             if self.xr_ready is True and ridx == 4:
+                if self.install_mode:
+                    # wait for the baking process to complete
+                    self.logger.debug("install completed, shutting down device")
+                    self.logger.info("requesting power-off")
+                    self.wait_write("", None)
+                    self.wait_write("conf t")
+                    self.wait_write("commit replace best-effort")
+                    self.wait_write("yes", "Do you wish to proceed? [no]")
+                    self.wait_write("exit")
+                    self.wait_write("admin hw-module location 0/RP0 shutdown", "#")
+                    self.wait_write("yes", "Shutdown hardware module ? [no,yes]")
+                    self.running = True
+                    return
                 # run main config!
                 if not self.bootstrap_config():
                     # main config failed :/
@@ -220,10 +245,6 @@ def bootstrap_config(self):
         if not self._wait_config("show interfaces description", "Gi0/0/0/0"):
             return False
 
-        # wait for call-home in config
-        if not self._wait_config("show running-config call-home", "service active"):
-            return False
-
         self.wait_write("configure")
         self.wait_write(f"hostname {self.hostname}")
         # configure netconf
@@ -269,9 +290,55 @@ def _wait_config(self, show_cmd, expect):
 
 
 class XRV(vrnetlab.VR):
-    def __init__(self, hostname, username, password, nics, conn_mode, vcpu, ram):
+    def __init__(self, hostname, username, password, *remaining):
+        super(XRV, self).__init__(username, password)
+        self.vms = [XRV_vm(hostname, username, password, *remaining)]
+
+
+class XRV_installer(XRV):
+    """XRV9K installer"""
+
+    def __init__(self, username, password, *remaining):
         super(XRV, self).__init__(username, password)
-        self.vms = [XRV_vm(hostname, username, password, nics, conn_mode, vcpu, ram)]
+        self.vms = [XRV_vm(username, password, *remaining, install_mode=True)]
+
+    def install(self):
+        self.logger.info("Installing XRV9K")
+        xrv = self.vms[0]
+        while not xrv.running:
+            xrv.work()
+
+        # wait for system to shut down cleanly
+        for i in range(0, 600):
+            time.sleep(1)
+            try:
+                xrv.p.communicate(timeout=1)
+            except subprocess.TimeoutExpired:
+                pass
+            except Exception as exc:
+                # assume it's dead
+                self.logger.info(
+                    "Can't communicate with qemu process, assuming VM has shut down properly."
+                    + str(exc)
+                )
+                break
+
+            try:
+                (ridx, match, res) = xrv.tn.expect([b"System halted"], 1)
+                if res != b"":
+                    self.logger.trace("OUTPUT XRV9K: %s" % res.decode())
+                else:
+                    break
+            except Exception as exc:
+                # assume it's dead
+                self.logger.info(
+                    "Can't communicate over serial console, assuming VM has shut down properly."
+                    + str(exc)
+                )
+                break
+
+        xrv.stop()
+        self.logger.info("Installation complete")
 
 
 if __name__ == "__main__":
@@ -284,12 +351,13 @@ def __init__(self, hostname, username, password, nics, conn_mode, vcpu, ram):
     parser.add_argument("--hostname", default="vr-xrv9k", help="Router hostname")
     parser.add_argument("--username", default="vrnetlab", help="Username")
     parser.add_argument("--password", default="VR-netlab9", help="Password")
-    parser.add_argument("--nics", type=int, default=128, help="Number of NICS")
+    parser.add_argument("--install", action="store_true", help="Install XRV9K")
+    parser.add_argument("--nics", type=int, default=64, help="Number of NICS")
     parser.add_argument(
         "--vcpu", type=int, default=2, help="Number of cpu cores to use"
     )
     parser.add_argument(
-        "--ram", type=int, default=12228, help="Number RAM to use in MB"
+        "--ram", type=int, default=14336, help="Number RAM to use in MB"
     )
     parser.add_argument(
         "--connection-mode",
@@ -309,7 +377,7 @@ def __init__(self, hostname, username, password, nics, conn_mode, vcpu, ram):
     logger.debug(f"Environment variables: {os.environ}")
     vrnetlab.boot_delay()
 
-    vr = XRV(
+    xrv_args = (
         args.hostname,
         args.username,
         args.password,
@@ -318,4 +386,9 @@ def __init__(self, hostname, username, password, nics, conn_mode, vcpu, ram):
         args.vcpu,
         args.ram,
     )
-    vr.start()
+    if args.install:
+        vr = XRV_installer(*xrv_args)
+        vr.install()
+    else:
+        vr = XRV(*xrv_args)
+        vr.start()
-- 
2.33.0

@hellt
Copy link

hellt commented Sep 20, 2021

@vincentbernat what did you change from the base patch?

@vincentbernat
Copy link

Mostly, I did a factory reset after initial install. Also, I did not alter the way initial user is created and logged in.

@plajjan
Copy link
Collaborator Author

plajjan commented Sep 22, 2021

I can indeed confirm that giving XRV more NICs makes bootup slower. We did some testing of that, probing through number of nics from like 1 to 128 or so in steps of 10 and it's pretty linear. This was not with a special install mode, but just vanilla image as produced by vrnetlab today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants