Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grains missing/lost when keep_alive is False and connection is closed. #69

Open
TheBirdsNest opened this issue Jul 20, 2021 · 3 comments

Comments

@TheBirdsNest
Copy link

TheBirdsNest commented Jul 20, 2021

Opening this as I believe it to be a bug/undesirable state.

When refresh grains or any sync activity on the proxy (saltutil.sync_all) grains populated by Napalm are reset to 'None' when the connection to the device has closed. This occurs when the proxy setting keep_alive:False is applied.

Instead, what should happen is if a grains refresh is requested, the connection should be restarted and grains re-collected.

proxy.sls:

proxy:
  global_delay_factor: 5
  proxytype: napalm
  driver: ios
  always_alive: False

Workflow:

/run # salt -I 'realm:WSC' grains.get model
c40988df-1a86-4ff0-bf8d-e018cc6c55bd:
    model:
        C891-24X/K9

/run # salt 'c40988df-1a86-4ff0-bf8d-e018cc6c55bd' saltutil.refresh_grains
c40988df-1a86-4ff0-bf8d-e018cc6c55bd:
    True

/run # salt 'c40988df-1a86-4ff0-bf8d-e018cc6c55bd' grains.get model
c40988df-1a86-4ff0-bf8d-e018cc6c55bd:
    None

Proxy log:

[ERROR   ] Cannot execute "cli" on <redacted> as Salt. Reason: Socket is closed!
[ERROR   ] Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/napalm/ios/ios.py", line 205, in _send_command
    output = self.device.send_command(command)
  File "/usr/lib/python3.9/site-packages/netmiko/utilities.py", line 500, in wrapper_decorator
    return func(self, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/netmiko/base_connection.py", line 1475, in send_command
    prompt = self.find_prompt(delay_factor=delay_factor)
  File "/usr/lib/python3.9/site-packages/netmiko/base_connection.py", line 1179, in find_prompt
    self.write_channel(self.RETURN)
  File "/usr/lib/python3.9/site-packages/netmiko/base_connection.py", line 459, in write_channel
    self._write_channel(out_data)
  File "/usr/lib/python3.9/site-packages/netmiko/base_connection.py", line 417, in _write_channel
    self.remote_conn.sendall(write_bytes(out_data, encoding=self.encoding))
  File "/usr/lib/python3.9/site-packages/paramiko/channel.py", line 846, in sendall
    sent = self.send(s)
  File "/usr/lib/python3.9/site-packages/paramiko/channel.py", line 801, in send
    return self._send(s, m)
  File "/usr/lib/python3.9/site-packages/paramiko/channel.py", line 1198, in _send
    raise socket.error("Socket is closed")
OSError: Socket is closed

Please note I applied the fix for the bug noted here: saltstack/salt#60025

Versions Report:

/run # salt-proxy --versions-report
Salt Version:
          Salt: 3003

Dependency Versions:
          cffi: 1.14.5
      cherrypy: 18.6.1
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.0.1
       libgit2: 1.1.0
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.20
      pycrypto: 3.10.1
  pycryptodome: 3.10.1
        pygit2: 1.4.0
        Python: 3.9.5 (default, May 12 2021, 20:44:22)
  python-gnupg: Not Installed
        PyYAML: 5.4.1
         PyZMQ: 19.0.2
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: alpine 3.14.0
        locale: utf-8
       machine: x86_64
       release: 5.4.72-microsoft-standard-WSL2
        system: Linux
       version: Alpine Linux 3.14.0

Note: This is a lab setup but the same is seen in our production setup with the same version but running on CentOS.

Many thanks!

@TheBirdsNest
Copy link
Author

TheBirdsNest commented Jul 20, 2021

P.S - the reason for us enabling keep_alive:False is due to a Cisco CVE where the implementation of SSH Global Requests is causing a FSM exception and crashing devices.

@TheBirdsNest
Copy link
Author

I believe I have found the issue in salt.utilts.napalm package..

I can do a PR to the SaltStack repo if thats the right place for it?

>>> import napalm.base as napalm_base
>>> from napalm_base.exceptions import ConnectionClosedException
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'napalm_base'
>>>
>>>
>>>
>>> from napalm.base.exceptions import ConnectionClosedException

It appears naplam_base is never loaded (correctly?) so the HAS_CONN_CLOSED_EXC_CLASS which causes this reconnect clause to be ignored in the call() function:

elif retry and HAS_CONN_CLOSED_EXC_CLASS and isinstance(error, ConnectionClosedException):

Tested my manually importing:

from napalm.base.exceptions import ConnectionClosedException and its working well now:

/run # salt -I 'realm:WSC' grains.get model
c40988df-1a86-4ff0-bf8d-e018cc6c55bd:
    C891-24X/K9


-------------------------------------------
Summary
-------------------------------------------
# of minions targeted: 1
# of minions returned: 1
# of minions that did not return: 0
# of minions with errors: 0
-------------------------------------------
/run # salt -I 'realm:WSC' saltutil.refresh_grains
c40988df-1a86-4ff0-bf8d-e018cc6c55bd:
    True


-------------------------------------------
Summary
-------------------------------------------
# of minions targeted: 1
# of minions returned: 1
# of minions that did not return: 0
# of minions with errors: 0
-------------------------------------------
/run # salt -I 'realm:WSC' grains.get model
c40988df-1a86-4ff0-bf8d-e018cc6c55bd:
    C891-24X/K9


-------------------------------------------
Summary
-------------------------------------------
# of minions targeted: 1
# of minions returned: 1
# of minions that did not return: 0
# of minions with errors: 0
-------------------------------------------
/run #

@TheBirdsNest
Copy link
Author

I created a related bug in the SaltStack project as I guess that is the right place for it: saltstack/salt#60581
Feel free to close this one if needed otherwise I will on its resolution in the SaltStack repo.

I'm aiming on completing a PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant