massive Peer rework, based on FernyTransport #19668

allisonkarlitskaya · 2023-11-29T18:06:03Z

revert PR test: workaround an old (and very common) flake #19835 , this PR fixes it properly (hopefully)

src/cockpit/peer.py

src/cockpit/remote.py

src/cockpit/beiboot.py

allisonkarlitskaya · 2023-12-06T13:23:14Z

The permission denied error in the superuser case is annoying: that's the sound of the kill() syscall failing to reap the bridge running as root...

That's python/cpython#112800 but we're going to need a workaround...

src/cockpit/beiboot.py

allisonkarlitskaya · 2023-12-08T16:53:40Z

Looking at TestReverseProxy.testNginxNoTLS. It starts ws with --local-session=cockpit-bridge, but that bridge crashes right away:

Looking at the polkit code, and thinking about the error message, it's worth noting:

            self.subject = ('unix-session', {'session-id': Variant(os.environ['XDG_SESSION_ID'], 's')})

So clearly it's the not-a-real-session spawning of the bridge that's the issue here, which means that this problem is specific to this testcase.

That being said, there's nothing wrong with robustifying the code a bit there. There may be other reasons that polkit isn't going to want us to become an agent...

allisonkarlitskaya · 2023-12-11T16:46:22Z

Okay, so most stuff is getting under control by now.

A large part of the remaining problem is that the NumberOfPasswordPrompts SSH option doesn't do what its name suggests: it also controls the number of time that passphrases are prompted for, and setting it to zero effectively disables the use of locked keys.

I could add yet another option for dealing with that, but I start to wonder if maybe it's time to draw a clearer line in the sand between "interactive" and "non-interactive" uses of cockpit-beiboot.

The non-interactive case would immediately prompt for the password on startup using the * authorize message and never send authorize messages again after that. Any other place where an askpass prompt arrived would effectively be fatal. The interactive mode wouldn't prompt on startup, but would behave more or less the same as Cockpit Client expects.

I'll roll this over in my head a bit. Pending a reasonable solution there and resolution of the situation about how to run cockpit.beiboot from inside the beipack this is starting to look promising, though...

src/cockpit/beiboot.py

 from pathlib import Path
-from typing import Dict, Iterable, Optional, Sequence
+from typing import Iterable, NamedTuple, Sequence


src/cockpit/peer.py

src/cockpit/beiboot.py

-        message['superuser'] = False
-        self.ssh_peer.write_control(message)
+    # Step 2: when the client replies, create the peer
+    def do_authorize(self, message: JsonObject) -> None:


test/pytest/test_beiboot.py

 import sys
+import typing


test/pytest/test_peer.py

martinpitt · 2024-03-22T11:55:30Z

testBeibootWithBridge is rather shallow: There's just some impedance mismatch with the recent commit 22f38b3. Either we keep unknown-host (as on current main), and that fixes it:

-- src/cockpit/beiboot.py
+++ src/cockpit/beiboot.py
@@ -252,7 +252,7 @@ class ForwarderPeer(Peer):
 
     def do_exception(self, exc: Exception) -> None:
         if isinstance(exc, (OSError, socket.gaierror)):
-            raise CockpitProblem('no-host', error='no-host', message=str(exc)) from exc
+            raise CockpitProblem('unknown-host', error='unknown-host', message=str(exc)) from exc
 
         elif isinstance(exc, ferny.SshHostKeyError):
             hostkey_info = self.authorize_handler.hostkey_info or {}

Or we need to change the tests to expect no-host instead:

--- test/verify/check-client
+++ test/verify/check-client
@@ -206,7 +206,7 @@ Command = /usr/bin/env python3 -m cockpit.beiboot --interactive
         # unreachable host
         b.set_val("#server-field", "unknownhost")
         b.click("#login-button")
-        b.wait_in_text("#login-error-message", "Host is unknown")
+        b.wait_in_text("#login-error-message", "Unable to connect to that address")
         b.wait_val("#server-field", "unknownhost")
         # does not appear in recent hosts
         b.wait_in_text("#recent-hosts-list", "10.111.113.2")
@@ -215,7 +215,7 @@ Command = /usr/bin/env python3 -m cockpit.beiboot --interactive
         # wrong port
         b.set_val("#server-field", "10.111.113.2:222")
         b.click("#login-button")
-        b.wait_in_text("#login-error-message", "Host is unknown")
+        b.wait_in_text("#login-error-message", "Unable to connect to that address")
 
         # unencrypted SSH key login
         self.m_client.execute("runuser -u admin -- ssh-keygen -t rsa -N '' -f ~admin/.ssh/id_rsa")

I actually prefer the latter -- I reviewed the two errors, and "unknown-host" feels more like "SSH does not know this" rather than "network/DNS failure".

martinpitt · 2024-03-22T11:59:28Z

TestHostSwitching.testBasic is just an unexpected message:

/manifests.js: external channel failed: internal-error

It's a flake, too, and doesn't reproduce locally. However, the test already has:

        self.allow_journal_messages(".*: failure while serving external channel: internal-error")

but that string doesn't actually exist any more -- the bridge has done this for a long time already:

src/ws/cockpitchannelresponse.c:          g_message ("%s: external channel failed: %s", self->logname, problem);

So the test just needs to adjust the expected error message (that should be a separate commit, as it would be a flake on main as well). I'm also happy to send that as a separate PR (but it by itself feels too small to run the whole CI machinery on..)

martinpitt · 2024-03-22T12:09:13Z

testBeibootNoBridge is harder: The "switch to admin access" dialog never goes away, but it actually got quite far: on the "target" machine I see

\_ sshd: admin [priv]
|   \_ sshd: admin@notty
|       \_ python3 -ic # cockpit-bridge
|           \_ sh -ec echo SSH_AUTH_SOCK=$SSH_AUTH_SOCK; read a
|           |   \_ ssh-agent sh -ec echo SSH_AUTH_SOCK=$SSH_AUTH_SOCK; read a
|           \_ sudo -k -A python3 -ic # cockpit-bridge --privileged
|               \_ python3 -ic # cockpit-bridge --privileged

i.e. the remote priv bridge did start. Before I debug that further, I'll coordinate with Lis.

test/verify/check-superuser TestSuperuserDashboard is hopefully/likely the same root cause: no actual superusers session on a remote machine after logging in.

And finally, TestSuperuserDashboardOldMachine.test is supposedly the hardest: CentOS 7 did not yet have any concept of a "remote privilege indicator". It tries to run pkexec cockpit-bridge --privileged, but that hangs forever, and that's presumably the reason why the "reboot" root channel (and thus that dialog) blocks. So let's sort out the other two above first.

martinpitt · 2024-03-22T14:23:46Z

Oh, and there's of course the too old Python crash for RHEL 9.4's py 3.9:

   File "/usr/lib/python3.9/site-packages/cockpit/peer.py", line 144, in ConfiguredPeer
    helper: BridgeBeibootHelper | None = None
  TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

This message doesn't exist anymore — update to the new string written by cockpit-ws. Thanks Martin :)

This reverts commit a0f6bfe.

allisonkarlitskaya · 2024-03-25T11:04:21Z

testBeibootNoBridge is harder: The "switch to admin access" dialog never goes away, but it actually got quite far: on the "target" machine I see

This is also the failing unit test...

allisonkarlitskaya · 2024-03-25T12:24:54Z

testBeibootNoBridge is harder: The "switch to admin access" dialog never goes away, but it actually got quite far: on the "target" machine I see

This is also the failing unit test...

Okay. Fixed that, and it's getting a lot farther, but now it's failing for a new reason: the processes aren't getting cleaned up properly.

martinpitt · 2024-03-26T05:59:37Z

test/verify/check-shell-host-switching

-        self.allow_journal_messages(".*: failure while serving external channel: internal-error")
+        self.allow_journal_messages(".*: external channel failed: internal-error")


Oops, it seems I was wrong -- The message still does exist:

src/ws/cockpitchannelresponse.c: g_debug ("%s: failure while serving external channel: %s", self->logname, problem);

So instead I suggest

self.allow_journal_messages(".*external channel.*: internal-error")

martinpitt · 2024-03-26T06:04:01Z

test/verify/check-reauthorize

-        # TODO: this should only be 'access-denied'
-        legit_results = [f'result: {err}' for err in ('access-denied', 'authentication-failed', 'terminated')]
-        self.assertIn(b.text(".super-channel span"), legit_results)


Seems it's back 😢 so perhaps drop that commit for the time being?

I was kinda hoping to slay this one, to be honest...

martinpitt · 2024-03-26T06:05:58Z

failing for a new reason: the processes aren't getting cleaned up properly.

I'm not even sure if that's a new problem. We've had failures like this for ages, especially on Arch. It's also "just" the second affected retry on /devel, but everywhere else it failed on the first run, so at least it made this bug much easier to reproduce. It would of course be super fun if your old fixup proposal would help here at all 😉

martinpitt · 2024-03-26T06:36:09Z

test/pytest/test_beiboot.py

+    if not os.access('/dev/kvm', os.W_OK):
+        pytest.skip('kvm unavailable')


ATM, pytest only runs on github workflows, so this test doesn't run anywhere.

So AFAICS we have two choices here:

Treat this as unit test. Then this shouldn't involve VMs (or even containers) and network access. I get why we don't want to test against src/ssh/mock-sshd.c, as we are trying to get rid of that code and the libssh dependency. But ferny tests against python-asyncssh, and a mock SSH server looks quite reasonable?

Treat it as an integration test. Then this shouldn't reimplement all the bots/machine/hostkey/privkey handling here, but just use the bots API. You can drive this from Python in just a few lines of code if you insist on doing this with pytest fixtures (but honestly our standard MachineCase seems just fine for that, and it should be possible to wrap that into a fixture). The integration tests already assume that they run from the git checkout (e.g. as they need test/verify/files), so importing the beiboot module and running it on the host seems fine to me -- the integration tests don't have to involve a browser. E.g. most of test/verify/check-connection doesn't.

It makes sense to split the test. I.e. keep the "unit-y" bits here, like test_parse_destination() or def test_conds() (with some adjustments), and move the integration-y ones to test/verify/check-client or a new test/verify/check-beiboot.

This is sort of what I was hoping you'd look into...

This is definitely some kind of a "third type" of test. fwiw, I've been running them locally and they're helping me tonnes (and they all currently pass).

Summary from our discussion:

we keep the general shape, but move it into test/pytest-integration/

drop that bots fixture and just import testvm statically; integration tests just assume bots/

add a pytest -k mumble mumble select test/pytest-integration/* to test/run, so that it runs on our CI independently of run-tests; it doesn't parallelize, but it's also fast enough (preferably the /networking scenario, it's the fastest)

But also: We don't want to/have to block this PR on that. We can live with not running these for a while. The more useful work would be to add a proper beiboot scenario on an unprepared image. The covered functionality is mostly just a faster/smaller version of what the integration tests already do.

github-advanced-security bot found potential problems Nov 29, 2023

View reviewed changes

src/cockpit/peer.py Fixed Show resolved Hide resolved

src/cockpit/remote.py Fixed Show resolved Hide resolved

src/cockpit/beiboot.py Fixed Show resolved Hide resolved

src/cockpit/beiboot.py Fixed Show resolved Hide resolved

allisonkarlitskaya force-pushed the ferny-transport branch from bf525f8 to 3aec639 Compare November 30, 2023 13:48

martinpitt mentioned this pull request Dec 1, 2023

various cleanups for typing stuff #19270

Merged

allisonkarlitskaya force-pushed the ferny-transport branch 2 times, most recently from 2965bb2 to dc9cd0e Compare December 5, 2023 11:07

This comment was marked as resolved.

Sign in to view

allisonkarlitskaya force-pushed the ferny-transport branch 3 times, most recently from 8efcb70 to 5eae275 Compare December 6, 2023 12:52

github-advanced-security bot found potential problems Dec 6, 2023

View reviewed changes

src/cockpit/beiboot.py Fixed Show fixed Hide fixed

allisonkarlitskaya force-pushed the ferny-transport branch from 5eae275 to 5d20586 Compare December 6, 2023 15:22

This comment was marked as resolved.

Sign in to view

allisonkarlitskaya force-pushed the ferny-transport branch 2 times, most recently from 09a5c4d to dc4ed95 Compare December 8, 2023 07:59

github-advanced-security bot found potential problems Dec 8, 2023

View reviewed changes

src/cockpit/beiboot.py Fixed Show fixed Hide fixed

allisonkarlitskaya force-pushed the ferny-transport branch from dc4ed95 to 6ab101e Compare December 8, 2023 08:30

This comment was marked as resolved.

Sign in to view

allisonkarlitskaya force-pushed the ferny-transport branch 4 times, most recently from 6698519 to 3d568bf Compare December 11, 2023 14:53

martinpitt mentioned this pull request Dec 12, 2023

amplify testSudo flake #19328

Closed

allisonkarlitskaya force-pushed the ferny-transport branch 2 times, most recently from 075c65b to e8d01ca Compare December 15, 2023 08:30

github-advanced-security bot found potential problems Dec 15, 2023

View reviewed changes

allisonkarlitskaya force-pushed the ferny-transport branch from e8d01ca to 65b2f8c Compare December 15, 2023 11:16

github-advanced-security bot found potential problems Dec 15, 2023

View reviewed changes

allisonkarlitskaya force-pushed the ferny-transport branch 3 times, most recently from 7414e8f to 779a474 Compare December 15, 2023 14:12

github-advanced-security bot found potential problems Dec 15, 2023

View reviewed changes

test/pytest/test_beiboot.py

import sys

import typing

Check notice

Code scanning / CodeQL

Module is imported with 'import' and 'import from' Note

Module 'typing' is imported with both 'import' and 'import from'.

allisonkarlitskaya force-pushed the ferny-transport branch from 779a474 to 35ed01e Compare December 15, 2023 18:56

martinpitt mentioned this pull request Jan 11, 2024

test: workaround an old (and very common) flake #19835

Merged

allisonkarlitskaya force-pushed the ferny-transport branch from 35ed01e to ebab696 Compare February 15, 2024 10:54

github-advanced-security bot found potential problems Feb 15, 2024

View reviewed changes

test/pytest/test_peer.py Fixed Show fixed Hide fixed

test/pytest/test_peer.py Fixed Show fixed Hide fixed

martinpitt mentioned this pull request Mar 22, 2024

pybridge: Add initial authorize request to cockpit-beiboot, and handle host key prompts #19401

Open

3 tasks

allisonkarlitskaya force-pushed the ferny-transport branch from ebab696 to 01ca58d Compare March 22, 2024 10:01

allisonkarlitskaya added 2 commits March 25, 2024 11:36

test: adjust a journal message match

217fe08

This message doesn't exist anymore — update to the new string written by cockpit-ws. Thanks Martin :)

Revert "test: workaround an old (and very common) flake"

5183714

This reverts commit a0f6bfe.

allisonkarlitskaya force-pushed the ferny-transport branch 2 times, most recently from b29c305 to 35fc1d4 Compare March 25, 2024 10:47

allisonkarlitskaya force-pushed the ferny-transport branch from 35fc1d4 to ba48757 Compare March 25, 2024 11:49

martinpitt reviewed Mar 26, 2024

View reviewed changes

ferny transport

fccc304

allisonkarlitskaya force-pushed the ferny-transport branch from ba48757 to fccc304 Compare April 3, 2024 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

massive Peer rework, based on FernyTransport #19668

massive Peer rework, based on FernyTransport #19668

allisonkarlitskaya commented Nov 29, 2023 •

edited by martinpitt

This comment was marked as resolved.

allisonkarlitskaya commented Dec 6, 2023 •

edited

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

allisonkarlitskaya commented Dec 8, 2023

allisonkarlitskaya commented Dec 11, 2023

martinpitt commented Mar 22, 2024

martinpitt commented Mar 22, 2024 •

edited

martinpitt commented Mar 22, 2024 •

edited

martinpitt commented Mar 22, 2024

allisonkarlitskaya commented Mar 25, 2024

allisonkarlitskaya commented Mar 25, 2024

martinpitt Mar 26, 2024

martinpitt Mar 26, 2024

allisonkarlitskaya Mar 26, 2024

martinpitt commented Mar 26, 2024

martinpitt Mar 26, 2024

allisonkarlitskaya Mar 26, 2024

martinpitt Mar 27, 2024

		self.allow_journal_messages(".*: failure while serving external channel: internal-error")
		self.allow_journal_messages(".*: external channel failed: internal-error")

		if not os.access('/dev/kvm', os.W_OK):
		pytest.skip('kvm unavailable')

massive Peer rework, based on FernyTransport #19668

Are you sure you want to change the base?

massive Peer rework, based on FernyTransport #19668

Conversation

allisonkarlitskaya commented Nov 29, 2023 • edited by martinpitt

This comment was marked as resolved.

allisonkarlitskaya commented Dec 6, 2023 • edited

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

allisonkarlitskaya commented Dec 8, 2023

allisonkarlitskaya commented Dec 11, 2023

martinpitt commented Mar 22, 2024

martinpitt commented Mar 22, 2024 • edited

martinpitt commented Mar 22, 2024 • edited

martinpitt commented Mar 22, 2024

allisonkarlitskaya commented Mar 25, 2024

allisonkarlitskaya commented Mar 25, 2024

martinpitt Mar 26, 2024

Choose a reason for hiding this comment

martinpitt Mar 26, 2024

Choose a reason for hiding this comment

allisonkarlitskaya Mar 26, 2024

Choose a reason for hiding this comment

martinpitt commented Mar 26, 2024

martinpitt Mar 26, 2024

Choose a reason for hiding this comment

allisonkarlitskaya Mar 26, 2024

Choose a reason for hiding this comment

martinpitt Mar 27, 2024

Choose a reason for hiding this comment

allisonkarlitskaya commented Nov 29, 2023 •

edited by martinpitt

allisonkarlitskaya commented Dec 6, 2023 •

edited

martinpitt commented Mar 22, 2024 •

edited

martinpitt commented Mar 22, 2024 •

edited