[Feature] Resurrection: saving and restoring terminal layout/contents/commands #5013

crides · 2024-02-15T07:34:06Z

Drafting as this is WIP, but want to seek feedback/get discussion going on this.

Currently:

content/scrollback saving and restoring works. This is done by having a reverse attribute to CSI converter, so that the saved contents are in "plain text" (other than attributes; just like tmux)
basic layout saving/restoring works. The layout is saved to a slightly minimized tree in JSON form.
basic shell/cwd/program works. shell, cwd and foreground* program just gets obtained thru existing APIs, and shell and cwd are restored thru pane spawning/splitting; program is just converted back to a shell string and send to the pane. Note: foreground is technically not the right term, as we grab the newest of the first level children of the shell and spawn it back.
the whole state is wrapped in a .tar.zstd for compression. each pane's content gets its own file (just like tmux)

the save/restore functionalities are exported as lua APIs so users can do whatever they want
locally I'm using this for testing (manually saving/restoring):

      { mods = 'CTRL|SHIFT', key = 'S', action = wezterm.action_callback(function(_, _, _)
          wezterm.mux.save_state_to("state_test")
      end) },
      { mods = 'CTRL|SHIFT', key = 'R', action = wezterm.action_callback(function(_, _, _)
          wezterm.mux.restore_state_from("state_test")
      end) },

there's basically 0 error handling. that definitely needs to be improved

This has not been thoroughly tested, and I'll integrate it into my workflow and see how it goes

crides · 2024-02-15T07:59:38Z

possibly related:

I remember there should be one more related issue (more about content than layout) but I can't find it (though it's pretty short/less important)

wez

Thanks for tackling this! There are a couple of existing things that I think you can build on top of that will make this simpler (see inline comments on the code), but there are also a couple of things that I think might make you frown a little bit, because there's more to reconcile before we can merge in this functionality.

From a security and privacy perspective: storing the terminal output on disk is a hazard, because it may contain private/privileged/secret information. At the very least we need to ensure that the files that we write for that are private. In theory, the UmaskSaver that we create at process start time should mask off the group+other bits from the files that we save so you may not strictly need to take any extra steps around that most basic safeguard.

However: I know that there was a CVE reported against VTE when it stored scrollback unencrypted in plain text as part of its infinite scrollback handling, and there are parallels between that and doing it here. I'd feel more comfortable if we had a reasonable solution around this, although the logistics are a bit different: for scrollback that is discarded when the process exits, the key material can also be discarded. But when we explicitly want to load things again later, then we need a persistent key that we can use to decrypt it later so there is additional configuration and infrastructure required to deal with that.

There are some headaches here. If we don't have a solution for this then we need to ensure that the save/restore functionality is opt-in and that is it very clear to the user that it is not encrypted and that they need to satisfy themselves that it is suitable for their environment.

In terms of deciding the program and re-launching it, there are a couple of things to consider; here are a couple of issues that may not seem immediately relevant:

why are they relevant here?

The command that we spawned into the pane is not necessarily the foreground or youngest process currently running into the pane. I'm not sure that it is wise to grab the foreground/youngest process TBH.
Not all panes are local panes with processes (eg: could be a remote ssh session, or a serial port) so querying the process list will never work for those
A pane may be a "local" pane but may be in a different domain (such as an ExecDomain or WslDomain) that has a slightly different way to get launched
The command that was spawned could be dangerous! Consider someone launching rm -rf into a pane. Zellij's session resumption prompts the user to confirm that it is safe to re-launch. We could potentially shortcut an unconditional prompt with some basic heuristics. For example, if the process passes the mux-is-stateful checks (which include looking at https://wezfurlong.org/wezterm/config/lua/config/skip_close_confirmation_for_processes_named.html) then we could assume that a basic shell invocation need not be prompted about

What I had in mind around this was a very rough idea that we might save the CommandBuilder that was used to spawn a given pane so that we could retrieve and reason about it later via something like Pane:get_spawning_command(&self) -> Option<CommandBuilder>

Then there could be a general mechanism to handle the "pane isn't running, show a prompt to see if it should be (re)started" state.

There's complications around this:

The existing logic that cares about the process having exited and showing the hold message is only present for local panes. There's a decent chunk of code to reconcile the exit status of the process and the configured exit behavior and it is all specific to a local process.
There is currently no way to represent a not-yet-spawned process/pane, and I don't think we should intoduce that concept to the Pane interface itself because it would make it more awkward to reason about the state of a Pane.

Instead, I think we can draw from the technique used in mux/src/ssh.rs and/or things like the copy and quickselect overlays to shim in a wrapping Pane instance.

So, when mapping the saved version of a pane in, a separate ResumedPane would be built instead. We'd pass in the original CommandBuilder and the saved scrollback. The pane would render those, then it would handle the safe-to-restart logic and show a prompt if required. When it is safe/confirmed to spawn the actual command it would then do so in the appropriate domain and switch its mode of operation to either delegate to that newly spawned pane, or to replace itself in its containing tab with that newly spawned pane.

I'm also open to other ideas on how to deal with this. I do think it is important to handle the various different kinds of panes holistically, or otherwise find a composable/reusable way to deal with the (re)start behavior.

mux/src/lib.rs

wez · 2024-02-15T13:48:43Z

mux/src/lib.rs

@@ -1372,6 +1378,112 @@ impl Mux {

        Ok((tab, pane, window_id))
    }
+


I think pretty much all of this should be moved to a new mux/src/statesaver.rs file. I don't love how generic statesaver sounds. If you can propose a nice and descriptive without being too unwieldy name that encompasses something like session layout state, I'm all ears!

wez · 2024-02-15T13:51:19Z

mux/src/lib.rs

@@ -1372,6 +1378,112 @@ impl Mux {

        Ok((tab, pane, window_id))
    }
+
+    pub fn save(&self) -> SavedState {


save is a bit too generic a name, so I would propose something more descriptive and in alignment with whatever naming we decide for the state saver file in the comment above

wez · 2024-02-15T13:57:17Z

mux/src/lib.rs

+        }
+    }
+
+    pub fn save_state_to<P: AsRef<Path>>(&self, name: P) {


I have some general stylistic commentary here which I think is mostly addressed by handling errors; the general rules are:

Make this return anyhow::Result<()>

No .unwrap(). Prefer instead to ? the error and let it bubble up

Less nesting; prefer to extract indented code that needs to handle errors out into its own function so that it can also use ?

When using ?, if the error doesn't include sufficient context of its own, prefer to use .context("explain it")? or .with_context(||format!("explain it with {parameters}"))?. For example, the standard library filesystem functions don't print the filename that they are operating upon, so you always want to use with_context to include the file name.

If ? is somehow not appropriate/possible, use .expect("reason why this will never be a runtime error that we can handle") instead of .unwrap(). That way the reasoning is embedded directly in the code, and we can use the presence of .unwrap() as a bad code smell without having to re-read and grok every call site.

mux/src/pane.rs

termwiz/src/color.rs

termwiz/src/surface/line/clusterline.rs

termwiz/src/surface/line/line.rs

termwiz/src/surface/line/storage.rs

wez · 2024-02-15T14:46:31Z

mux/src/lib.rs

+pub struct SavedPaneState {
+    id: usize,
+    cwd: Option<SerdeUrl>,
+    root_argv: Vec<String>,


the domain and other important info are missing from this, but I think the approach here needs to be different; I'll cover this more in the main overall review comment

crides · 2024-02-16T04:39:37Z

Thanks for the very detailed reply!

but there are also a couple of things that I think might make you frown a little bit, because there's more to reconcile before we can merge in this functionality.

No problem. I would like to do things in a correct way for other users, but first I have some things I would like clarifying in your comment.

In theory, the UmaskSaver that we create at process start time should mask off the group+other bits from the files that we save so you may not strictly need to take any extra steps around that most basic safeguard.

Yep, the state saves as 0600 right now

But when we explicitly want to load things again later, then we need a persistent key that we can use to decrypt it later so there is additional configuration and infrastructure required to deal with that.

Encryption would be nice, but it'd also be nice if the user can easily dump the contents without having wezterm load it first. I'm not sure of a good way to do both yet.

If we don't have a solution for this then we need to ensure that the save/restore functionality is opt-in and that is it very clear to the user that it is not encrypted and that they need to satisfy themselves that it is suitable for their environment.

I'm not sure about the security parts, but right now the feature is totally opt in. Nothing extra will be done unless the user triggers save_state_to/restore_state_from in their config. Given we use lua for configs, I think exposing the saving/restoring API + some timer/cron utilities would be easy and enough to achieve what tmux-resurrect has, instead of wrapping a bunch of stuff behind config options

As for spawning commands, I don't quite understand the problems yet:

The command that we spawned into the pane is not necessarily the foreground or youngest process currently running into the pane.

No. But the foreground command running on top of the shell should be. Also, we can't grab the second or deeper children of the shell, because that'll break the spawning relationships

Not all panes are local panes with processes (eg: could be a remote ssh session, or a serial port) so querying the process list will never work for those

I'll need to look into those, but in those cases I think we shouldn't restore recursively all the layers? As in, we should just attempt to restore the pane history and connection, but not care about what's running inside of those panes.

A pane may be a "local" pane but may be in a different domain (such as an ExecDomain or WslDomain) that has a slightly different way to get launched

I'll need to look into those

The command that was spawned could be dangerous! Consider someone launching rm -rf into a pane. Zellij's session resumption prompts the user to confirm that it is safe to re-launch. We could potentially shortcut an unconditional prompt with some basic heuristics.

tmux-resurrect handles this by only allowing default and config whitelisted processes to be restored. I think this is better than just ask for confirmation*. Maybe we can use a combination of whitelisting, and asking for confirmation for non-whitelisted things.

*: because, if I have 50 panes with editors and repls inside some of them, then I'd need to go to every pane to restore them

wez · 2024-02-17T01:27:28Z

Not all panes are local panes with processes (eg: could be a remote ssh session, or a serial port) so querying the process list will never work for those

I'll need to look into those, but in those cases I think we shouldn't restore recursively all the layers? As in, we should just attempt to restore the pane history and connection, but not care about what's running inside of those panes.

If I have a 3-pane layout, where I have a regular local shell on the left, a shell in a docker container (implemented as an ExecDomain that essentially ran docker exec ...) in the middle, and an ssh session (implemented as an SshDomain that used the internal ssh client to connect to a remote host) in the right pane, there's no reason that the save/restore shouldn't be able to handle those three things in the same.

For the local pane case: what was run by the user was the shell, and I think the principle of least surprise is to resume their shell (and show their scrollback up to that point). I think running whatever the most recent command was in the shell (assuming that we figure it out correctly) might be interesting, but would be confusing when that program terminates and it doesn't leave them in their shell
For the docker container case, I don't think we can guarantee to see inside the container to understand what it was running purely from process tree introspection, and again, I think just dropping back into the command that was used to spawn the pane is the least surprising and probably most anticipated outcome
For the ssh case we definitely cannot see into what it was running, but it is similar to the other two cases where I think the reasonable expectation is to restore the prior scrollback and drop you into a freshly logged in shell there

The command that was spawned could be dangerous! Consider someone launching rm -rf into a pane. Zellij's session resumption prompts the user to confirm that it is safe to re-launch. We could potentially shortcut an unconditional prompt with some basic heuristics.

tmux-resurrect handles this by only allowing default and config whitelisted processes to be restored. I think this is better than just ask for confirmation*. Maybe we can use a combination of whitelisting, and asking for confirmation for non-whitelisted things.

*: because, if I have 50 panes with editors and repls inside some of them, then I'd need to go to every pane to restore them

Yeah, I think a combination of an allow list and some reasonable heuristics would make sense for this. That said, I don't think the prompt would be onerous: what I had in mind is that the pane is restored and you will see a simple text prompt in that pane to initiate a restart, so it's not like it would be a series of 50 modals that you had to consider immediately on restore, you would only have to confront it if/when you wanted to.

Destroy666x · 2024-03-06T21:50:59Z

I don't think security is that important if it's something that can be willingl enabled in Lua.

Mikilio · 2024-03-10T15:59:50Z

Maybe it could be made secure by storing the data in /tmp and introducing a Lua setting to specify your own command to store/retrieve it to disk. The defaults would be a naive copy, and we could add a warning that this configuration is insecure.

For people who want more security, they can then encrypt their restoration data, filter it as they see fit or use any third party program to handle backups from tempfs in general.

crides added 4 commits February 14, 2024 23:20

simple pane dump

f818163

reduce CSI output

5847b58

save state

ee52b32

restore

7acf3e5

crides changed the title ~~[Feature] Resurrection: saving and restoring terminal states/contents/commands~~ [Feature] Resurrection: saving and restoring terminal layout/contents/commands Feb 15, 2024

wez reviewed Feb 15, 2024

View reviewed changes

crides added 2 commits February 17, 2024 17:48

use lines_as_escapes; pane_id

e31b307

error

21ae2ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Resurrection: saving and restoring terminal layout/contents/commands #5013

[Feature] Resurrection: saving and restoring terminal layout/contents/commands #5013

crides commented Feb 15, 2024 •

edited

crides commented Feb 15, 2024

wez left a comment

wez Feb 15, 2024

wez Feb 15, 2024

wez Feb 15, 2024

wez Feb 15, 2024

crides commented Feb 16, 2024

wez commented Feb 17, 2024

Destroy666x commented Mar 6, 2024

Mikilio commented Mar 10, 2024

		@@ -1372,6 +1378,112 @@ impl Mux {

		Ok((tab, pane, window_id))
		}

[Feature] Resurrection: saving and restoring terminal layout/contents/commands #5013

Are you sure you want to change the base?

[Feature] Resurrection: saving and restoring terminal layout/contents/commands #5013

Conversation

crides commented Feb 15, 2024 • edited

crides commented Feb 15, 2024

wez left a comment

Choose a reason for hiding this comment

wez Feb 15, 2024

Choose a reason for hiding this comment

wez Feb 15, 2024

Choose a reason for hiding this comment

wez Feb 15, 2024

Choose a reason for hiding this comment

wez Feb 15, 2024

Choose a reason for hiding this comment

crides commented Feb 16, 2024

wez commented Feb 17, 2024

Destroy666x commented Mar 6, 2024

Mikilio commented Mar 10, 2024

crides commented Feb 15, 2024 •

edited