Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Resurrection: saving and restoring terminal layout/contents/commands #5013

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

crides
Copy link
Sponsor Contributor

@crides crides commented Feb 15, 2024

Drafting as this is WIP, but want to seek feedback/get discussion going on this.

Currently:

  • content/scrollback saving and restoring works. This is done by having a reverse attribute to CSI converter, so that the saved contents are in "plain text" (other than attributes; just like tmux)
  • basic layout saving/restoring works. The layout is saved to a slightly minimized tree in JSON form.
  • basic shell/cwd/program works. shell, cwd and foreground* program just gets obtained thru existing APIs, and shell and cwd are restored thru pane spawning/splitting; program is just converted back to a shell string and send to the pane. Note: foreground is technically not the right term, as we grab the newest of the first level children of the shell and spawn it back.
  • the whole state is wrapped in a .tar.zstd for compression. each pane's content gets its own file (just like tmux)
  • the save/restore functionalities are exported as lua APIs so users can do whatever they want
    locally I'm using this for testing (manually saving/restoring):
          { mods = 'CTRL|SHIFT', key = 'S', action = wezterm.action_callback(function(_, _, _)
              wezterm.mux.save_state_to("state_test")
          end) },
          { mods = 'CTRL|SHIFT', key = 'R', action = wezterm.action_callback(function(_, _, _)
              wezterm.mux.restore_state_from("state_test")
          end) },
    
  • there's basically 0 error handling. that definitely needs to be improved

This has not been thoroughly tested, and I'll integrate it into my workflow and see how it goes

@crides crides changed the title [Feature] Resurrection: saving and restoring terminal states/contents/commands [Feature] Resurrection: saving and restoring terminal layout/contents/commands Feb 15, 2024
@crides
Copy link
Sponsor Contributor Author

crides commented Feb 15, 2024

possibly related:

I remember there should be one more related issue (more about content than layout) but I can't find it (though it's pretty short/less important)

Copy link
Owner

@wez wez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this! There are a couple of existing things that I think you can build on top of that will make this simpler (see inline comments on the code), but there are also a couple of things that I think might make you frown a little bit, because there's more to reconcile before we can merge in this functionality.

From a security and privacy perspective: storing the terminal output on disk is a hazard, because it may contain private/privileged/secret information. At the very least we need to ensure that the files that we write for that are private. In theory, the UmaskSaver that we create at process start time should mask off the group+other bits from the files that we save so you may not strictly need to take any extra steps around that most basic safeguard.

However: I know that there was a CVE reported against VTE when it stored scrollback unencrypted in plain text as part of its infinite scrollback handling, and there are parallels between that and doing it here. I'd feel more comfortable if we had a reasonable solution around this, although the logistics are a bit different: for scrollback that is discarded when the process exits, the key material can also be discarded. But when we explicitly want to load things again later, then we need a persistent key that we can use to decrypt it later so there is additional configuration and infrastructure required to deal with that.

There are some headaches here. If we don't have a solution for this then we need to ensure that the save/restore functionality is opt-in and that is it very clear to the user that it is not encrypted and that they need to satisfy themselves that it is suitable for their environment.

In terms of deciding the program and re-launching it, there are a couple of things to consider; here are a couple of issues that may not seem immediately relevant:

why are they relevant here?

  • The command that we spawned into the pane is not necessarily the foreground or youngest process currently running into the pane. I'm not sure that it is wise to grab the foreground/youngest process TBH.
  • Not all panes are local panes with processes (eg: could be a remote ssh session, or a serial port) so querying the process list will never work for those
  • A pane may be a "local" pane but may be in a different domain (such as an ExecDomain or WslDomain) that has a slightly different way to get launched
  • The command that was spawned could be dangerous! Consider someone launching rm -rf into a pane. Zellij's session resumption prompts the user to confirm that it is safe to re-launch. We could potentially shortcut an unconditional prompt with some basic heuristics. For example, if the process passes the mux-is-stateful checks (which include looking at https://wezfurlong.org/wezterm/config/lua/config/skip_close_confirmation_for_processes_named.html) then we could assume that a basic shell invocation need not be prompted about

What I had in mind around this was a very rough idea that we might save the CommandBuilder that was used to spawn a given pane so that we could retrieve and reason about it later via something like Pane:get_spawning_command(&self) -> Option<CommandBuilder>

Then there could be a general mechanism to handle the "pane isn't running, show a prompt to see if it should be (re)started" state.

There's complications around this:

  • The existing logic that cares about the process having exited and showing the hold message is only present for local panes. There's a decent chunk of code to reconcile the exit status of the process and the configured exit behavior and it is all specific to a local process.
  • There is currently no way to represent a not-yet-spawned process/pane, and I don't think we should intoduce that concept to the Pane interface itself because it would make it more awkward to reason about the state of a Pane.

Instead, I think we can draw from the technique used in mux/src/ssh.rs and/or things like the copy and quickselect overlays to shim in a wrapping Pane instance.

So, when mapping the saved version of a pane in, a separate ResumedPane would be built instead. We'd pass in the original CommandBuilder and the saved scrollback. The pane would render those, then it would handle the safe-to-restart logic and show a prompt if required. When it is safe/confirmed to spawn the actual command it would then do so in the appropriate domain and switch its mode of operation to either delegate to that newly spawned pane, or to replace itself in its containing tab with that newly spawned pane.

I'm also open to other ideas on how to deal with this. I do think it is important to handle the various different kinds of panes holistically, or otherwise find a composable/reusable way to deal with the (re)start behavior.

mux/src/lib.rs Outdated Show resolved Hide resolved
@@ -1372,6 +1378,112 @@ impl Mux {

Ok((tab, pane, window_id))
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pretty much all of this should be moved to a new mux/src/statesaver.rs file. I don't love how generic statesaver sounds. If you can propose a nice and descriptive without being too unwieldy name that encompasses something like session layout state, I'm all ears!

@@ -1372,6 +1378,112 @@ impl Mux {

Ok((tab, pane, window_id))
}

pub fn save(&self) -> SavedState {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save is a bit too generic a name, so I would propose something more descriptive and in alignment with whatever naming we decide for the state saver file in the comment above

mux/src/lib.rs Outdated
}
}

pub fn save_state_to<P: AsRef<Path>>(&self, name: P) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some general stylistic commentary here which I think is mostly addressed by handling errors; the general rules are:

  • Make this return anyhow::Result<()>
  • No .unwrap(). Prefer instead to ? the error and let it bubble up
  • Less nesting; prefer to extract indented code that needs to handle errors out into its own function so that it can also use ?
  • When using ?, if the error doesn't include sufficient context of its own, prefer to use .context("explain it")? or .with_context(||format!("explain it with {parameters}"))?. For example, the standard library filesystem functions don't print the filename that they are operating upon, so you always want to use with_context to include the file name.
  • If ? is somehow not appropriate/possible, use .expect("reason why this will never be a runtime error that we can handle") instead of .unwrap(). That way the reasoning is embedded directly in the code, and we can use the presence of .unwrap() as a bad code smell without having to re-read and grok every call site.

mux/src/pane.rs Outdated Show resolved Hide resolved
termwiz/src/color.rs Outdated Show resolved Hide resolved
termwiz/src/surface/line/clusterline.rs Outdated Show resolved Hide resolved
termwiz/src/surface/line/line.rs Outdated Show resolved Hide resolved
termwiz/src/surface/line/storage.rs Outdated Show resolved Hide resolved
pub struct SavedPaneState {
id: usize,
cwd: Option<SerdeUrl>,
root_argv: Vec<String>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the domain and other important info are missing from this, but I think the approach here needs to be different; I'll cover this more in the main overall review comment

@crides
Copy link
Sponsor Contributor Author

crides commented Feb 16, 2024

Thanks for the very detailed reply!

but there are also a couple of things that I think might make you frown a little bit, because there's more to reconcile before we can merge in this functionality.

No problem. I would like to do things in a correct way for other users, but first I have some things I would like clarifying in your comment.


In theory, the UmaskSaver that we create at process start time should mask off the group+other bits from the files that we save so you may not strictly need to take any extra steps around that most basic safeguard.

Yep, the state saves as 0600 right now

But when we explicitly want to load things again later, then we need a persistent key that we can use to decrypt it later so there is additional configuration and infrastructure required to deal with that.

Encryption would be nice, but it'd also be nice if the user can easily dump the contents without having wezterm load it first. I'm not sure of a good way to do both yet.

If we don't have a solution for this then we need to ensure that the save/restore functionality is opt-in and that is it very clear to the user that it is not encrypted and that they need to satisfy themselves that it is suitable for their environment.

I'm not sure about the security parts, but right now the feature is totally opt in. Nothing extra will be done unless the user triggers save_state_to/restore_state_from in their config. Given we use lua for configs, I think exposing the saving/restoring API + some timer/cron utilities would be easy and enough to achieve what tmux-resurrect has, instead of wrapping a bunch of stuff behind config options


As for spawning commands, I don't quite understand the problems yet:

  • The command that we spawned into the pane is not necessarily the foreground or youngest process currently running into the pane.

No. But the foreground command running on top of the shell should be. Also, we can't grab the second or deeper children of the shell, because that'll break the spawning relationships

  • Not all panes are local panes with processes (eg: could be a remote ssh session, or a serial port) so querying the process list will never work for those

I'll need to look into those, but in those cases I think we shouldn't restore recursively all the layers? As in, we should just attempt to restore the pane history and connection, but not care about what's running inside of those panes.

  • A pane may be a "local" pane but may be in a different domain (such as an ExecDomain or WslDomain) that has a slightly different way to get launched

I'll need to look into those

  • The command that was spawned could be dangerous! Consider someone launching rm -rf into a pane. Zellij's session resumption prompts the user to confirm that it is safe to re-launch. We could potentially shortcut an unconditional prompt with some basic heuristics.

tmux-resurrect handles this by only allowing default and config whitelisted processes to be restored. I think this is better than just ask for confirmation*. Maybe we can use a combination of whitelisting, and asking for confirmation for non-whitelisted things.

*: because, if I have 50 panes with editors and repls inside some of them, then I'd need to go to every pane to restore them

@wez
Copy link
Owner

wez commented Feb 17, 2024

  • Not all panes are local panes with processes (eg: could be a remote ssh session, or a serial port) so querying the process list will never work for those

I'll need to look into those, but in those cases I think we shouldn't restore recursively all the layers? As in, we should just attempt to restore the pane history and connection, but not care about what's running inside of those panes.

If I have a 3-pane layout, where I have a regular local shell on the left, a shell in a docker container (implemented as an ExecDomain that essentially ran docker exec ...) in the middle, and an ssh session (implemented as an SshDomain that used the internal ssh client to connect to a remote host) in the right pane, there's no reason that the save/restore shouldn't be able to handle those three things in the same.

  • For the local pane case: what was run by the user was the shell, and I think the principle of least surprise is to resume their shell (and show their scrollback up to that point). I think running whatever the most recent command was in the shell (assuming that we figure it out correctly) might be interesting, but would be confusing when that program terminates and it doesn't leave them in their shell
  • For the docker container case, I don't think we can guarantee to see inside the container to understand what it was running purely from process tree introspection, and again, I think just dropping back into the command that was used to spawn the pane is the least surprising and probably most anticipated outcome
  • For the ssh case we definitely cannot see into what it was running, but it is similar to the other two cases where I think the reasonable expectation is to restore the prior scrollback and drop you into a freshly logged in shell there
  • The command that was spawned could be dangerous! Consider someone launching rm -rf into a pane. Zellij's session resumption prompts the user to confirm that it is safe to re-launch. We could potentially shortcut an unconditional prompt with some basic heuristics.

tmux-resurrect handles this by only allowing default and config whitelisted processes to be restored. I think this is better than just ask for confirmation*. Maybe we can use a combination of whitelisting, and asking for confirmation for non-whitelisted things.

*: because, if I have 50 panes with editors and repls inside some of them, then I'd need to go to every pane to restore them

Yeah, I think a combination of an allow list and some reasonable heuristics would make sense for this. That said, I don't think the prompt would be onerous: what I had in mind is that the pane is restored and you will see a simple text prompt in that pane to initiate a restart, so it's not like it would be a series of 50 modals that you had to consider immediately on restore, you would only have to confront it if/when you wanted to.

@Destroy666x
Copy link

I don't think security is that important if it's something that can be willingl enabled in Lua.

@Mikilio
Copy link

Mikilio commented Mar 10, 2024

Maybe it could be made secure by storing the data in /tmp and introducing a Lua setting to specify your own command to store/retrieve it to disk. The defaults would be a naive copy, and we could add a warning that this configuration is insecure.

For people who want more security, they can then encrypt their restoration data, filter it as they see fit or use any third party program to handle backups from tempfs in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants