Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UNIX domain socket transport #6183

Closed
PhrozenByte opened this issue Jan 27, 2022 · 23 comments · Fixed by #7615
Closed

Add UNIX domain socket transport #6183

PhrozenByte opened this issue Jan 27, 2022 · 23 comments · Fixed by #7615
Assignees
Milestone

Comments

@PhrozenByte
Copy link
Contributor

/kind enhancement

When creating full system backups one must usually run Borg as root to avoid permission issues. However, it might be undesirable to store these backups as root, as people often prefer storing backups in known places (like /var/backups) owned by dedicated users (like the default backup user in Debian-based distributions).

Right now there are only two possible solutions to accomplish this:

  • One might chown the repo after borg create et al. exits
  • One might abuse Borg's remote repository feature to connect to localhost using SSH as different user (as suggested by @ThomasWaldmann, see references below)

IMO this is a reasonable use case and Borg should support it without taking such rather desperate solutions.

Thus I'd like to suggest adding a socket:// transport (like socket:///run/borg/borg.sock) to Borg. The socket should be created by borg serve by accepting an additional --socket option with a path (e.g. borg serve --socket /run/user/1000/borg/borg.sock, umask 0117). How borg serve is being invoked is up to the user. borg create et al. can now connect to this socket using the socket:// transport (e.g. borg create socket:///run/borg/borg.sock /path/to/backup).

Even though this is not the goal, it also makes https://github.com/borgbackup/borg/blob/master/docs/deployment/pull-backup.rst#socat easier by eliminating socat. It might also enable some more use cases I just can't think of right now.

As I don't know Borg's sources I can't really judge the complexity, but since Borg already supports remote transports with borg serve, this could be as easy as replacing sys.stdin/sys.stdout of borg serve with a UNIX domain socket and to let borg create et al. communicate with this socket instead of the ssh subprocess.

References

For non-ssh repos and running borg as root, but not having the repo owned by root, you can use the ssh://user@localhost/myrepo trick.

- #4082 (comment)

You can work around that problem by using borg with user@localhost:/path/to/repo as repo (in that scenario, borg client can run as root and borg serve would run as user.

- #3587 (comment)

@ThomasWaldmann ThomasWaldmann added this to the 1.2.x milestone Jan 27, 2022
@ThomasWaldmann
Copy link
Member

Sounds good, thanks for the detailled suggestion!

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Apr 8, 2023

https://docs.python.org/3/library/socket.html

The code in borg.remote uses FDs (of stdin, stdout, stderr) and os.select/read/write/set_blocking.

It seems like select() and setblocking() can also be used with sockets, so would it be just send/recv instead of `write/read? How about error / exception handling?

The remote code is rather complex and not very nice to work with, I'ld like to avoid additional complexity or risks there.

@RonnyPfannschmidt
Copy link
Contributor

RonnyPfannschmidt commented Apr 8, 2023

Sockets can be turned into file objects for feature parity using the makefile api.

https://docs.python.org/3/library/socket.html#socket.socket.makefile

If those readable / writable objects can be used, it may be possible without pain.

The only caveat would be that the socket server would have a one client at a time limit for lock parity.

@ThomasWaldmann
Copy link
Member

But socket.makefile returns a python file object, not a OS-level FD (int).

@RonnyPfannschmidt
Copy link
Contributor

in the case of socket fd integers - on unix read/write are equivalent to send/recv without flags

@PhrozenByte
Copy link
Contributor Author

@ThomasWaldmann You can call fileno() on the file object to get the fd from a socket.makefile() (Popen.stdin and Popen.stdout are file objects, too). As pointed out by @RonnyPfannschmidt, there should be no differences on Unix-like systems. Windows is a whole different story though, but since Windows isn't supported anyway, this should be no major issue. However, I must admit that I personally never used socket.makefile(), but only the high-level socket methods...

Thanks to #6270 giving me a nice starting point I just started looking into Borg's remote code and I agree that the current implementation is indeed not so easy to work with. The code very much confused me, especially whether borg serve relies on stderr. If borg serve relies on stderr, this is going to be a big issue - because with a Unix socket, you just don't have the separation of stdout and stderr, but only a single bidirectional channel. There are solutions, but it isn't as easy as initially thought then...

However, looking at the code, borg serve doesn't seem to use stderr. Even though it defines a stderr_fd, it only uses it for a single error, but isn't set to stderr anyway, but stdout, and won't be sent over the wire anyway. So, unless I'm missing something, borg serve doesn't use stderr.

borg/src/borg/remote.py

Lines 198 to 201 in 1e1c922

def serve(self):
stdin_fd = sys.stdin.fileno()
stdout_fd = sys.stdout.fileno()
stderr_fd = sys.stdout.fileno()

What confuses me about this is that the "client" part, i.e. the Popen call, is differentiating between stdout and stderr:

borg/src/borg/remote.py

Lines 589 to 592 in 1e1c922

self.p = Popen(borg_cmd, bufsize=0, stdin=PIPE, stdout=PIPE, stderr=PIPE, env=env, preexec_fn=ignore_sigint)
self.stdin_fd = self.p.stdin.fileno()
self.stdout_fd = self.p.stdout.fileno()
self.stderr_fd = self.p.stderr.fileno()

Popen's stderr is then sent (here) to handle_remote_line(), which is doing quite sophisticated things to send this stuff to logging (and logging only, which is good). So, looking at this (and not understanding much more about Borg's code ❗), I presume that some older versions of borg serve did in fact use stderr, but no longer. If this is true, handle_remote_line() only exists for backwards compatibility reasons and can be removed in Borg 2.0, because Borg 2.0 isn't supporting old clients anyway. All server errors are sent via stdout and the only thing for which stderr is still needed, is to catch error output of ssh, e.g. reporting about failed auth.

If all of this is true, it's basically really just replacing the fds on both ends by the fds of a socket. The fds behave the same. Also see #6270 and my comment there.

The only caveat would be that the socket server would have a one client at a time limit for lock parity.

On a important side note: My intention with this suggestion is not making borg serve a "real" server accepting multiple clients simultaneously. For now it's totally fine to still require one borg serve process (and unix socket) per client, just as before, and therefore only accept a single client at a time. However, just to make this clear, this can be expanded to support multiple clients simultaneously at some later point. It naturally requires threading then, which AFAIK is a bigger topic for Borg, therefore it's intentionally out of the scope of this suggestion. But it's laying some ground work...

However, out of curiosity @RonnyPfannschmidt, why is socket.makefile() limiting it to a single client? As I said earlier, I never used socket.makefile(), but unless I understand the API docs wrong, we can (and actually must) call this per client and therefore get separate fds per client?

@RonnyPfannschmidt
Copy link
Contributor

Makefile isn't, the need for locking limits based on usage patterns

@ThomasWaldmann
Copy link
Member

@PhrozenByte btw, windows support is coming, @RayyanAnsari works on it (no remote repos yet though).

@ThomasWaldmann ThomasWaldmann self-assigned this Apr 10, 2023
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Apr 10, 2023

About Location specifier for --repo:

socket:///abs/path/to/borg.sock::/abs/path/to/borg.repo

Is that a good, usual syntax or are there better ideas?

We need to transport socket path and repo path via that URL.

Update:

hmm, can we always assume that both must be local paths? So we could create the socket file inside the repo path? If so, we could use:

socket:///abs/path/to/borg.repo

and just create the socket inside the repo dir as "socket" or ".socket".

@PhrozenByte
Copy link
Contributor Author

❤️ for #7519 😊

About Location specifier for --repo:

socket:///abs/path/to/borg.sock::/abs/path/to/borg.repo

Is that a good, usual syntax or are there better ideas?

It mimics Borg's ssh:// transport and follows URL-style syntax, so it is common for sure. The number of slashes might be a bit confusing. Strictly speaking, URLs only require <scheme>:<address>, therefore socket:<absolute path> (e.g. socket:/abs/path/to/repo) would be fine, too. Other standards commonly use the double slashes to indicate a server address (e.g. //borgbackup.org in https://borgbackup.org), but this rather is a later interpretation than actual standard. The internet is a inconsistent mess: Think of the file: scheme, it was meant to address local files only, but for some reason one can specify a hostname nobody ever needs, resulting in file:///abs/path/to/…. The double slashes have a big advantage though: A user unaware of socket: might actually mean ./socket:/some/path with borg create socket:/some/path ("socket:" is a totally valid directory name...). This is less likely with borg create socket:///some/path...

So, in the end, we can choose what inconsistency we want: We can use socket://<absolute path> and confuse people with the three slashes, or use socket:<absolute path> and confuse people why ssh is requiring the two slashes but socket isn't. To me that's a matter of personal preference. I wouldn't introduce yet another syntax though.

hmm, can we always assume that both must be local paths?

No.

  1. Even if we assume that borg create and borg serve run on the same machine (i.e. both paths are indeed local), we can't assume that the user running borg create has access to the repo. Just think of my original use case: My repo is stored in /var/backups/borg and /var/backups is owned by backup with 700 perms. Sure, my full system backup runs with root and therefore has access to the repo, but I also want to run borg create with the unprivileged daniel user. daniel can't even see /var/backups/borg. But depending on the socket's perms he can use the socket.

  2. Another use case for the socket transport is implementing an improved pull mechanism. See Add proof of concept for pull mechanism #6270 (comment)

  3. Yet another use case for the socket transport is proxying. Think of a storage server to which the client has no direct network access, but needs a proxy. The new socket transport would allow one to use a rendezvous server to forward the connection.

So we could create the socket file inside the repo path? If so, we could use:

socket:///abs/path/to/borg.repo

and just create the socket inside the repo dir as "socket" or ".socket".

Am I right that your thinking about requiring local paths originates in the impracticability of socket:///abs/path/to/borg.sock::/abs/path/to/borg.repo as repo path? First, you're right, that's kinda weird. But I don't like the solution of requiring the socket to be in the repo path for multiple reasons (in no particular order):

  1. See above, requiring this prevents the use cases mentioned before.
  2. People might confuse borg serve --socket with access restriction. We have borg serve --restrict-to-* for that.
  3. Until now borg serve was independent of single repos, with this it gets dependant on a single repo.
  4. We limit possible future development of this feature. Right now we have a "single client, single repo" policy, but in the future we might want to extent this to a full "Borg server", allowing an arbitrary number of clients to access an arbitrary number of different repos. Not now for sure, but we should keep the possible future in mind.
  5. Sockets kinda belong to /run.

However, I still see the impracticability of socket:///abs/path/to/borg.sock::/abs/path/to/borg.repo. So I was thinking what might be a viable alternative and --rsh came to my mind. Why don't we simply assume the socket to be in the repo, but allow users to specify it otherwise using an option, e.g. --socket?

So, borg create socket:///abs/path/to/repo assumes the socket to be in /abs/path/to/repo/socket, but one can specify otherwise using borg create --socket /run/my-borg-socket socket:///abs/path/to/repo, just like --rsh allows one to overwrite the ssh command used.

borg serve still receives an arbitrary path for --socket; it can be /abs/path/to/repo/socket, but can also be /run/my-borg-socket, or /some/other/path, because we might not even be on the same machine but rather let ssh forward the socket for us. Additionally checking whether the dirname of borg serve --socket is a Borg repo and implying --restrict-to-repository then is a good idea nevertheless.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Apr 11, 2023

Thanks for the feedback (I had similar ideas/concerns, but just wanted something simple to get going and then refine after feedback).

The reason why I chose double-slash + socket_abs_path is that this could grow support for TCP sockets (by using a server name or IP and port after the double-slash instead of the UDS path). The :: separator is unusual for a URL though, but guess we can't use just a /.

Your idea with a separate --socket option would also work, but is a bit inconsistent compared to ssh://server:port/path borg URLs where the server and port are part of the repo url and not contained in a separate (e.g. --ssh) option.

So, how could we get a socket file path at place of SOCKET without parsing ambiguities?

borg create --repo=ssh://user@server:port/path/to/borg.repo  # for consistency comparison
borg create --repo=socket://server:port/path/to/borg.repo  # possible future with TCP socket
borg create --repo=socket://SOCKET/path/to/borg.repo  # UNIX domain socket

Hmm, how about just mandating that SOCKET:

  • must have .sock suffix
  • must be an absolute path (start with a slash)
    ?

That would result in these usage patterns:

borg create --repo=socket:///path/to/borg.repo  # SOCKET empty -> UDS in repo dir
borg create --repo=socket:///run/borg.sock/path/to/borg.repo  # SOCKET w/ slash and .sock -> path, use that!
borg create --repo=socket://server:port/path/to/borg.repo  # SOCKET w/o leading slash -> TCP

@ThomasWaldmann
Copy link
Member

A more radical approach could be splitting that url into --remote=SERVER_URL --repo=local_repo_path.

@PhrozenByte
Copy link
Contributor Author

PhrozenByte commented Apr 11, 2023

Yeah, good point. I agree, adding a --socket option isn't perfect and adds some inconsistency, but if combined with your idea to specify a default location for the socket, a user will easily understand how it works.

Thinking about it, adding the socket's path to --repo just feels a bit weird to me: with --repo we're just asking for the repo's location. Locations that are not local paths, must match <transport>://<host>/<path to repo>, because the host is undoubtedly part of the location. But a socket's path is no host - it's another path. Adding a --socket option with a default value is very easy to explain, but "magic separators" (like ::, or requiring an absolute path and a .sock suffix) are hard to understand. Wasn't this confusion the reason (among the regex getting unmanageable) why repo path and archive names were split up in Borg 2?

Adding a --socket option just feels like the best option, especially when combined with a default value. Deriving the default value from the repo's path sounds reasonable. So, we end up with this:

borg create --repo=ssh://user@server:port/path/to/borg.repo
borg create --rsh='ssh -i /path/to/id_rsa' --repo=ssh://user@server:port/path/to/borg.repo

borg create --repo=socket://server:port/path/to/borg.repo  # possible future with TCP socket
borg create --repo=socket:///path/to/borg.repo  # UNIX domain socket in default location
borg create --socket=/other/path/to/borg.sock --repo=socket:///path/to/borg.repo  # UNIX domain socket in custom location

In the future we might even not just look in the repo's path, but in other default locations as well. I just had a whole bunch of ideas, e.g. $XDG_RUNTIME_DIR/borg.sock if we ever implement a "full Borg server". But that doesn't really matter right now, by keeping the syntax (<transport>://<host>/<path to repo>) simple and rather use a separate option, we stay flexible.

Hmm, how about just mandating that SOCKET:

  • must have .sock suffix
  • must be an absolute path (start with a slash)
    ?

In practice, this means that .sock becomes a "magic separator", just as :: would be. IMHO we should avoid this. It's hard to explain and might conflict with otherwise valid paths.

A more radical approach could be splitting that url into --remote=SERVER_URL --repo=local_repo_path.

You mean like --remote=ssh://server:port --repo=/path/to/borg.repo? Yeah, pretty radical, but reasonable. --remote and --repo would then strongly depend on each other. If a user forgets adding --remote (e.g. in a wrapper script appending options), Borg might happily do things wrong without even noticing. Plus, we'd still need the --rsh option, right? IMO it feels like this would make things harder.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Apr 11, 2023

@PhrozenByte about stderr: yeah, the borg serve function does not really use it, but the borg repository code emits logging messages on stderr (which usually go either directly to e.g. the console (for a local repo) or get sent via the ssh connection to the clientside code in RemoteRepository).

So, guess we have that problem that there is no easy solution with just 1 bidi UDS.

logging.handlers.SocketHandler maybe? Hmm, maybe rather not - that would need the borg client being the "log server" side (maybe not a big issue for UDS, but for TCP socket).

So, guess all we have left is to multiplex stdout and stderr over one channel (and de-mux on the other side).

Why is there no fd012-mux-socket-demux-fd012 tool? :-)

@PhrozenByte
Copy link
Contributor Author

PhrozenByte commented Apr 11, 2023

That's very, very unfortunate 😒

Since Borg uses Python's built-in logging facility, I'd indeed recommend creating a logging.handlers.RemoteHandler that is responsible for sending not just log messages, but also exceptions and errors of borg serve (including Borg's repo code) to the client. This handler can and should be used for SSH-based connections, too. To do so we could hook into setup_logging() (replace is_serve=False (and stream=None) by handler=StreamHandler, so that we can pass handler=RemoteHandler for the socket transport) and more or less move the existing code for exceptions to this new logging.handlers.RemoteHandler. Additional work is required to handle the log messages on the client side, which definitely requires diving deep into the remote code... I can totally understand that this isn't desirable.

To avoid this we could multiplex stdout and stderr, by e.g. prefixing all messages with a channel indicator (a single byte would do, e.g. 1 for stdout and 2 for stderr) and more or less send the current logging messages unchanged (i.e. no message packing required; you can then keep the current exception handling unchanged). This won't save us much work on the server side (i.e. borg serve) because we must still hook into the logging facility (i.e. add a logging.handlers.RemoteHandler) to change where the log messages are sent to. But it saves a lot work on the client side, because checking for the channel indicator can be done very early in processing (possibly right after os.read()). However, please note that this will break any attempt to support older clients (but that's fine I guess, we break support anyway). For future clients it won't be much of a problem apart from having a possibly unused channel indicator at the beginning of every message.

Yet another solution is to quite literally open a second channel for stderr (i.e. calling socket.connect() resp. socket.accept() twice and using the second connection for stderr). It's by far the easiest solution, doesn't require much work on both sides, however, it has the major disadvantage of being incompatible with a possible future TCP-based transport (connecting twice is fine for a unix socket, but not for TCP) and creates future BC issues in general (abolishing the second connection would be a BC-breaking change).

I'd vote for the second approach, it's a good compromise. But indeed, this idea is no longer as easy as I initially thought it would be 😞

logging.handlers.SocketHandler maybe? Hmm, maybe rather not - that would need the borg client being the "log server" side (maybe not a big issue for UDS, but for TCP socket).

I'm not 100% sure what you mean. Are you referring to some log messages actually belonging to the server, not the client? I agree, this is a delicate topic, but there's no solution other than checking every single log message whether it's meant for the client, or the server. However, due to our "one client, one socket" policy, we don't really have to think about this right now: we just send everything to the client, the same way we already do for SSH. How we do this (see above), doesn't really matter IMHO. We won't really need this differentiation until we implement a "full Borg server".

@ThomasWaldmann
Copy link
Member

about socket file in the repo directory:

that would limit operations to existing repos, we can't have the socket file in a repo dir that does not yet exist.
so guess we rather forget about that.

@PhrozenByte
Copy link
Contributor Author

PhrozenByte commented May 30, 2023

Great work Thomas! 👏 ❤️

I agree. However, the issue what path to use for --socket by default remains an issue tough...

The "right place" for the socket is $XDG_RUNTIME_DIR for unprivileged users, e.g. $XDG_RUNTIME_DIR/borg/borg.sock (let's reserve a directory, who knows for what we might wanna use it in the future...), and /run/borg/borg.sock for root.

However, there are two open questions with $XDG_RUNTIME_DIR:

  1. What to do when $XDG_RUNTIME_DIR is not set
  2. What to do when $BORG_BASE_DIR is set

First, what's the right place for the socket if $XDG_RUNTIME_DIR is not set? The XDG basedir spec doesn't really specify a default location for $XDG_RUNTIME_DIR, but Systemd basically made /run/user/$UID the default. We might do the same. However, if /run/user/$UID doesn't exist yet, we probably won't be able to create it due to insufficient permissions; thus I'd vote against /run/user/$UID as "implicit" default if $XDG_RUNTIME_DIR is not set and rather use a "safe" fallback. Again, the XDG basedir spec doesn't really tell us what to do, just to use "a replacement directory with similar capabilities". Some recommend /tmp/run, but IMHO this is a rather weird alternative... I'd rather recommend falling back to $BORG_CACHE_DIR/run.

Second, what shall we do when $BORG_BASE_DIR is set? AFAIK this variable was created before Borg followed the XDG basedir specs. Since $BORG_BASE_DIR plays pretty nice with the XDG basedir specs for configs and caches, there was no reason not to keep it. But the runtime dir is different. If we assume $BORG_BASE_DIR to be the definitive "I want all my stuff there" answer of the user, I'd say that if $BORG_BASE_DIR is set, we always use $BORG_CACHE_DIR/run, no matter whether $XDG_RUNTIME_DIR is set or not, for both unprivileged users and root.

Last, we should allow the user to give us a "definitive answer" where one wants his stuff... So a $BORG_RUNTIME_DIR variable should be added, too.

Just for record, I was also thinking about a $BORG_SOCKET_DIR variable, but would recommend to be less specific (i.e. better just use $BORG_RUNTIME_DIR): If we - some day, maybe never - add a "full Borg server", we will need a location for other runtime data, e.g. the daemon's PID. Adding yet another env variable for a single file is just overkill... That's also the reason why I recommend using $BORG_CACHE_DIR/run, not e.g. $BORG_CACHE_DIR/sockets.

So, let's wrap this up: We always use $BORG_RUNTIME_DIR/borg.sock. If $BORG_RUNTIME_DIR is set, use it. If it is not set, but $BORG_BASE_DIR is set, set $BORG_RUNTIME_DIR to $BORG_CACHE_DIR/run and use this location. If neither variables are set, we differentiate between root and unprivileged users: For root, set $BORG_RUNTIME_DIR to /run/borg and use this location. For unprivileged users, set $BORG_RUNTIME_DIR to $XDG_RUNTIME_DIR/borg if $XDG_RUNTIME_DIR is set, otherwise set it to $BORG_CACHE_DIR/run and use this location.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented May 30, 2023

For most of the logic, we want to use platformdirs package, they have a user_runtime_dir() function:

https://github.com/platformdirs/platformdirs/blob/main/src/platformdirs/unix.py#L155

Misc. fs helpers already exist there:

https://github.com/borgbackup/borg/blob/master/src/borg/helpers/fs.py

Note: platformdirs does not special case root, nor do they have a site runtime dir.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented May 30, 2023

socket file permissions:
due to constants.UMASK_DEFAULT == 0o0077, the socket file has 0o700 permissions (user only, no permissions for group).

Would that be a problem? Usually we have these scenarios:

  • run borg client and server as same user: no problem
  • run borg client as root and server as non-root: no problem either

Hmm, guess if one had a scenario where multiple different users would want to access the same borg serve socket, then we would need group permissions and all the users being members of that group.

socket file location for different user scenario:
obviously, the default socket location would not work, so guess the location of the socket created by borg serve has to be given to the borg client via --socket=/that/path.

@PhrozenByte
Copy link
Contributor Author

Yes, group r+w is required for scenarios with different unprivileged users on the same machine; this includes my original scenario in which the backup user (or maybe a distinct borg user) runs borg serve. Using umask 007 for the socket must be documented though; users must be aware that by using sockets, all members of the group borg serve is run with gain access to repos (and one might run borg serve with a group different from the primary group). For these scenarios one must always pass --socket (note: not for all scenarios though, e.g. for a proxied socket one won't necessarily need --socket). However, I don't think that this is an issue, because such setups always require manual setup anyway (that's what I meant with "How borg serve is being invoked is up to the user")...

About using user_runtime_dir() from the platformdirs package: It looks like that this lib works fine for unprivileged users, but yields /run/user/0 for root. This is a rather unorthodox location. As common practice it should be just /run for root.

@ThomasWaldmann
Copy link
Member

@PhrozenByte could you file a bug at platformdirs?

@PhrozenByte
Copy link
Contributor Author

@PhrozenByte could you file a bug at platformdirs?

ref platformdirs/platformdirs#188

@ThomasWaldmann
Copy link
Member

@PhrozenByte merged stuff into master.

if you have time, guess it would be good to give it some practical testing.
maybe the long running borg serve --socket shows some yet unknown issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants