Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to follow symbolic links #3863

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

cccapon
Copy link

@cccapon cccapon commented Aug 10, 2022

A new flag '--follow-symlinks' was added to the backup command (default off). When enabled, restic will dereference any symbolic link encountered to its destination file or folder instead. This allows the destination of symbolic links to be backed up as if they were standard files or folders.

In order to prevent infinite loops or backup of unwanted symlinks, use the --exclude options as well.

What does this PR change? What problem does it solve?

Symbolic links are used for a variety of purposes in the Windows file system and sometimes it is necessary to archive the linked contents.

Under Windows, restic does not currently back up folders linked to by symbolic links.

This change causes restic to dereference symbolic links during backup and examine the underlying file or folder instead. The default processing of restic does not change and unless this option is used, symbolic links will still be ignored.

Was the change previously discussed in an issue or on the forum?

Issues:
#542
#1078
#1215
#2211
#2564
#2699
#2708
#3674
#3594
#3848

Forum:
4582
1508

Checklist

  • I have read the contribution guidelines.
  • I have enabled maintainer edits.
  • I have added tests for all code changes.
  • I have added documentation for relevant changes (in the manual).
  • There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
  • I have run gofmt on the code in all commits.
  • All commit messages are formatted in the same style as the other commits in the repo.
  • I'm done! This pull request is ready for review.

A new flag --follow-symlinks was added to the backup command (default off).  When enabled, restic will dereference any symbolic link encountered to its destination file or folder instead.  This allows the destination of symbolic links to be backed up as if they were standard files or folders.  In order to prevent infinite loops or backup of unwanted symlinks, use the --exclude options.
@MichaelEischer
Copy link
Member

To be honest, I'm not at all convinced by that approach. It is all to easy to create infinite loops. Users shouldn't have to somehow determining beforehand which symlinks could lead to endless loops. But I also don't have time at the moment to think about what an alternative could look like.

@kerryland
Copy link

Maybe this is naive, but here's a simple technique that will use a little memory if you use '--follow-symlinks'. That might be the price you have to pay for the feature. I'm assuming that restic backups run in a single process, and you don't need to consider clusters of restic processes performing backups in parallel.

if '--follow-symlinks' enabled
{
   if processing a directory
   {
      generate a hash to uniquely identify it, and stick it in a list.
   }
   else if processing a symlink
   {
      generate a hash for the target directory.
      If the hash is already in the list 
      {
          produce a warning message
      }
      else
      {
          backup the target directory
      }
}
else 
{
  if processing a directory
  {
     backup the directory
  }
}

@cccapon
Copy link
Author

cccapon commented Oct 17, 2022

@kerryland ,
The suggestion of building a list of visited destination folders is a good one. But, I'm not getting anything back from GO for Readlink. Without it, I don't think it will be possible.

Here is what I've found so far.

os.Lstat() .Mode()  --> Lrw-rw-rw- 
os.Stat() .Mode()  --> drwxrwxrwx
os.Readlink() --> error: The system cannot find the file specified.
filepath.EvalSymlinks() --> error: The system cannot find the file specified.

The basis of my patch is to use os.Stat() instead of os.Lstat() -- and then Restic works.
Without the link destination, though, I don't think loop tracking can be coded.

Do you know of any other way to find the destination path? I'm not keen on launching a shell command to find out.

(fyi: my previous patch involved options for patterns which identify Symbolic links to descend into)

By the way, command: dir /a shows <JUNCTION> as the folder type and the mounted filesystem as
\??\Volume{d6cc17c5-1734-4085-bca7-964f1e9f5de9}\
Presumably, I could backup that volume, but it would be tough to identify locations using a GUID.

@MichaelEischer ,
What ideas do you have for handling the infinite loop scenario? I don't mind coding it, but I'd like some directions from you guys if possible.

@MichaelEischer
Copy link
Member

I'm not keen on launching a shell command to find out.

Launching a shell command from within the archiver is completely out of question.

What ideas do you have for handling the infinite loop scenario?

On Unix we could also use Inodes to detect duplicates. But that won't work on windows. If we included the symlink targets in the backup then we somehow have to recognize whether we've already visited a certain folder. The suggestion from kerryland would do that, but depends on being able to determine the symlink location.

I haven't given the matter much thought though, as I currently don't have time to come up with a concept on how to better handle symlinks without making a total mess out of things.

@cccapon
Copy link
Author

cccapon commented Nov 14, 2022

Well. That's about it from me.

For those who follow, I provided two solutions:

  1. Have the user identify which symbolic links are to be include in the backup. This was not accepted because it was too complicated for users.

  2. Flag all symbolic links to be backed up, then use 'exclude' to prevent infinite loops. This was not accepted because it relys on the user to prevent infinite loops.

  3. There doesn't appear to be a way to identify the destination of symbolic links in Windows. Using GO lang 1.18.1 the functions os.Readlink() and filepath.EvalSymlinks() both fail on symbolic links of type "Junction" with "File not found". Without knowing the destination of the link, the only other way I know to detect infinite loops would be to look for duplicates using directory contents. That approach can not tell the difference between a loop and a duplicate folder.

@MichaelEischer If you come up with another solution, let me know and I will try to code it up.

For now, I'm at a stand-still.

@kerryland
Copy link

I've never used GO before, which is why I've been conspicuously silent, but if these GO functions really don't work on Windows then surely that's a major GO bug that needs to be reported and fixed?

Anyway, I just had a quick play on Windows with 1.19.3 and didn't see any problem. Is it a bug specific to 1.18.1? Can't we upgrade?

Here's the awful code I just wrote/scavenged.

package main

import "fmt"
import "os"
import "io/fs"
import "path/filepath"

func main() {
        pathName := os.Args[1]
	f, err := os.Lstat(pathName)
	if err != nil {
		fmt.Println(err)
	}
	switch mode := f.Mode(); {
	case mode.IsRegular():
		fmt.Println("regular file")
	case mode.IsDir():
		fmt.Println("directory")
	case mode&fs.ModeSymlink != 0:
		fmt.Println("symbolic link")
	case mode&fs.ModeNamedPipe != 0:
		fmt.Println("named pipe")
	}

        readLink(pathName);
}

func readLink(file string) error {
	path, err := os.Readlink(file)
	if err != nil {
		return err
	}

	path, err = filepath.EvalSymlinks(file)

	fmt.Printf("%s\n", path)
	return err
}

I created a directory "realdir", then a symlink "softlink", and a hardlink to the "realdir" directory.

The output looks like this:

D:\dev\golang\play> go run .\example\hello.go .\realdir
directory

D:\dev\golang\play> go run .\example\hello.go .\softlink\
symbolic link

D:\dev\golang\play> go run .\example\hello.go .\hardlink\
symbolic link

@Dobatymo
Copy link

@MichaelEischer @cccapon getting the target of a junction (reparse point) on Windows is not fun. It's possible with the winapi https://stackoverflow.com/questions/46383428/get-the-immediate-target-path-from-symlink-reparse-point but I don't know if that's implemented in Go somewhere.

@deajan
Copy link

deajan commented Jan 17, 2023

I've done detection based on a regex that catches cloud file errors (reparse points) in my restic wrapper.
Can share the code.

@cccapon
Copy link
Author

cccapon commented Jan 17, 2023

@deajan I'd be interested to hear more. How are you using regex in this context?

@deajan
Copy link

deajan commented Jan 17, 2023

@cccapon I'm wrapping restic in a Python script, the interesting part being the following:

        # run restic backup
        exit_code, stdout, stderr = command_runner(restic_command)

        if exit_code == 0:
            # Everything is okay, let's just return status and output from restic 
            return True, stdout
        elif exit_code == 3 and os.name == 'nt':
            # TEMP-FIX-4155, since we don't have reparse point support for Windows, see https://github.com/restic/restic/issues/4155, we have to filter manually for cloud errors which should not affect backup result
            # exit_code = 3 when errors are present but snapshot could be created
            is_reparse_point_error = True
            for line in stderr.split('\n'):
                if re.match('error', line, re.IGNORECASE):
                    if re.match('.*: The cloud operation is not supported on a read-only volume\.|.*: The media is write protected\.', line, re.IGNORECASE):
                        is_reparse_point_error = True
                    else:
                        is_reparse_point_error = False
            if is_reparse_point_error is True:
                return True, stdout
            # TEMP-FIX-4155-END
        return False, stderr

If interested, I've built a whole restic wrapper, including YAML config file, and a nice GUI that allows backup/restore operations as well as GUI config for MSWindows and Linux.
Indeed, the program is usually called via scheduled tasks /cron and will backup only if no recent snapshot is already present.
GUI snapshot follows:
image

Sorry if I kindof hijacked the thread, but the above solution is my current workaround for reparse points in windows environment.

@cowwoc
Copy link

cowwoc commented Feb 19, 2023

@cccapon I think you guys are conflating symbolic links with junctions. mklink /d creates symbolic links and is the default behavior. mylink /j creates a junction. https://superuser.com/a/343079 (and other answers in that thread) discusses the differences.

Also, https://go-review.googlesource.com/c/go/+/460595 might be of interest.

Let's start by adding support for symbolic links, and follow up with a separate issue for junctions.

@cccapon
Copy link
Author

cccapon commented Oct 12, 2023

For those who follow, I did find a way around this problem with the current restic.exe (v 0.16.0).

In your backup script, change directories (cd in BAT, Push-Location in PowerShell) to the root directory of the symbolic link, then launch restic and backup the '.' (dot) current directory. When restic's starting point is already inside the mount point (junction) it runs fine. restic just can't cross the devision between one file system and the next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants