Skip to content

Syncing extended attributes

Jakob Borg edited this page Jun 3, 2022 · 8 revisions

Background

Extended attributes are extra file metadata. Their precise form and limitations differ between operating systems and filesystems. In most Unixes, they're a namespaced string name with a small associated value, where "small" means limited to several kilobytes, often limited by the filesystem block size and similar. Both the OS APIs and the filesystem in use add their limitations. The values tend to be small in practice: comment strings, ACL strings, etc. In macOS and Windows, the size of attributes isn't explicitly limited and can, in some cases, be large; historically, macOS has stored icon data and similar in "resource forks", which are exposed as a form of an extended attribute.

Any meaning to be inferred from extended attributes is OS-specific. While both FreeBSD and Linux can store complex file ACLs in extended attributes, they are not stored in the same format and the naming rules for the attribute differs, so syncing one to the other is meaningless.

Mechanics

We aim to implement syncing of extended attributes in the cases where it makes sense -- that is, between nodes sharing the same operating system. If a file has extended attributes originating on Linux, we will not apply them when syncing the file on FreeBSD.

Depending on filesystem limitations, not all attributes may be applied when applying extended attributes. For example, a file that originates on a Linux system on ext4 can be synced to a Linux system on FAT, but we will fail to apply the extended attributes.

We will read and replace the set of extended attributes in the file metadata when scanning a file. If extended attributes for another operating system are present in the metadata, these will be retained on a best effort basis. That is, touching a file with Linux extended attributes on a Windows system will generally keep the Linux extended attributes, as long as Syncthing already knew about them and had them in the database. The "best-effort" part means we do not attempt to make Linux extended attributes survive a rename on Windows, etc.

Some extended attributes require elevated privileges to read or write. It's up to the user to arrange for these permissions.

Implementation

We treat extended attributes as file metadata, similar to permissions, and store them in file metadata (FileInfo). To detect whether the set of attributes has changed, we look at the inode change time (XXX: On macOS and Linux at least, others to be confirmed) and compare it to a new local attribute in the FileInfo. In the FileInfo, the extended attributes are represented as an ordered list of messages, similar to the following:

message FileInfo {
  ...
  repeated ExtendedAttribute extended_attributes = 42;
  ...
  int64 locale_inode_change_time_ns = 1002; // not sent over the wire
}

message ExtendedAttribute {
  OperatingSystem os = 1;
  string name = 2;
  bytes value = 3;
}

The {os, name} tuple must be unique in the list, so this can also be viewed as a map from {os, name} to value.

The OperatingSystem type is an enum covering the operating systems where we have extended attribute support: Linux, FreeBSD, MacOS, Windows. The name and value are self-explanatory. On FreeBSD and NetBSD, attributes are namespaced into user or system by a separate namespace integer, while on other systems, the namespace is encoded in the name. We convert the integer to and from a user. or system. prefix on the string. (This is also how for example, golang.org/x/sys/unix does it.)

We will expose some new configuration values on a per-folder basis (approximate names inb4 bikeshedding):

  • syncExtendedAttributes: boolean, to enable or disable the feature. Enabled means in both directions: picking up attributes when scanning and applying them when syncing. (Default TBD, but I'm leaning towards "on".)
  • extendedAttributesFilter: a list of strings probably, containing something like prefixes, regexps, unanchored string matches... This is a way to say "sync only these attributes", which is by default blank to propagate all attributes.
  • maxExtendedAttributeValueSize: integer, the largest size of an attribute value we will accept (while scanning), default to something like 1 KiB.
  • maxExtendedAttributesPerFile: integer, the maximum number of extended attributes we will accept (while scanning), default to something reasonable (but not unlimited).
  • requireSyncingExtendedAttributes: boolean, whether to consider a failure to read or write extended attributes fatal. This would typically be off, as this is sort of how Syncthing works today, but there are cases where for example, not syncing the ACL or MAC attributes make having the file at all a problem, and the sync must fail.

Potential problems, annoyances, and caveats

  • In practice, the operating systems we would support would be Linux, FreeBSD, Windows, and macOS, plus maybe NetBSD and Solaris/Illumos. I've added these as "maybe" because they round to 0.0% of the user base together, and if it turns out to be annoying, I propose skipping them. In practice, I think NetBSD has the same API as FreeBSD and might happen "for free", while Solaris/Illumos will be wildly different.

  • We need to be able to pick up extended-attribute-only changes when scanning, such as when a user changes an ACL or removes a quarantine attribute. In at least some systems, this is visible as an update to the inode change time.

  • We need to implement a read-merge-write thing for syncing extended attributes only, similar to the permission-only shortcut. This can potentially fail, and if it does, I'm not sure how we'd go about rolling it back either.

  • The FileInfo might grow large. We'll have limits to prevent unreasonable things like multi-megabyte attributes, but there is still an increase in size. Of course, this isn't an issue if extended attributes are not in use or the feature is disabled.