Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mtree: add xattr support #1400

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Conversation

maxcrees
Copy link

@maxcrees maxcrees commented Jun 18, 2020

This adds support for optionally writing xattrs to mtree output, and
limited ability to read them from mtree input. The format of the keyword
is the following, one for each xattr present for the entry:

xattr.<name>=<b64 encoded value>

This format is inspired by go-mtree by Vincent Batts, though
there is a possible slight incompatibility: libarchive's existing base64
encoder does not perform padding because it was written to encode xattrs
into PAX attributes which already have a length component. The decoder
should accept padding ok, but the mtree parser may not like the
unescaped '=' characters.

The name is escaped according to existing mtree escaping practices which
is octal encoding of [^!-}] except [ #=\] is also encoded. go-mtree
performs its own implementation of vis(3) on it with the flags
VIS_WHITE, VIS_OCTAL, and VIS_GLOB. There is some disjointedness here
but it should be compatible in the common case of no escaping of course.

Note that for output this is gated on --options=mtree:xattrs, which I've
left out of the default mtree options.

So far I've tested converting a PAX file directly to mtree with bsdtar
and it seems to work well.

todo:

  • reading mtree currently doesn't decode the xattr name
  • add to bidding?
  • test bsdtar from filesystem, including --[no-]xattrs
  • test directory xattr support
  • add tests?
  • when reading an mtree, also fill in xattrs from disk if available -
    this is tricky...
  • PAX tar files made by libarchive default to putting both the
    LIBARCHIVE and SCHILY PAX attributes in order to encode xattrs, but
    this causes libarchive to store the same xattr twice internally.
    Therefore if converting directly from PAX to mtree you get duplicated
    xattrs in the output. I don't think there's any easy way to fix this
    other than changing how libarchive keeps track of xattrs internally.
  • add information to mtree(5)
  • test behavior when duplicate entries with xattrs are encountered

Comments and suggestions welcome! I'm opening this now to see if this is something upstream is interested in before I go further.

@maxcrees
Copy link
Author

maxcrees commented Jun 18, 2020

While testing this I've discovered what might be considered a bug: when writing SCHILY.xattr PAX attributes, libarchive urlencodes the xattr name, but when reading back SCHILY.xattr it doesn't decode the name. LIBARCHIVE.xattr urlencodes and decodes the xattr name.

As far as I can tell schily-tools tar does not perform any sort of encoding on the name (similar to the value) but considers it a string (so it must not contain a NUL). I think libarchive should just read/write the SCHILY name without any coding then. GNU tar urlencodes '=' and '%' so that behavior is probably more important.

@mmatuska
Copy link
Member

Please don't put functions into header files.
We are probably looking for something like archive_base64.h + archive_base64.c

This adds support for optionally writing xattrs to mtree output, and
limited ability to read them from mtree input. The format of the keyword
is the following, one for each xattr present for the entry:

xattr.<name>=<b64 encoded value>

This format is inspired by go-mtree by Vincent Batts, though
there is a possible slight incompatibility: libarchive's existing base64
encoder does not perform padding because it was written to encode xattrs
into PAX attributes which already have a length component. The decoder
should accept padding ok, but the mtree parser may not like the
unescaped '=' characters.

The name is escaped according to existing mtree escaping practices which
is octal encoding of [^!-}] except [ #=\] is also encoded.  go-mtree
performs its own implementation of vis(3) on it with the flags
VIS_WHITE, VIS_OCTAL, and VIS_GLOB. There is some disjointedness here
but it should be compatible in the common case of no escaping of course.

Note that for output this is gated on --options=mtree:xattrs, which I've
left out of the default mtree options.

So far I've tested converting a PAX file directly to mtree with bsdtar
and it seems to work well.

todo:

* when reading an mtree, also fill in xattrs from disk if available -
  this is tricky...
* PAX tar files made by libarchive default to putting both the
  LIBARCHIVE and SCHILY PAX attributes in order to encode xattrs, but
  this causes libarchive to store the same xattr twice internally.
  Therefore if converting directly from PAX to mtree you get duplicated
  xattrs in the output. I don't think there's any easy way to fix this
  other than changing how libarchive keeps track of xattrs internally.
@maxcrees
Copy link
Author

@mmatuska I've moved the functions to archive_base64.c, put the prototypes in archive_base64_private.h, and rebased onto master. Any thoughts on the other items?

@maxcrees
Copy link
Author

I think I could add grabbing xattrs from disk (when checkfs is enabled) in parse_file() if I were to rewrite it using the archive_read_disk family of functions; otherwise I would need to copy the ifdef mess from there.

@maxcrees
Copy link
Author

For PAX->mtree etc. duplicate xattrs issue it is a minor problem but it could potentially be fixed by storing xattrs on archive entries as a hash table with last one wins semantics instead of a linked list, but I'd probably do that in a separate PR.

char *
base64_decode(const char *s, size_t len, size_t *out_len)
{
static const unsigned char digits[64] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems identical to the digits[] in encode. If so, let's move it to global scope while preserving the static const notation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants