Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multivolume tar archives and code review. #1599

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

zBeeble42
Copy link

So... This pull request contains two things: code review patches and mutivolume write patches. It is intended for review and discussion and not release. No changes to documentation or long options yet.

A: Code review. In doing my work, I noticed two things that could be improved.

  1. The numbering for long options' enumeration. Starting at '1' and reaching '55' at this point, it was already denying -0 and other single numbers in addition to denying anything like -? or -% ... we don't use them, but the solution is equally easy: change the numbering to start somewhere else. It's an int, so it could be -100 or -1000. I chose 257 to start above the range of single byte characters. It could equally start at 65536 (above two byte characters). Anyways: simple fix.
  2. Archive_write_open_filename and archive_write_open_fd share a lot of code. Seems like bad form to have separate files. Merging the code was simple and tests well. Less code to maintain ... simpler to trace what code is doing when you're me... or someone like me.

B: Multivolume tar archives.

To properly do a multivolume archive with a tape drive, you need to close the tape drive to change the tape. This will be true for any drive that can hold the tape and prevent eject. Might even be true for some floppies (what are they, even?). This patch allows multivolume tar writes (-c). I intend to also create a read patch (shortly). I also talked in the google group about a "multivolume header" allowing reading the non-first volume fairly easily (even given a regular tar and dd, if need be). That patch will take me longer as I must go deeper.

For this patch I needed to duplicate what was in archive_write_open_{filename|fd} inside the bsdtar code. This was due to some functions being "private" or static and/or who knows what at which level. Of note, you can't create a multivolume archive on stdout (you can't close and reopen it)... so I deny volume switching if filename is NULL. I track an print the volume number in the message. It's intended that the verbose output might print the volume number (ahead of the 'a') for an index ... and of course the "multivolume header" will use it.

I chose the gnutar compatible '-M' as the switch. I haven't put in the long option yet.

I have tested this with a SAS LTO-6 tape drive so far.

@mmatuska
Copy link
Member

mmatuska commented Feb 8, 2022

I suggest to split this PR into two PRs:
1.) code deduplication in archive_write_open_fd.c and archive_write_open_filename.c
2.) multi-volume feature for bsdtar

@zBeeble42
Copy link
Author

I suggest to split this PR into two PRs: 1.) code deduplication in archive_write_open_fd.c and archive_write_open_filename.c 2.) multi-volume feature for bsdtar

So... here's the thing. The multivolume stuff pretty much can't exist without the redundancy removal. And once you remove the redundancy, the multivolume support is really small. Basically:

  1. Add the option in cmdline.c and in bsdtar.h
  2. Check the option and volume number at EOT.

What I'm saying is that the redundancy removal isn't worth it without mutivolume write and multivolume write doesn't work without the removal of a layer (which ends up being the redundancy). 99% of this patch is about removing the file archive_write_open_fd.c --- which is an extra layer between archive_write_open and archive_write_open_filename. All the functions are duplicated there and the concern it's addressing is when you open an fd or a filename. Now... a tape can be either an fd or a filename (tar -cvf - >/dev/sa0 or tar -cvf /dev/sa0) and by combining the two options we loose a whole level of indirection (at the cost of, I think, one variable of record keeping) and we gain the ability to close and re-open an fd or a filename (multi-volume).

Note that the code as submitted does keep exporting the functions from archive_write_open_fd --- but the files are combined as the function pointers are static (local) and so both sets of advertised functions need be in the same compilation unit.

@zBeeble42
Copy link
Author

OK. I remember now (after re-reading things) why specifically this is work to separate. You'll note that code is pulled into tar/write.c. Specifically static functions for file_(open|write|close) ... because in addition to the redundancies, information hiding directed me to make tar ever-so-smarter about multivolume in a way that libarchive is just not.

TBH, I was torn on this and discussion on this point might be more pertinent. The reason I don't have read patches yet is that the read path in libarchive is more complex and I haven't decided the level at which to insert this code. The approach I took with the write won't work directly with read.

So... the question (and I asked in this in the forum --- hoping for some guidance) ... do we make libarchive understand multivolume --- in which case libarchive needs the ability to contact the user (with callbacks or whatever) or do we leave this as a feature of bsdtar. This, in turn, is linked with many of the multivolume header formats and their support.

If it is the judgement of this community that libarchive should grow this rather major change, I would be willing to champion it. If not, I'm pretty sure I can make what I need (bsdtar support for multivolume) with this change and another minor set of changes in the read path.

@kientzle
Copy link
Contributor

There's already precedent for bsdtar to support features (appending to tar archives, for example) that aren't supported by libarchive per se. And the utility of multivolume tar is significantly less for libarchive than for bsdtar. So personally I'd be content to see this go into bsdtar and not libarchive.

static ssize_t file_write(struct archive *, void *, const void *buff, size_t);
static int open_filename(struct archive *, int, const void *);

int
archive_write_open_fd(struct archive *a, int fd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion about merging archive_write_open_fd.c and archive_write_open_filename.c. There are a couple of guiding principles at work:

  • Libarchive tries hard to keep everything static that isn't part of the official API. That avoids the need to fuss with linker-specific tricks to manage symbol visibility, but also implies that we try to limit our use of "private" internal interfaces.
  • Libarchive also tries hard to avoid static link pollution: If you reference a symbol, you get just the necessary pieces needed to support that symbol and very little else. But that means we have to be careful about combining features into a single source file.

A side effect of the above principles is that we do naturally get some duplication of common basic functionality. Personally, I'm fine with that: it has the side effect of keeping most source files standalone and easy to understand without having to dig through the entire system to untangle a lot of specialized vernacular. It means that most library functionality could be implemented outside of the library without any problems.

In this particular case, I don't think there's a really compelling argument either way: The duplicated functions aren't that significant and neither is the link pollution from combining the two. But I would not like to see other unrelated files get merged just because they happen to share a few helper functions.

@@ -1055,3 +1070,190 @@ test_for_append(struct bsdtar *bsdtar)
*/

}

static int
bsdtar_write_open_fd(struct bsdtar *b, struct archive *a, int fd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be duplicated in bsdtar? I thought write_open_fd was only used for stdout, which shouldn't need any multivolume support.

@kientzle
Copy link
Contributor

Is there any reasonable way to write tests for this? We've accepted features like this in the past without tests, but it's always better if we have some tests to help guard against regressions.

@zBeeble42
Copy link
Author

I don't know how your test regime works. The trigger for this function (because compressed tapes are, by their very nature, variable length from the perspective of the user) is the EOT delivery by the device. To test it, you'd either be simulating it or have a test bench that can deliver that EOT status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants