Multivolume tar archives and code review. #1599

zBeeble42 · 2021-10-21T18:13:15Z

So... This pull request contains two things: code review patches and mutivolume write patches. It is intended for review and discussion and not release. No changes to documentation or long options yet.

A: Code review. In doing my work, I noticed two things that could be improved.

The numbering for long options' enumeration. Starting at '1' and reaching '55' at this point, it was already denying -0 and other single numbers in addition to denying anything like -? or -% ... we don't use them, but the solution is equally easy: change the numbering to start somewhere else. It's an int, so it could be -100 or -1000. I chose 257 to start above the range of single byte characters. It could equally start at 65536 (above two byte characters). Anyways: simple fix.
Archive_write_open_filename and archive_write_open_fd share a lot of code. Seems like bad form to have separate files. Merging the code was simple and tests well. Less code to maintain ... simpler to trace what code is doing when you're me... or someone like me.

B: Multivolume tar archives.

To properly do a multivolume archive with a tape drive, you need to close the tape drive to change the tape. This will be true for any drive that can hold the tape and prevent eject. Might even be true for some floppies (what are they, even?). This patch allows multivolume tar writes (-c). I intend to also create a read patch (shortly). I also talked in the google group about a "multivolume header" allowing reading the non-first volume fairly easily (even given a regular tar and dd, if need be). That patch will take me longer as I must go deeper.

For this patch I needed to duplicate what was in archive_write_open_{filename|fd} inside the bsdtar code. This was due to some functions being "private" or static and/or who knows what at which level. Of note, you can't create a multivolume archive on stdout (you can't close and reopen it)... so I deny volume switching if filename is NULL. I track an print the volume number in the message. It's intended that the verbose output might print the volume number (ahead of the 'a') for an index ... and of course the "multivolume header" will use it.

I chose the gnutar compatible '-M' as the switch. I haven't put in the long option yet.

I have tested this with a SAS LTO-6 tape drive so far.

…ame and _fd as I noted in my "code audit" email posts. Also Started to modify things for the incoming multi-volume changes.

mmatuska · 2022-02-08T09:43:58Z

I suggest to split this PR into two PRs:
1.) code deduplication in archive_write_open_fd.c and archive_write_open_filename.c
2.) multi-volume feature for bsdtar

zBeeble42 · 2022-02-12T22:40:54Z

I suggest to split this PR into two PRs: 1.) code deduplication in archive_write_open_fd.c and archive_write_open_filename.c 2.) multi-volume feature for bsdtar

So... here's the thing. The multivolume stuff pretty much can't exist without the redundancy removal. And once you remove the redundancy, the multivolume support is really small. Basically:

Add the option in cmdline.c and in bsdtar.h
Check the option and volume number at EOT.

What I'm saying is that the redundancy removal isn't worth it without mutivolume write and multivolume write doesn't work without the removal of a layer (which ends up being the redundancy). 99% of this patch is about removing the file archive_write_open_fd.c --- which is an extra layer between archive_write_open and archive_write_open_filename. All the functions are duplicated there and the concern it's addressing is when you open an fd or a filename. Now... a tape can be either an fd or a filename (tar -cvf - >/dev/sa0 or tar -cvf /dev/sa0) and by combining the two options we loose a whole level of indirection (at the cost of, I think, one variable of record keeping) and we gain the ability to close and re-open an fd or a filename (multi-volume).

Note that the code as submitted does keep exporting the functions from archive_write_open_fd --- but the files are combined as the function pointers are static (local) and so both sets of advertised functions need be in the same compilation unit.

zBeeble42 · 2022-02-12T22:53:19Z

OK. I remember now (after re-reading things) why specifically this is work to separate. You'll note that code is pulled into tar/write.c. Specifically static functions for file_(open|write|close) ... because in addition to the redundancies, information hiding directed me to make tar ever-so-smarter about multivolume in a way that libarchive is just not.

TBH, I was torn on this and discussion on this point might be more pertinent. The reason I don't have read patches yet is that the read path in libarchive is more complex and I haven't decided the level at which to insert this code. The approach I took with the write won't work directly with read.

So... the question (and I asked in this in the forum --- hoping for some guidance) ... do we make libarchive understand multivolume --- in which case libarchive needs the ability to contact the user (with callbacks or whatever) or do we leave this as a feature of bsdtar. This, in turn, is linked with many of the multivolume header formats and their support.

If it is the judgement of this community that libarchive should grow this rather major change, I would be willing to champion it. If not, I'm pretty sure I can make what I need (bsdtar support for multivolume) with this change and another minor set of changes in the read path.

kientzle · 2022-02-13T04:35:43Z

There's already precedent for bsdtar to support features (appending to tar archives, for example) that aren't supported by libarchive per se. And the utility of multivolume tar is significantly less for libarchive than for bsdtar. So personally I'd be content to see this go into bsdtar and not libarchive.

kientzle · 2022-02-13T04:52:23Z

libarchive/archive_write_open_filename.c

 static ssize_t	file_write(struct archive *, void *, const void *buff, size_t);
 static int	open_filename(struct archive *, int, const void *);

+int
+archive_write_open_fd(struct archive *a, int fd)


I don't have a strong opinion about merging archive_write_open_fd.c and archive_write_open_filename.c. There are a couple of guiding principles at work:

Libarchive tries hard to keep everything static that isn't part of the official API. That avoids the need to fuss with linker-specific tricks to manage symbol visibility, but also implies that we try to limit our use of "private" internal interfaces.

Libarchive also tries hard to avoid static link pollution: If you reference a symbol, you get just the necessary pieces needed to support that symbol and very little else. But that means we have to be careful about combining features into a single source file.

A side effect of the above principles is that we do naturally get some duplication of common basic functionality. Personally, I'm fine with that: it has the side effect of keeping most source files standalone and easy to understand without having to dig through the entire system to untangle a lot of specialized vernacular. It means that most library functionality could be implemented outside of the library without any problems.

In this particular case, I don't think there's a really compelling argument either way: The duplicated functions aren't that significant and neither is the link pollution from combining the two. But I would not like to see other unrelated files get merged just because they happen to share a few helper functions.

kientzle · 2022-02-13T04:55:02Z

tar/write.c

@@ -1055,3 +1070,190 @@ test_for_append(struct bsdtar *bsdtar)
 */

 }
+
+static int
+bsdtar_write_open_fd(struct bsdtar *b, struct archive *a, int fd)


Does this need to be duplicated in bsdtar? I thought write_open_fd was only used for stdout, which shouldn't need any multivolume support.

kientzle · 2022-02-13T04:59:55Z

Is there any reasonable way to write tests for this? We've accepted features like this in the past without tests, but it's always better if we have some tests to help guard against regressions.

zBeeble42 · 2022-02-13T05:07:06Z

I don't know how your test regime works. The trigger for this function (because compressed tapes are, by their very nature, variable length from the perspective of the user) is the EOT delivery by the device. To test it, you'd either be simulating it or have a test bench that can deliver that EOT status.

David Gilbert added 4 commits October 18, 2021 17:24

Cahnged numbering on multi-letter command options and combined _filen…

9e27858

…ame and _fd as I noted in my "code audit" email posts. Also Started to modify things for the incoming multi-volume changes.

1st try at being able to write plain multi-volume bsdtar archives.

664d98f

OOps. Wrong FD closed.

7b6303f

OK. Missed that. Simple fix. Testing good.

aa7e154

mmatuska requested a review from kientzle November 15, 2021 23:57

Merge branch 'libarchive:master' into master

aa98200

Merge remote-tracking branch 'upstream/master'

ea77b13

kientzle reviewed Feb 13, 2022

View reviewed changes

DHowett mentioned this pull request Aug 11, 2023

write_xar: move libxml2 behind an abstraction layer #1849

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multivolume tar archives and code review. #1599

Multivolume tar archives and code review. #1599

zBeeble42 commented Oct 21, 2021

mmatuska commented Feb 8, 2022 •

edited

zBeeble42 commented Feb 12, 2022

zBeeble42 commented Feb 12, 2022

kientzle commented Feb 13, 2022

kientzle Feb 13, 2022

kientzle Feb 13, 2022

kientzle commented Feb 13, 2022

zBeeble42 commented Feb 13, 2022

Multivolume tar archives and code review. #1599

Are you sure you want to change the base?

Multivolume tar archives and code review. #1599

Conversation

zBeeble42 commented Oct 21, 2021

A: Code review. In doing my work, I noticed two things that could be improved.

B: Multivolume tar archives.

mmatuska commented Feb 8, 2022 • edited

zBeeble42 commented Feb 12, 2022

zBeeble42 commented Feb 12, 2022

kientzle commented Feb 13, 2022

kientzle Feb 13, 2022

Choose a reason for hiding this comment

kientzle Feb 13, 2022

Choose a reason for hiding this comment

kientzle commented Feb 13, 2022

zBeeble42 commented Feb 13, 2022

mmatuska commented Feb 8, 2022 •

edited