New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multivolume tar archives and code review. #1599
base: master
Are you sure you want to change the base?
Conversation
…ame and _fd as I noted in my "code audit" email posts. Also Started to modify things for the incoming multi-volume changes.
I suggest to split this PR into two PRs: |
So... here's the thing. The multivolume stuff pretty much can't exist without the redundancy removal. And once you remove the redundancy, the multivolume support is really small. Basically:
What I'm saying is that the redundancy removal isn't worth it without mutivolume write and multivolume write doesn't work without the removal of a layer (which ends up being the redundancy). 99% of this patch is about removing the file archive_write_open_fd.c --- which is an extra layer between archive_write_open and archive_write_open_filename. All the functions are duplicated there and the concern it's addressing is when you open an fd or a filename. Now... a tape can be either an fd or a filename (tar -cvf - >/dev/sa0 or tar -cvf /dev/sa0) and by combining the two options we loose a whole level of indirection (at the cost of, I think, one variable of record keeping) and we gain the ability to close and re-open an fd or a filename (multi-volume). Note that the code as submitted does keep exporting the functions from archive_write_open_fd --- but the files are combined as the function pointers are static (local) and so both sets of advertised functions need be in the same compilation unit. |
OK. I remember now (after re-reading things) why specifically this is work to separate. You'll note that code is pulled into tar/write.c. Specifically static functions for file_(open|write|close) ... because in addition to the redundancies, information hiding directed me to make tar ever-so-smarter about multivolume in a way that libarchive is just not. TBH, I was torn on this and discussion on this point might be more pertinent. The reason I don't have read patches yet is that the read path in libarchive is more complex and I haven't decided the level at which to insert this code. The approach I took with the write won't work directly with read. So... the question (and I asked in this in the forum --- hoping for some guidance) ... do we make libarchive understand multivolume --- in which case libarchive needs the ability to contact the user (with callbacks or whatever) or do we leave this as a feature of bsdtar. This, in turn, is linked with many of the multivolume header formats and their support. If it is the judgement of this community that libarchive should grow this rather major change, I would be willing to champion it. If not, I'm pretty sure I can make what I need (bsdtar support for multivolume) with this change and another minor set of changes in the read path. |
There's already precedent for bsdtar to support features (appending to tar archives, for example) that aren't supported by libarchive per se. And the utility of multivolume tar is significantly less for libarchive than for bsdtar. So personally I'd be content to see this go into bsdtar and not libarchive. |
static ssize_t file_write(struct archive *, void *, const void *buff, size_t); | ||
static int open_filename(struct archive *, int, const void *); | ||
|
||
int | ||
archive_write_open_fd(struct archive *a, int fd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion about merging archive_write_open_fd.c
and archive_write_open_filename.c
. There are a couple of guiding principles at work:
- Libarchive tries hard to keep everything
static
that isn't part of the official API. That avoids the need to fuss with linker-specific tricks to manage symbol visibility, but also implies that we try to limit our use of "private" internal interfaces. - Libarchive also tries hard to avoid static link pollution: If you reference a symbol, you get just the necessary pieces needed to support that symbol and very little else. But that means we have to be careful about combining features into a single source file.
A side effect of the above principles is that we do naturally get some duplication of common basic functionality. Personally, I'm fine with that: it has the side effect of keeping most source files standalone and easy to understand without having to dig through the entire system to untangle a lot of specialized vernacular. It means that most library functionality could be implemented outside of the library without any problems.
In this particular case, I don't think there's a really compelling argument either way: The duplicated functions aren't that significant and neither is the link pollution from combining the two. But I would not like to see other unrelated files get merged just because they happen to share a few helper functions.
@@ -1055,3 +1070,190 @@ test_for_append(struct bsdtar *bsdtar) | |||
*/ | |||
|
|||
} | |||
|
|||
static int | |||
bsdtar_write_open_fd(struct bsdtar *b, struct archive *a, int fd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be duplicated in bsdtar? I thought write_open_fd
was only used for stdout, which shouldn't need any multivolume support.
Is there any reasonable way to write tests for this? We've accepted features like this in the past without tests, but it's always better if we have some tests to help guard against regressions. |
I don't know how your test regime works. The trigger for this function (because compressed tapes are, by their very nature, variable length from the perspective of the user) is the EOT delivery by the device. To test it, you'd either be simulating it or have a test bench that can deliver that EOT status. |
So... This pull request contains two things: code review patches and mutivolume write patches. It is intended for review and discussion and not release. No changes to documentation or long options yet.
A: Code review. In doing my work, I noticed two things that could be improved.
B: Multivolume tar archives.
To properly do a multivolume archive with a tape drive, you need to close the tape drive to change the tape. This will be true for any drive that can hold the tape and prevent eject. Might even be true for some floppies (what are they, even?). This patch allows multivolume tar writes (-c). I intend to also create a read patch (shortly). I also talked in the google group about a "multivolume header" allowing reading the non-first volume fairly easily (even given a regular tar and dd, if need be). That patch will take me longer as I must go deeper.
For this patch I needed to duplicate what was in archive_write_open_{filename|fd} inside the bsdtar code. This was due to some functions being "private" or static and/or who knows what at which level. Of note, you can't create a multivolume archive on stdout (you can't close and reopen it)... so I deny volume switching if filename is NULL. I track an print the volume number in the message. It's intended that the verbose output might print the volume number (ahead of the 'a') for an index ... and of course the "multivolume header" will use it.
I chose the gnutar compatible '-M' as the switch. I haven't put in the long option yet.
I have tested this with a SAS LTO-6 tape drive so far.