Skip to content

Commit

Permalink
Tar: Improve unseekable stream handling (#84279)
Browse files Browse the repository at this point in the history
* Add tests that verify we handle unseekable streams correctly.
Adjust existing unseekable stream tests to verify correct for all formats and with multiple tar entries.

* Add expected data field locations for all supported formats.

* Add exception message for when attempting to write an unseekable data stream into an unseekable archive stream.

* Add seekability validation in public TarWriter entry writing methods.

* Add TarFile stream roundtrip tests for unseekable streams.

* Add missing async TarFile roundtrip tests.

* Support unseekable streams in TarHeader.Write.

* Reuse and simplify the code.

* More reuse, remove unused and not needed.

* Remove TarFile.CreateFromDirectoryAsync.File.Roundtrip.cs. Submit it in a separate PR.

* Remove unnecessary resx comments.

* Dedicated method for writing fields to buffer depending on the format.

* Specify `Data` in name of method that expects unseekable data stream. Add extra debug asserts.

* Delete unnecessary method.

* Rename WritePadding to WriteEmptyPadding

* Rename test variables

* Merge identical test arrays into one

* Invert if else to be more clear about conditions

* remove size assign comment

* Remove redundant debug assert

* Async padding byte array creation simplification

* Apply suggestions from code review

---------

Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com>
  • Loading branch information
carlossanlop and adamsitnik committed Jun 1, 2023
1 parent 1575ed9 commit 84a7be7
Show file tree
Hide file tree
Showing 12 changed files with 584 additions and 258 deletions.
64 changes: 4 additions & 60 deletions src/libraries/System.Formats.Tar/src/Resources/Strings.resx
@@ -1,64 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
<root>
<!--
Microsoft ResX Schema
Version 2.0
The primary goals of this format is to allow a simple XML format
that is mostly human readable. The generation and parsing of the
various data types are done through the TypeConverter classes
associated with the data types.
Example:
... ado.net/XML headers & schema ...
<resheader name="resmimetype">text/microsoft-resx</resheader>
<resheader name="version">2.0</resheader>
<resheader name="reader">System.Resources.ResXResourceReader, System.Windows.Forms, ...</resheader>
<resheader name="writer">System.Resources.ResXResourceWriter, System.Windows.Forms, ...</resheader>
<data name="Name1"><value>this is my long string</value><comment>this is a comment</comment></data>
<data name="Color1" type="System.Drawing.Color, System.Drawing">Blue</data>
<data name="Bitmap1" mimetype="application/x-microsoft.net.object.binary.base64">
<value>[base64 mime encoded serialized .NET Framework object]</value>
</data>
<data name="Icon1" type="System.Drawing.Icon, System.Drawing" mimetype="application/x-microsoft.net.object.bytearray.base64">
<value>[base64 mime encoded string representing a byte array form of the .NET Framework object]</value>
<comment>This is a comment</comment>
</data>
There are any number of "resheader" rows that contain simple
name/value pairs.
Each data row contains a name, and value. The row also contains a
type or mimetype. Type corresponds to a .NET class that support
text/value conversion through the TypeConverter architecture.
Classes that don't support this are serialized and stored with the
mimetype set.
The mimetype is used for serialized objects, and tells the
ResXResourceReader how to depersist the object. This is currently not
extensible. For a given mimetype the value must be set accordingly:
Note - application/x-microsoft.net.object.binary.base64 is the format
that the ResXResourceWriter will generate, however the reader can
read any of the formats listed below.
mimetype: application/x-microsoft.net.object.binary.base64
value : The object must be serialized with
: System.Runtime.Serialization.Formatters.Binary.BinaryFormatter
: and then encoded with base64 encoding.
mimetype: application/x-microsoft.net.object.soap.base64
value : The object must be serialized with
: System.Runtime.Serialization.Formatters.Soap.SoapFormatter
: and then encoded with base64 encoding.
mimetype: application/x-microsoft.net.object.bytearray.base64
value : The object must be serialized into a byte array
: using a System.ComponentModel.TypeConverter
: and then encoded with base64 encoding.
-->
<xsd:schema id="root" xmlns="" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xsd:import namespace="http://www.w3.org/XML/1998/namespace" />
<xsd:element name="root" msdata:IsDataSet="true">
Expand Down Expand Up @@ -270,4 +211,7 @@
<data name="TarExtAttrDisallowedValueChar" xml:space="preserve">
<value>The value of the extended attribute key '{0}' contains a disallowed '{1}' character.</value>
</data>
</root>
<data name="TarStreamSeekabilityUnsupportedCombination" xml:space="preserve">
<value>Cannot write the unseekable data stream of entry '{0}' into an unseekable archive stream.</value>
</data>
</root>
Expand Up @@ -49,5 +49,9 @@ internal static class FieldLocations
internal const ushort V7Padding = LinkName + FieldLengths.LinkName;
internal const ushort PosixPadding = Prefix + FieldLengths.Prefix;
internal const ushort GnuPadding = RealSize + FieldLengths.RealSize;

internal const ushort V7Data = V7Padding + FieldLengths.V7Padding;
internal const ushort PosixData = PosixPadding + FieldLengths.PosixPadding;
internal const ushort GnuData = GnuPadding + FieldLengths.GnuPadding;
}
}

Large diffs are not rendered by default.

Expand Up @@ -222,6 +222,7 @@ public void WriteEntry(TarEntry entry)
ObjectDisposedException.ThrowIf(_isDisposed, this);
ArgumentNullException.ThrowIfNull(entry);
ValidateEntryLinkName(entry._header._typeFlag, entry._header._linkName);
ValidateStreamsSeekability(entry);
WriteEntryInternal(entry);
}

Expand Down Expand Up @@ -270,6 +271,7 @@ public Task WriteEntryAsync(TarEntry entry, CancellationToken cancellationToken
ObjectDisposedException.ThrowIf(_isDisposed, this);
ArgumentNullException.ThrowIfNull(entry);
ValidateEntryLinkName(entry._header._typeFlag, entry._header._linkName);
ValidateStreamsSeekability(entry);
return WriteEntryAsyncInternal(entry, cancellationToken);
}

Expand All @@ -281,12 +283,8 @@ private void WriteEntryInternal(TarEntry entry)

switch (entry.Format)
{
case TarEntryFormat.V7:
entry._header.WriteAsV7(_archiveStream, buffer);
break;

case TarEntryFormat.Ustar:
entry._header.WriteAsUstar(_archiveStream, buffer);
case TarEntryFormat.V7 or TarEntryFormat.Ustar:
entry._header.WriteAs(entry.Format, _archiveStream, buffer);
break;

case TarEntryFormat.Pax:
Expand Down Expand Up @@ -323,8 +321,7 @@ private async Task WriteEntryAsyncInternal(TarEntry entry, CancellationToken can

Task task = entry.Format switch
{
TarEntryFormat.V7 => entry._header.WriteAsV7Async(_archiveStream, buffer, cancellationToken),
TarEntryFormat.Ustar => entry._header.WriteAsUstarAsync(_archiveStream, buffer, cancellationToken),
TarEntryFormat.V7 or TarEntryFormat.Ustar => entry._header.WriteAsAsync(entry.Format, _archiveStream, buffer, cancellationToken),
TarEntryFormat.Pax when entry._header._typeFlag is TarEntryType.GlobalExtendedAttributes => entry._header.WriteAsPaxGlobalExtendedAttributesAsync(_archiveStream, buffer, _nextGlobalExtendedAttributesEntryNumber++, cancellationToken),
TarEntryFormat.Pax => entry._header.WriteAsPaxAsync(_archiveStream, buffer, cancellationToken),
TarEntryFormat.Gnu => entry._header.WriteAsGnuAsync(_archiveStream, buffer, cancellationToken),
Expand Down Expand Up @@ -374,6 +371,14 @@ private async ValueTask WriteFinalRecordsAsync()
return (fullPath, actualEntryName);
}

private void ValidateStreamsSeekability(TarEntry entry)
{
if (!_archiveStream.CanSeek && entry._header._dataStream != null && !entry._header._dataStream.CanSeek)
{
throw new IOException(SR.Format(SR.TarStreamSeekabilityUnsupportedCombination, entry.Name));
}
}

private static void ValidateEntryLinkName(TarEntryType entryType, string? linkName)
{
if (entryType is TarEntryType.HardLink or TarEntryType.SymbolicLink)
Expand Down
@@ -1,9 +1,9 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.IO.Enumeration;
using System.Linq;
using Xunit;

Expand Down Expand Up @@ -204,5 +204,65 @@ public void PaxNameCollision_DedupInExtendedAttributes()
Assert.True(File.Exists(path1));
Assert.True(Path.Exists(path2));
}

[Theory]
[MemberData(nameof(GetTestTarFormats))]
public void UnseekableStreams_RoundTrip(TestTarFormat testFormat)
{
using TempDirectory root = new();

using MemoryStream sourceStream = GetTarMemoryStream(CompressionMethod.Uncompressed, testFormat, "many_small_files");
using WrappedStream sourceUnseekableArchiveStream = new(sourceStream, canRead: true, canWrite: false, canSeek: false);

TarFile.ExtractToDirectory(sourceUnseekableArchiveStream, root.Path, overwriteFiles: false);

using MemoryStream destinationStream = new();
using WrappedStream destinationUnseekableArchiveStream = new(destinationStream, canRead: true, canWrite: true, canSeek: false);
TarFile.CreateFromDirectory(root.Path, destinationUnseekableArchiveStream, includeBaseDirectory: false);

FileSystemEnumerable<FileSystemInfo> fileSystemEntries = new FileSystemEnumerable<FileSystemInfo>(
directory: root.Path,
transform: (ref FileSystemEntry entry) => entry.ToFileSystemInfo(),
options: new EnumerationOptions() { RecurseSubdirectories = true });

destinationStream.Position = 0;
using TarReader reader = new TarReader(destinationStream, leaveOpen: false);

// Size of files in many_small_files.tar are expected to be tiny and all equal
int bufferLength = 1024;
byte[] fileContent = new byte[bufferLength];
byte[] dataStreamContent = new byte[bufferLength];
TarEntry entry = reader.GetNextEntry();
do
{
Assert.NotNull(entry);
string entryPath = Path.TrimEndingDirectorySeparator(Path.GetFullPath(Path.Join(root.Path, entry.Name)));
FileSystemInfo fsi = fileSystemEntries.SingleOrDefault(file =>
file.FullName == entryPath);
Assert.NotNull(fsi);
if (entry.EntryType is TarEntryType.RegularFile or TarEntryType.V7RegularFile)
{
Assert.NotNull(entry.DataStream);

using Stream fileData = File.OpenRead(fsi.FullName);

// If the size of the files in manu_small_files.tar ever gets larger than bufferLength,
// these asserts should fail and the test will need to be updated
AssertExtensions.LessThanOrEqualTo(entry.Length, bufferLength);
AssertExtensions.LessThanOrEqualTo(fileData.Length, bufferLength);

Assert.Equal(fileData.Length, entry.Length);

Array.Clear(fileContent);
Array.Clear(dataStreamContent);

fileData.ReadExactly(fileContent, 0, (int)entry.Length);
entry.DataStream.ReadExactly(dataStreamContent, 0, (int)entry.Length);

AssertExtensions.SequenceEqual(fileContent, dataStreamContent);
}
}
while ((entry = reader.GetNextEntry()) != null);
}
}
}
@@ -1,9 +1,9 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.IO.Enumeration;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
Expand Down Expand Up @@ -269,5 +269,65 @@ await using (TarWriter writer = new(stream, TarEntryFormat.Pax, leaveOpen: true)
Assert.True(File.Exists(path1));
Assert.True(Path.Exists(path2));
}

[Theory]
[MemberData(nameof(GetTestTarFormats))]
public async Task UnseekableStreams_RoundTrip_Async(TestTarFormat testFormat)
{
using TempDirectory root = new();

await using MemoryStream sourceStream = GetTarMemoryStream(CompressionMethod.Uncompressed, testFormat, "many_small_files");
await using WrappedStream sourceUnseekableArchiveStream = new(sourceStream, canRead: true, canWrite: false, canSeek: false);

await TarFile.ExtractToDirectoryAsync(sourceUnseekableArchiveStream, root.Path, overwriteFiles: false);

await using MemoryStream destinationStream = new();
await using WrappedStream destinationUnseekableArchiveStream = new(destinationStream, canRead: true, canWrite: true, canSeek: false);
await TarFile.CreateFromDirectoryAsync(root.Path, destinationUnseekableArchiveStream, includeBaseDirectory: false);

FileSystemEnumerable<FileSystemInfo> fileSystemEntries = new FileSystemEnumerable<FileSystemInfo>(
directory: root.Path,
transform: (ref FileSystemEntry entry) => entry.ToFileSystemInfo(),
options: new EnumerationOptions() { RecurseSubdirectories = true });

destinationStream.Position = 0;
await using TarReader reader = new TarReader(destinationStream, leaveOpen: false);

// Size of files in many_small_files.tar are expected to be tiny and all equal
int bufferLength = 1024;
byte[] fileContent = new byte[bufferLength];
byte[] dataStreamContent = new byte[bufferLength];
TarEntry entry = await reader.GetNextEntryAsync();
do
{
Assert.NotNull(entry);
string entryPath = Path.TrimEndingDirectorySeparator(Path.GetFullPath(Path.Join(root.Path, entry.Name)));
FileSystemInfo fsi = fileSystemEntries.SingleOrDefault(file =>
file.FullName == entryPath);
Assert.NotNull(fsi);
if (entry.EntryType is TarEntryType.RegularFile or TarEntryType.V7RegularFile)
{
Assert.NotNull(entry.DataStream);

await using Stream fileData = File.OpenRead(fsi.FullName);

// If the size of the files in manu_small_files.tar ever gets larger than bufferLength,
// these asserts should fail and the test will need to be updated
AssertExtensions.LessThanOrEqualTo(entry.Length, bufferLength);
AssertExtensions.LessThanOrEqualTo(fileData.Length, bufferLength);

Assert.Equal(fileData.Length, entry.Length);

Array.Clear(fileContent);
Array.Clear(dataStreamContent);

await fileData.ReadExactlyAsync(fileContent, 0, (int)entry.Length);
await entry.DataStream.ReadExactlyAsync(dataStreamContent, 0, (int)entry.Length);

AssertExtensions.SequenceEqual(fileContent, dataStreamContent);
}
}
while ((entry = await reader.GetNextEntryAsync()) != null);
}
}
}
Expand Up @@ -161,13 +161,18 @@ public void GetNextEntry_CopyDataTrue_UnseekableArchive()
Assert.Throws<ObjectDisposedException>(() => entry.DataStream.Read(new byte[1]));
}

[Fact]
public void GetNextEntry_CopyDataFalse_UnseekableArchive_Exceptions()
[Theory]
[InlineData(TarEntryFormat.V7)]
[InlineData(TarEntryFormat.Ustar)]
[InlineData(TarEntryFormat.Pax)]
[InlineData(TarEntryFormat.Gnu)]
public void GetNextEntry_CopyDataFalse_UnseekableArchive_Exceptions(TarEntryFormat format)
{
MemoryStream archive = new MemoryStream();
using (TarWriter writer = new TarWriter(archive, TarEntryFormat.Ustar, leaveOpen: true))
TarEntryType fileEntryType = GetTarEntryTypeForTarEntryFormat(TarEntryType.RegularFile, format);
using MemoryStream archive = new MemoryStream();
using (TarWriter writer = new TarWriter(archive, format, leaveOpen: true))
{
UstarTarEntry entry1 = new UstarTarEntry(TarEntryType.RegularFile, "file.txt");
TarEntry entry1 = InvokeTarEntryCreationConstructor(format, fileEntryType, "file.txt");
entry1.DataStream = new MemoryStream();
using (StreamWriter streamWriter = new StreamWriter(entry1.DataStream, leaveOpen: true))
{
Expand All @@ -176,30 +181,34 @@ public void GetNextEntry_CopyDataFalse_UnseekableArchive_Exceptions()
entry1.DataStream.Seek(0, SeekOrigin.Begin); // Rewind to ensure it gets written from the beginning
writer.WriteEntry(entry1);

UstarTarEntry entry2 = new UstarTarEntry(TarEntryType.Directory, "dir");
TarEntry entry2 = InvokeTarEntryCreationConstructor(format, TarEntryType.Directory, "dir");
writer.WriteEntry(entry2);
}

archive.Seek(0, SeekOrigin.Begin);
using WrappedStream wrapped = new WrappedStream(archive, canRead: true, canWrite: false, canSeek: false);
UstarTarEntry entry;
TarEntry entry;
byte[] b = new byte[1];
using (TarReader reader = new TarReader(wrapped)) // Unseekable
{
entry = reader.GetNextEntry(copyData: false) as UstarTarEntry;
entry = reader.GetNextEntry(copyData: false);
Assert.NotNull(entry);
Assert.Equal(TarEntryType.RegularFile, entry.EntryType);
Assert.Equal(fileEntryType, entry.EntryType);
entry.DataStream.ReadByte(); // Reading is possible as long as we don't move to the next entry

// Attempting to read the next entry should automatically move the position pointer to the beginning of the next header
Assert.NotNull(reader.GetNextEntry());
TarEntry entry2 = reader.GetNextEntry();
Assert.NotNull(entry2);
Assert.Equal(format, entry2.Format);
Assert.Equal(TarEntryType.Directory, entry2.EntryType);
Assert.Null(reader.GetNextEntry());

// This is not possible because the position of the main stream is already past the data
Assert.Throws<EndOfStreamException>(() => entry.DataStream.Read(new byte[1]));
Assert.Throws<EndOfStreamException>(() => entry.DataStream.Read(b));
}

// The reader must stay alive because it's in charge of disposing all the entries it collected
Assert.Throws<ObjectDisposedException>(() => entry.DataStream.Read(new byte[1]));
Assert.Throws<ObjectDisposedException>(() => entry.DataStream.Read(b));
}

[Theory]
Expand Down

0 comments on commit 84a7be7

Please sign in to comment.