Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18679. Add API for bulk/paged delete of files #6726

Merged

Conversation

mukund-thakur
Copy link
Contributor

@mukund-thakur mukund-thakur commented Apr 12, 2024

Adding tests on top of #6494

Description of PR

How was this patch tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

steveloughran and others added 6 commits April 15, 2024 12:36
A more minimal design that is easier to use and implement.

Caller creates a BulkOperation; they get the page size of
it and then submit batches to delete of less than that size.

The outcome of each call contains a list of failures.

S3A implementation to show how straightforward it is.

Even with the single entry page size, it is still more
efficient to use this as it doesn't try to recreate a
parent dir or perform any probes to see if it is a directory:
it maps straight to a DELETE call.

Change-Id: Ibe8737e7933fe03d39070e7138a710b50c3d60c2
Add methods in FileUtil to take an FS, cast to a BulkDeleteSource
then perform the pageSize/bulkDelete operations.

This is to make reflection based access straightforward: no new interfaces
or types to work with, just two new methods with type-erased lists.

Change-Id: I2d7b1bf8198422de635253fc087ca551a8bc6609
Change-Id: Ib098c07cc1f7747ed1a3131b252656c96c520a75
Using this PR to start with the initial design, implementation
and services offered by having lower-level interaction with S3
pushed down into an S3AStore class, with interface/impl split.

The bulk delete callbacks now to talk to the store, *not* s3afs,
with some minor changes in behaviour (IllegalArgumentException is
raised if root paths / are to be deleted)

Mock tests are failing; I expected that: they are always brittle.

What next? get this in and then move lower level fs ops
over a method calling s3client at a time, or in groups, as appropriate.

The metric of success are:
* all callback classes created in S3A FS can work through the store
* no s3client direct invocation in S3AFS

Change-Id: Ib5bc58991533fd5d13e78426e88b884b6ae5205c
Changing results of method calls, using Tuples.pair() to
return Map.Entry() instances as immutable tuples.

Change-Id: Ibdd5a5b11fe0a57b293b9cb623272e272c8bab69
@mukund-thakur mukund-thakur force-pushed the bulk-delete-mukund-HADOOP-18679 branch from 4bbb3b7 to a61f139 Compare April 15, 2024 18:02
@github-actions github-actions bot added the ABFS label Apr 15, 2024
@steveloughran steveloughran self-assigned this Apr 15, 2024
@steveloughran steveloughran changed the title Add API for bulk/paged object deletion HADOOP-18679. Add API for bulk/paged object deletion Apr 15, 2024
@steveloughran
Copy link
Contributor

commented. I've also done a PR #6738 which tunes the API to work with iceberg, having just written a PoC of the iceberg binding.

My PR

  • moved the wrapper methods to a new wrappedio.WrappedIO class
  • add a probe for the api being available
  • I also added an availability probe in the interface. not sure about that as we really should make it available everywhere, always.

Can you cherrypick this PR onto your branch and then do the review comments.

After which, please do not do any rebasing of your PR. That way, it is easier for me too keep my own branch in sync with your changes. Thanks.

PoC of iceberg integration, based on their S3FileIO one.

https://github.com/steveloughran/iceberg/blob/s3/HADOOP-18679-bulk-delete-api/core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java#L208

The iceberg api passes in a collection of paths, which may span multiple filesystems.

To handle this,

  • the bulk delete API should take a Collection, not a list
  • it needs to be implemented in every FS, because trying to distinguish case-by-case on support would be really complex.

We are going to need a default FS impl which just
invokes delete(path, false) and maps any IOE to a failure.

Change-Id: If56bca7cb8529ccbfbb1dfa29cedc8287ec980d4
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented

*/
public class DefaultBulkDeleteOperation implements BulkDelete {

private final int pageSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is always 1, isn't it? so much can be simplified here

  • no need for the field
  • no need to pass it in the constructor
  • pageSize() to return 1

validateBulkDeletePaths(paths, pageSize, basePath);
List<Map.Entry<Path, String>> result = new ArrayList<>();
// this for loop doesn't make sense as pageSize must be 1.
for (Path path : paths) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's only even going to be 1 entry here

try {
fs.delete(path, false);
// What to do if this return false?
// I think we should add the path to the result list with value "Not Deleted".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good q. I'd say yes. or actually do a getFileStatus() cal and see what is there for

  • file doesn't exist (not an error)
  • path is a directory: add to result

key is that file not found isn't escalated

bindReadOnlyRolePolicy(assumedRoleConfig, readOnlyDir);
roleFS = (S3AFileSystem) destDir.getFileSystem(assumedRoleConfig);

int range = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a store where bulk delete page size == 1, use that as the range

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 05s No case conflicting files found.
+0 🆗 codespell 0m 01s codespell was not available.
+0 🆗 detsecrets 0m 01s detect-secrets was not available.
+0 🆗 xmllint 0m 01s xmllint was not available.
+0 🆗 spotbugs 0m 01s spotbugs executables are not available.
+0 🆗 markdownlint 0m 01s markdownlint was not available.
+1 💚 @author 0m 00s The patch does not contain any @author tags.
+1 💚 test4tests 0m 00s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 2m 31s Maven dependency ordering for branch
+1 💚 mvninstall 91m 09s trunk passed
+1 💚 compile 40m 32s trunk passed
+1 💚 checkstyle 6m 09s trunk passed
-1 ❌ mvnsite 4m 42s /branch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in trunk failed.
+1 💚 javadoc 14m 12s trunk passed
+1 💚 shadedclient 171m 45s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 2m 24s Maven dependency ordering for patch
+1 💚 mvninstall 11m 00s the patch passed
+1 💚 compile 38m 33s the patch passed
+1 💚 javac 38m 33s the patch passed
-1 ❌ blanks 0m 00s /blanks-eol.txt The patch has 5 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 checkstyle 6m 31s the patch passed
-1 ❌ mvnsite 4m 36s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 javadoc 14m 19s the patch passed
+1 💚 shadedclient 185m 02s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ asflicense 5m 46s /results-asflicense.txt The patch generated 1 ASF License warnings.
555m 55s
Subsystem Report/Notes
GITHUB PR #6726
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname MINGW64_NT-10.0-17763 cfb6e8c364ad 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / 7415427
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/testReport/
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Add bulk delete path capability true for all FS
@github-actions github-actions bot added the HDFS label Apr 25, 2024
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this, just want to make sure we are happy with the fallback logic of delete() false. it's such a vague result anyway.

we should get reviews from people like iceberg devs

@@ -85,6 +88,9 @@ public ITestS3AContractBulkDelete(boolean enableMultiObjectDelete) {
protected Configuration createConfiguration() {
Configuration conf = super.createConfiguration();
S3ATestUtils.disableFilesystemCaching(conf);
conf = propagateBucketOptions(conf, getTestBucketName(conf));
skipIfNotEnabled(conf, Constants.ENABLE_MULTI_DELETE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should actually be guarded with 'if (enableMultiObjectDelete)' so at least the single entry delete ops are tested with GCS. This does matter as GCS returns 404 on a DELETE nonexistent-path; we need to make sure that doesn't trigger a failure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch. tested with gcs bucket as well.

@steveloughran
Copy link
Contributor

for the iceberg support to work, all filesystems MUST implement the api or we have to modify that PoC to handle the case where they don't. I'd rather the spec says any FS which supports delete() MUST support this since it comes for free.

@steveloughran
Copy link
Contributor

iceberg poc pr apache/iceberg#10233

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 06s No case conflicting files found.
+0 🆗 codespell 0m 01s codespell was not available.
+0 🆗 detsecrets 0m 01s detect-secrets was not available.
+0 🆗 xmllint 0m 01s xmllint was not available.
+0 🆗 spotbugs 0m 01s spotbugs executables are not available.
+0 🆗 markdownlint 0m 01s markdownlint was not available.
+1 💚 @author 0m 00s The patch does not contain any @author tags.
+1 💚 test4tests 0m 00s The patch appears to include 10 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 2m 19s Maven dependency ordering for branch
+1 💚 mvninstall 91m 56s trunk passed
+1 💚 compile 40m 19s trunk passed
+1 💚 checkstyle 6m 14s trunk passed
-1 ❌ mvnsite 4m 28s /branch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in trunk failed.
+1 💚 javadoc 19m 58s trunk passed
+1 💚 shadedclient 195m 53s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 2m 25s Maven dependency ordering for patch
+1 💚 mvninstall 16m 00s the patch passed
+1 💚 compile 37m 43s the patch passed
+1 💚 javac 37m 43s the patch passed
+1 💚 blanks 0m 00s The patch has no blanks issues.
+1 💚 checkstyle 5m 59s the patch passed
-1 ❌ mvnsite 4m 36s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 javadoc 19m 48s the patch passed
+1 💚 shadedclient 199m 09s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 5m 37s The patch does not generate ASF License warnings.
597m 32s
Subsystem Report/Notes
GITHUB PR #6726
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname MINGW64_NT-10.0-17763 e57383186604 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / 0339eeb
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/4/testReport/
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/4/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented.

  • I'd prefer that BulkDeleteSource to be in FileSystem, with the base implementatoon returning a DefaultBulkDeleteOperation instance.
  • and so we should have one of the contract tests doing it directly through the API on the getFileSystem() instance
  • other than that though, looking really good.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented. Now, one more thing based on my changes in #6686.

it's adding many more methods to this class, so I'm giving them the name of the class/interface + "_" + method name.

can you do the same here? some style checker will complain but it will help us to separate the methods in the new class.

other than that, all good

@@ -4980,4 +4982,17 @@ public MultipartUploaderBuilder createMultipartUploader(Path basePath)
methodNotSupported();
return null;
}

/**
* Create a default bulk delete operation to be used for any FileSystem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't hold for the subclasses. better to say

Create a bulk delete operation.
The default implementation returns an instance of {@link DefaultBulkDeleteOperation}

@mukund-thakur
Copy link
Contributor Author

can you do the same here? some style checker will complain but it will help us to separate the methods in the new class.

I don't understand what to do here.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mukund-thakur tried to clarify what i mean: we have the class/interface split from the method/operation by a _ character

it makes a lot more sense when you look at my PR, which brings a lot more methods into the same class. Why all the same class? less reflection code

* @throws IllegalArgumentException path not valid.
* @throws IOException problems resolving paths
*/
public static int bulkDeletePageSize(FileSystem fs, Path path) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename bulkDelete_pageSize

* @throws IOException IO problems including networking, authentication and more.
* @throws IllegalArgumentException if a path argument is invalid.
*/
public static List<Map.Entry<Path, String>> bulkDelete(FileSystem fs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename bulkDelete_delete

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 07s No case conflicting files found.
+0 🆗 codespell 0m 01s codespell was not available.
+0 🆗 detsecrets 0m 01s detect-secrets was not available.
+0 🆗 xmllint 0m 01s xmllint was not available.
+0 🆗 spotbugs 0m 01s spotbugs executables are not available.
+0 🆗 markdownlint 0m 01s markdownlint was not available.
+1 💚 @author 0m 01s The patch does not contain any @author tags.
+1 💚 test4tests 0m 00s The patch appears to include 11 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 2m 42s Maven dependency ordering for branch
+1 💚 mvninstall 107m 34s trunk passed
+1 💚 compile 48m 30s trunk passed
+1 💚 checkstyle 7m 10s trunk passed
-1 ❌ mvnsite 5m 19s /branch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in trunk failed.
+1 💚 javadoc 23m 51s trunk passed
+1 💚 shadedclient 225m 30s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 228m 42s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 2m 57s Maven dependency ordering for patch
+1 💚 mvninstall 19m 17s the patch passed
+1 💚 compile 45m 43s the patch passed
+1 💚 javac 45m 43s the patch passed
+1 💚 blanks 0m 01s The patch has no blanks issues.
+1 💚 checkstyle 7m 25s the patch passed
-1 ❌ mvnsite 5m 28s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 javadoc 24m 00s the patch passed
+1 💚 shadedclient 232m 12s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 7m 14s The patch does not generate ASF License warnings.
700m 51s
Subsystem Report/Notes
GITHUB PR #6726
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname MINGW64_NT-10.0-17763 296d6abd6fb2 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / e37d88f
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/5/testReport/
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/5/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

mukund, if you can do those naming changes then I'm +1

@steveloughran steveloughran changed the title HADOOP-18679. Add API for bulk/paged object deletion HADOOP-18679. Add API for bulk/paged delete of files May 17, 2024
@steveloughran
Copy link
Contributor

+1

I was about to merge then I realised that yetus wasn't ready. Here is my draft commit message

  1. as soon as this is in I will rebase HADOOP-19131. Assist reflection IO with WrappedOperations class #6686 onto it, which extends WrappedIO and adds the reflection utility classes from Parquet to assist in testing.
  2. I'll leave you to do the cherrypick and merge onto 3.4.x
  3. And I want to get a minimal version into 3.3.x, maybe with a page size of 1 even on S3A, but without the safety checks, so still saves on LIST/HEAD calls.

create a BulkDelete implementation from a
BulkDeleteSource; the BulkDelete interface provides
the pageSize(): the maximum number of entries which can be
deleted, and a bulkDelete(Collection paths)
method which can take a collection up to pageSize() long.

This is optimized for object stores with bulk delete APIs;
the S3A connector will offer the page size of
fs.s3a.bulk.delete.page.size unless bulk delete has
been disabled.

Even with a page size of 1, the S3A implementation is
more efficient than delete(path)
as there are no safety checks for the path being a directory
or probes for the need to recreate directories.

The interface BulkDeleteSource is implemented by
all FileSystem implementations, with a page size
of 1 and mapped to delete(pathToDelete, false).
This means that callers do not need to have special
case handling for object stores versus classic filesystems.

To aid use through reflection APIs, the class
org.apache.hadoop.io.wrappedio.WrappedIO
has been created with "reflection friendly" methods.

Contributed by Mukund Thakur and Steve Loughran

@steveloughran steveloughran merged commit 47be1ab into apache:trunk May 20, 2024
1 of 2 checks passed
@steveloughran
Copy link
Contributor

merged to trunk. mukund, can you do the backport to branch-3.4.1, while we can think about what to do for 3.3.9? speaking of which, we should think about that...

@steveloughran
Copy link
Contributor

fyi #6686 adds dynamic load for the wrapped methods, and calls them through the tests.

@mukund-thakur
Copy link
Contributor Author

While backporting this to branch-3.4 I see this failure. Will check if this is happening on trunk as well.
[INFO] [ERROR] Failures: [ERROR] TestStagingCommitter.testJobCommitFailure:662 [Committed objects compared to deleted paths org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@2de1acf4{ requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, deletes=0}] Expecting: <["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", "s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", "s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", "s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", "s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]> to contain exactly in any order: <[]> but the following elements were unexpected: <["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", "s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", "s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", "s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", "s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]>

@steveloughran
Copy link
Contributor

yeah, just seen those too. the mocked s3 client isn't getting the delete one object calls via the store object.

got a small PR up to help debug it, which is not a fix

@steveloughran
Copy link
Contributor

problem is store is null in innermost s3afs, triggers an NPE in deleteObject() before the aws client has its delete operation called, so list of deleted paths is not updated.

easiest immediate fix is to mock deleteObject(), longer term we should actually be stubbing the entire store, as that's what the interface/impl split is meant to assist

@mukund-thakur
Copy link
Contributor Author

fix this with mockito magic somehow. https://github.com/apache/hadoop/pull/6843/files
this was not easy to debug.

@steveloughran
Copy link
Contributor

I tried to do it yesterday too. I think my solution would have pulled out createStore() into a method and overrode it, or added S3AInternals.setStore() call.

@steveloughran
Copy link
Contributor

update, #5081 does successfully show the staging UT failure (good!) but also some in hadoop-common due to the new interface...we need to modify the tests to indicate it is safe to not override the base implementation.

mukund-thakur added a commit that referenced this pull request May 28, 2024
Applications can create a BulkDelete instance from a
BulkDeleteSource; the BulkDelete interface provides
the pageSize(): the maximum number of entries which can be
deleted, and a bulkDelete(Collection paths)
method which can take a collection up to pageSize() long.

This is optimized for object stores with bulk delete APIs;
the S3A connector will offer the page size of
fs.s3a.bulk.delete.page.size unless bulk delete has
been disabled.

Even with a page size of 1, the S3A implementation is
more efficient than delete(path)
as there are no safety checks for the path being a directory
or probes for the need to recreate directories.

The interface BulkDeleteSource is implemented by
all FileSystem implementations, with a page size
of 1 and mapped to delete(pathToDelete, false).
This means that callers do not need to have special
case handling for object stores versus classic filesystems.

To aid use through reflection APIs, the class
org.apache.hadoop.io.wrappedio.WrappedIO
has been created with "reflection friendly" methods.

Contributed by Mukund Thakur and Steve Loughran
 Conflicts:
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants