Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation when upgrading 3-2.2.8 #891

Open
selimelawwa opened this issue Oct 18, 2022 · 1 comment
Open

Performance degradation when upgrading 3-2.2.8 #891

selimelawwa opened this issue Oct 18, 2022 · 1 comment

Comments

@selimelawwa
Copy link

When upgrading the from hadoop3-1.9.17 to hadoop3-2.2.8 (using the shaded jar of the new version) I faced performance degradation almost doubling the time of my tests.

I also created this Stackoverflow question

I have a performance test case which I run on my fileSystem implementation which uses org.apache.hadoop.fs.FileSystem the test runs several operations [create, read, write, rename, checkIfExists, mkDir] on 100 files with multiple threads.

I ran same tests several time on both versions of the Hadoop connectors and the new [2.2.8] is showing overall slower execution time (almost 2-2.2X the old connector time).

Below is a comparison between the average execution time for each operation while using each connector version:

operation, hadoop3-1.9.17, hadoop3-2.2.8
READ       4542.71,        10171.26, (X2 old)
RENAME     1347.75,        4483.27,  (X4 old)
EXISTS     47.23,          1538.74,  (X50 old)
CREATE     570.1,          1539.81,  (X3 old)

I have checked this github issue & tried to follow the recommendation to fine tune the performance using the configs/params but failed to find any improvement.

Is there any guidelines on parameter configurations to improve the above operations time?

Or might this performance issue be due to some incompatibility in my class-path jars? Even though I am using the shaded jar can other jars interfere?

Here is a list of jars I have in my class path:

  • gcs-connector-hadoop3-2.2.8-shaded.jar
  • google-extensions-0.7.1.jar
  • google-api-client-1.32.2.jar
  • google-http-client-apache-v2-1.40.1.jar
  • proto-google-common-protos-2.7.3.jar
  • google-http-client-1.41.8.jar
  • google-oauth-client-1.33.3.jar
  • google-http-client-jackson2-1.40.1.jar
  • grpc-google-cloud-storage-v2-2.2.2-alpha.jar
  • google-http-client-gson-1.41.8.jar
  • google-cloud-monitoring-1.82.0.jar
  • google-cloud-core-http-2.5.4.jar
  • proto-google-cloud-storage-v2-2.2.2-alpha.jar
  • google-api-client-jackson2-1.32.2.jar
  • google-api-services-iamcredentials-v1-rev20210326-1.32.1.jar
  • google-oauth-client-java6-1.27.0.jar
  • google-cloud-core-grpc-2.5.4.jar
  • google-http-client-appengine-1.34.2.jar
  • google-cloud-core-2.5.4.jar
  • google-auth-library-credentials-1.7.0.jar
  • google-cloud-storage-1.106.0.jar
  • proto-google-iam-v1-1.2.3.jar
  • google-api-services-storage-v1-rev20211018-1.32.1.jar
  • google-auth-library-oauth2-http-1.7.0.jar
  • proto-google-cloud-monitoring-v3-1.64.0.jar
  • grpc-services-1.43.2.jar
  • grpc-netty-shaded-1.43.2.jar
  • grpc-alts-1.43.2.jar
  • grpc-stub-1.43.2.jar
  • grpc-census-1.43.2.jar
  • grpc-protobuf-1.43.2.jar
  • grpc-api-1.43.2.jar
  • grpc-xds-1.43.2.jar
  • grpc-core-1.43.2.jar
  • grpc-protobuf-lite-1.43.2.jar
  • grpc-context-1.43.2.jar
  • opencensus-contrib-grpc-metrics-0.31.0.jar
  • grpc-auth-1.43.2.jar
  • gax-grpc-2.7.1.jar
  • grpc-grpclb-1.43.2.jar
  • api-common-2.1.4.jar
  • gax-2.7.1.jar
  • gax-httpjson-0.73.0.jar
  • util-2.2.8.jar
  • util-hadoop-hadoop3-2.2.8.jar
  • auto-value-annotations-1.9.jar
@selimelawwa
Copy link
Author

selimelawwa commented Oct 18, 2022

My File class which has methods like write, read ...etc

class File {
    private String path;
    private FileSystem fs;

}

Here is how my write method is implemented

@Override
    public OutputStream write(boolean overwriteIfExists) throws IOException {
        return fs.create(path, overwriteIfExists);
    }

And my read method:

 @Override
    public InputStream read() throws IOException {
        return fs.open(path);
    }

My test case simply creates many threads each has different a different instance of a file object which has different path (path to a unique GCS bucket object, path i.e gs://some-bucket/objectX) and then do read operation in example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant