Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: Remove FsStorageProvider from Table Service #104

Merged
merged 8 commits into from May 15, 2024

Conversation

HotSushi
Copy link
Collaborator

@HotSushi HotSushi commented May 13, 2024

Summary

❗❗ The volume of this PR might be intimidating, please read through the description to make reviewing easier.

The primary goal of this PR is to remove FsStorageProvider from :services:tables (and additionally :tables-test-fixture)

To achieve this goal, we do:
1: Add additional methods to StorageClient interface

StorageClient {
   + String getRootPrefix()
   + String getEndpoint()
}

2: Add additional method to FileIOManager to reverse lookup a Storage for a FileIO

FileIOManager {
   + Storage getStorage(fileIO)
}

3: Map FsStorageProvider methods to StorageManager

# pattern 1
storageProvider.rootPath() ->  
    storageManager.getDefaultStorage().getClient().getRootPrefix()

# pattern 2
storageProvider.endPoint() ->  
    storageManager.getDefaultStorage().getClient().getEndPoint()

# pattern 3
storageProvider.storageClient() -> 
    if(ensureHDFS) {
          storageManager.getDefaultStorage().getClient().getNativeClient()
    } else {
          throw exception or log 
    }

4: Fix log for StorageType

Instead of printing objecthash, print the type properly, ie StorageType.Type(local)

👆 thats it

Is this the end state of refactoring? NO. The goal of the PR is limited to removing fsStorageProvider. In future PRs we'll address CTAS, fileSecurer, stripPrefix, defaultTableLocation etc, currently these have been tagged with TODO

Changes

  • Refactoring
  • Tests

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

  • [ ✅ ] Manually Tested on local docker setup. Please include commands ran, and their output.
scala> spark.sql("CREATE TABLE openhouse.db.tb (ts timestamp, col1 string, col2 string) PARTITIONED BY (days(ts))").show()
++
||
++
++

scala>  spark.sql("INSERT INTO TABLE openhouse.db.tb VALUES (current_timestamp(), 'val1', 'val2')")
res1: org.apache.spark.sql.DataFrame = []

scala> spark.sql("SELECT * FROM openhouse.db.tb").show()
+--------------------+----+----+
|                  ts|col1|col2|
+--------------------+----+----+
|2024-05-13 23:52:...|val1|val2|
+--------------------+----+----+

scala> spark.sql("CREATE TABLE openhouse.db.tb0 AS SELECT 1").show()
++
||
++
++

scala> spark.sql("SELECT * FROM openhouse.db.tb0").show()
+---+
|  1|
+---+
|  1|
+---+

Audit events for these tests:
{"eventTimestamp":"1715644331261","clusterName":"LocalHadoopCluster","databaseName":"db","tableName":"tb","user":"openhouse","operationType":"CREATE","operationStatus":"SUCCESS","currentTableRoot":"hdfs://namenode:9000/data/openhouse/db/tb-97df14b9-752e-4b7f-b540-5058c784994c/00000-d4aca2a1-c8bc-40eb-8449-5b70d2e24598.metadata.json","grantor":null,"grantee":null,"role":null}

{"eventTimestamp":"1715644370032","clusterName":"LocalHadoopCluster","databaseName":"db","tableName":"tb","user":"openhouse","operationType":"COMMIT","operationStatus":"SUCCESS","currentTableRoot":"hdfs://namenode:9000/data/openhouse/db/tb-97df14b9-752e-4b7f-b540-5058c784994c/00001-54ef9c91-4747-4852-ba5b-adfb0b4f6fb3.metadata.json","grantor":null,"grantee":null,"role":null}

{"eventTimestamp":"1715644425694","clusterName":"LocalHadoopCluster","databaseName":"db","tableName":"tb","user":"openhouse","operationType":"READ","operationStatus":"SUCCESS","currentTableRoot":"hdfs://namenode:9000/data/openhouse/db/tb-97df14b9-752e-4b7f-b540-5058c784994c/00001-54ef9c91-4747-4852-ba5b-adfb0b4f6fb3.metadata.json","grantor":null,"grantee":null,"role":null}

  • [ ✅ ] Added new tests for the changes made.
  • [ ✅ ] Updated existing tests to reflect the changes made.
  • [ ✅ ] Some other form of testing like staging or soak time in production. Please explain.

Additional Information

@HotSushi
Copy link
Collaborator Author

cc: @ctrezzo

Copy link
Collaborator

@sumedhsakdeo sumedhsakdeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good, few nits/comments

sumedhsakdeo
sumedhsakdeo previously approved these changes May 14, 2024
Copy link
Collaborator

@autumnust autumnust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly have interface questions for a few current code blocks but they don't have to be blocking this change.

@HotSushi HotSushi merged commit ca05f77 into linkedin:main May 15, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants