Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for REST catalog in Iceberg connector #22417

Merged
merged 2 commits into from
May 24, 2024

Conversation

kiersten-stokes
Copy link
Contributor

@kiersten-stokes kiersten-stokes commented Apr 3, 2024

Description

This PR is two commits:

  • one that adds support for REST, which necessitates updating com.fasterxml.jackson.core:jackson-core and :jackson-databind to be >2.11 in the root pom
  • one that re-factors the Iceberg catalog implementation to be more modular (contains the bulk of the code changes)

Motivation and Context

Closes #20422

Impact

  • Adds support for using the Iceberg REST catalog

Test Plan

Tests have been added that spin up a testing HTTP server to create an Iceberg REST catalog with JDBC backing. Currently, three of the distributed smoke tests fail when using this set up due to a bug in Iceberg that I intend to follow up on

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Upgrades fasterxml.jackson artifacts to 2.11 :pr:`22417`

Iceberg Connector Changes
* Add support for Iceberg REST catalog :pr:`22417`

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation! Just two nits.

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good overall except that I think I disagree with leaving the refactor until later on IcebergResourceFactory. Might even be better to have this PR with 2 commits: one for the refactor and one for the REST catalog?

How were you thinking about refactoring it?

steveburnett
steveburnett previously approved these changes Apr 4, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local build. Everything looks good.

@hantangwangd
Copy link
Member

For reference, I agree with @ZacBlanco, use a separate commit for refactoring IcebergResourceFactory in the same PR would make the code more clearer.

@kiersten-stokes
Copy link
Contributor Author

@ZacBlanco @hantangwangd love the idea of doing two separate commits. That would be the best of both worlds since I can keep it separate from the functionality but also get the refactor done sooner than later

As for the re-factor, there are certainly several ways to go about it. I'm currently (at least once I'm back from vacation) considering two methods. First is passing a newSetBinder to the IcebergResourceFactory and accessing the relevant methods from there. This may be a bit easier. The second method is probably more involved, separating out each type of Iceberg catalog into it's own module and extending all the relevant classes from there (I think just config and resource factory's getCatalogProperties, though I haven't delved deep into this yet).

Please feel free to suggest additional ways and to weigh in on the above! As mentioned, I won't be back to this in another day or two 🙂

@ZacBlanco
Copy link
Contributor

ZacBlanco commented Apr 16, 2024

I took a brief look. Re-factor overall looks good! I will perform a detailed review once out of draft stage

@kiersten-stokes kiersten-stokes force-pushed the iceberg-rest branch 4 times, most recently from 874c730 to abd351b Compare April 23, 2024 19:58
@kiersten-stokes kiersten-stokes force-pushed the iceberg-rest branch 3 times, most recently from c6890d3 to 2a6589f Compare May 6, 2024 21:04
@kiersten-stokes kiersten-stokes marked this pull request as ready for review May 8, 2024 17:55
@kiersten-stokes kiersten-stokes requested review from a team and hantangwangd as code owners May 8, 2024 17:55
@hantangwangd
Copy link
Member

Hi @kiersten-stokes, can you rebase your branch onto the newest master branch? it's a bit too far behind :-).

@steveburnett
Copy link
Contributor

nit, suggest adding PR # into each item of the release note entry:

== RELEASE NOTES ==

General Changes
* Upgrades fasterxml.jackson artifacts to 2.11 :pr:`22417`

Iceberg Connector Changes
* Add support for Iceberg REST catalog :pr:`22417`

steveburnett
steveburnett previously approved these changes May 16, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local docs build, looks good!

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this great feature. I took a look at the first commit, overall it looks good to me. But I'm a little confused by the large amount of error logs printed by the test cases.

After check the code, I found that the error log is printed by RESTCatalogServlet, and by comparing the test implementations of Iceberg and Trino, I notice that they both use certain methods to hide these error messages (that is to say, the error messages are still logged, but not displayed). Iceberg inject a NOPLogger, while Trino overwrite the servlet RestCatalogServlet and injected its own configurable io.airlift.log.Logger.

So should we as well do something to get rid of these annoying error messages? What's your opinion?

Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly just nits

restJdbcBackend.setConf(hdfsEnvironment.getConfiguration(new HdfsContext(SESSION), new Path(location.getAbsolutePath())));

Map<String, String> properties = ImmutableMap.<String, String>builder()
.put(URI, "jdbc:h2:mem:test" + System.nanoTime() + "_" + ThreadLocalRandom.current().nextInt())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: will we actually need nanoTime and a random variable? - also, maybe add another underscore between :test and `System.nanoTime()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that both is probably overkill, although this pattern came up a few different times in other tests that use this type of catalog. I was guessing it was a 'better safe than sorry' situation. Definitely agree on the underscore!

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just took a whole look at all the commits, it's a very pretty refactoring, overall looks good to me. Just one thing for discussion, and some nits.

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, only some little nits. Again, it's a very pretty refactoring, make things more clearer.

Make nessie config optional in session properties
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the great work.

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome refactor and nice work--thank you!

@tdcmeehan tdcmeehan merged commit ba8ce85 into prestodb:master May 24, 2024
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for REST catalog in Iceberg connector
5 participants