Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement schemas.list RPC method #3598

Merged
merged 7 commits into from
May 20, 2024
Merged

Conversation

seancolsen
Copy link
Contributor

Fixes #3596

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the develop branch of the repository
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@seancolsen seancolsen added this to the Beta milestone May 16, 2024
@seancolsen seancolsen added the pr-status: review A PR awaiting review label May 16, 2024
@seancolsen
Copy link
Contributor Author

Note this is stacked on top of #3597

Base automatically changed from rpc_method_list_refactor to architectural_overhaul May 20, 2024 06:23
Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look great, overall organization is also nice.

I have some changes to request for the SQL code itself; see my specific line comments.

Comment on lines 686 to 687
min(s.nspname) AS name,
min(d.description) AS description,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer either adding these to the GROUP BY, or just using the any_value aggregate here. I find the min call a bit misleading. My personal preference is adding these columns to the GROUP BY since it makes it clear(er) that you're getting a single row per oid, name, description tuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d85df60

I originally wanted to use any_value but (to my surprise) this function was only just recently added in PostgreSQL 16.

I'm accustomed to the behavior in MySQL which basically automatically applies any_value any time you don't explicitly use an aggregate function.

It's helpful to know what sort of patterns you prefer to use for cases like this. Thanks. I'll try to use additional grouping columns going forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally wanted to use any_value but (to my surprise) this function was only just recently added in PostgreSQL 16.

Innnteresting. I could swear there was something like that before, but maybe I dreamt it.

Comment on lines 686 to 687
min(s.nspname) AS name,
min(d.description) AS description,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you aware of the obj_description function? It's specific to PostgreSQL (I think), but it would simplify things by avoiding one of the joins below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 38d8cee.

I did notice that function. But I chose to use the join because my instinct was that it would be more performant. I suppose performance isn't critical here anyway because we're not likely to be returning thousands of schemas. And perhaps Postgres needs to do similar things under the hood when using obj_description vs when querying system catalog tables. But basically in the general sense I just wouldn't be inclined to use lookup functions in the SELECT clause when we could use a JOIN to get the same data. Since you seem to prefer the lookup function at least in this case I've changed it here for clarity/simplicity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you're correct that the join would be more performant, but I note that the psql command \d uses the obj_description, so it's probably not that bad.

db/sql/0_msar.sql Show resolved Hide resolved
@seancolsen
Copy link
Contributor Author

Ready for re-review, @mathemancer

Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one tiny change to request. If you're okay with that, feel free to merge after namespacing the obj_description function.

db/sql/0_msar.sql Show resolved Hide resolved
Comment on lines 686 to 687
min(s.nspname) AS name,
min(d.description) AS description,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you're correct that the join would be more performant, but I note that the psql command \d uses the obj_description, so it's probably not that bad.

SELECT
s.oid AS oid,
s.nspname AS name,
obj_description(s.oid) AS description,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to go around again for such a small thing, but I think you were correct in a previous conversation (or comment; can't remember) when you noted that we should namespace these things since a (perhaps foolish) user could have created a function with the same name in some context. So, it'd be better to use pg_catalog.obj_description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Makes sense. Fixed in f328964

@mathemancer mathemancer added pr-status: revision A PR awaiting follow-up work from its author after review and removed pr-status: review A PR awaiting review labels May 20, 2024
@seancolsen seancolsen merged commit 2c15ae1 into architectural_overhaul May 20, 2024
33 checks passed
@seancolsen seancolsen deleted the schemas_list branch May 20, 2024 15:11
@Anish9901
Copy link
Member

@seancolsen I noticed that this PR doesn't have any python test for testing the schemas.list_ rpc function, it only has one for testing the expected name of the endpoint. Any reason why that may be?

@seancolsen
Copy link
Contributor Author

@Anish9901 It didn't seem worth testing to me. Do you think it should have a test? What would you want to assert in that test? I'm open to it certainly. But also trying to avoid too much boilerplate in order to move quickly. For reference, we discussed this in our last meeting (starting at about 48:20 and my take-away from that discussion was that the test coverage that matters the most to us right now is the test coverage at the SQL layer.

@Anish9901
Copy link
Member

Anish9901 commented May 20, 2024

Ok, I just went through the video, as per my understanding, I don't think we agreed on not adding the python tests for the rpc endpoints rather I think we decided on not having end-to-end python tests which would call up the db and hence tend to be slow. FWIW, I think we should at least have one test for making sure that the sql function is called with appropriate parameters, this is relatively low effort. I'd like to hear @mathemancer's opinion on whether or not we want tests that check the wiring of the functions that are called from within the rpc functions.

@mathemancer
Copy link
Contributor

As I'm looking through the PR with @Anish9901 's concerns in mind:

  • Given that we're not doing any E2E or even full-backend-call testing, we need to be extra conscientious about testing that the wiring between tests is as expected, and that tests are mangling things appropriately.
  • For the schema listing RPC function, therefore, we could add a test to:
    • Make sure the call to the underlying python-layer function is as expected, and
    • The list comprehension produces expected output, given some monkeypatched result from the python-layer function.

This would be similar to, for example, the test for columns.list_. It's kind of a pain to put together, but it does give us something to break if we accidentally typo something in the function logic that we might not otherwise notice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-status: revision A PR awaiting follow-up work from its author after review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants