Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue 1075][pulsaradmin] Reorganize pulsaradmin codebase #1085

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

flowchartsman
Copy link
Contributor

@flowchartsman flowchartsman commented Aug 30, 2023

Master Issue: #1075

Motivation

This PR represents the changes first proposed by me on #streamnative/pulsar-admin-go#40 and #streamnative/pulsar-admin-go#39 squashed and replayed on top of master after #1079 landed.

Modifications

The API has been thoroughly rehashed, with most subpackages removed in favor of a cleaner top-level API on the base package which is much friendlier to autocomplete.

In addition, every REST endpoint can now be explicitly versioned.

It also includes the following changes:

flowchartsman/pulsar-admin-go@ef9273f
flowchartsman/pulsar-admin-go@c74ba31
flowchartsman/pulsar-admin-go@97989ac
flowchartsman/pulsar-admin-go@feace75
flowchartsman/pulsar-admin-go@756acd8
flowchartsman/pulsar-admin-go@2bd7185

Verifying this change

This PR includes some additional tests, however pulsaradmin as a whole is far from fully exercised, however I have a plan for testing (see next section).

Proposed Changes To Integration and Testing

Once this PR is complete (or as part of it), I plan to convert all of the raw API calls in the main library test code to use an integration fixture pulsaradmin client instead, which will help exercise the code in real-world use.

In addition, I would like to combine this with the changes I started in #1076 to remove the need for makefile-driven testing in favor of using a Docker-based IT framework such as https://github.com/testcontainers/testcontainers-go. There would be several benefits to this approach, chief among them that tests could be run piecemeal, and only those tests which actually use the integration container would cause it to become initialized. This would improve feature iteration across the board.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): shockingly, no.
  • The public API: yes
    • As mentioned above, this is a rather significant change to the pulsaradmin API
  • The schema: no
  • The default values of configurations: no
  • The wire protocol: no

Documentation

While this PR does not introduce a new feature per se, any existing examples using the old pulsaradmin API would need to be modified.

Remaining work

There is one remaining point I need to address before this PR is complete, and several nice-to-haves that can be separate issues, if need be.

Required

  • Config/Auth merge
    • The auth and config for pulsaradmin are still completely separate, and use a different initialization strategy to the main package. They will need to DRYed with the code in `pulsar/auth

Nice-to have

  • Make Parameters more consistent
    • There are still some rough edges and inconsistencies in the API, particularly in regards to method paramaters. One example is methods that consume types like TopicName and NamespaceName. I understand the desire for these types, but they are inconsistently applied, since elsewhere the API allows you to pass in strings for either of these types, sometimes separated, sometimes combined. This should be consistent.
    • It is also awkward to call their Get- functions only to immediately need to check or discard an error and then to dereference, since the methods consume values, while the Get- functions return pointers This is cumbersome, and can be simplified by externalizing the fields and allowing literal init, or by changing the getter functions to return value types instead (along with better names), and simply having the methods validate the arguments themselves. This reduces two error checks and a dereference down to one error check on the method call itself. In theory, these types could even be converted to string types with some extra methods, without too much trouble.
  • Integrate pulsaradmin into pulsar unit tests
    • This API represents a great opportunity for integration into the testing code for package pulsar itself, since most of the raw API calls made in that code are administrative. Combining this with testcontainers-go and an integration harness would absolutely slap.
  • use request cloning in pulsar/auth transports
    • This is probably a separate issue, but I notice that the transports in pulsar/auth do not clone the requests. I mention it here, since it's a modification I made to the auth handlers in pulwar-admin-go, and which is highly encouraged by Go. Since I'll likely be handling that code anyway, now might be a good time to take care of it.

@flowchartsman flowchartsman changed the title [Issue 1075][pulsaradmin] Reorganize [Issue 1075][pulsaradmin] Reorganize pulsaradmin codebase Aug 31, 2023
@flowchartsman flowchartsman marked this pull request as ready for review August 31, 2023 00:27
Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changeset is large. May I ask if you do any non-trivial changes except moving packages?

I'd suggest we fix the package layout as a flattened one + internal package. And do any non-trivial changes in the following PRs. This would help us merge patches smoothly instead of discussing up and down among multiple proposals.

If the changeset follows this pattern exactly, I'd expect to see a sheet of the rehashing details - this also helps downstream projects to migrate. For example, when I'm thinking of upgrade pulsar-client-go for https://github.com/streamnative/terraform-provider-pulsar, I need to know what new APIs I should use.

@flowchartsman
Copy link
Contributor Author

flowchartsman commented Sep 6, 2023

I list the changes above, however, the main changes to the API are client creation and auth, which I intended on replacing with the stuff from pulsar/auth once I had some feedback. Once that is incorporated there won't be a lot of major changes.

I also added the ability to discriminate errors that are:

  • "something not found"
  • "actual error"
  • "unexpected error that doesn't have a reason"

And you can see that here

The only other big change is the addition of the versioning for every endpoint. This is internal, and has a default value, so it is not user-facing as far as creating the client, but I believe it is necessary since it was the only thing that allowed me to use the library to administer pulsar functions, since pulsar-admin-go was locked to v2 endpoints, which did not work with recent versions of pulsar.

I am unsure what you mean by

I'd suggest we fix the package layout as a flattened one + internal package

Since this is already a flat layout could you elaborate on that further? I am not sure how much of the library can be moved into internal that I did not already move there, since most of it is either endpoint listings or API argument types and the types need to be exposed to the user at the package level.

If the idea is for me to back out any changes that weren't moving packages around, I would rather avoid that if I can and just fix auth here. It was already a pain to recreate the changes I did to pulsar-admin-go and it took me the better part of a day to replay them all in a way that made sense, and I had to go through and manually checkout -p all of the changes to the license comments so that it applied cleanly. That was not fun ;)

If I absolutely must do that, I will, but I want to be very clear on what's being asked of me before I go through all that effort a third time.

@flowchartsman
Copy link
Contributor Author

flowchartsman commented Sep 6, 2023

If you're curious about the API and migration, I think you would find it rather easy, here are some snippets to give you an idea:

padminClient, err := pulsaradmin.NewClient(flagPulsarWebServiceURL, flagPulsarClusterURL, flagPulsarAuthToken)
if err != nil {
	return nil, fmt.Errorf("error creating pulsar admin client: %w", err)
}

// listing topics
ns, _ := pulsaradmin.GetNameSpaceName(myTenant, myNamespace) // like I mentioned above, this is something I also want to fix, which would be in those later PRs you talked about
topicsPartitioned, topicsUnpartitioned, err := padminClient.Topics().List(*ns)
if err != nil {
	return fmt.Errorf("unable to pull list of topics for namespace %s: %v", myNamespace, err)
}

// checking function status
functionStatus, err := s.pulsarAdminClient.Functions().GetFunctionStatus(common.TenantFlows, common.NamespaceAgents, uuid)
if pulsaradmin.IsNotFound(err) { // this is also new
	// handle not found separately rather than returning an error
}
if err != nil {
        return fmt.Errorf("unexpected error retrieving function status: %v", err)
}

// creating a function
funcURL := "function://myTenant/myNamespace/name_of_fn@1.0"
functionConfig := &pulsaradmin.FunctionConfig{
	CleanupSubscription: true,
	AutoAck:                      true,
	LogTopic:                     logtopic,
	ProcessingGuarantees:         "ATLEAST_ONCE",
	Tenant:                       myTenant,
	Namespace:                    myNamespace,
	Name:                         functionName,
	Inputs:                       []string{inputTopic},
        Output:                       outputTopic,
	ClassName:                    "name_of_fn", //not necessary, but prettier
	ForwardSourceMessageProperty: true,
	Go:                           &funcURL,
	SubscriptionPosition:         "Latest",
}
if err := s.padminClient.Functions().CreateFuncWithURL(functionConfig, funcURL); err != nil {
	return false, fmt.Errorf("couldn't create the function: %w", err)
}

Basically you are typing all of the same things, only there are less packages to deal with, since everything is in pulsaradmin now.

@tisonkun
Copy link
Member

tisonkun commented Sep 10, 2023

could you elaborate on that further

I mean that when you include some nontrivial changes like:

  • Add API versions field;
  • Refactor Error structs;
  • ...

with a > 3000 lines refactor where most of its changes are rename, it's very hard for downstream developers who encounter the interface changed to figure out the exact difference before and after this patch - they should find the significant 20-40 line in these 3000+ lines, with no dedicated commit hint.

You refer to an outstanding issue about unify "auth". Do you want to include more nontrivial changes in this patch? I'll strongly suggest you submit a separate patch since we don't release per PR but we have time to finish your full proposal in several PRs (you can submit them at once and we do a simultaneous review, see apache/flink#9832 as an example, multiple commits or PRs instead of one commit for everything). Once a PR goes over thousands lines and contain a few lines nontrivial, I'm afraid that there is no more reviewers but we "merge and bless".

@nodece
Copy link
Member

nodece commented Sep 11, 2023

Thank you for your contribution! Your code looks good to me, but this PR is large, and includes changes to the feature and file movement.

Could you split multiple PRs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants