Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal API changes #1

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Internal API changes #1

wants to merge 7 commits into from

Conversation

danielhfrank
Copy link
Contributor

Just wanted to put some work I've done on the internal api changes up for folks to look at. The main goal of all this stuff is to increase the separation between the epsilon-greedy stuff and the standard hostpool so they can be composed as desired. I'll try to summarize the main changes:

  • The epsilonGreedyHostPoolResponse keeps a pointer to a epsilonGreedyHostPool struct, not the interface. Internally it embeds a HostPoolResponse - the interface, not the basic struct impl.
  • The epsilonGreedyHostPool embeds the HostPool interface, not the basic struct impl
  • The various HostPoolResponse impls no longer have a sync.Once. I don't think there should be any expectation that they be threadsafe
  • The HostPool interface now has a private selectHost method to produce a HostPoolResponse from a host string. This also takes care of any state changes involved in selecting that host, such as dealing with retry time and starting a time for epsilonGreedy operation
  • getRoundRobin and getEpsilonGreedy just select a hostname. The selectHost method takes care of other state changes involved in delivering the HostResponse to the user.

I have some more changes in mind that I would eventually like to make in the service of the separation I mentioned, but I think that this is a good place to start. The tests pass, though I did have to change them slightly: I have to make a couple of type assertions, and I had to remove one line that tested the sync.Once functionality of the HostPoolResponses.

cc @jehiah @mreiferson

@danielhfrank
Copy link
Contributor Author

I tried to update this with a change that I think had gotten us stuck earlier. Specifically, it sometimes makes sense for us to separate the logic of choosing the next host to request, and maintaining the state around that choice. This is because extensions of the HostPool (such as epsilon greedy or anything else a user might come up with) often want to be able to invoke (or override) only one of those functions, without knowing the details of how they're implemented. This led to me exposing those two as public methods on the HostPool interface. The intention was to communicate that they should be used when developing extensions to HostPool, but as a user the Get method should be used to get the next host.

Unfortunately, in yet another head-on collision with the golang type system, it appears that the only way for this solution to proceed will be to refactor the public API from a Get method bound to a HostPool instance (and as part of the interface) to a package-level function that takes a HostPool instance as an argument. I'll omit the boring details of why that is here, but I've made the change in the last commit and have tests passing. Nevertheless, I'm not super happy with needing to change the public API, much less for a pull request called "internal changes".

@rymo4, it was your idea to try to pull this out of the grave, you got any ideas? @mreiferson, I know you're tired of writing javascript, you got any input? Does anyone think it's worthwhile to spend any more time on this?

@mreiferson
Copy link

I'm not sure I understand what necessitated the breaking API change (although I admit the end result is a reasonable outcome)...

I think the real question is what do we want this external API to support?

If we're looking to provide end-users the ability to introduce custom selection mechanisms what about an external API like this:

type HostPool interface {
    Get() Response
}

type Response interface {
    Host() string
    Mark(error)
}

type Selector interface {
    SelectNextHost() Response
    MarkHost(string, error)
}

func NewHostPool(hosts []string, selector Selector) HostPool

It would be used something like:

hp := hostpool.New([]string{"a", "b", "c"}, &hostpool.EpsilonGreedySelector{})
r := hp.Get()
host := r.Host()
err := doWork(host)
r.Mark(err)

It seems like the common subset of functionality is a list of hosts and where behavior differs is in the selection.. so I feel like the interfaces should reflect that, right?

The nature of the proposed Selector interface would enable each concrete type to maintain (separate) state per host (SelectNextHost is called to iterate, and MarkHost is called via HostPoolResponse.Mark).

Thoughts?

@rymo4
Copy link

rymo4 commented Sep 4, 2013

@mreiferson Nice, I think that the Selector interface makes sense and makes hostpool easy to extend. It would be nice to document these changes and how to add new selection algs in the README too.

@danielhfrank
Copy link
Contributor Author

I do like the approach of separating the interfaces that the user interacts with from those that developers of extensions work with, and think I may take a stab at putting that together. One thing worth keeping in mind is that, when extending HostPools (or Selectors here), it's often helpful to embed some kind of Selector. For example, doing this means that the Epsilon Greedy hostpool doesn't need to keep track of dead hosts, because it can leverage the base hostpool to do that. It was this practice that forced me to change the external API a bit, basically because those embedded Hostpools don't behave like superclasses in other languages (duh). I think that, for the same reason the package-level function got me out of that jam, separate Selector and HostPool interfaces won't suffer from my earlier problems.

Now, the way I'm envisioning implementing this, the struct backing HostPool will be very very thin, not much more than what the package level functions would have provided. But I do think that having the HostPool interface exposed to the user is more natural, as well as being backwards compatible.

@mreiferson
Copy link

I think the base functionality is so simple that I'm not convinced it's necessary to re-use just for the sake of DRY... I think it contributed to some of the original complexity. Also, I believe the difficulty with the proposed new API will be Selector types getting access to host data that is stored on the HostPool types...

One way or another, to settle on an API we're happy with, I don't think we're going to be able to avoid any breaking API changes, so I wouldn't worry too much about it.

I'm excited to see how it comes out...

@danielhfrank
Copy link
Contributor Author

I think you've been incepted 😜

}
h.epsilonCounts[h.epsilonIndex]++
h.epsilonValues[h.epsilonIndex] += int64(duration.Seconds() * 1000)
}

func (s *epsilonGreedySelector) MakeHostResponse(host string) HostPoolResponse {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's any value in maintaining a separate MakeHostResponse, having the function is fine, but making it part of the interface is unnecessary and can be hidden as an implementation detail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree on this point. I really want it to be possible to create a HostPool with multiple Selectors composed together. To do that, I don't think we can rely on hidden implementation details of the Selectors (among other things, extensions shouldn't need to be in the hostpool package). I haven't quite gotten to that point in this pull req, due to some stuff with the implementation of hostEntry that I've been deliberately leaving to a subsequent pull req to keep things from getting too complicated.

MakeHostResponse does the work necessary to update state based on the chosen host, I do think it's a necessary inclusion in the interface. Note here that we call out to the embedded Selector's MakeHostResponse, without privileged access to its implementation. That's the pattern I'd like to encourage for other Selectors going forward

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have been more specific. I wasn't suggesting that you rely on or even have access to internals...

It seems like SelectNextHost is the only public interface that a Selector needs to implement, it just needs to be changed to return a HostPoolResponse interface... that method's internal implementation doesn't seem relevant.

But, I'm not quite sure where you're going with selector composition, so please elaborate if I'm just missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, that definitely sounds more straightforward, and is closer to what I originally wanted to do. The problem I was running into was that we typically want to update some state when we select which host to try. This may be on the HostPool/Selector side (state around retrying a host previously marked dead) or on the HostPoolResponse side (storing the time that the request was initiated for timing purposes). MakeHostResponse was a poorly named way to shoehorn all those state updates together.

All of this is further muddied by the fact that I'm still just working with one hostEntry implementation. I'll try to continue thinking about this on a higher level in terms of what state needs to kept and when it needs to be accessed, maybe I'll come up with something simpler.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the same page w/ different selectors needing to maintain separate state, I just don't think that requires two public interface methods, right?

I agree with your assessment that hostEntry has too much responsibility and should be specific to each selector, ideally.

@mreiferson
Copy link

Another cleanup item... we can easily remove the embedded EpsilonValueCalculator in the Log and Polynomial types by just returning 1 / <calc> :)

✂️ 🚧

@@ -42,86 +46,84 @@ type epsilonGreedyHostPool struct {
// To compute the weighting scores, we perform a weighted average of recent response times, over the course of
// `decayDuration`. decayDuration may be set to 0 to use the default value of 5 minutes
// We then use the supplied EpsilonValueCalculator to calculate a score from that weighted average response time.
func NewEpsilonGreedy(hosts []string, decayDuration time.Duration, calc EpsilonValueCalculator) HostPool {
func NewEpsilonGreedy(decayDuration time.Duration, calc EpsilonValueCalculator) Selector {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/NewEpsilonGreedy/NewEpsilonGreedySelector

sthulb pushed a commit to HailoOSS/go-hostpool that referenced this pull request Sep 14, 2016
Allow hosts for an existing hostpool to be changed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants