Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the ProviderManager to have more paralleism #729

Open
aschmahmann opened this issue Jul 21, 2021 · 4 comments
Open

Allow the ProviderManager to have more paralleism #729

aschmahmann opened this issue Jul 21, 2021 · 4 comments
Labels
effort/days Estimated to take multiple days, but less than a week exp/intermediate Prior experience is likely helpful kind/enhancement A net-new feature or improvement to an existing feature need/triage Needs initial labeling and prioritization

Comments

@aschmahmann
Copy link
Contributor

aschmahmann commented Jul 21, 2021

The ProviderManager has a single event loop for managing all requests (puts, gets, etc.)

func (pm *ProviderManager) run(proc goprocess.Process) {

The major operations in the event loop such as add and get provider block the event loop and can potentially take a long time (e.g. a network based datastore lookup)

case np := <-pm.newprovs:
err := pm.addProv(np.key, np.val)
if err != nil {
log.Error("error adding new providers: ", err)
continue
}
if gcSkip != nil {
// we have an gc, tell it to skip this provider
// as we've updated it since the GC started.
gcSkip[mkProvKeyFor(np.key, np.val)] = struct{}{}
}
case gp := <-pm.getprovs:
provs, err := pm.getProvidersForKey(gp.key)
if err != nil && err != ds.ErrNotFound {
log.Error("error reading providers: ", err)
}
// set the cap so the user can't append to this.
gp.resp <- provs[0:len(provs):len(provs)]

This would lead to the provide manager getting backlogged and a slow down in network responses.

If instead we allow for parallelism on the calls happening within the event loop, such as allowing many adds or gets happening at the same time, then we'd be enabling users to reduce response latencies by increasing the resources they use.

@aschmahmann aschmahmann added effort/days Estimated to take multiple days, but less than a week exp/intermediate Prior experience is likely helpful kind/enhancement A net-new feature or improvement to an existing feature need/triage Needs initial labeling and prioritization labels Jul 21, 2021
@Stebalien
Copy link
Member

Also related: #675. But yeah, given high latency blockstore operations, this should be parallelized.

@Stebalien
Copy link
Member

Note: parallelism is a problem for gets, not puts (ish). The datastore is an "auto batching" datastore, so puts are effectively done in parallel.

I say "ish" because we'd need to have a "put" queue to make this fully parallel, otherwise we'll block every time we flush.

@Stebalien
Copy link
Member

I guess the main issue here is that gets are blocking puts.

@aschmahmann
Copy link
Contributor Author

Agreed. Generally the "event loop" pattern tends to fall apart when items inside the loop are blocking, so it'd be nice to have worker pools to keep the event loop clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/days Estimated to take multiple days, but less than a week exp/intermediate Prior experience is likely helpful kind/enhancement A net-new feature or improvement to an existing feature need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

2 participants