Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES with Multi-Cluster Search Support #4473

Closed
gseng opened this issue Dec 16, 2013 · 3 comments
Closed

ES with Multi-Cluster Search Support #4473

gseng opened this issue Dec 16, 2013 · 3 comments

Comments

@gseng
Copy link
Contributor

gseng commented Dec 16, 2013

ES with Multi-Cluster Search Support

It'd be nice to be able to query across multiple clusters and get their aggregated results.

Our initial motivation is to view Kibana results across multiple clusters. See: Enhancement: Allow conntections to multiple ES backends from a single Kibana instance.

However, since we also use/query ES directly, a proxy would work far better than changing Kibana.

In the discussion above, it was decided (by Shay) that the place to do it would be at the ES level.

The following is a proposal to achieve it at the ES level. This is some very early planning and I've decided to post it early to make sure I'm not duplicating work or am on a totally wrong path (which is totally possible). Any feedback/comments are appreciated.

Proposal

The general plan is to make a query only node (termed 'search load balancer' in elasticsearch.yml) and have a list of cluster names that we want to query.

During a search, the node will query from each of the shards in each of the clusters and aggregate the results.

Details

  • Make a query only node
    • In elasticsearch.yml
      • node.master: false
      • node.data: false
    • I'm hoping this means that we can isolate code changes to only the search portion.
  • Accommodate multiple clusters
    • Have ZenDiscover reach out to all the listed clusters to get their state.
    • ClusterService will contain a map of ClusterName -> ClusterState.
      • To maintain the interface, clusterService.state() will just return the first cluster.
      • We can have another interface that will allow us to get the the cluster map.
    • MulticastZenPing will have to listen for changes in the listed clusters and update the corresponding cluster state.
  • Searching across multiple clusters
    • I've only looked at TransportSearchTypeAction and TransportSearchQueryThenFetchAction so far.
    • In the BaseAsyncAction, we'll need to get all the relevant shards across each cluster and query them (sendExecuteFirstPhase).
    • We change expectedSuccessfulOps and expectedTotalOps to the multi-cluster counts so that in onFirstPhaseResult we know when to move on (innerMoveToSecondPhase).
    • In moveToSecondPhase, we again use the metadata from each cluster and do the actual fetch of the documents.
    • Finally in innerFinishHim, we merge all the results with the SearchPhaseController and return the response via the normally.

Thanks!

@brusic
Copy link
Contributor

brusic commented Jan 13, 2014

Just noticed an interesting commit that addresses this issue: #4708

Definitely a more thought out approach since it merges the cluster states, but requires a new "tribe" node. I wonder how it will works for single cluster actions that do not require a merged cluster state.

@gseng
Copy link
Contributor Author

gseng commented Jan 13, 2014

Thanks brusic, that sounds like exactly what we want, and as you said, more elegant. Am working on seeing if it works out for us (comments in #4708 ).

@gseng gseng closed this as completed Jan 13, 2014
@brusic
Copy link
Contributor

brusic commented Jan 14, 2014

Glad to see that my comment was useful and that you are already testing out the feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants