Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] Remote data Provider - not to be merged #1801

Open
wants to merge 3 commits into
base: dc-v5
Choose a base branch
from

Conversation

kum-deepak
Copy link
Collaborator

@kum-deepak kum-deepak commented Dec 14, 2020

It is now possible to send the entire filter state and retrieve data for all the charts in a single call from a remote server.

In this PoC there is a simple Express (NodeJS) based web server application (see folder server). It loads the data, sets up individual data providers, and waits. When it receives a request, it applies all the filters, computes data for each of the providers, serializes, and sends it back.

The client (see folder web-src/remote) hooks up calls to fetch data from a remote source by new callbacks beofreRenderAll and beforeRedrawAll. The remote data is saved in a local variable. The RemoteDataAdapters pick up data appropriate for the chart. The current sample does not cover Data Caping. The client does not use crossfilter and has no access to the actual underlying data.

The linking of server and client is based on chartId - which is anchorName in the current implementation.

Currently, DataTable/DataGrid is not supported. Not entirely clear on the approach that would be optimal. We might go for the protocol that DataGrid uses or create a simple one for our use. Either way, it is workable. We will come back to it at a later stage.

To test it out you need to run the following additionally for the server (which is an independent npm project):

$ cd server
$ npm ci                                       # only once
$ tsc && node dist/server.js

For the client, with grunt server running go to http://localhost:8888/web/remote/remote.html

To test it out further, open the example in multiple browsers/tabs. All of these will share the same in-memory instance of the data. Verify that these do not interfere with each other.

@kum-deepak kum-deepak added this to the dc-v5 milestone Dec 14, 2020
@gordonwoodhull
Copy link
Contributor

It would be great to have a server-side implementation of crossfilter-like interfaces - I don't think anyone has solved this generally, although lots of people have cobbled things together for their purposes.

The problem with using crossfilter itself on the server side is that it is very stateful - its efficiency relies on applying filters incrementally, so it will be slow to clear filters and apply different ones when there is more than one user hitting the server. Its basic design is single-user.

I'm sure it's fine for small examples, but it's exactly when the data gets big that you want a server-side solution.

A more realistic server might use Apache Arrow, Elastic Search, Nanocubes, or in-memory SQL.

@kum-deepak
Copy link
Collaborator Author

Wow! That was quick. It will be a great idea to use a more appropriate server-side solution.

In the current phase, I am trying to see that if the client-side is ready to work with remote data optimally. Already I am finding some gaps - handling error gracefully, throttling (debounce probably), ability to pause/resume updates (for example when restoring filters), optionally blocking the UI (while an update is in progress) among others.

Once the client has been adequately done, I guess the communication protocol would have stabilized. At that stage, I am intending to do a sample server using Ruby on Rails and Elastic (what we mostly use). It will likely have a configuration system, translation from dc filter convention to Elastic query conventions, querying, and packaging results in dc conventions.

I am planning to merge only dc related changes as of now. The current sample server may remain as a (possibly external) sample.

@gordonwoodhull
Copy link
Contributor

Good plan.

Yeah, I didn't test it, but those sorts of problems are to be expected. Glad to help test / troubleshoot if you get stuck.

I should be a bit more responsive through the end of year since everyone at work is on vacation. (I always vacation earlier in the year so I can have this quiet time to get work done.)

@kum-deepak
Copy link
Collaborator Author

I have a PoC server working with Ruby and Elastic. Please see https://github.com/kum-deepak/elastic-dc (branch develop). Basic running instructions in the README.

The main config is https://github.com/kum-deepak/elastic-dc/blob/develop/src/conf.rb, it defines dimensions, charts, custom aggregations, and custom value accessors.

It currently implements everything needed for stocks example - which may not be sufficient for other examples. However, the code quality and approach is reasonable. It uses Elastic bulk APIs and is fast. Currently does not use any framework other than Rack, however should be usable from any framework (like Rails, Grape, etc.).

I can notice the following differences from the Crossfilter version:

I will add more technical notes soon.

@kum-deepak
Copy link
Collaborator Author

It turns out that there is an easy way to tell Elastic to return empty buckets - https://www.elastic.co/guide/en/elasticsearch/reference/7.11/search-aggregations-bucket-terms-aggregation.html#_minimum_document_count_4 - setting it to 0 (default 1) works.

@gordonwoodhull
Copy link
Contributor

Hi @kum-deepak. I am eager to try this, just buried in other projects for the next couple weeks.

(I know, it won’t take that long, but I’m a bit obsessive and can’t change course without losing momentum. I can respond to the question, however.)

I think the default sort for group.all() is what is called “natural order” throughout the documentation - alphabetical (sometimes called lexicographic) for strings, numeric for numbers.

If you find something else it’s usually an input bug such as a mixture of the two, or falsy values which can really screw things up.

If you find something else, please post an issue on the new repo and I will follow up.

Again, apologies for the delay in review. This is incredible news!

@kum-deepak
Copy link
Collaborator Author

Thanks, @gordonwoodhull!

I have updated documentation in https://github.com/kum-deepak/elastic-dc/, have a look at it when you get time. No rush!

@kum-deepak
Copy link
Collaborator Author

I am merging the changes to dc itself - the server and the sample will remain in this PR for now.

@kum-deepak
Copy link
Collaborator Author

Rebased and updated code to work with the latest in the dc-v5 branch. It now uses node16 modules, so, it will only work with Node 16 and newer.

@kum-deepak kum-deepak modified the milestones: dc-v5, dc-next Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants