Unipop Elasticsearch plugin working only on 10000 docuements #135

gaurav6041 · 2018-08-28T10:39:17Z

I have an index where one field has has a value of department in which the employee is present in.
Below is present Unipop elasticsearch configurations

{
  "class": "org.unipop.elastic.ElasticSourceProvider",
  "addresses": "http://192.168.10.121:6968",
  "vertices": [
    {
      "index": "employee_records",
      "id": "@department",
      "label": "hastags",
      "properties": {
        "value": "@department"
      }
    },
    {
      "index": "employee_records",
      "id": "@unique",
      "label": "unique",
      "properties": {
        "value": "@unique"
      }
    }
  ],
  "edges": [
    {
      "index": "employee_records",
      "id": "@_id",
      "label": "connects",
      "properties": {},
      "outVertex": {
        "ref": true,
        "id": "@unique",
        "label": "unique"
      },
      "inVertex": {
        "ref": true,
        "id": "@department",
        "label": "hastags"
      }
    }
  ]
}

And below are results
{v[Tech]=3159, v[Analysis]=3726, v[Admin]=2111, v[Support]=1003}

And Here is my gremlin query => g.V().outE('connects').inV().groupCount()
There are total of 65000 employee records. But the above result totals to 9999
In my above configurations i have mapped department of employees with employee id.
So count of all the departments should total out to 9999 whereas it should be 65000.

This means that unipop is picking only 10000 documents from elasticsearch.
This might me due to limit of elasticsearch of fetching 10000 documents in one request and scroll api is not been used.

The text was updated successfully, but these errors were encountered:

randanman · 2018-08-28T12:18:12Z

@babiy8 @eyalmar100 didn't you recently add support for the scroll API?

gaurav6041 · 2018-08-30T09:08:41Z

The problem is still there of only 10000 documents being fetched from elasticsearch. Can anyone help?

randanman · 2018-09-02T12:17:24Z

@HenShalom is this bug happening to you as well?

eyalmar100 · 2018-09-05T09:19:48Z

Hi
My name is Eyal, and I'm one of the developers of Unipop.
Do you want to send me all the necessary details of your project to my email ?
I will try to reconstruct the issue and see why its not working for you.
My email is : eyalmar100@gmail.com

eyalmar100 · 2018-09-05T14:26:11Z

This is a long shot, but in the DocumentController class there is field :
.. private final int maxLimit = 10000;
try to change it and see what happens ..

seanbarzilay · 2018-09-05T14:29:00Z

The maxLimit field is the maximum number of documents elasticsearch can fetch and to my knowledge we haven't implemented the scroll API yet.

…

On Wed, Sep 5, 2018, 5:26 PM eyalmar100 ***@***.***> wrote: This is a long shot, but in the DocumentController class there is field : .. private final int maxLimit = 10000; try to change it and see what happens .. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#135 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AK8elAx95dWAWUnERupyTdH7OvdgItmyks5uX98SgaJpZM4WPWlq> .

randanman · 2018-09-05T15:17:27Z

@eyalmar100 thats not a good idea, and not a scalable solution.
I know for a fact that you worked on solving this, because I sat with you and @babiy8 to solve the scroll API issues. What happened with that? did you push the needed fixes?

eyalmar100 · 2018-09-05T17:21:11Z

I didn't know why/what this field refers to .. now I know after @seanbarzilay answered .
Sometime its good to have some remarks/documentation ..
Anyway, we started to implement the scroll API , but did not finish it yet ( the scroll API is based on 'polling' mechanism) , and we have to think how to interact/ cooperate with the 'client' to make it work right.
We will finish it soon.

seanbarzilay · 2018-09-05T17:31:00Z

@eyalmar100 If you want we can talk, before I left I had an idea on how to implement the scroll API.

eyalmar100 · 2018-09-06T05:40:19Z

@seanbarzilay - Yes . I will be happy to talk :)
my mail is : eyalmar100@gmail.com
you can send me a message with your contacts information ( phone , etc .)
Thanks

randanman · 2018-09-06T08:43:51Z

@eyalmar100 @babiy8 thats very unfortunate. I believe we wrote most of the code when I sat with you. You guys said you'll finish it and push it. What happened?

gaurav6041 · 2018-09-12T07:08:31Z

Hi All
Can you tell me till when the fixes would be done.
I wish to explore it further and do performance testing on large datasets

eyalmar100 · 2018-09-12T08:06:49Z

Hן
I'm exploring it , and hope it will be fixed during next week .

gaurav6041 · 2018-09-12T12:00:23Z

Great Thanks.

eyalmar100 · 2018-09-13T10:24:47Z

Hi @gaurav6041 ,
So I checked the code, and in the class DocumentController, there is this method ( plz make sure you use it ..) :

private <E extends Element, S extends DocumentSchema> Iterator search(SearchQuery query, Map<S, QueryBuilder> schemas) {

    if (schemas.size() == 0) return EmptyIterator.instance();
    logger.debug("Preparing search. Schemas: {}", schemas);

    client.refresh();


    Map<Search, List<Pair<S, Search>>> groupedQueries = schemas.entrySet().parallelStream().filter(Objects::nonNull)
            .map(kv -> createSearchBuilder(kv, query))
            .map(kv -> createSearch(kv, query))
            .collect(Collectors.groupingBy(Pair::getValue1
            ));

    return groupedQueries.entrySet().stream().flatMap(entry -> {
        Search search = entry.getKey();
        List<S> searchSchemas = entry.getValue().stream().map(Pair::getValue0).collect(Collectors.toList());
        JestResult results = client.execute(search);
        if (results == null || !results.isSucceeded()) return Stream.empty();
        JsonElement scroll_id = results.getJsonObject().get("_scroll_id");
        List<JsonElement> resultsList = new LinkedList<>();
        results.getJsonObject().get("hits").getAsJsonObject().get("hits").getAsJsonArray().forEach(hit->((LinkedList) resultsList).add(hit));

        **_if(scroll_id != null && resultsList.size() > 0 && resultsList.size() == maxLimit)_** {
            while (resultsList.size() < query.getLimit() || query.getLimit() == -1) {
                results = client.execute(new SearchScroll.Builder(scroll_id.getAsString(), "6m").build());
                scroll_id = results.getJsonObject().get("_scroll_id");
                JsonArray thisResultAsJsonArray = results.getJsonObject().get("hits").getAsJsonObject().get("hits").getAsJsonArray();
                thisResultAsJsonArray.forEach(hit-> ((LinkedList) resultsList).add(hit));

                if(thisResultAsJsonArray.size() != maxLimit)
                    break;

            }
        }
        return searchSchemas.stream().map(s -> s.parseResultsOptimized(resultsList, query));
    }).flatMap(Collection::stream).iterator();

}

Its in current version .
I run it on my dev ( I change the maxLimit for 500 for my query that suppose to return 800 records )-
It worked .
Please, try to run a query that returns smaller number of records , ( like 200 or so ) , change the maxLimit
to 30/40/50 whatever , and see if you eventually you get the right number ( aggregate the iterations) of records .
It should work.
Please let me know if it works for you, and then we'll continue ..
Thanks
P.S
Right now its not the best solution , but it SHOULD returns the right number of records

gaurav6041 · 2018-09-19T07:30:04Z

Hey @eyalmar100 - Is it fixed - picking all the documents matching the query from elasticsearch.
Actually I am not a developer of this and I am not familiar with its code.

eyalmar100 · 2018-09-19T18:18:33Z

Hi
Yes, its working.I tested it few times, and seems working correct.

gaurav6041 · 2018-09-20T05:38:30Z

@eyalmar100 - The problem is still there. Its still only fetching 10000 documents.
On which branch you have updated code? I am using master branch.

eyalmar100 · 2018-09-20T06:07:05Z

Hi
Yes, its in the master branch
You can validate that you use the correct code if you open the file I wrote above comment : DocumentController.java
and see the snippet code ..
Anyway, later on ( today or by the end of week )I'm going to add some printing to console to make sure you synchronized with the correct code.
P.S
You don't have to be developer to open the file and read the code :)

gaurav6041 · 2018-10-03T06:01:55Z

Hey @eyalmar100 - The last release was 0.2.1 . Can you tell me how to install unipop-elastic plugin not from a release but from a branch master and also add a new release , it becomes easy for everyone else who uses this.

eyalmar100 · 2018-10-03T07:38:29Z

Hi @gaurav6041 ,
you can download again all the files ( the same way you downloaded it before - using "download zip" file or "git clone ..")
build the elastic module ( mvn clean install) , after that , copy the output jar ( unipop-elastic-0.2.2-SNAPSHOT.jar ) to your tinkerpop folder ( ../name_tinkerpop_folder/ext/unipop).

.

gaurav6041 · 2018-10-03T09:18:46Z

Hey @eyalmar100 Thanks.
I mentioned this because as per the wiki of the unipop , a new plugin can directly be installed using command below on tinkerpop console command -> :install com.github.unipop-graph unipop-elastic 0.2.1 .
Anyway i tried building the project , but i am getting an exception as below

Caused by: org.apache.maven.project.DependencyResolutionException: Could not resolve dependencies for project com.github.unipop-graph:unipop-elastic:jar:0.2.2-SNAPSHOT: Failed to collect dependencies for [pl.allegro.tech:embedded-elasticsearch:jar:2.1.0 (compile), org.elasticsearch.client:transport:jar:5.3.1 (compile), org.apache.logging.log4j:log4j-api:jar:2.9.1 (compile), org.elasticsearch.plugin:transport-netty4-client:jar:5.3.1 (compile), com.esotericsoftware.yamlbeans:yamlbeans:jar:1.09 (compile), com.googlecode.json-simple:json-simple:jar:1.1.1 (compile), com.github.unipop-graph:unipop-core:jar:0.2.2-SNAPSHOT (compile), org.elasticsearch:elasticsearch:jar:5.3.1 (compile), io.searchbox:jest:jar:5.3.3 (compile), org.json:json:jar:20090211 (compile)] at org.apache.maven.project.DefaultProjectDependenciesResolver.resolve(DefaultProjectDependenciesResolver.java:158) at org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.getDependencies(LifecycleDependencyResolver.java:185) ... 22 more Caused by: org.sonatype.aether.collection.DependencyCollectionException: Failed to collect dependencies for [pl.allegro.tech:embedded-elasticsearch:jar:2.1.0 (compile), org.elasticsearch.client:transport:jar:5.3.1 (compile), org.apache.logging.log4j:log4j-api:jar:2.9.1 (compile), org.elasticsearch.plugin:transport-netty4-client:jar:5.3.1 (compile), com.esotericsoftware.yamlbeans:yamlbeans:jar:1.09 (compile), com.googlecode.json-simple:json-simple:jar:1.1.1 (compile), com.github.unipop-graph:unipop-core:jar:0.2.2-SNAPSHOT (compile), org.elasticsearch:elasticsearch:jar:5.3.1 (compile), io.searchbox:jest:jar:5.3.3 (compile), org.json:json:jar:20090211 (compile)] at org.sonatype.aether.impl.internal.DefaultDependencyCollector.collectDependencies(DefaultDependencyCollector.java:258) at org.sonatype.aether.impl.internal.DefaultRepositorySystem.collectDependencies(DefaultRepositorySystem.java:308) at org.apache.maven.project.DefaultProjectDependenciesResolver.resolve(DefaultProjectDependenciesResolver.java:150) ... 23 more Caused by: org.sonatype.aether.resolution.ArtifactDescriptorException: Failed to read artifact descriptor for com.github.unipop-graph:unipop-core:jar:0.2.2-SNAPSHOT at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.loadPom(DefaultArtifactDescriptorReader.java:331) at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.readArtifactDescriptor(DefaultArtifactDescriptorReader.java:186) at org.sonatype.aether.impl.internal.DefaultDependencyCollector.process(DefaultDependencyCollector.java:412) at org.sonatype.aether.impl.internal.DefaultDependencyCollector.collectDependencies(DefaultDependencyCollector.java:240) ... 25 more Caused by: org.apache.maven.model.resolution.UnresolvableModelException: Could not find artifact com.github.unipop-graph:unipop:pom:0.2.2-SNAPSHOT at org.apache.maven.repository.internal.DefaultModelResolver.resolveModel(DefaultModelResolver.java:126) at org.apache.maven.model.building.DefaultModelBuilder.readParentExternally(DefaultModelBuilder.java:813) at org.apache.maven.model.building.DefaultModelBuilder.readParent(DefaultModelBuilder.java:664) at org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:310) at org.apache.maven.model.building.DefaultModelBuilder.build(DefaultModelBuilder.java:232) at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.loadPom(DefaultArtifactDescriptorReader.java:322) ... 28 more Caused by: org.sonatype.aether.resolution.ArtifactResolutionException: Could not find artifact com.github.unipop-graph:unipop:pom:0.2.2-SNAPSHOT at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:538) at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:216) at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolveArtifact(DefaultArtifactResolver.java:193) at org.apache.maven.repository.internal.DefaultModelResolver.resolveModel(DefaultModelResolver.java:122) ... 33 more

Can you help me with this?

randanman · 2018-10-03T10:30:54Z

@eyalmar100 you can publish a new version to maven, so users won't have to build it themselves.
Ask @seanbarzilay for help if needed.

eyalmar100 · 2018-10-03T13:19:49Z

ok, I will do it soon, thanks

gaurav6041 · 2018-10-04T05:09:05Z

@randanman @eyalmar100 - Thanks. Please let me know when you guys publish a new version

eyalmar100 · 2018-10-04T06:34:53Z

Hi
Sure. I will update you soon :)

gaurav6041 · 2018-10-11T10:08:32Z

@randanman @eyalmar100 - Can you please deploy the new version which we talked about earlier.

eyalmar100 · 2018-10-18T04:02:20Z

Hi @gaurav6041
Sorry for the delay ..had some problems,
I will deploy it next week ( hopefully )
Thanks

gaurav6041 · 2018-11-06T04:20:28Z

@randanman @eyalmar100 - Can you please tell me till when will it be deployed. I have tried building it on my own. But i am still getting some error.

eyalmar100 · 2018-11-06T06:28:04Z

Hi @gaurav6041
Sorry about the delay.. I found a problem in the scroll api when running queries that returns large amount of data, so I fixed it. I now testing it , and hopefully by the end of this week ( maybe sooner - tomorrow or so ) I will deploy it and leave you instructions how to use it
Thanks.

gaurav6041 closed this as completed Aug 30, 2018

gaurav6041 reopened this Aug 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unipop Elasticsearch plugin working only on 10000 docuements #135

Unipop Elasticsearch plugin working only on 10000 docuements #135

gaurav6041 commented Aug 28, 2018 •

edited

randanman commented Aug 28, 2018

gaurav6041 commented Aug 30, 2018

randanman commented Sep 2, 2018

eyalmar100 commented Sep 5, 2018

eyalmar100 commented Sep 5, 2018

seanbarzilay commented Sep 5, 2018 via email

randanman commented Sep 5, 2018

eyalmar100 commented Sep 5, 2018

seanbarzilay commented Sep 5, 2018

eyalmar100 commented Sep 6, 2018

randanman commented Sep 6, 2018

gaurav6041 commented Sep 12, 2018

eyalmar100 commented Sep 12, 2018

gaurav6041 commented Sep 12, 2018

eyalmar100 commented Sep 13, 2018

gaurav6041 commented Sep 19, 2018

eyalmar100 commented Sep 19, 2018

gaurav6041 commented Sep 20, 2018

eyalmar100 commented Sep 20, 2018

gaurav6041 commented Oct 3, 2018

eyalmar100 commented Oct 3, 2018

gaurav6041 commented Oct 3, 2018 •

edited

randanman commented Oct 3, 2018

eyalmar100 commented Oct 3, 2018

gaurav6041 commented Oct 4, 2018

eyalmar100 commented Oct 4, 2018

gaurav6041 commented Oct 11, 2018 •

edited

eyalmar100 commented Oct 18, 2018

gaurav6041 commented Nov 6, 2018 •

edited

eyalmar100 commented Nov 6, 2018

Unipop Elasticsearch plugin working only on 10000 docuements #135

Unipop Elasticsearch plugin working only on 10000 docuements #135

Comments

gaurav6041 commented Aug 28, 2018 • edited

randanman commented Aug 28, 2018

gaurav6041 commented Aug 30, 2018

randanman commented Sep 2, 2018

eyalmar100 commented Sep 5, 2018

eyalmar100 commented Sep 5, 2018

seanbarzilay commented Sep 5, 2018 via email

randanman commented Sep 5, 2018

eyalmar100 commented Sep 5, 2018

seanbarzilay commented Sep 5, 2018

eyalmar100 commented Sep 6, 2018

randanman commented Sep 6, 2018

gaurav6041 commented Sep 12, 2018

eyalmar100 commented Sep 12, 2018

gaurav6041 commented Sep 12, 2018

eyalmar100 commented Sep 13, 2018

gaurav6041 commented Sep 19, 2018

eyalmar100 commented Sep 19, 2018

gaurav6041 commented Sep 20, 2018

eyalmar100 commented Sep 20, 2018

gaurav6041 commented Oct 3, 2018

eyalmar100 commented Oct 3, 2018

gaurav6041 commented Oct 3, 2018 • edited

randanman commented Oct 3, 2018

eyalmar100 commented Oct 3, 2018

gaurav6041 commented Oct 4, 2018

eyalmar100 commented Oct 4, 2018

gaurav6041 commented Oct 11, 2018 • edited

eyalmar100 commented Oct 18, 2018

gaurav6041 commented Nov 6, 2018 • edited

eyalmar100 commented Nov 6, 2018

gaurav6041 commented Aug 28, 2018 •

edited

gaurav6041 commented Oct 3, 2018 •

edited

gaurav6041 commented Oct 11, 2018 •

edited

gaurav6041 commented Nov 6, 2018 •

edited