Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dql): add keyword "every" to DQL #8976

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

EnricoMi
Copy link

@EnricoMi EnricoMi commented Aug 25, 2023

This adds the keyword every to DQL, which picks equidistant nodes. This provides a deterministic sample of the result that is proportional in size. It can be combined with first, offset, and after. Pagination is applied before every.

This supersedes #8107

Docs PR: None

@dgraph-bot dgraph-bot added area/testing Testing related issues area/querylang Issues related to the query language specification and implementation. area/core internal mechanisms go Pull requests that update Go code labels Aug 25, 2023
Copy link
Contributor

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the PR. This looks good to me, I have just one minor comment. I am also thinking that we should add test where every and after are used together. Thanks

query/query.go Outdated Show resolved Hide resolved
@mangalaman93 mangalaman93 changed the title feat(DQL): Add keyword every to DQL feat(dql): add keyword "every" to DQL Aug 25, 2023
@EnricoMi EnricoMi force-pushed the branch-add-every-to-dql-2 branch 2 times, most recently from f93b115 to b3a2e9f Compare August 25, 2023 10:00
@EnricoMi
Copy link
Author

we should add test where every and after are used together.

Added in https://github.com/dgraph-io/dgraph/pull/8976/files#diff-99902c080bca80ed1ad52da34ea4041069d1c64a52317fc17f2285e8170ae6ddR1085

mangalaman93
mangalaman93 previously approved these changes Aug 25, 2023
Copy link
Contributor

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor comment, LGTM otherwise. Is it possible for you to raise a corresponding docs PR on https://github.com/dgraph-io/dgraph-docs

query/query.go Outdated Show resolved Hide resolved
@EnricoMi
Copy link
Author

I'd love to run tests locally, but the instructions in CONTRIBUTING.md seem outdated and following the instructions from ci-dgraph-tests.yml don't work either:

cd t; ./t
testutil: "" localhost:9080 localhost:5080
...
Running tests for 121 packages.
Sent 1/121 packages for processing.
Bringing up cluster test-013-1 for package: ../dgraph/docker-compose.yml ...
CLUSTER UP: test-013-1. Package: ../dgraph/docker-compose.yml
Health for test-013-1_alpha6_1 failed: Get "http://localhost:49272/health": read tcp 127.0.0.1:35362->127.0.0.1:49272: read: connection reset by peer. Response: "". Retrying...
Health for test-013-1_alpha6_1 failed: Get "http://localhost:49272/health": dial tcp 127.0.0.1:49272: connect: connection refused. Response: "". Retrying...
Health for test-013-1_alpha6_1 failed: Get "http://localhost:49272/health": dial tcp 127.0.0.1:49272: connect: connection refused. Response: "". Retrying...
...

Running dgraph version in the dgraph/dgraph:local image returns:

dgraph: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by dgraph)
dgraph: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by dgraph)

@EnricoMi
Copy link
Author

EnricoMi commented Aug 25, 2023

I have reworked the notion of every in conjunction with first, after and offset. The every is applied before pagination, so that the result of every can be paginated, not the paginated result is every-ed.

This required to change the way every works. It now picks the last uid from every-sized batches, so that paginating with after works: 2e4f90f

@EnricoMi
Copy link
Author

I have managed to get tests working by setting contrib/Dockerfile to ubuntu:22.04 to match my dev machine.

Copy link
Contributor

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better too. One question before it is ready to go.

query/query4_test.go Outdated Show resolved Hide resolved
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
harshil-goel
harshil-goel previously approved these changes Aug 25, 2023
@mangalaman93
Copy link
Contributor

@EnricoMi Could you look at the CI failures?

Also, I only have an incomplete version of this PR #8977. If you like, you could take over a complete it. It should be a very similar PR.

@EnricoMi
Copy link
Author

@mangalaman93 happy to finish #8977 once I understand the notion of having multiple sg.uidMatrix.

@EnricoMi
Copy link
Author

Tests should all pass now, will look into docs PR next.

@mangalaman93
Copy link
Contributor

@EnricoMi upgrade tests are failing but don't worry about those. They are failing because the older versions of Dgraph do not support "every" keyword. I will fix those.

}

// Re-merge the UID matrix.
sg.DestUIDs = algo.MergeSorted(sg.uidMatrix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also write unit test for applyEvery function?
I think you need to first MergeSort the entire uidMatrix and then take every ith element? Right now I don't think we will get the correct result.

@@ -988,6 +989,142 @@ func TestCascadeSubQuery1(t *testing.T) {
}`, js)
}

func TestHasEvery(t *testing.T) {
query := `{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this line as the first line of the test in each new test that you have added. This should fix the upgrade tests. Thanks

dgraphtest.ShouldSkipTest(t, "071c85b59fdf80ade88f4571e9cf8c867fae1f6a", dc.GetVersion())

@mangalaman93
Copy link
Contributor

@EnricoMi one more question, what is the use case of "every" word? Why can't we just use "random"?

@EnricoMi
Copy link
Author

@mangalaman93 the use case is to partition the result set (the entire graph) into equi-sized partitions with one query to get boundary uids that are then used with after to fetch results in parallel: https://discuss.dgraph.io/t/sample-your-result/9055/1 With the random key the size of partitions would not be as useful.

Using offset is not an option as it has O(N) time complexity, where after has constant time: https://discuss.dgraph.io/t/pagination-with-offset-should-scale-like-after/8577

Looking at the code, the every keyword seems of O(N) complexity as well. Maybe the sampling should happen remotely in worker.ProcessTaskOverNetwork to retrieve only the sample of uids (e.g. every 1,000,000th uid):

result, err := worker.ProcessTaskOverNetwork(ctx, taskQuery)

@mangalaman93
Copy link
Contributor

Looking at the code, the every keyword seems of O(N) complexity as well. Maybe the sampling should happen remotely in worker.ProcessTaskOverNetwork to retrieve only the sample of uids (e.g. every 1,000,000th uid):

that does seem like an interesting idea. I just wonder whether it is feasible to do given that results may get filtered out later sometimes. If you want, you can look into it. Thanks for the explanation for adding 'every' to the query language.

@EnricoMi
Copy link
Author

EnricoMi commented Sep 5, 2023

that does seem like an interesting idea

I may need some pointer to the code where that worker.ProcessTaskOverNetwork is being executed.

@EnricoMi
Copy link
Author

@mangalaman93 ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core internal mechanisms area/querylang Issues related to the query language specification and implementation. area/testing Testing related issues go Pull requests that update Go code
Development

Successfully merging this pull request may close these issues.

None yet

4 participants