Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[How to use regexp in @filter]: <regexp use> #8996

Open
WSC998 opened this issue Sep 9, 2023 · 3 comments
Open

[How to use regexp in @filter]: <regexp use> #8996

WSC998 opened this issue Sep 9, 2023 · 3 comments
Labels
kind/question Something requiring a response.

Comments

@WSC998
Copy link

WSC998 commented Sep 9, 2023

Question.

I had about 50million nodes,if I use ‘regexp’ like “{
resources(func:regexp(name, /abc/i) )@filter(eq(workspace_key, "def")) {
id
name
resource_key
}
}”,
it will be fast with in 1 second,however,
if I use it like "{
resources(func: eq(workspace_key, "def"))@filter(regexp(name, /abc/i)) {
id
name
resource_key
}
}",
it will be useless and overtime. Most of the time I have to spell out a complex logical combination, so it's impossible to put a regexp in "func" but filter.
So I really wanna know whether the index is invalid when I use 'regexp' in @filter
And if somebody has a good idea to solve this problem?

@WSC998 WSC998 added the kind/question Something requiring a response. label Sep 9, 2023
@mangalaman93
Copy link
Contributor

We don't have a query planner yet, Dgraph may not do a good job in figuring out in what order to execute the filters. Is it possible for you to share the data? It may still be worth looking into this.

@WSC998
Copy link
Author

WSC998 commented Sep 11, 2023

`package main

import (
"context"
"encoding/json"
"fmt"
"github.com/dgraph-io/dgo"
"github.com/dgraph-io/dgo/protos/api"
"google.golang.org/grpc"
"strconv"
"sync"
)

func nodeExists(dg *dgo.Dgraph, id string) (bool, error) {
query := fmt.Sprintf({ data(func: eq(id, "%s")) { uid } }, id)
resp, err := dg.NewTxn().Query(context.Background(), query)
if err != nil {
return false, err
}
var result struct {
Data []struct {
UID string json:"uid"
} json:"data"
}
if err := json.Unmarshal(resp.Json, &result); err != nil {
return false, err
}
return len(result.Data) > 0, nil
}

func saveNode(dg *dgo.Dgraph, j int) error {
id := "test_text" + strconv.Itoa(j)
name := name + id
node := map[string]interface{}{
"name": name,
"workspace_key": "_xGWV7",
"create_by": "xxw" + strconv.Itoa(j),
"qa": "wws",
"id": id,
}

ctx := context.Background()

nodeJSON, err := json.Marshal(node)
if err != nil {
    return err
}

mutation := &api.Mutation{
    CommitNow: true,

    SetJson: nodeJSON,
}

_, err = dg.NewTxn().Mutate(ctx, mutation)
if err != nil {
    return err
}
fmt.Printf("Created  node %s \n", id)
return nil

}

func saveRelation(dg *dgo.Dgraph, srcID, dstID string) error {
addRelationQuery := { left as var(func: eq(id, "%s")) right as var(func: eq(id, "%s")) }
addQuads := fmt.Sprintf(uid(left) <%s> uid(right) . , "test_parent_to_child")
addReq := &api.Request{
CommitNow: true,
Query: fmt.Sprintf(addRelationQuery, srcID, dstID),
Mutations: []*api.Mutation{
&api.Mutation{
SetNquads: []byte(addQuads),
},
},
}
_, err := dg.NewTxn().Do(context.Background(), addReq)
if err != nil {
return err
}

return nil

}

func main() {
conn, err := grpc.Dial("127.0.0.1:9080", grpc.WithInsecure())
if err != nil {
fmt.Println("Error connecting to Dgraph server:", err)
return
}
defer conn.Close()

dg := dgo.NewDgraphClient(api.NewDgraphClient(conn))

concurrency := 100
totalNodes := 1000000
startIndex := 0
nodeBatchSize := (totalNodes - startIndex) / concurrency

var wg sync.WaitGroup
wg.Add(concurrency)

for i := 0; i < concurrency; i++ {
    go func(workerID int) {
        defer wg.Done()

        start := startIndex + workerID*nodeBatchSize
        end := start + nodeBatchSize
        if workerID == concurrency-1 {
            end = totalNodes
        }

        for j := start; j < end; j++ {
            //id := "wsc_test_text" + strconv.Itoa(j)
            //if isExist, _ := nodeExists(dg, id); !isExist {
            err = saveNode(dg, j)
            if err != nil {
                fmt.Println("Error saving node:", err)
                return
            }
            //}
        }
    }(i)
}
//
wg.Wait()
fmt.Println("Node creation completed!")

}
`
It's my easy demo to create data, you can change "totalNodes " to decide how many nodes you want to generate.
And I set predicate "workspace_key" with "exact" and others set "trigram" and "term".

@damonfeldman
Copy link

Definitely a problem. Regexp in filter should use the index. See also this discuss thread for same issue: https://discuss.dgraph.io/t/how-to-use-regexp-in-filter/18901/8 .

My guess is that when regexp was allowed in the filter, someone mistakenly or naively thought it would not need to be indexed. So it's a feature that you can post-filter with regexp, but misleading that it does not use the filter. I suspect GraphQL will build this underlying (non-optimized) DQL structure under the hood, making this more impactful for those users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Something requiring a response.
Development

No branches or pull requests

3 participants