Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed index not used for equality checks if not all included property keys are used #3977

Open
FlorianHockmann opened this issue Sep 13, 2023 · 0 comments

Comments

@FlorianHockmann
Copy link
Member

  • Version: 0.6.3 / master (tried with the Docker image tag 1.0.0-20230910-132431.0663c29)
  • Storage Backend: Berkeley
  • Mixed Index Backend: Lucene
  • Link to discussed bug: Could not find a suitable index to answer graph query and graph scans are disabled #3973
  • Expected Behavior: The mixed index should be used.
  • Current Behavior: Mixed index is not used.
  • Steps to Reproduce: Create a mixed index with 2 property keys and try to query with just one, without using any predicates.

Detailed steps for reproduction

Create a minimal schema with two property keys and a mixed index which uses both property keys:

gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@7dfa0a09
gremlin> mgmt.makePropertyKey('name').dataType(String.class).make()
==>name
gremlin> mgmt.makePropertyKey('age').dataType(Integer.class).make()
==>age
gremlin> mgmt.buildIndex('nameAndAge', Vertex.class).addKey(mgmt.getPropertyKey('name')).addKey(mgmt.getPropertyKey('age')).buildMixedIndex('search')
==>nameAndAge
gremlin> mgmt.commit()
==>null

Verify that the schema was successfully created (especially that the index is ENABLED):

gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@4a019f70
gremlin> mgmt.printSchema()
==>------------------------------------------------------------------------------------------------
Vertex Label Name              | Partitioned | Static                                             |
---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
Edge Label Name                | Directed    | Unidirected | Multiplicity                         |
---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
Property Key Name              | Cardinality | Data Type                                          |
---------------------------------------------------------------------------------------------------
name                           | SINGLE      | class java.lang.String                             |
age                            | SINGLE      | class java.lang.Integer                            |
---------------------------------------------------------------------------------------------------
Graph Index (Vertex)           | Type        | Unique    | Backing        | Key:           Status |
---------------------------------------------------------------------------------------------------
nameAndAge                     | Mixed       | false     | search         | name:         ENABLED |
                               |             |           |                | age:          ENABLED |
---------------------------------------------------------------------------------------------------
Graph Index (Edge)             | Type        | Unique    | Backing        | Key:           Status |
---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
Relation Index (VCI)           | Type        | Direction | Sort Key       | Order    |     Status |
---------------------------------------------------------------------------------------------------

Search for vertices by using just one of the two property keys, without using any search predicates (so an equality check):

gremlin> g.V().has('name', 'test').profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[name.eq(test)])                                                             4.068   100.00
  constructGraphCentricQuery                                                                   0.175
  constructGraphCentricQuery                                                                   0.061
  GraphCentricQuery                                                                            3.666
    \_condition=(name = test)
    \_orders=[]
    \_isFitted=false
    \_isOrdered=true
    \_query=[]
    scan                                                                                       3.181
    \_query=[]
    \_fullscan=true
    \_condition=VERTEX
                                            >TOTAL                     -           -           4.068        -

As can be seen, this does not use an index and instead performs a full graph scan.

However, it works if both property keys are used:

gremlin> g.V().has('name', 'test').has('age', 12).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[name.eq(test), age.eq(12)])                                                41.993   100.00
  constructGraphCentricQuery                                                                   3.957
  GraphCentricQuery                                                                           32.800
    \_condition=(name = test AND age = 12)
    \_orders=[]
    \_isFitted=false
    \_isOrdered=true
    \_query=[(age = 12)]:nameAndAge
    \_index=nameAndAge
    \_index_impl=search
    backend-query                                                      0                      32.410
    \_query=nameAndAge:[(age = 12)]:nameAndAge
                                            >TOTAL                     -           -          41.993        -

or if a search predicate like textContains is used:

gremlin> g.V().has('name',textContains('test')).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[name.textContains(test)])                                                  17.027   100.00
  constructGraphCentricQuery                                                                   0.272
  GraphCentricQuery                                                                           16.520
    \_condition=(name textContains test)
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=[(name textContains test)]:nameAndAge
    \_index=nameAndAge
    \_index_impl=search
    backend-query                                                      0                      16.433
    \_query=nameAndAge:[(name textContains test)]:nameAndAge
                                            >TOTAL                     -           -          17.027        -

This seems to be a bug as the Mixed Index docs mention that a) any combination of index property keys can be used and that b) equality checks are also supported:

Mixed indexes retrieve vertices or edges by any combination of previously added property keys. Mixed indexes provide more flexibility than composite indexes and support additional condition predicates beyond equality. On the other hand, mixed indexes are slower for most equality queries than composite indexes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant