New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(neo4j): improve neo4j query performance by using node labels #10415
base: master
Are you sure you want to change the base?
perf(neo4j): improve neo4j query performance by using node labels #10415
Conversation
@david-leifker @RyanHolstien |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are broken tests with ./gradlew :metadata-io:test
@RyanHolstien @david-leifker Could you please approve it as I had fixed the changes. It was a formatting issue and hence the build was failing and so are the unit-tests. I fixed it now and would need your approval. |
|
@pashashaik-mms , as mentions here the below error is occurring because in testcases here, "sourceEntityFilter" is passed as null and because of that the variable "srcNodeLabel" in method "findRelatedEntities" is not setting having a default value of blank, which results in the below query which is not correct.
Please also handle the case where variable "sourceEntityFilter" in method "findRelatedEntities" can be null or empty
|
FIXED NOW |
@deepgarg-visa fixed now. Could you please check and approve it. Its been waiting a while now. |
@pashashaik-mms are all metadata-io testcase passed ?
|
@deepgarg-visa I handled your scenario as well. Now the tests are running fine. Could you please check the same. removeEdgesFromNode() might be in balance. |
@@ -648,18 +666,34 @@ public void removeEdgesFromNode( | |||
|
|||
// build node label from entity type | |||
final String srcNodeLabel = urn.getEntityType(); | |||
String matchTemplate = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code can be refactored as below:
`final RelationshipDirection relationshipDirection = relationshipFilter.getDirection();
final String srcNodeLabel = urn.getEntityType();
String matchTemplate = "";
matchTemplate =
String.format(
"MATCH (src {urn: $urn})-[r%s]-(dest) RETURN type(r), dest, 2", srcNodeLabel);
if (relationshipDirection == RelationshipDirection.INCOMING) {
matchTemplate =
String.format(
"MATCH (src {urn: $urn})<-[r%s]-(dest) RETURN type(r), dest, 0", srcNodeLabel);
} else if (relationshipDirection == RelationshipDirection.OUTGOING) {
matchTemplate =
String.format(
"MATCH (src {urn: $urn})-[r%s]->(dest) RETURN type(r), dest, 1", srcNodeLabel);
}
if (srcNodeLabel != null && !srcNodeLabel.isEmpty()) {
matchTemplate =
String.format(
"MATCH (src:%s {urn: $urn})-[r%s]-(dest) RETURN type(r), dest, 2", srcNodeLabel);
if (relationshipDirection == RelationshipDirection.INCOMING) {
matchTemplate =
String.format(
"MATCH (src:%s {urn: $urn})<-[r%s]-(dest) RETURN type(r), dest, 0", srcNodeLabel);
} else if (relationshipDirection == RelationshipDirection.OUTGOING) {
matchTemplate =
String.format(
"MATCH (src:%s {urn: $urn})-[r%s]->(dest) RETURN type(r), dest, 1", srcNodeLabel);
}
}`
PR created and contributed by: MediamarktSaturn Technology GmbH, Analytics-Services Team. Special thanks to @raudzis for the finding and idea proposed.
PR Introduction:
This PR introduces an optimization to the Neo4j querying process within our Datahub project. Previously, our Neo4j queries did not specify node labels during the match phase, which resulted in scanning all nodes in the database. This approach was inefficient, especially for large datasets. By integrating dynamic node labels into our match queries, we significantly improve query performance by leveraging Neo4j's ability to use indexes more effectively.
Node Label Integration: Modified the Neo4j queries wherever applicable and now, the query explicitly targets nodes with the specified label, reducing the search space and improving performance.
Performance: By applying node labels directly in our match clauses, the database engine can optimize node lookups using existing indexes, thus speeding up the query execution by reducing the number of nodes scanned.
Scalability: These improvements make our database queries more scalable, handling larger datasets more efficiently.
Maintainability: This change also enhances the clarity of our queries, making them more understandable at a glance, which benefits new contributors and maintainers alike.
Checklist
[ ] Links to related issues (if applicable)[ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.[ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub