Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fn is_inversed, equivalent to typeid(e) < 0 #5692

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

wey-gu
Copy link
Contributor

@wey-gu wey-gu commented Aug 26, 2023

this is handy to check if an edge is scanned
from its dst end or not.

What type of PR is this?

  • bug
  • feature
  • enhancement

What problem(s) does this PR solve?

Issue(s) number:

Description:

The same edge could be treated as two because we persist one instance of edge in two places, see:

(root@nebula) [basketballplayer]> 
MATCH (v1:player{name:"Tim Duncan"})-[e]-(v2:player{name:"Tony Parker"}) 
RETURN e
+----------------------------------------------------+
| e                                                  |
+----------------------------------------------------+
| [:follow "player100"->"player101" @0 {degree: 95}] |
| [:follow "player101"->"player100" @0 {degree: 95}] |
+----------------------------------------------------+
Got 2 rows (time spent 5.819ms/105.497167ms)

With is_inversed, we could detect those edges that are not expected in many cases.

How do you solve it?

Just yet another function returns bool: typeid(e) < 0.

Special notes for your reviewer, ex. impact of this fix, design document, etc:

Document is needed.

Checklist:

Tests:

  • Unit test(positive and negative cases)
  • Function test
  • Performance test
  • N/A

Affects:

  • Documentation affected (Please add the label if documentation needs to be modified.)
  • Incompatibility (If it breaks the compatibility, please describe it and add the label.)
  • If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
  • Performance impacted: Consumes more CPU/Memory

Release notes:

Please confirm whether to be reflected in release notes and how to describe:

ex. Fixed the bug .....

@wey-gu wey-gu added the ready-for-testing PR: ready for the CI test label Aug 26, 2023
@wey-gu
Copy link
Contributor Author

wey-gu commented Aug 26, 2023

@yixinglu , any chance we could bypass this lint error, please?

.linters/cpp/cpplint.py src/common/function/FunctionManager.cpp 2>&1 | grep error
src/common/function/FunctionManager.cpp:2047:  Small and focused functions are preferred: FunctionManager::FunctionManager() has 1521 non-comment lines (error triggered by exceeding 500 lines).  [readability/fn_size] [2]

@yixinglu
Copy link
Contributor

@yixinglu , any chance we could bypass this lint error, please?

.linters/cpp/cpplint.py src/common/function/FunctionManager.cpp 2>&1 | grep error
src/common/function/FunctionManager.cpp:2047:  Small and focused functions are preferred: FunctionManager::FunctionManager() has 1521 non-comment lines (error triggered by exceeding 500 lines).  [readability/fn_size] [2]

Maybe you could bypass it by splitting the function into some small functions, or disabling the 'readability/fn_size' rule in linters/cpp/cpplint.py

@wey-gu wey-gu force-pushed the fn_is_inversed branch 6 times, most recently from 0e780fa to 8e56f27 Compare September 21, 2023 10:32
this is handy to check if an edge is scanned
from its dst end or not.
split the huge function
@wey-gu wey-gu added the doc affected PR: improvements or additions to documentation label Sep 22, 2023
yixinglu
yixinglu previously approved these changes Sep 22, 2023
Copy link
Contributor

@yixinglu yixinglu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wey-gu
Copy link
Contributor Author

wey-gu commented Sep 22, 2023

The failure now seems to be related to the instability of the test cases themself.

yixinglu
yixinglu previously approved these changes Sep 22, 2023
Copy link
Contributor

@czpmango czpmango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, the edge type id is an implementation level concept and cannot be used to simply indicate query semantics.

@czpmango
Copy link
Contributor

A possible approach is to use both the node id and the edge type id to determine the edge direction. The syntax might look like this:

MATCH (v1:A)-[e]-(v2:B) RETURN v1, is_source_of(v1, e) AS edge_direction, v2

@czpmango
Copy link
Contributor

Minor suggestion: It is better to commit the refactoring-related codes to another pr, which is also more review friendly.

@wey-gu
Copy link
Contributor Author

wey-gu commented Sep 27, 2023

IMHO, the edge type id is an implementation level concept and cannot be used to simply indicate query semantics.

A possible approach is to use both the node id and the edge type id to determine the edge direction. The syntax might look like this:

MATCH (v1:A)-[e]-(v2:B) RETURN v1, is_source_of(v1, e) AS edge_direction, v2

Yeah, agreed, mixing thing here is a little bit twisted, the way we treated e in MATCH (v1:player{name:"Tim Duncan"})-[e]-(v2:player{name:"Tony Parker"}) RETURN e as two instances, it's kind of implementation exposure twisted with graph semantics, too? So we need such functions as a mitigation on top of that implementation?

is_source_of(v, e) is better and is a more scrupulous semantic definition, but sometimes we just don't have the v in the pattern to be referred :(.

Maybe we could add both is_source_of(v_or_vid, e) and this function?

Maybe we come out with a name that's more scrupulous than is_inversed?

Minor suggestion: It is better to commit the refactoring-related codes to another pr, which is also more review-friendly.

Agreed! I'll create a separate PR for the refactor to be merged before this change!

@czpmango
Copy link
Contributor

czpmango commented Sep 27, 2023

is_source_of(v, e) is better and is a more scrupulous semantic definition, but sometimes we just don't have the v in the pattern to be referred :(.
Maybe we could add both is_source_of(v_or_vid, e) and this function?
Maybe we come out with a name that's more scrupulous than is_inversed?

I dont think you can judge the edge direction simply by the sign of edge type id.

@wey-gu
Copy link
Contributor Author

wey-gu commented Sep 27, 2023

is_source_of(v, e) is better and is a more scrupulous semantic definition, but sometimes we just don't have the v in the pattern to be referred :(.
Maybe we could add both is_source_of(v_or_vid, e) and this function?
Maybe we come out with a name that's more scrupulous than is_inversed?

I dont think you can judge the edge direction simply by the sign of edge type id.

This is how I may need this sugar, here

WITH map{`true`: "-[", `false`: "<-["} AS arrow_l,
     map{`true`: "]->", `false`: "]-"} AS arrow_r
MATCH (s)-[e:follow*..2]-() WHERE id(s) IN ["player100", "player101"]
  WITH id(s) AS subj, [rel in e | [
     arrow_l[tostring(typeid(rel) > 0)] +
        tostring(rel.degree)+
     arrow_r[tostring(typeid(rel) > 0)],
     CASE typeid(rel) > 0
        WHEN true THEN dst(rel)
        WHEN false THEN src(rel)
     END
     ]
  ] AS rels
  WITH
      subj,
      REDUCE(acc = collect(NULL), l in rels | acc + l) AS flattened_rels
RETURN
  subj,
  REDUCE(acc = subj,l in flattened_rels|acc + ', ' + l) AS flattened_rels

Where do I need to construct a knowledge sequence from path/edges, without getting the sign of typeid, there were some bad cases.

@czpmango
Copy link
Contributor

This is how I may need this sugar, here

WITH map{`true`: "-[", `false`: "<-["} AS arrow_l,
     map{`true`: "]->", `false`: "]-"} AS arrow_r
MATCH (s)-[e:follow*..2]-() WHERE id(s) IN ["player100", "player101"]
  WITH id(s) AS subj, [rel in e | [
     arrow_l[tostring(typeid(rel) > 0)] +
        tostring(rel.degree)+
     arrow_r[tostring(typeid(rel) > 0)],
     CASE typeid(rel) > 0
        WHEN true THEN dst(rel)
        WHEN false THEN src(rel)
     END
     ]
  ] AS rels
  WITH
      subj,
      REDUCE(acc = collect(NULL), l in rels | acc + l) AS flattened_rels
RETURN
  subj,
  REDUCE(acc = subj,l in flattened_rels|acc + ', ' + l) AS flattened_rels

Where do I need to construct a knowledge sequence from path/edges, without getting the sign of typeid, there were some bad cases.

How about that(probably be better performance):

MATCH p=(s)-[e:follow*..2]-() WHERE id(s) IN ["player100", "player101"]
RETURN id(s) AS subj, pathToString(p)
std::string pathToString(Path p, std::vector<std::string> propNames={}) {
   ...
}

@wey-gu
Copy link
Contributor Author

wey-gu commented Sep 27, 2023

This is how I may need this sugar, here

WITH map{`true`: "-[", `false`: "<-["} AS arrow_l,
     map{`true`: "]->", `false`: "]-"} AS arrow_r
MATCH (s)-[e:follow*..2]-() WHERE id(s) IN ["player100", "player101"]
  WITH id(s) AS subj, [rel in e | [
     arrow_l[tostring(typeid(rel) > 0)] +
        tostring(rel.degree)+
     arrow_r[tostring(typeid(rel) > 0)],
     CASE typeid(rel) > 0
        WHEN true THEN dst(rel)
        WHEN false THEN src(rel)
     END
     ]
  ] AS rels
  WITH
      subj,
      REDUCE(acc = collect(NULL), l in rels | acc + l) AS flattened_rels
RETURN
  subj,
  REDUCE(acc = subj,l in flattened_rels|acc + ', ' + l) AS flattened_rels

Where do I need to construct a knowledge sequence from path/edges, without getting the sign of typeid, there were some bad cases.

How about that(probably be better performance):

MATCH p=(s)-[e:follow*..2]-() WHERE id(s) IN ["player100", "player101"]
RETURN id(s) AS subj, pathToString(p)
std::string pathToString(Path p, std::vector<std::string> propNames={}) {
   ...
}

make sense!

@wey-gu
Copy link
Contributor Author

wey-gu commented Dec 11, 2023

@czpmango I think I need to implement pathToString! But for now, I still consider is_invsersed helpful, it's a self-explanation for our such behavior as again hit by #5779 :-D What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc affected PR: improvements or additions to documentation ready-for-testing PR: ready for the CI test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants