Support Nebula Graph #48

wey-gu · 2022-05-13T07:50:19Z

Add backend support of Nebula Graph, an Open Source, distributed Graph Database.

Issue in Amundsen Repo: RFC/Feature: Nebula Graph as Backend Storage amundsen#1816
PR in Amundsen Repo: feat: Introduce Nebula Metadata Proxy and Databuilder amundsen#1817

Signed-off-by: wey-gu <weyl.gu@gmail.com>

mgorsk1 · 2022-05-16T18:04:07Z

Thanks for this RFC @wey-gu

After reading through this I got impression that this proposal is similar to adding RDS or Atlas integration. These have their pluses but serious downside is keeping feature coverage aligned across different backends which has not been great so far.

From what I've got from this + reading through Amundsen PR of actual implementation maybe there is opportunity here that would just mean refactor of neo4j code which would enable us to use same components (but differently configured) with both neo4j and nebula (as they both speak opencypher)? This way nebula integration would always mean exact feature coverage as neo4j and be a nice oss alternative to it.

wey-gu · 2022-05-17T10:22:50Z

Thanks @mgorsk1 for your time to look into the proposal!

Indeed, I also had seen yet another backend storage increases the burden introducing new features during the implementation of the reference PR for the proposal, and I just told myself to keep eye on all PRs after it's merged and lift it from my own efforts then.

While, as you pointed, it doesn't scale at all, and it in big chance is a good opportunity to make cypher-based backend with some level of abstractions to share codes when possible.

I will take this context and purpose in mind and see what could be done on the refactor.

There are some challenges that nebula only support OpenCypher as a dialect and reusing query string itself isn't directly possible(see here), while the mindset to per each read functions are similar, thus, find a way to decouple cypher-speaking DB implementation from code to configurations looks possible(and worth it).

Thanks.

wey-gu · 2022-05-24T11:40:24Z

Dear @mgorsk1,

After I revisit everything, I think the refactor to best reuse cypher based GDB code is to break the functions in a CypherAbstract proxy class into (one or more pairs)steps of: "querying GDB" and "postprocessing results", with the query_string/template being placed in a separate python file(default as the current neo4j) as variables of the function.

I put some details as follow. What do you think, please? If it's good to go, I could then prepare for a PoC implementation.

Thanks!

Also a appendix in the end on the main differences between nebula and neo4j regarding cypher read queries and driver

CypherAbstract Proxy

Neo4j Proxy and Nebula Proxy will inheritance it, with their own drivers, and other unique things

Something will be different compared to those in the current Neo4j proxy:

Read Query String/Template will be a variable of functions to enable more cypher-dialect-speaking backend proxy
- we will put cypher query templates in separate files
- we will put current neo4j cypher queries as default, with some format changes to make it easier to be lifted to nebula(with a translator ideally), when the translator isn't enough, override of the query string could be done, too
Break ReadQuery functions into multiple steps to enable more reuse/override of different proxies
- step of executinge query
  - could be simply one or more self._execute_cypher_query() or some reusable self._exec_col_query() ones
- step of postprocessing results
  - this new function enables the most of the execution query parts to be reused

appendix

Major diffs for Query String:

key in properties vs. only in WHERE id() clause
key: $resource_key --> where id(n) == "foo"

- MATCH (f:foo {key: "foo_100"}) RETURN f
+ MATCH (f:foo) WHERE id(f) == "foo_100" RETURN f

RETURN/WITH key vs id()
foo.key --> id(foo)

- MATCH (f:foo) RETURN f.key
+ MATCH (f:foo) RETURN id(f)

The equal sign in WHERE Clause
a = b --> a == b

- MATCH (f:foo) WHERE 1 = 1 RETURN f
+ MATCH (f:foo) WHERE 1 == 1 RETURN f

Prop for vertex need to explicitly provide tag/label name
table.name --> table.Table.name

- MATCH (f:foo) RETURN f.name
+ MATCH (f:foo) RETURN f.foo.name

There are keywords to be escaped with "`"

- MATCH (user:User) RETURN user
+ MATCH (`user`:`User`) RETURN `user`

Major diffs in Result

single() is not supported in Nebula
record['key_name'] vs recordget_value_by_key('key_name')
record['col']['col_type'] vs record.get_value_by_key('col').as_node().properties("Column")['col_type'] or record['row'][i_col]['Column.col_type'] *
- * note nebula supports execute() and execute_json(), in PoC PR, execute_json() was used, which may not be a good idea

wey-gu · 2022-06-07T02:38:20Z

Dear @mgorsk1, what do you think, please?

PS. in case the refactoring is not feasible, I will try my best to maintain the nebula backend in my free time, I really like Amundsen and am willing to contribute more and more.

mgorsk1 · 2022-06-08T08:04:38Z

Hey @wey-gu apologies for late reply & thanks for very detailed and thorough explanation. It's very much appreciated to have such eager community member 🙇

I do think it'd be really useful to have Nebula as an alternative especially given that lot's of goodies in neo4j is only available in commercial version.

As for generalizing more it makes sense to have abstract cypher proxy with separate implementations taking into account these subtle differences. Maintenance of metadata proxy indeed should not be that much of a hassle - what's much more important in my view is we can reuse GraphSerializable models for Nebula. Extractors, Publishers & Loaders are usually one-time effort to prepare (unless you see a way to also reuse those). The pain starts when you need to catch up to GraphSerializable.

I would still love some input from @feng-tao @dkunitsk or anyone who was involved in original neo4j implementation to see if this level of refactoring would be acceptable.

Golodhros · 2022-12-08T19:13:49Z

Hey @wey-gu, what is the cost of using this database?
If we agree to have this, would you be open to implement this?

wey-gu · 2022-12-09T02:18:18Z

Dear @Golodhros

Thanks❤!

what is the cost of using this database?

I think the costs(to users, if we are talking about this) are:

NebulaGraph is schema-ful(trade-off between flexibility & performance), so the mindset should be changed for using switching from neo4j, while this could be covered by the data-builder of NebulaGraph(I had implemented that on checking schema and making needed changes in the PoC, which could be refined though)
NebulaGraph is distributed, and there are costs here, as well, it's not as lightweight as neo4j(single process), but we've got HA, better perf/throughput, etc.
There is no cypher APOC equivalent implementation in NebulaGraph for now(there is a downstream user willing to upstream similar capabilities in near future though)
NebulaGraph is a relatively young project, but the community is quite active and it's getting gradually maturer. And it's already trusted by quite some of the teams

If we agree to have this, would you be open to implement this?

Thanks so much, I'll be happy/honored to implement this.

BR//Wey

Golodhros · 2022-12-09T16:51:28Z

What do you think @feng-tao, @kristenarmes and @allisonsuarez ?

wey-gu requested a review from a team as a code owner May 13, 2022 07:50

wey-gu force-pushed the nebula branch from 328e923 to 43b3a14 Compare May 13, 2022 07:51

wey-gu mentioned this pull request May 13, 2022

RFC/Feature: Nebula Graph as Backend Storage amundsen-io/amundsen#1816

Closed

wey-gu force-pushed the nebula branch from 43b3a14 to 12c8b0e Compare May 13, 2022 08:26

Support Nebula Graph

fb25656

Signed-off-by: wey-gu <weyl.gu@gmail.com>

wey-gu force-pushed the nebula branch from 12c8b0e to fb25656 Compare May 13, 2022 13:12

Golodhros added the Status: Draft label Dec 8, 2022

wey-gu mentioned this pull request Feb 7, 2023

feat: Introduce Nebula Metadata Proxy and Databuilder amundsen-io/amundsen#1817

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Nebula Graph #48

Support Nebula Graph #48

wey-gu commented May 13, 2022 •

edited

mgorsk1 commented May 16, 2022 •

edited

wey-gu commented May 17, 2022

wey-gu commented May 24, 2022

wey-gu commented Jun 7, 2022

mgorsk1 commented Jun 8, 2022 •

edited

Golodhros commented Dec 8, 2022

wey-gu commented Dec 9, 2022 •

edited

Golodhros commented Dec 9, 2022

Support Nebula Graph #48

Are you sure you want to change the base?

Support Nebula Graph #48

Conversation

wey-gu commented May 13, 2022 • edited

mgorsk1 commented May 16, 2022 • edited

wey-gu commented May 17, 2022

wey-gu commented May 24, 2022

CypherAbstract Proxy

appendix

wey-gu commented Jun 7, 2022

mgorsk1 commented Jun 8, 2022 • edited

Golodhros commented Dec 8, 2022

wey-gu commented Dec 9, 2022 • edited

Golodhros commented Dec 9, 2022

wey-gu commented May 13, 2022 •

edited

mgorsk1 commented May 16, 2022 •

edited

mgorsk1 commented Jun 8, 2022 •

edited

wey-gu commented Dec 9, 2022 •

edited