Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling multiple labels where class inheritance is not appropriate #737

Open
alanbuxton opened this issue Aug 21, 2023 · 10 comments
Open

Comments

@alanbuxton
Copy link

This was initially raised as a question in the community forum: https://community.neo4j.com/t/how-to-handle-3-or-more-labels-in-neomodel/63767

Full details below:

I've got a Neo4J database where a node can have multiple labels.

As a toy example, imagine these nodes:

CREATE 
(a:Parent {name: "Foo"}), 
(b:Child1:Parent {name: "Bar"}), 
(c:Child2:Parent {name:"Baz"}), 
(d:Blue:Child1:Parent {name:"Boz"}), 
(e:Orange:Child1:Parent {name:"Qwe"}), 
(f:Blue:Child2:Parent {name:"Asd"}), 
(g:Orange:Child2:Parent {name: "Zxc"})

In neomodel I can define the following classes:

class Parent(StructuredNode):
    name = StringProperty()

class Child1(Parent):
    pass

class Child2(Parent):
    pass

Using this I can do things like:

Child1.nodes.filter(name="Bar")

and get a result because the only matching node happens to have Child1 and Parent1 labels. But if I do Child1.nodes.all() the results also include nodes that are labelled with Blue and Orange so I get the following error:

neomodel.exceptions.NodeClassNotDefined: Node with labels Parent,Blue,Child1 does not resolve to any of the known objects
Parent --> <class '__main__.Parent'>
Parent,Child1 --> <class '__main__.Child1'>
Parent,Child2 --> <class '__main__.Child2'>

I can't simply do:

class BlueChild1(Child1, Parent, Blue):
    pass 

class OrangeChild1(Child1, Parent, Orange):
    pass

etc because in these cases I'd end up with a requirement for BlueChild1 and OrangeChild1 as labels, which is not what I want in the database.

This may sound like a contrived example but I'm seeing it in a real use case where a node could be an Organization or a Person or both (or even have some other labels), but it's not the case that Person or Organization inherit from each other.

I investigated using __optional_labels__ but this is still problematic because the class name is always created as a label.

@alanbuxton
Copy link
Author

I've implemented a solution that works for my use case here: alanbuxton@cb51ba0

Happy to open a PR if you think helpful for the project

@mariusconjeaud
Copy link
Collaborator

This is interesting. I'll have to dive in more to fully understand the difference between this and using __abstract_node__ first, but otherwise it can be interesting I think.

On the other hand, my worry would be that it can get confusing between abstract_node, optional_labels, inheritance ?

@alanbuxton
Copy link
Author

If there is a neater solution using __abstract_node__ then that would be super cool, but my issue was in this kind of multiple inheritance situation which didn't seem a good fit.

@aanastasiou
Copy link
Collaborator

@alanbuxton A key (low level) detail in neomodel is that the labels are used to determine the class that a Node is "paired" with.

Consequently, the labels are determined by the modelling choices. Neomodel returns the correct set of results because given the class hierarchy you described, Blue and Orange ARE ChildX objects too (where X \in {1,2})

I would like to suggest two things:

  1. Can you please post a more realistic example of what corresponds to what given my explanation above? You mention "a node can have mulitple labels". Indeed, in Neomodel a node can have multiple labels but the labels have a specific meaning. Is your inheritance example mirroring how you have modelled your domain exactly or how you would like the labels to work?

  2. Specifically to your problem, I would suggest that you first determine which are the labels that should be used to determine the instantiated class (that is, the set of labels that uniquely identify an entity in your model) and which labels are secondary (that is, they exist for convenience or "speed" or whatever other need they cover in their particular application). You can then begin to manage, which classes would need renaming within your model and which classes would need additional labels defined using one of the available mechanisms.

    1. If your example reflects your modelling hierarchy, that is, if you expect Blue to inherit methods from both Parent::Child1 and Parent::Child2 then it would be interesting to see how we could work from both sides to find a "best practice" for this challenge within Neomodel. But before we start thinking about alternative object resolution algorithms, let's first make sure we understand what we are dealing with.

@alanbuxton
Copy link
Author

The data I've got in Neo4J was loaded from RDF via neosemantics. This tool gives all nodes a "Resource" label and can then create additional labels depending on their RDF "type". Each unique entity can have multiple types in RDF, and so can get multiple labels when loaded into Neo4J.

In some rare cases, a node is treated as both a Person and an Organization. In some others I've got labels for both a SiteAddedActivity (meaning an organization opened in a new geographical location) and a ProductAddedActivity (meaning an organization has launched a new product).

Here is a more realistic example showing the combined labels.

from neomodel import db, StructuredNode, StringProperty

cypher_creation = '''CREATE
(a:Resource:Person {name:"foo"}),
(b:Resource:Organization {name:"bar"}),
(c:Resource:Person:Organization {name:"baz"}),
(d:Resource:SiteAddedActivity {name:"qwe"}),
(e:Resource:ProductAddedActivity {name:"asd"}),
(f:Resource:ProductAddedActivity:SiteAddedActivity {name: "zxc"})
'''

class Resource(StructuredNode):
    name = StringProperty()

class Person(Resource):
    pass

class Organization(Resource):
    pass

class SiteAddedActivity(Resource):
    pass

class ProductAddedActivity(Resource):
    pass

db.cypher_query(cypher_creation)

With this data the following both work:

Person.nodes.get_or_none(name="foo")

and

Resource.nodes.get_or_none(name="foo")

both return:

<Person: {'name': 'foo', 'element_id_property': '4:7a3471f8-2e7f-4f99-8866-ffd9e0752d2a:36858'}>

But

Person.nodes.get_or_none(name="baz")

throws a neomodel.exceptions.NodeClassNotDefined: Node with labels Resource,Organization,Person does not resolve to any of the known objects

@mariusconjeaud
Copy link
Collaborator

This is super clear !

You mentioned in your first post that

I investigated using optional_labels but this is still problematic because the class name is always created as a label.

Here do you mean that your tested solution was something like this :

class Organization(Resource):
    __optional_labels__ = ["Person"]
    pass

... and the "class name that is always created as a label" is Organization, which you don't want ? I'm not sure I understand why this doesn't fit the purpose, so if you could explain :-)

@alanbuxton
Copy link
Author

alanbuxton commented Aug 28, 2023

Thanks for getting back so quickly @mariusconjeaud - really appreciate you looking into this.

In this solution, every time something that has both Organization and Person labels assigned it will be treated as an Organization. It means I can't have one model with logic related to a "Person" and a different model with logic related to an "Organization"., e.g. obviously I can't do:

class Organization(Resource):
    __optional_labels__ = ['Person']

class Person(Resource):
    __optional_labels__ = ['Organization']

Because the combo of "Organization" and "Person" is already taken

Really I want to be able to handle these cases. For a given node:

  • This node is a person (with its own business logic in python)
  • This node is an organization (with its own business logic in python)
  • This node is both an organization and a person

Thoughts?

alanbuxton added a commit to alanbuxton/syracuse-neo that referenced this issue Sep 6, 2023
@alanbuxton
Copy link
Author

Any updates on this issue at all?

I think the real solution would be if Organization.nodes.all() would know to be expanding cypher results to Organization objects rather than figuring out what class to insantiate based only on the labels.

@mariusconjeaud
Copy link
Collaborator

Hey @alanbuxton, sorry for letting that die down, I had less time to spend on neomodel recently, and most of it has been spent on the driver refactoring + py2neo becoming EOL for real - you'll hear more about this later.

I now clearly understand what you want, but will admit to being lost for now, as I haven't worked on the cypher projection to objects yet so I know little about it.

So to sump up, I still think this is important, I just did not have time, and it is not a trivial fix for me (yet)

@tkess
Copy link

tkess commented Dec 22, 2023

Not sure but think this is a related issue and wondering if it's a bug. In core.py the registry is created with this code;

def build_class_registry(cls):
base_label_set = frozenset(cls.inherited_labels())
optional_label_set = set(cls.inherited_optional_labels())

# Construct all possible combinations of labels + optional labels
possible_label_combinations = [
    frozenset(set(x).union(base_label_set))
    for i in range(1, len(optional_label_set) + 1)
    for x in combinations(optional_label_set, i)
]
possible_label_combinations.append(base_label_set)

for label_set in possible_label_combinations:
    if label_set not in db._NODE_CLASS_REGISTRY:
        db._NODE_CLASS_REGISTRY[label_set] = cls
    else:
        raise NodeClassAlreadyDefined(cls, db._NODE_CLASS_REGISTRY)

But if you look at it, it always doing a union. So in my case, I have 2 inherited classes...and as a result it's creating an index that looks like {"A","B"}, not {"A"}, {"B"} and {"A","B"} which is what I thought it was going to do.

Then here in util.py is where its used:

   if isinstance(object_to_resolve, Node):
        return self._NODE_CLASS_REGISTRY[
            frozenset(object_to_resolve.labels)
        ].inflate(object_to_resolve)

The issue is that is always receiving just one of the classes, the one that was used to create the query that needs to be inflated. So either {"A"} or {"B"}, but not {"A", "B"} and the call fails with cannot resolve error because neither {"A"} or {"B"} is in the _NODE_CLASS_REGISTRY. It seems to be happening with all my inherits.

Am I missing something? Was the intention to produce a list of all possible combinations, because I don't see how that code above accomplishes that, and as far as I see in my runs, frozen set only contains the multiple class index's.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants