Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow call get_hash in NodeEncoder #134

Open
Onandon11 opened this issue Jun 30, 2020 · 6 comments
Open

Slow call get_hash in NodeEncoder #134

Onandon11 opened this issue Jun 30, 2020 · 6 comments

Comments

@Onandon11
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
In our project some graphs are time critical, when profiling the code the NodeEncoder arose as a bottleneck.

Describe the solution you'd like
Can we think of something else to serialize nodes without using the sha256 of the hashlib?

Describe alternatives you've considered
Maybe we can use the __repr__, this method describe the object in a single line.

Do you guys have alternatives?

@neuneck
Copy link
Collaborator

neuneck commented Jun 30, 2020

It should not be too hard to make the hashing function used there configurable. There is no specific reason I picked the sha256 except for it being quite common and available in the python standard library. To me, one advantage of the neat representations of flowpipe graphs is to easily tell when to graphs are the same (after execution), so a simple __repr__ which falls back to the in-memory location of python objects that didn't define it would not fit the bill. But it could make sense as a default, since every object supports it!

The NodeEncoder is used to provide something json compatible from generic python objects. A quick look at the code shows me that it should only get called in the Node.list_repr() or Graph.list_repr() methods. I might have missed something, though.

Can you maybe elaborate a little bit on the circumstance in which you find the slowdown from the NodeEncoder? Maybe this is a pointer to something else that needs improvement.

@Onandon11
Copy link
Collaborator Author

I agree on why you picked the sha256 and I also agree that this is a fine way because it indeed shows if objects are the same.

If it should only be called by Node.list_repr() and Graph.list_repr() than indeed something is going on. I inly use the function when I'm debugging, but thereafter I never call them again. However the NodeEncoder shows up many times in my profiler when running whitout these calls.

I will look why they are called and where :) I keep you updated!

@Onandon11
Copy link
Collaborator Author

So I profiled my code and saw that it's being called in:

def _update_value(self, value):
"""Update the internal value."""
old_hash = get_hash(self._value)
new_hash = get_hash(value)
self._value = value
if old_hash is None or new_hash is None or (old_hash != new_hash):
self.is_dirty = True

This function is called every time a value is updated. I don't think this is expected behavior isn't it?

@neuneck
Copy link
Collaborator

neuneck commented Jul 14, 2020

Oh, yes it is. This piece of logic makes sure that a plug/node stays clean if you set the same input again. Thay way, when you re-run the graph and only parts of the inputs change, you can simply set all of the values, but only the altered part of the graph re-runs.

Would a way to override the hash function used here help you out? Alternatively, I think you might be able to get a dirty quick fix by monkey-patching the utilities.get_hash function before importing anything else from fowpipe

import flowpipe.utilities
def null_hash(*args, **kwargs):
    return None
flowpipe.utilities.get_hash = null_hash

import flowpipe....

I haven't tried it though, it might not work at all.

@Onandon11
Copy link
Collaborator Author

Hmm, again, you're right :p I don't use it that frequently (updating an existing graph), so that's why I didn't thought about it I think.. Maybe it's due to my other issue #135, were I frequently re-create Graphs because they are inside Nodes.

@neuneck
Copy link
Collaborator

neuneck commented Jul 14, 2020

It would be easy to make that hash function a constant in the flowpipe.utilities package and make use of python's lazy evaluation for convenient overriding. I'm talking about

## flowpipe/utilitied.py
HASH_FUNC = hash_func
....
def hash_func(...):
    ...

and

## flowpipe/plug.py
import .utilities
...
    def _update_value(self, value):
        old_hash = utilities.HASH_FUNC(self.value)
        new_hash = utilities.HASH_FUNC(value)
...

This way, users like you, that are slowed down by this evaluation (or work with objects where the current hashing function doesn't work), can override the hash function conveniently, by just setting

flowpipe.utilities.HASH_FUNC = my_hash_func
... # rest of the code

Do you think that will help you? It feels pythonic to me, since it's easy to explain and explicit.
Dynamically generating graphs is fine, and should be fine, too. Not nesting them in nodes will not help with this issue, at least not as far as I can see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants