Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The distributed example doesn't work #251

Open
nexon33 opened this issue Sep 15, 2022 · 5 comments
Open

The distributed example doesn't work #251

nexon33 opened this issue Sep 15, 2022 · 5 comments

Comments

@nexon33
Copy link

nexon33 commented Sep 15, 2022

The example code at https://github.com/CodeReclaimers/neat-python/blob/master/examples/xor/evolve-feedforward-distributed.py doesn't seem to work and I can't get it to work.

lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object '_ExtendedManager._get_manager_class.<locals>._EvaluatorSyncManager'

It would be really cool to run this on multiple devices and have it train a lot quicker

@nexon33
Copy link
Author

nexon33 commented Sep 15, 2022

I did get it to work but indeed its a bit unreliable like the docs say.

Would je really excited to see that being picked up however. :)

@bennr01
Copy link
Contributor

bennr01 commented Sep 15, 2022

Hi,

I am the guy who wrote neat.distributed a couple of years ago. The old version tried to utilize multiprocessing's distributed functionality, but (as you can see) it has some issues.
I rewrote the entire module ~5 years ago (see #125). The changes weren't merged and by now seem to cause some merge conflicts, but it should solve most problems. In case you want to check it out: my branch https://github.com/bennr01/neat-python/tree/distributed_socket contains the entire changes, but it's also somewhat outdated compared to the main repository.

Regarding performance: how much neat.distributed can improve the performance varies greatly depending on your exact network evaluation function. For most "small" examples (e.g. xor, ...), the performance cost of serializing and deserializing exceeds the cost of simply evaluating on a single device. If you are looking to improve evaluation speed for such examples, I'd recommend either using neat.parallel or PyPy instead (note: pypy+neat.parallel is slower).

@nexon33
Copy link
Author

nexon33 commented Sep 15, 2022

Hi,

I am the guy who wrote neat.distributed a couple of years ago. The old version tried to utilize multiprocessing's distributed functionality, but (as you can see) it has some issues.
I rewrote the entire module ~5 years ago (see #125). The changes weren't merged and by now seem to cause some merge conflicts, but it should solve most problems. In case you want to check it out: my branch https://github.com/bennr01/neat-python/tree/distributed_socket contains the entire changes, but it's also somewhat outdated compared to the main repository.

Regarding performance: how much neat.distributed can improve the performance varies greatly depending on your exact network evaluation function. For most "small" examples (e.g. xor, ...), the performance cost of serializing and deserializing exceeds the cost of simply evaluating on a single device. If you are looking to improve evaluation speed for such examples, I'd recommend either using neat.parallel or PyPy instead (note: pypy+neat.parallel is slower).

In fact I'm trying to optimize a function that takes a few minutes to complete, so running this distributed would really speed this up a lot so I don't need to wait half an hour for one generation. Is Pypy in combination with distributed computing advised?

I will try and take a look at the code tomorrow.

@nexon33
Copy link
Author

nexon33 commented Sep 16, 2022

I'm having trouble merging the repositories as I almost never have done it before. The problem is mainly how can I merge this so I can start selecting which code should stay and which shouldn't

Is there any other way I can contact you?

@bennr01
Copy link
Contributor

bennr01 commented Sep 16, 2022

In fact I'm trying to optimize a function that takes a few minutes to complete, so running this distributed would really speed this up a lot so I don't need to wait half an hour for one generation. Is Pypy in combination with distributed computing advised?

I haven't tested it. In theory it should work as long as you set num_workers=1 on each secondary node and manually start a pypy process for each core on each secondary node. This is because IIRC pypy looses a lot of performance benefits when using multiprocessing.Pool, although this may depend on the exact use case and may have changed in the last couple of years. Running a seperate pypy process for each core may allow you to circumvent this.

I'm having trouble merging the repositories as I almost never have done it before. The problem is mainly how can I merge this so I can start selecting which code should stay and which shouldn't

For anyone else reading this: I've responded to a separate issue in my fork here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants