The distributed example doesn't work #251

nexon33 · 2022-09-15T12:23:46Z

The example code at https://github.com/CodeReclaimers/neat-python/blob/master/examples/xor/evolve-feedforward-distributed.py doesn't seem to work and I can't get it to work.

lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object '_ExtendedManager._get_manager_class.<locals>._EvaluatorSyncManager'

It would be really cool to run this on multiple devices and have it train a lot quicker

The text was updated successfully, but these errors were encountered:

nexon33 · 2022-09-15T17:18:56Z

I did get it to work but indeed its a bit unreliable like the docs say.

Would je really excited to see that being picked up however. :)

bennr01 · 2022-09-15T18:32:05Z

Hi,

I am the guy who wrote neat.distributed a couple of years ago. The old version tried to utilize multiprocessing's distributed functionality, but (as you can see) it has some issues.
I rewrote the entire module ~5 years ago (see #125). The changes weren't merged and by now seem to cause some merge conflicts, but it should solve most problems. In case you want to check it out: my branch https://github.com/bennr01/neat-python/tree/distributed_socket contains the entire changes, but it's also somewhat outdated compared to the main repository.

Regarding performance: how much neat.distributed can improve the performance varies greatly depending on your exact network evaluation function. For most "small" examples (e.g. xor, ...), the performance cost of serializing and deserializing exceeds the cost of simply evaluating on a single device. If you are looking to improve evaluation speed for such examples, I'd recommend either using neat.parallel or PyPy instead (note: pypy+neat.parallel is slower).

nexon33 · 2022-09-15T18:51:37Z

Hi,

I am the guy who wrote neat.distributed a couple of years ago. The old version tried to utilize multiprocessing's distributed functionality, but (as you can see) it has some issues.
I rewrote the entire module ~5 years ago (see #125). The changes weren't merged and by now seem to cause some merge conflicts, but it should solve most problems. In case you want to check it out: my branch https://github.com/bennr01/neat-python/tree/distributed_socket contains the entire changes, but it's also somewhat outdated compared to the main repository.

Regarding performance: how much neat.distributed can improve the performance varies greatly depending on your exact network evaluation function. For most "small" examples (e.g. xor, ...), the performance cost of serializing and deserializing exceeds the cost of simply evaluating on a single device. If you are looking to improve evaluation speed for such examples, I'd recommend either using neat.parallel or PyPy instead (note: pypy+neat.parallel is slower).

In fact I'm trying to optimize a function that takes a few minutes to complete, so running this distributed would really speed this up a lot so I don't need to wait half an hour for one generation. Is Pypy in combination with distributed computing advised?

I will try and take a look at the code tomorrow.

nexon33 · 2022-09-16T07:27:41Z

I'm having trouble merging the repositories as I almost never have done it before. The problem is mainly how can I merge this so I can start selecting which code should stay and which shouldn't

Is there any other way I can contact you?

bennr01 · 2022-09-16T14:12:12Z

In fact I'm trying to optimize a function that takes a few minutes to complete, so running this distributed would really speed this up a lot so I don't need to wait half an hour for one generation. Is Pypy in combination with distributed computing advised?

I haven't tested it. In theory it should work as long as you set num_workers=1 on each secondary node and manually start a pypy process for each core on each secondary node. This is because IIRC pypy looses a lot of performance benefits when using multiprocessing.Pool, although this may depend on the exact use case and may have changed in the last couple of years. Running a seperate pypy process for each core may allow you to circumvent this.

I'm having trouble merging the repositories as I almost never have done it before. The problem is mainly how can I merge this so I can start selecting which code should stay and which shouldn't

For anyone else reading this: I've responded to a separate issue in my fork here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The distributed example doesn't work #251

The distributed example doesn't work #251

nexon33 commented Sep 15, 2022

nexon33 commented Sep 15, 2022

bennr01 commented Sep 15, 2022

nexon33 commented Sep 15, 2022

nexon33 commented Sep 16, 2022 •

edited

bennr01 commented Sep 16, 2022

The distributed example doesn't work #251

The distributed example doesn't work #251

Comments

nexon33 commented Sep 15, 2022

nexon33 commented Sep 15, 2022

bennr01 commented Sep 15, 2022

nexon33 commented Sep 15, 2022

nexon33 commented Sep 16, 2022 • edited

bennr01 commented Sep 16, 2022

nexon33 commented Sep 16, 2022 •

edited