Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to control the max length of a genome and several other issues? #158

Open
Y1fanHE opened this issue Oct 12, 2021 · 3 comments
Open

How to control the max length of a genome and several other issues? #158

Y1fanHE opened this issue Oct 12, 2021 · 3 comments

Comments

@Y1fanHE
Copy link
Contributor

Y1fanHE commented Oct 12, 2021

Hello @erp12 @lspector ,

Very grateful for your contribution on making this PyshGP!
I am doing my research project on PushGP and I am trying to use this Python library.

  • I found that it seems there is no control parameter for the maximum genome length (not the one for the initial genome length).
    This parameter seems to have the name of max-points in Clojush.

  • In pyshgp/push/stack.py, you commented at Line 48 as # Collection sizes and string lengths are bounded to avoid utilizing too many resources. However, in pyshgp/push/types.py, you did not set is_collection=True for the PushStrType. This returns MemoryError when I tried some string related tasks, such as "small or large".

  • I would like to ask the average time for you to run experiment on the general program synthesis benchmark problems. It takes me a long time to run 1000 x 300 evaluations (about 2.5h for "small or large" with parallelism using 6 cores).

@erp12
Copy link
Owner

erp12 commented Oct 13, 2021

Thanks so much for these bug reports @Y1fanHE!

I found that it seems there is no control parameter for the maximum genome length (not the one for the initial genome length).
This parameter seems to have the name of max-points in Clojush.

Good catch. To address this we would need to do a few things.

  1. Decide how the user should configure the max genome size. In Clojush, the max-points parameter is confusing because it has multiple meanings. In pyshgp, we should either add an additional parameter to the SearchConfiguration class or maybe tweak the existing initial_genome_size parameter to apply to all genomes.

  2. Implement a truncation of child genomes after variation operators are applied. I would have to think a bit more about the best way to make this change.

If you are interested in working on a PR for these changes, I would be happy to assist. Otherwise, I can add it to my list of things to do for this project and try to get around to them at some point.

In pyshgp/push/stack.py, you commented at Line 48 as # Collection sizes and string lengths are bounded to avoid utilizing too many resources. However, in pyshgp/push/types.py, you did not set is_collection=True for the PushStrType. This returns MemoryError when I tried some string related tasks, such as "small or large".

Another good catch! I see you have already opened PR #159 to fix this issue. Thank you! We can discuss this further there.

I would like to ask the average time for you to run experiment on the general program synthesis benchmark problems. It takes me a long time to run 1000 x 300 evaluations (about 2.5h for "small or large" with parallelism using 6 cores).

I don't have any data on this at the moment. One of the things on the backlog of this project is to add a script for launching multiple runs of common benchmark problems and gathering statistics such as runtime and solution rates. This is a larger task that isn't well defined yet, but I am happy to offer assistance if you feel motivated to take it on.

@Y1fanHE
Copy link
Contributor Author

Y1fanHE commented Oct 14, 2021

Hello, @erp12 . Thank you for the reply.

Yes, I would like to work on the max genome size changes.

In pyshgp, we should either add an additional parameter to the SearchConfiguration class or maybe tweak the existing initial_genome_size parameter to apply to all genomes.

I think this could be an additional parameter, since in GP usually, the genome can grow longer than the initial states.

For the bug report and the test case, I have already made some comments in PR #159.

One of the things on the backlog of this project is to add a script for launching multiple runs of common benchmark problems and gathering statistics such as runtime and solution rates. This is a larger task that isn't well defined yet, but I am happy to offer assistance if you feel motivated to take it on.

I think this is a good idea. Since I eventually will run experiments using this pyshgp, I could just provide the script that I used.

Also, I saw in the gecco paper of this library, there is future work to implement the stack in another language. Do you have a plan to do this recently? Or I can try to do that though I do not have any experience before.

@erp12
Copy link
Owner

erp12 commented Oct 17, 2021

Yes, I would like to work on the max genome size changes.
I think this could be an additional parameter, since in GP usually, the genome can grow longer than the initial states.

That's great! I think your decision to use an additional parameter is good.

To guide your work on this feature, I think the best place to implement the limit is in the VariationOperator base class so that all concrete implementations of VariationOperator will produce child genomes that are within the limit.


I think this is a good idea. Since I eventually will run experiments using this pyshgp, I could just provide the script that I used.

Sounds good. If you think your script is generic enough to be usable by a wider set of users and you would like to take the time to contribute it to the project, feel free to open a PR and we can discuss further.


Also, I saw in the gecco paper of this library, there is future work to implement the stack in another language. Do you have a plan to do this recently? Or I can try to do that though I do not have any experience before.

A few years ago I made an attempt to implement the stacks as a lightweight C++ data structure. Unfortunately I was not knowledgable enough about how to interface between Python and C++ without sacrificing either performance or flexibility. I am also concerned about increasing the complexity when it comes to build and distribution processes once the project moves to a multi-language codebase.

I am happy to consider an architecture for a faster Push interpreter, including implementations in other languages, if you have a specific proposal in mind.

Some other possibilities for getting faster runtimes could be to try running pyshgp on PyPy or after compiling the project to c using mypyc. I have not tried either of these technologies yet.

Lastly, I should mention that a single PushGP run on the typical benchmark problems used in the literature is known to typically take hours per run regardless of implementation language. This recent paper about a new set of benchmark problems (similar to "small-or-large") did an informal measurement of runtime and found that most problems required an average runtime of over 5 hours on modest hardware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants