Possible more efficient search strategy #577

gm89uk · 2024-03-21T11:41:05Z

gm89uk
Mar 21, 2024

Thank you Miles for sharing this amazing package.
I am an ophthalmologist who is very interested in using this to improve the accuracy of our outcomes in various types of refractive surgery, i.e. I am not a programmer and know just enough python to get by and run this. I have so far, had excellent results, surpassing that of optimised XGBoost models on test databases. As we are dealing with outcomes, accuracy is more important than simplicity (while keeping it generalisable).

I wanted to share some observations and potential to improve the Pysr search strategy to get more rapid reductions in loss with simpler equations.

In my example, I have eight features, therefore, I increased the maxsize to 60. After much experimentation and running various algorithms on different databases for a couple days each (6 Core Ryzen 7 CPU laptop), I found that there are often diminishing returns after a certain max size, often the Loss plotted against maxsize appears to follow an exponential decay function:

The loss drops very slowly following this (obviously problem specific and I understand there is no convergence as such).
My strategy is to take the equation from the maxsize / 2, (so in the above example it was 30) and generate a new feature with it, x8.

I then generate a new y variable, which is y1= y - x8 and this represents the error left over from the equation at size = 30. I run Pysr again with maxsize = Previous_Max_Size /2 (so 60/2 = 30) and so the total maxsize remains the same as before. The new y variable is y1 with the same x features as the original run.

What happens is that there is a rapid reduction in loss for lower complexity and new expressions are found quickly. I ran both the original code on ipython locally and the new pysr run on y1 on google colab simultaneously. Despite being significantly slower, the google colab managed to find new expressions very quickly at lower complexity to improve the original equation. Inputting the error as y, rather than x, prevents pysr from using the previous expression as a variable, which would lead to horrible increased complexity and lack of interpretability. This happens reliability every time I've tried it.

I have attached some example screenshots:
This is my ipython run, has been running for approximately 3 days.

You can see at complexity = 30, the loss (MSE) is stuck on 1.160e-01 and hasn't budged in the last day, while at complexity = 60, it has very slowly reduced.

The new loss units are identical and comparable between first run pysr and second run, as the new y is the loss.

Here is a new run on jupyter notebook using the process above.
At the start of model2, at complexity 1, the loss is basically the same as what we had with model1: complexity of [maxsize / 2] (30).

However, after running model2 for just 5 minutes, we have a rapid reduction in loss with low complexity:

You can see in model2 at complexity = 4 (i.e. total true complexity of 30+4 = 34), we now have a loss of 1.086e-01. If we compare this to complexity 34 in the first run, 1.125e-1, having remained stagnant for over a day, we achieved lower loss in a few minutes.

Therefore, my new equation would be: model1(complexity 30) + model2(complexity 4) with a reduced loss function and a new expression the model could explore.
I have manually calculated the loss functions myself of model 1 and model1+model2 to confirm the loss functions are comparable.

Unfortunately, I don't have a means of feeding new equations found in model2 back to model1; I cannot add it to the hall of fame or let pysr know of the new expression that it can play with a mutate/cross over to improve the existing model1.

My suggestion would be to somehow add an option to permit pysr to perform this process iteratively above a certain threshold of 'stagnation', and feedback the new simple complexity terms that lead to a reduced loss back to the original model equations and let the mutations and cross overs help from there, perhaps a new term of complexity 34, may help at complexity 60 for example. Model2 only needs to run for a few minutes to find new expressions and 'mix things up' for model 1.

I thought of trying to do this myself in python, generate the new variables and set up the new model automatically, but if you know a means to feed expressions back to model 1 and resume it would be much appreciated.

What would be ideal, is that we can add simple expressions from model 2 to model 1, for example maxsize/2 (complexity 30) and still permit the equation to be modified and improved further within the model1 run.

I apologise about the long winded explanation, I hope it made sense!

Here is my code for reference.

import numpy as np
from pysr import PySRRegressor
import pandas as pd
from pandas import ExcelFile
df = pd.read_excel("BasicParameters.xlsx", sheet_name="SameA")
#df.head()
x = df[['a', 'k', 'd', 's', 'l', "w", "v", "g"]].to_numpy()
y = df['p'].to_numpy() #This is the first run, the second run, it would be y1 = y-x8(predictions from first model at maxsize = 30).

model = PySRRegressor(
    model_selection="accuracy",  # Result is a mix of simplicity+accuracy
    niterations=1000000,
    #ncycles_per_iteration=550,
    binary_operators=["+", "*", "-", "/","^","cond","mod","min","max", "greater", "logical_or", "logical_and"],
    unary_operators=[
        "cos",
        "tan",
        "exp",
        "sin",
        #"sqrt",
        "log"
        #"cube"
    ],
    #warmup_maxsize_by=0.005,
    maxsize=60,
    #warm_start=True,
    populations=18,
    population_size=100,
    #weight_optimize = 0.001,
    precision=64,
    batching=True,
    nested_constraints={
        "sin": {"sin": 0, "cos": 0, "tan": 0}, 
        "cos": {"sin": 0, "cos": 0, "tan": 0},
        "tan": {"sin": 0, "cos": 0, "tan": 0},
        "exp": {"exp": 0, "log": 0},
	"log": {"exp": 0, "log": 0}
    },
    #complexity_of_operators={"sin":2,"cos":2,"log":2,"exp":2,"^":2},
    #complexity_of_operators = 2,
    turbo=True,
    #constraints={"pow": (9, 2)}
    constraints={'^': (-1, 3)}
)

model.fit(x, y)```

pukpr · 2024-03-23T23:01:05Z

pukpr
Mar 23, 2024

I can see where this might be applicable -- say a staged, or cascaded, or convolutional set of transformations where the previous stage is fed into the next stage. That may not be what you are trying to do but the effect is the same. The overall complexity does not reduce unless the intermediate result has some plausible meaning.

I am working on a similar problem where the intermediate would be model1 (which can be checked for plausibility) and the final model2 can incorporate model1. This is usually decomposed as a data-flow model, where the model1 output is piped into the model2 stage. So what I do is run model1 in a mode that it minimizes an error in what the intermediate result is expected to be (a plausible result). Then for model2, I start with the model1 results and that is further modified and symbolic transformations (mainly sin, cos trig functions) are applied to model1 until the error to the target is minimized. When that is done, can see how much mode1 deviated from the expected intermediate result. The important point is that the complexity of model1 doesn't matter as much as the complexity of model2, since the model1 is serving as more of an emulator to get close to the intermediate result and then will be parsimoniously transformed to the target. It's almost as if you need a parametric representation of the intermediate results so that slight phase shifts and amplitude adjustments can be made.

This is one of those scenarios that happen all the time -- you may have an intermediate result but you want that emulated because you don't know if it's the exact input the next stage is operating on. Or anytime that only a short calibration of the intermediate stage is available and you need that extrapolated over a much wider range, so you may want to backtrack in history, or make future predictions, In medical, it may be in inverse tomography.

Tried asking ChatGPT4 if it could figure out any clever way to do this in a continuously iterative fashion using PySR.
https://chat.openai.com/share/8727fdbc-fbb5-4886-93ba-764ea0b88d04
Nothing truly clever

0 replies

MilesCranmer · 2024-03-25T00:22:29Z

MilesCranmer
Mar 25, 2024
Maintainer

Hi @gm89uk,

Thanks very much for your post, I am delighted to hear you are trying things out! Very exciting to hear your use-case, I'm happy to help more.

Unfortunately, I don't have a means of feeding new equations found in model2 back to model1; I cannot add it to the hall of fame or let pysr know of the new expression that it can play with a mutate/cross over to improve the existing model1.

My suggestion would be to somehow add an option to permit pysr to perform this process iteratively above a certain threshold of 'stagnation', and feedback the new simple complexity terms that lead to a reduced loss back to the original model equations and let the mutations and cross overs help from there, perhaps a new term of complexity 34, may help at complexity 60 for example. Model2 only needs to run for a few minutes to find new expressions and 'mix things up' for model 1.

You raise some interesting ideas! PySR does some of these: https://arxiv.org/abs/2305.01582. In particular the algorithm actually re-introduces the current hall-of-fame back into the populations at a regular interval. The "migration" parameters control this behavior: https://astroautomata.com/PySR/api/#migration-between-populations such as fraction_replaced (mixing between populations), fraction_replaced_hof (mixing from hall of fame back to populations).

However these happen at a regular interval rather than based on some heuristic; I think that's an interesting idea.

Other things to note that you might try tuning: https://astroautomata.com/PySR/api/#working-with-complexities

parsimony. This is an additive regularization based on complexity. A parsimony of 0.01 means 0.01 * complexity is added to the loss function during the search (though it is not added to the printed loss; it's just a regularization)
adaptive_parsimony_scaling. This is an adaptive regularization for a strategy that seeks to make the distribution of complexities uniform. Basically it computes how frequently complexities appear in the population, and favor/disfavor expressions so that, on average, the complexities will have about the same frequency.

The default adaptive_parsimony_scaling is 20 but I actually like to set it to 1000 (https://astroautomata.com/PySR/tuning/) as I find it really helps for this stuff.

Another parameter you could try is warmup_maxsize_by which slowly increases the maxsize during the search. So if you set it to 0.5, it will reach the max complexity at 50% of the way through the search, and over time, linearly increase the max complexity from 5 up to the maxsize you pass.

Hope this helps a bit. Please followup with any other questions; happy to help!

Best,
Miles

1 reply

gm89uk Mar 26, 2024
Author

Thank you both for your input and for the useful tips @MilesCranmer.

Setting parsimony to min(loss)/10 greatly improved efficiency. With 7 dimensions in x, fraction_replaced_hof set to 0.1 worked very harmoniously with parsimony to improve efficiency without aimlessly increasing complexity. Now Pysr behaves as I described above, reaching a min(loss) in 6 hours what I had previously achieved over 3 days!

I found that adaptive_parsimony_scaling worked as described above, but slowed down the overall reduction of loss. In combination with parsimony, there is too much regularisation. warmup_maxsize_by had a similar effect to adaptive_parsimony_scaling; essentially forcing optimisiation of lower complexities prior to adding higher complexities until the threshold is reached to continue as normal. Both were ineffective compared to just having a more optimised parsimony value.

Some suggestions:

I suspect higher dimensional problems would benefit from an increased fraction_replaced_hof as it did very much so in my case.
Could parsimony be a dynamic parameter that is set to the min(loss)/10 with each iteration? This would ensure it aggressively tunes lower complexities initially, followed by more refinement as the min(loss) reduces.
Your explanation for adaptive_parsimony_scaling above is useful, and would benefit from being included in your tuning post.

Regarding my use case if you are interested, attempting to be as succinct as possible, cataract surgery (replacing someone's natural lens) is the most common ophthalmic procedure. Someone's postoperative refraction (glasses prescription), is determined by the lens choice put in. The lens choice is determined by pre-operative biomarkers, or measurements of the eye. Various different type of algorithms exist to predict the power of lens required to achieve a certain refraction, based on manipulation of the pre-operative measurements: i) theoretical optics, ii) regression, iii) machine learning / black box algorithms. Once a lens is put in, we measure the refraction of the patients post-operatively to calculate our measurement error. I find that this may be an ideal candidate for symbolic regression, based on my preliminary testing.

BW

George

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible more efficient search strategy #577

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Possible more efficient search strategy #577

gm89uk Mar 21, 2024

Replies: 2 comments · 1 reply

pukpr Mar 23, 2024

MilesCranmer Mar 25, 2024 Maintainer

gm89uk Mar 26, 2024 Author

gm89uk
Mar 21, 2024

Replies: 2 comments 1 reply

pukpr
Mar 23, 2024

MilesCranmer
Mar 25, 2024
Maintainer

gm89uk Mar 26, 2024
Author