Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplifying constants in result or during search #128

Open
johanbluecreek opened this issue Sep 10, 2022 · 3 comments
Open

Simplifying constants in result or during search #128

johanbluecreek opened this issue Sep 10, 2022 · 3 comments
Labels
feature (importance: low) Low-importance feature

Comments

@johanbluecreek
Copy link
Contributor

I have used this package for a work-flow that is basically: Solve a differential equation (DE) using DifferentialEquations.jl and then giving that solution to SymbolicRegression.jl to find the analytical expression. From the DE it is quite obvious that the resulting expression should only contain integers or rationals. The optimization that is carried out for constants in the expressions gets very close, but it would be nice if there was either:

  • a function that can, post-search, fix numerical constants to some exact numerical constant, or
  • an option for the search that would act with some such function, as described above, on the candidate expressions during the search

Is there such functionality already in SymbolicRegression.jl that I have missed, or would it be useful to have?

Simple examples would be having solutions like 5.0000000002*x1, or whatever, rounded to 5.0 * x1, more advanced being the rationals having 0.200000002 * x2 fixed to x2 / 5.0 or similar, or more advanced still sin(x1 * 0.31830988618454) to sin(x1 / π). In SymPy there is a function nsimplify that can handle the numerical part of such functionality, and works quite well, e.g.

>>> from sympy import nsimplify, GoldenRatio, pi, E
>>> nsimplify(0.865255979442955, [GoldenRatio, pi, E], full=True)
E/pi

but I don't know if there is a Julia package that does the same thing.

@MilesCranmer
Copy link
Owner

MilesCranmer commented Sep 10, 2022

Interesting question. I see two options:

1. Post-process expressions.

You could do is use SymbolicRegression.jl like an nsimplify on steroids. For example, let's say we want to find an Ramanujan-like approximation to pi. To do this, you pass integers as features, and set complexity_of_constants=100 - this will make real constants too complex to use.

options = Options(
    # Make constants prohibitively expensive:
    complexity_of_constants=100, 
    unary_operators=(sqrt, square), 
    binary_operators=(+, *, /, -), 
    mutationWeights=[0.0, 0.47, 0.79, 5.1, 1.7, 0.0020, 0.00023, 0.21],
    # ^ Set p(mutate_constant)=0.0
    shouldOptimizeConstants=false,
    # ^ Set constant optimization off (so we don't waste cycles)
    parsimony=0.001,
)
# Integers from 1 to 10:
X = reshape(collect(1.0:10.0), 10, 1)
y = [float(pi)]

EquationSearch(X, y; options=options, multithreading=true, niterations=1000)

This gives me as output:
Screen Shot 2022-09-10 at 1 13 11 PM
which is actually pretty good, recovering many approximations of pi! https://en.wikipedia.org/wiki/Approximations_of_%CF%80 (I wonder if some of these are even known...?)

You could do something similar for other constants you wish to include - you could also set varMap to make the printout give the constants in such a case.

2. Search directly for integer/rationals.

One option is to basically apply the solution 1., but use it for the search itself. (i.e., concatenate the constants/integers with the data features, at each row)

More generally (say you want to include any integer), this is a bit tricky, especially because Optim.jl only can optimize float-like constants. But you could basically rely on the mutate_constant function (here) to explore the space of integers.

Right now, the library assumes that Node{T} and Dataset{T} have the same type T. In other words: expression constants have the same type as the element type of the dataset. However, this isn't really necessary - one could rewrite the backend to allow for Node{T1}, Dataset{T2}, which would allow you directly store integers or rational numbers in the expressions, while still computing with float precision. One could then look at using JuMP.jl in place of Optim.jl for explicit integer/rational optimization, if such a thing is possible.

Another option is to implement a function

function val(tree::Node{T})::Int
    convert(Int, tree.val)
end

and make it so that every time tree.val is called, you would instead call val(tree). You could then set that output type (here, an Int), to some user-specified type. I think the Node{T1} / Dataset{T2} solution is more elegant though.

@qwertyjl

This comment was marked as off-topic.

@MilesCranmer
Copy link
Owner

Hi @qwertyjl - I think this question is unrelated to the use of SymbolicRegression.jl/PySR - apologies but I do not have time to answer general math/science questions.
Best,
Miles

@MilesCranmer MilesCranmer added the feature (importance: low) Low-importance feature label Sep 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature (importance: low) Low-importance feature
Projects
None yet
Development

No branches or pull requests

3 participants