Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multidimensional equations #262

Open
cmhamel opened this issue Oct 23, 2023 · 4 comments
Open

Multidimensional equations #262

cmhamel opened this issue Oct 23, 2023 · 4 comments

Comments

@cmhamel
Copy link

cmhamel commented Oct 23, 2023

Is there a way to optimize a multi dimensional symbolic equation?

From I can tell from the documentation and toying with the package, each output is given its own symbolic form. Is this correct?

@MilesCranmer
Copy link
Owner

Not currently. However, DynamicExpressions.jl which forms the expression backend, can indeed handle this: https://github.com/SymbolicML/DynamicExpressions.jl/#tensors. So just need to find some time to try turning it on and fixing up various type assumptions.

Alternatively you can implement this manually via a custom loss objective, where the objective splits a single expression into each component of the vector output. See https://astroautomata.com/PySR/examples/#9-custom-objectives for an example (That shows the Python API but its the same on the Julia side. Just convert full_objective -> loss_function and pass a function rather than a string)

@cmhamel
Copy link
Author

cmhamel commented Oct 24, 2023

Thanks @MilesCranmer !

The custom loss is probably what I need. To make sure I understand, let's say I have a 3d equation I'm trying to fit. Would I just need to ensure it's a binary tree, split the root, and then say split the left node of the root again to fill out three expressions?

@MilesCranmer
Copy link
Owner

Yeah, exactly!!

Another tricky part comes from the fact that Dataset.y is a 1D vector. Thus, you could put the 1st element into y, and the 2nd and 3rd element into the last columns of X. Then, in your loss function, you could have a check that those features never show up in the expression, like this:

function my_custom_objective(tree, dataset::Dataset{T,L}, options) where {T,L}
    # Return infinite loss for any violated assumptions:
    tree.degree != 2 && return L(Inf)
    tree.l.degree != 2 && return L(Inf)
    
    # say, 2nd element is feature 6 in X, and 3rd is feature 7.
    # This function checks if a given node is equal to those feature nodes:
    is_feature_6_or_7(node) = node.degree == 0 && !node.constant && (node.feature == 6 || node.feature == 7)

    # We iterate through all the nodes in the tree. If any match, we return infinite loss:
    any(is_feature_6_or_7, tree) && return L(Inf)


    y1 = dataset.y
    y2 = dataset.X[6, :]
    y3 = dataset.X[7, :]
    # [rest of loss function]
end

Also note that you will have to manually extract the subexpressions at the very end (since the printing does not know about your scheme).

@MilesCranmer
Copy link
Owner

MilesCranmer commented Oct 24, 2023

Also, one other thought – returning Inf might be too harsh. So what you could do is return L(10000) if tree.degree != 2, but then only L(1000) (i.e., 10x lower) if tree.l.degree != 2. And L(100) for the feature violation. So at least then you are telling the genetic algorithm to go in the right direction (otherwise it might never create a tree with a binary node -> binary node, and just get stuck)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants