Multidimensional equations #262

cmhamel · 2023-10-23T21:45:19Z

Is there a way to optimize a multi dimensional symbolic equation?

From I can tell from the documentation and toying with the package, each output is given its own symbolic form. Is this correct?

MilesCranmer · 2023-10-23T21:59:45Z

Not currently. However, DynamicExpressions.jl which forms the expression backend, can indeed handle this: https://github.com/SymbolicML/DynamicExpressions.jl/#tensors. So just need to find some time to try turning it on and fixing up various type assumptions.

Alternatively you can implement this manually via a custom loss objective, where the objective splits a single expression into each component of the vector output. See https://astroautomata.com/PySR/examples/#9-custom-objectives for an example (That shows the Python API but its the same on the Julia side. Just convert full_objective -> loss_function and pass a function rather than a string)

cmhamel · 2023-10-24T12:04:27Z

Thanks @MilesCranmer !

The custom loss is probably what I need. To make sure I understand, let's say I have a 3d equation I'm trying to fit. Would I just need to ensure it's a binary tree, split the root, and then say split the left node of the root again to fill out three expressions?

MilesCranmer · 2023-10-24T12:44:14Z

Yeah, exactly!!

Another tricky part comes from the fact that Dataset.y is a 1D vector. Thus, you could put the 1st element into y, and the 2nd and 3rd element into the last columns of X. Then, in your loss function, you could have a check that those features never show up in the expression, like this:

function my_custom_objective(tree, dataset::Dataset{T,L}, options) where {T,L}
    # Return infinite loss for any violated assumptions:
    tree.degree != 2 && return L(Inf)
    tree.l.degree != 2 && return L(Inf)
    
    # say, 2nd element is feature 6 in X, and 3rd is feature 7.
    # This function checks if a given node is equal to those feature nodes:
    is_feature_6_or_7(node) = node.degree == 0 && !node.constant && (node.feature == 6 || node.feature == 7)

    # We iterate through all the nodes in the tree. If any match, we return infinite loss:
    any(is_feature_6_or_7, tree) && return L(Inf)


    y1 = dataset.y
    y2 = dataset.X[6, :]
    y3 = dataset.X[7, :]
    # [rest of loss function]
end

Also note that you will have to manually extract the subexpressions at the very end (since the printing does not know about your scheme).

MilesCranmer · 2023-10-24T12:46:52Z

Also, one other thought – returning Inf might be too harsh. So what you could do is return L(10000) if tree.degree != 2, but then only L(1000) (i.e., 10x lower) if tree.l.degree != 2. And L(100) for the feature violation. So at least then you are telling the genetic algorithm to go in the right direction (otherwise it might never create a tree with a binary node -> binary node, and just get stuck)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multidimensional equations #262

Multidimensional equations #262

cmhamel commented Oct 23, 2023

MilesCranmer commented Oct 23, 2023

cmhamel commented Oct 24, 2023

MilesCranmer commented Oct 24, 2023

MilesCranmer commented Oct 24, 2023 •

edited

Multidimensional equations #262

Multidimensional equations #262

Comments

cmhamel commented Oct 23, 2023

MilesCranmer commented Oct 23, 2023

cmhamel commented Oct 24, 2023

MilesCranmer commented Oct 24, 2023

MilesCranmer commented Oct 24, 2023 • edited

MilesCranmer commented Oct 24, 2023 •

edited