Help with custom loss function #613

gm89uk · 2024-04-30T13:38:05Z

gm89uk
Apr 30, 2024

Summary:
I am trying to create a loss function that incorporates multiple variables that are not used in X for Pysr. I'm not sure if this is possible. Essentially I am trying to achieve multi-objective optimization through a custom loss function.

Background:
I have several Gaussian optics equations with multiple variables to choose an intraocular lens for surgery.
I have a training database with measured truths AFTER surgery:

Actual_P: lens inserted
Actual_S: resulting refraction following surgery
Actual_P and Actual_S have a complex relationship that is based on Gaussian optics, and relies on the final lens position, e

I also have variables that can be measured BEFORE surgery

a,k,d,L,w,o and g, these will be used to predict e.

Back_calculated_e is an variable that can be back calculated based on gaussian optics:

Back_calculated_e = f1(a,k,Actual_P,Actual_S), this is used for training data (y)

Pysr is used to Predict e based on pre-operative variables:

Predicted_e = f2(a,k,d,L,w,o) with y This is the Pysr equation

Following this we can verify how good our Predicted_E is by using it to compare:

Predicted_P to Actual_P, Predicted P=f3(a,k,Actual_S,Predicted_e)
Predicted_S to Actual_S, Predicted S=f4(a,k,Actual_P,Predicted_e)

Why not just use Back_Calculated_e as y and use MSE as a loss function? This is what I'm currently doing with Back_Calculated e as the y, with a,k,d,L,w,o as X.

However, as there are measurement errors of baseline variables (a in particular), this makes Back_Calculated_e unreliable as a increases, and so it is much better to rely on absolute truths for loss function, e.g. an average of MSE between [Predicted_P and Actual_P], and [Predicted_S and Actual_S].

How can I feed in Actual_S and Actual_P to a custom loss function although they can are not included in the x variables?
Actual_S and Actual_P are just two arrays that I can read from an Excelsheet.

Thank you very much in advance, or if anyone has any ideas of how this could be achieved?
Failing this:

Is it possible to feed in Actual_P (more important than Actual_S) as a weight then make that the custom loss from Predicted_e?
I will run a separate Pysr model that does not use a to predict e, when a is large. However, it seems to be a suboptimal solution.

Gaussian Optics equations if of interest:
v is a constant (12).

Equation to calculate P from e, based on a chosen S:
Predicted_P=(1336 / (a - e)) - (1336 / (1336 / ((1000 / (1000 / s - v)) + k) - e)) (function 3)

Equation to calculate S from E, based on a chosen P:
Predicted_S = 1000*(aekp - 1336ak - 1336ap - e^2kp + 1336ep + 1784896)/(aekpv - 1000aep - 1336akv - 1336apv + 1336000a - e^2kpv + 1000e^2p + 1336epv + 1784896v) (function 4)

Cannot get this to work...

function custom_loss(tree, data)
  a, k, d, L, w, o, g = extract_variables(tree) #how to extract these variables?

  # Extract actual P and S from the data structure
  actual_p = ?#how to feed through
  actual_s = ?#how to feed through

  # Predicted e from the symbolic expression (tree)
  predicted_e = evaluate(tree, [a, k, d, L, w, o, g])

  # Calculate predicted P and S using preexisting functions
  predicted_p = calculate_predicted_p(predicted_e, actual_s)
  predicted_s = calculate_predicted_s(predicted_e, actual_p)

  # Combine errors using weighted MSE
  weight_p = 0.5  # Adjust weight for P prediction
  weight_s = 0.5  # Adjust weight for S prediction
  return weight_p * mean((predicted_p - actual_p) .^ 2) + weight_s * mean((predicted_s - actual_s) .^ 2)
end

Current Code:

import numpy as np
from pysr import PySRRegressor
import pandas as pd
from pandas import ExcelFile
df = pd.read_excel("path.xlsx", sheet_name="DataTrain")
x = df[['a', 'k','d', 'L', 'w','g','o']].to_numpy() #'AVAL',
y = df[['e']].to_numpy()

model = PySRRegressor(
    elementwise_loss = "L1DistLoss()",
    model_selection="accuracy",  
    niterations=1000000, #1000000
    #ncycles_per_iteration=1500,
    binary_operators=["+", "*", "-", "/"],
    unary_operators=[
        "cos",
       "tan",
        "exp",
        "sin",
        "sqrt",
        "inv",
        "square",
	"log"
    ],
    maxsize=100,
    warm_start=True,
    populations=18,
    population_size=300,
    fraction_replaced_hof = 0.1,
    parsimony = 0.01,    
    bumper=True,
    nested_constraints={
        "sin": {"sin": 0, "cos": 0, "tan": 0}, 
        "cos": {"sin": 0, "cos": 0, "tan": 0},
        "tan": {"sin": 0, "cos": 0, "tan": 0},
        "exp": {"exp": 0, "log": 1},
	"log": {"exp": 1, "log": 0},
        "square": {"square": 2, "sqrt": 4},
        "sqrt": {"square": 4, "sqrt": 2},
    }
)
model.fit(x, y,variable_names=["a","k","d","L","w","g","o"])

MilesCranmer · 2024-05-02T17:08:25Z

MilesCranmer
May 2, 2024
Maintainer

Happy to help. I am a bit confused about one thing in your question:

Predicted_P to Actual_P, Predicted P=f(a,k,Actual_S,Predicted_e)
Predicted_S to Actual_S, Predicted S=f(a,k,Actual_P,Predicted_e)

Does this mean you want to find the same f for both, but just with the variable (S or P) swapped? If not, could you rewrite your question with a different symbol for functions that should be different? Thanks!

0 replies

MilesCranmer · 2024-05-02T17:17:20Z

MilesCranmer
May 2, 2024
Maintainer

How can I feed in Actual_S and Actual_P to a custom loss function although they can are not included in the x variables?

What I normally do here is just feed in the variables as additional columns of X, but then zero them out within the custom loss function. For example:

function my_loss_function(tree, dataset::Dataset{T,L}, options)::L where {T,L}
    X = copy(dataset.X)
    y = copy(dataset.y)  # Don't need to copy if you aren't modifying; but just a safer habit

    # Ordering (depends how you pass to .fit)
    # 'a' - 1; 'k' - 2; 'd' - 3; 'L' - 4; 'w' - 5; 'g' - 6;'o' - 7.
    # Thus, say that 'P' is 8 and 'S' is 9
    X_without_P_and_S = vcat(
        X[1:7, :],  # The actual data
        X[8:9, :] .* 0,  # Pass zeroed version
    )
    # Need to pass the the full shape,
    # as the genetic algorithm will still sometimes use features 8 and 9!
    # Thus, we simply hide that information from it.

    prediction, complete = eval_tree_array(tree, X_without_P_and_S, options)
    if !complete
        return L(Inf)
    end

    mse = sum(i -> (prediction[i] - y[i])^2, eachindex(y)) / length(y)

    # Do something with X[8, :] and X[9, :] ? 

    return loss
end

Hopefully this helps you get started! And pass this entire thing as a string to the loss_function parameter.

3 replies

gm89uk May 3, 2024
Author

Thank you so much @MilesCranmer, I'm away for a week but looking forward to trying it out, and will fix my notation then. I will let you know how I get on.

gm89uk May 20, 2024
Author

Thank you very much @MilesCranmer

I have managed to get the custom loss function to work.

x = df[['Axiallength', 'MeanK','ACDepth', 'LensThickness', 'WTWCornealDiameter','Gender','Age', 'AK1336','AK','IOLPower','BestSphEquivPostOp']].to_numpy() #'AVAL',
y = df[['ELP']].to_numpy()
elementwise_loss = """
function my_loss_function(tree, dataset::Dataset{T,L}, options)::L where {T,L}
    X = copy(dataset.X)
    y = copy(dataset.y)  # Don't need to copy if you aren't modifying; but just a safer habit
    # Ordering (depends how you pass to .fit)
    # 'a' - 1; 'k' - 2; 'd' - 3; 'L' - 4; 'w' - 5; 'g' - 6;'o' - 7, 'akmin' - 8, 'ak' - 9, 'p' - 10, 's' - 11.
    # Thus, say that 'P' is 8
    X_without_P_S = vcat(
    X[1:9, :],# Keep the first 9 rows unchanged
    X[10:11,:] .* 0 #if more than one column to set to 0    #reshape(X[10, :], 1, :) .* 0 # Set all columns in the 10th row to 0 if only 10th column
    )
    # Need to pass the the full shape,
    # as the genetic algorithm will still sometimes use features 8!
    # Thus, we simply hide that information from it.
    prediction, complete = eval_tree_array(tree, X_without_P_S, options)
    if !complete
        return L(Inf)
    end
    #Predicted_P_Function = (1336 / (a - e)) - (1336 / (1336 / ((1000 / (1000 / s - v)) + k) - e))
    # Initialize Predicted_P and Actual_P arrays
    Predicted_P = zeros(length(y))
    Predicted_S = zeros(length(y))
    Actual_P = zeros(length(y))
    Actual_S = zeros(length(y))
    v = 12
    for i in eachindex(y)
        Actual_P[i] = X[10,i]
        Actual_S[i] = X[11,i]
        Predicted_P[i] = (1336 / (X[1,i] - prediction[i])) - (1336 / (1336 / ((1000 / (1000 / Actual_S[i] - v)) + X[2,i]) - prediction[i]))
        Predicted_S[i] = 1000*(X[1,i]*prediction[i]*X[2,i]*Actual_P[i] - 1336*X[1,i]*X[2,i] - 1336*X[1,i]*Actual_P[i] - prediction[i]^2*X[2,i]*Actual_P[i] + 1336*prediction[i]*Actual_P[i] + 1784896)/(X[1,i]*prediction[i]*X[2,i]*Actual_P[i]*v - 1000*X[1,i]*prediction[i]*Actual_P[i] - 1336*X[1,i]*X[2,i]*v - 1336*X[1,i]*Actual_P[i]*v + 1336000*X[1,i] - prediction[i]^2*X[2,i]*Actual_P[i]*v + 1000*prediction[i]^2*Actual_P[i] + 1336*prediction[i]*Actual_P[i]*v + 1784896*v)
        if isnan(Predicted_P[i]) || isnan(Predicted_S[i])
            return L(Inf)
        end

    end
    mse = (sum((Predicted_P .- Actual_P) .^ 2) + sum((Predicted_S .- Actual_S) .^ 2)) / (2*length(y))
    return mse
end
"""

This seems to work reasonably well, although slower (as expected) than the inbuilt loss functions. A few points:

Do you have any tips that could speed up its efficiency? e.g. it doesn't work if I set batching to true.
Occasionally p & s find their way to the hall of fame (although they are set to 0). Although it's usually temporarily until the algorithm realises they unnecessarily add to the complexity. Is it possible to add penalty to equations that include p (X[10]) and s (X[11])?
I added a isnan after Predicted_P and Predicted_S as a NaN occasionally came through as a loss which seems to work well.

Thank you again!

p.s. features ak and akmin are just feature engineering of a and k. I found including them, helps the loss function reduce much quicker.

gm89uk May 21, 2024
Author

I have been working to optimise my custom loss function and switched to using vectorised calculations which has increased the speed by a x25-30 times. It is now taking 2 s/it rather than 57s. I used inline functions, defined const and also added multithreading (Is this already handled by default by the Julia backend)?

Others writing their custom loss function may find this useful.

If you have any further optimisation advice let me know @MilesCranmer , thank you.

Updated code:

elementwise_loss = """
using Base.Threads

const V = 12  # Define V as a constant

# Define functions for computations
@inline function compute_predicted_p(x1, x2, actual_s, prediction)
    1336 / (x1 - prediction) - 1336 / (1336 / ((1000 / (1000 / actual_s - V)) + x2) - prediction)
end

@inline function compute_predicted_s(x1, prediction, x2, actual_p)
    num = 1000 * (x1 * prediction * x2 * actual_p - 1336 * x1 * x2 - 1336 * x1 * actual_p - prediction^2 * x2 * actual_p + 1336 * prediction * actual_p + 1784896)
    denom = x1 * prediction * x2 * actual_p * V - 1000 * x1 * prediction * actual_p - 1336 * x1 * x2 * V - 1336 * x1 * actual_p * V + 1336000 * x1 - prediction^2 * x2 * actual_p * V + 1000 * prediction^2 * actual_p + 1336 * prediction * actual_p * V + 1784896 * V
    num / denom
end

function my_loss_function(tree, dataset::Dataset{T,L}, options)::L where {T,L}
    X = copy(dataset.X) 
    y = dataset.y
    
    # Store Actual_P and Actual_S before modifying X
    Actual_P = X[10, :]
    Actual_S = X[11, :]
    
    # Modify X to hide the columns as required
    X[10:11, :] .= 0
    
    # Evaluate the tree on the modified X
    prediction, complete = eval_tree_array(tree, X, options)
    if !complete
        return L(Inf)
    end

    # Allocate arrays for Predicted_P and Predicted_S
    Predicted_P = Vector{Float64}(undef, length(y))
    Predicted_S = Vector{Float64}(undef, length(y))

    # Multithreaded computation for Predicted_P and Predicted_S
    @threads for i in eachindex(y)
        Predicted_P[i] = compute_predicted_p(X[1, i], X[2, i], Actual_S[i], prediction[i])
        Predicted_S[i] = compute_predicted_s(X[1, i], prediction[i], X[2, i], Actual_P[i])
        if isnan(Predicted_P[i]) || isnan(Predicted_S[i])
            return L(Inf)
        end
    end
    
    # Calculate MSE in a vectorized manner
    mse = sum((Predicted_P .- Actual_P) .^ 2 + (Predicted_S .- Actual_S) .^ 2) / (2 * length(y)) 
    return mse
end
"""

I am a bit stuck between two pieces of code
This:

    # Vectorized computation for Predicted_P and Predicted_S
    Predicted_P = compute_predicted_p.(X[1, :], X[2, :], Actual_S, prediction)
    Predicted_S = compute_predicted_s.(X[1, :], prediction, X[2, :], Actual_P)

vs.

@threads for i in eachindex(y)
        Predicted_P[i] = compute_predicted_p(X[1, i], X[2, i], Actual_S[i], prediction[i])
        Predicted_S[i] = compute_predicted_s(X[1, i], prediction[i], X[2, i], Actual_P[i])

Is the first one multithreaded or should the second one run faster?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help with custom loss function #613

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Help with custom loss function #613

gm89uk Apr 30, 2024

Replies: 2 comments · 3 replies

MilesCranmer May 2, 2024 Maintainer

MilesCranmer May 2, 2024 Maintainer

gm89uk May 3, 2024 Author

gm89uk May 20, 2024 Author

gm89uk May 21, 2024 Author

gm89uk
Apr 30, 2024

Replies: 2 comments 3 replies

MilesCranmer
May 2, 2024
Maintainer

MilesCranmer
May 2, 2024
Maintainer

gm89uk May 3, 2024
Author

gm89uk May 20, 2024
Author

gm89uk May 21, 2024
Author