[Feature]: Fitting Multiple Datasets with a single expression #316

AnasAbdelR · 2024-05-06T18:43:19Z

Feature Request

One way to prevent over-fitting when trying to find an equation for a particular trace of data is to provide multiple traces for a single equation as examples of the kind of results the equation produces. Is this something already implemented, and if not what would it take to get there?

MilesCranmer · 2024-05-06T20:24:13Z

Great question. This is something I’m really eager to have. Some help would be much appreciated though.

The ongoing effort: the PR SymbolicML/DynamicExpressions.jl#73 adds some necessary utilities to get this, which would let you be much more flexible in terms of how you define an expression — such as constraining functional forms, or such as learning parametric functions with per-‘category’ parameters.

If you are interested in helping, the next step would be to modify SymbolicRegression to use AbstractExpression added in that PR, rather than the current behavior which uses AbstractExpressionNode (a less flexible type). This should be possible to work on right away because the AbstractExpression interface has matured — I’m just adding more tests at the moment.

Another option, if you don’t wish to do it directly with SymbolicRegression.jl, is to do what we did in https://arxiv.org/abs/2202.02306. We learn a single expression (the force law) while also learning per-planet mass parameters. It’s easier to do this with deep learning; you essentially have the per-system parameters be trainable, and simultaneously fit an MLP. Then, finally, use the method in https://arxiv.org/abs/2006.11287 (basically, fit the inputs and outputs of the MLP with PySR) to get the actual parametric form of the equation.

There are some issues from this compared to a regular genetic algorithm, so I think it would be nice to have a proper implementation directly with SR.jl.

sathvikbhagavan · 2024-05-20T12:28:05Z

Hi @MilesCranmer, I wanted to look into it if thats alright and had a few questions:

How does the algorithm work with parametric equations? Iiuc, we can pass in some parametric expression with the values of parameters for each dataset we have with SymbolicML/DynamicExpressions.jl#73. How does the search algorithm work in that case? Looking at the test for parametric expression in https://github.com/SymbolicML/DynamicExpressions.jl/pull/73/files#diff-22d700493bea715bfef1d81576940fb67942b05bbac15c06b86bf549d6af3407R295, does the expression passed parse_expression it like a starting point for other expressions which are generated and explored?

It would be very helpful on sketching a concrete example on how parametric equations would be potentially used with SymbolicRegression.jl (for my understanding).

It would be great if you could give some pointers on what things to change, specifically where to change AbstractExpressionNode to AbstractExpression as iiuc, for evaluation, it would still be converted to a normal Node like in https://github.com/SymbolicML/DynamicExpressions.jl/pull/73/files#diff-22d700493bea715bfef1d81576940fb67942b05bbac15c06b86bf549d6af3407R219. I can look into it and make a PR. (I am still getting familiarized with the codebase 😅 )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Fitting Multiple Datasets with a single expression #316

[Feature]: Fitting Multiple Datasets with a single expression #316

AnasAbdelR commented May 6, 2024

MilesCranmer commented May 6, 2024

sathvikbhagavan commented May 20, 2024

[Feature]: Fitting Multiple Datasets with a single expression #316

[Feature]: Fitting Multiple Datasets with a single expression #316

Comments

AnasAbdelR commented May 6, 2024

Feature Request

MilesCranmer commented May 6, 2024

sathvikbhagavan commented May 20, 2024