Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Fitting Multiple Datasets with a single expression #316

Open
AnasAbdelR opened this issue May 6, 2024 · 2 comments
Open

[Feature]: Fitting Multiple Datasets with a single expression #316

AnasAbdelR opened this issue May 6, 2024 · 2 comments

Comments

@AnasAbdelR
Copy link

Feature Request

One way to prevent over-fitting when trying to find an equation for a particular trace of data is to provide multiple traces for a single equation as examples of the kind of results the equation produces. Is this something already implemented, and if not what would it take to get there?

@MilesCranmer
Copy link
Owner

Great question. This is something I’m really eager to have. Some help would be much appreciated though.

The ongoing effort: the PR SymbolicML/DynamicExpressions.jl#73 adds some necessary utilities to get this, which would let you be much more flexible in terms of how you define an expression — such as constraining functional forms, or such as learning parametric functions with per-‘category’ parameters.

If you are interested in helping, the next step would be to modify SymbolicRegression to use AbstractExpression added in that PR, rather than the current behavior which uses AbstractExpressionNode (a less flexible type). This should be possible to work on right away because the AbstractExpression interface has matured — I’m just adding more tests at the moment.


Another option, if you don’t wish to do it directly with SymbolicRegression.jl, is to do what we did in https://arxiv.org/abs/2202.02306. We learn a single expression (the force law) while also learning per-planet mass parameters. It’s easier to do this with deep learning; you essentially have the per-system parameters be trainable, and simultaneously fit an MLP. Then, finally, use the method in https://arxiv.org/abs/2006.11287 (basically, fit the inputs and outputs of the MLP with PySR) to get the actual parametric form of the equation.

There are some issues from this compared to a regular genetic algorithm, so I think it would be nice to have a proper implementation directly with SR.jl.

@sathvikbhagavan
Copy link

Hi @MilesCranmer, I wanted to look into it if thats alright and had a few questions:

How does the algorithm work with parametric equations? Iiuc, we can pass in some parametric expression with the values of parameters for each dataset we have with SymbolicML/DynamicExpressions.jl#73. How does the search algorithm work in that case? Looking at the test for parametric expression in https://github.com/SymbolicML/DynamicExpressions.jl/pull/73/files#diff-22d700493bea715bfef1d81576940fb67942b05bbac15c06b86bf549d6af3407R295, does the expression passed parse_expression it like a starting point for other expressions which are generated and explored?

It would be very helpful on sketching a concrete example on how parametric equations would be potentially used with SymbolicRegression.jl (for my understanding).

It would be great if you could give some pointers on what things to change, specifically where to change AbstractExpressionNode to AbstractExpression as iiuc, for evaluation, it would still be converted to a normal Node like in https://github.com/SymbolicML/DynamicExpressions.jl/pull/73/files#diff-22d700493bea715bfef1d81576940fb67942b05bbac15c06b86bf549d6af3407R219. I can look into it and make a PR. (I am still getting familiarized with the codebase 😅 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants