Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Composite regressors #514

Open
eelregit opened this issue Jan 3, 2024 · 0 comments
Open

[Feature]: Composite regressors #514

eelregit opened this issue Jan 3, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@eelregit
Copy link

eelregit commented Jan 3, 2024

Feature Request

A common pattern in applications of symbolic regression is that there exists some "bottleneck latent" parameters that can describe the data in fewer dimensions than that of the input features. For example, some data can have an underlying universal "shape", in the simplest case a univariate function of some rescaled variable (e.g. time for transition data, or radius for universal radial profiles). In more complicated cases, the universal template can depend on a location parameter, a scale parameter, and very few shape parameters. And these latent paraments can then be expressed as functions of other input features.

Sometimes we know such patterns from the domain knowledge. Otherwise we can also find such patterns by looking at the data with some transformation. I think it'd be a very useful inductive bias to build the intermediate latent steps into PySR. Assuming a pattern, this can help search for more complicated equations, and of course also with interpretability, which a deep learning bottleneck lacks.

The example page already has one example assuming a final division operator, predicting $P(\mathbf{x})/Q(\mathbf{x})$. However, more complicated cases like $f(\mu(x_2, x_3, \cdots) + x_1 \cdot \sigma(x_3, x_4, \cdots))$ need some hacking to work. Namely, splitting into 3 trees ($f$, $\mu$, and $\sigma$) and zero-padding every input to have the same number of features as the whole dataset. It would be nice to have a better solution for this. Each of the tree should have its features somehow specified, including the latent variables and possibly some subsets of the input features too. I wonder if a composite regressor, taking a list of PySRRegressor's and a custom model can be implemented?

Thank you, and happy New Year!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant