Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pareto front #33

Open
ianhbell opened this issue May 2, 2017 · 4 comments
Open

Pareto front #33

ianhbell opened this issue May 2, 2017 · 4 comments
Milestone

Comments

@ianhbell
Copy link

ianhbell commented May 2, 2017

Has there been any thought given to pareto front optimization? There's always a tradeoff between tree size and model fidelity, which I gather you handle with parsimony. But the other alternative is to keep any model that is non-dominated by the pareto front. I couldn't see any clear way of hacking that into gplearn.

@trevorstephens
Copy link
Owner

Sounds interesting @ianhbell ... Got a citation in mind?

@Ohjeah
Copy link

Ohjeah commented May 6, 2017

This should be a good point to start reading: https://www.iitk.ac.in/kangal/Deb_NSGA-II.pdf

@remiadon
Copy link

Hi, @ianhbell

Just for ciriosity, if I define a complexity measure (yielding the number of nodes in the tree representation of an expression), and use this complexity measure inside my custom fitness, a bit like so

from sklearn.metrics import r2_score
def my_custom_fitness(expr, X, y_true):
    y_pred = make_prediction(expr, X)
    return r2_score(y_pred, y_true) - (complexity(expr) / 1000)

Therefore:

  • for two expressions yielding the same r2_score, my_custom_fitness would favour the simplest one
  • for two expressions having the same complexity (i.e the first one is as simple as the second one), my_custom_fitness would the one that yields the best r2_score

Given these properties, the expression found at the end of fit would be on the pareto front (at least the one drawn considering all evaluated expressions)

Am I missing something ?

@remiadon
Copy link

Answering to myself with a reference

PARETO-FRONT EXPLOITATION IN SYMBOLIC
REGRESSION

Written at page 294 :

There is, however, a significant difference between using a Pareto front as a post-run
analysis tool vs. actively optimizing the Pareto front during a GP-run. In the
latter case the Pareto front becomes the objective that is being optimized instead
of the fitness (accuracy) of the “best” mode

So yes, I was missing something big

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants