Let Multivariable Regression only fit 1 variable of a "family" #48

JrtPec · 2018-05-29T12:38:22Z

I tried something extreme, and the results were too: I generated weather data with solar orientations, tilts, wind directions, ... in total about 1600 variables which resulted in this formula:

Value ~ HDD_13 + GlobalIrradianceO270T90 + HDD_3 + windComponentSquared180 + GlobalIrradianceO265T80 + precipIntensity + windComponent95 + GlobalIrradianceO265T75 + CDD_22 + GlobalIrradianceO275T20 + GlobalIrradianceO260T50 + GlobalIrradianceO40T60 + windComponentCubed145 + GlobalIrradianceO0T0 + GlobalIrradianceO35T90 + GlobalIrradianceO100T55 + GlobalIrradianceO0T85

And got a miraculous RSquared of 1!

I could obviously fix it by reducing the number of variables. But what might also work is this: define certain "families" of variables (for instance, the heating degree days), and make sure the Analysis only uses 1 of them to make its model.
Could just be a list of lists, like

var_structure = [
    [HDD_10, HDD_11, ..., HDD_24],
    [CDD_10, ...],
    [GlobalIrradianceO0T0, GlobalIrradianceO10T10, ...],
    ...
]

@saroele thoughs?

saroele · 2018-05-29T19:32:05Z

Looks like you had a fun day :-)

This is exactly what @kdebrab mentioned yesterday: with lots of potential dependent variables, you will get a perfect model (R²=1).

Can you post the fit.summary() of the result? I want to have a look at model statistics.

The list-of-list approach to create groups of dependent variables should work, but could again lead to an overfitted model. So preferentially, I'd like to find a way to avoid overfitting in general, without imposing any limits on the combination of variables.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let Multivariable Regression only fit 1 variable of a "family" #48

Let Multivariable Regression only fit 1 variable of a "family" #48

JrtPec commented May 29, 2018

saroele commented May 29, 2018

Let Multivariable Regression only fit 1 variable of a "family" #48

Let Multivariable Regression only fit 1 variable of a "family" #48

Comments

JrtPec commented May 29, 2018

saroele commented May 29, 2018