SMEP D: Blacklisting Results

Blacklisting Results

How much consumer protection do we need?

I'm still trying to figure out what Stata is doing (and it's a lot easier than finding out what is available in R)

from xtgls postestimation help :

(1) AIC and BIC are available only if igls and corr(independent) were specified at estimation.
(2) Likelihood-ratio tests are available only if igls and corr(independent) were specified at
    estimation.

The problem here is that some methods are not even asymptotically equivalent to maximum likelihood estimation. In these cases Stata does not provide those results. (xtgls is a function that allows for various options and model assumptions.) The likelihood ratio test is theoretically not appropriate in this case.

another example, after calling stcox, Stata has this :

e(marginsnotok) : "CSNell DEViance DFBeta ESR LDisplace LMax MGale SCAledsch SCHoenfel.."

margins_not_ok sounds like prohibiting some results, although I haven't looked at it yet.

The same problem shows up in statsmodels, when we use a generic class or inherit generic results in a subclass. It does not show up in "single-purpose" classes, where we have the results specifically targeted for the model.

As example, I'm looking at linear models, that estimate a covariance structure (similar to xtgls or others in Stata). The estimation results returns a linear RegressionResults instance, which might include, depending on the estimation details/options, results that are theoretically not appropriate (theoretically not justified or theoretically incorrect).

Question

Should we try to follow Stata's example and try to prohibit or warn users from using theoretically incorrect results?

Currently we just have sometimes a warning in the docs: "this inherits RegressionResults and not all results might be appropriate" or something like this.

It will be quite a bit of work to actually check the theory for all inherited results, but we could set up the infrastructure for this.

Possible Implementation

Models define a blacklist, the result instances check the blacklist and raise a warning or exception if the method or attribute is blacklisted. Completely deleting the method or attribute might work in some cases, but is more difficult to keep track of.

similar to the z-values versus t-values discussion that we had several times: for FGLS, Stata chooses one or the other depending on the model details https://github.com/statsmodels/statsmodels/issues/285

approximately: GLS with known weights or covariance has t-values, with estimated covariance matrix has z-values

The main location where we can present "opinionated" results to users right now is summary() which should be expanded more. But for attributes/methods conditional return, warnings and exceptions are the only way.

aside: Misspecification

The issue here is different from users requesting inappropriate results because their model is misspecified. If a user uses OLS standard errors, when there are autocorrelated residuals, then it's the problem of the user that the results are incorrect. All we can do is provide specification tests and correct methods. If a user insists on fitting the wrong model, then no statistical package can force the user not to.

In the case, above we have results that are theoretically inconsistent or unjustified with the model that the user requested.

"All models are wrong, but are they good enough."

Pages

Home

Provide feedback

Saved searches