Skip to content

SMEP: Empirical Likelihood (GSOC)

josef-pkt edited this page Apr 28, 2013 · 1 revision

SMEP: Empirical Likelihood (GSOC)

Justin Grana (Skipper Seabold) GSOC 2012

Status : merged, needs more review and advertising

Objective

Implement empirical likelihood estimation in statsmodels.

Abstract

In 1990, Art Owen published “Empirical Likelihood Ratio Confidence Regions” in The Annals of Statistics, which ignited a fury of research exploring the techniques and possibilities of empirical likelihood estimation. In 2009, statsmodels was released as its own package for statistical computing in the Python language and has subsequently grown to include among others, linear, nonlinear and time series regression models. In a way, empirical likelihood estimation and statsmodels are similar; they are both relatively new in their respective fields but are packed with unexploited opportunities that can benefit researchers, financial analysts and policymakers alike.

Statsmodels aim is to provide an open source alternative to statistical computing programs such as MATLAB, SAS, STATA and GAUSS. Over the past three summers, statsmodels received significant contributions from GSoC participants that pushed toward making statsmodels a one-stop-shop for statistical computing. To continue this push, my goal for GSoC is two-fold. Primarily, I target to implement empirical likelihood estimation techniques in statsmodels. Empirical likelihood is a nonparametric method that gives the observed data a louder voice than in many parametric models. It is especially useful when the underlying distribution of data is unknown or the researcher does not want to specify a distribution of the noise in the data. The applications range from analyzing survival times after a treatment to computing variability in stock market returns. Secondly, since empirical likelihood estimation relies on the use of estimating equations, I hope to assist in creating a class for estimating equations that can be used for future develop of other models such as the generalized method of moments (GMM). In that sense, my goals maintain some flexibility as to the weight that is put on implementing empirical likelihood estimation techniques and developing an estimating equations class.

Timeline

In order to remain organized and ensure that I am on track, I find that setting short term goals are always more helpful than setting long term goals. That said, I have broken my pre-GSoC, midterm and final goals down by the week.

Prior to Start of Program

  • Become acquainted with the style and coding conventions of statsmodels
  • Investigate the theory and computation of the optimization algorithms in SciPy
  • Learn the EM algorithm that will be used for estimation with censored data
  • Familiarize myself with git and other tools/standards for developers

Prior to Midterm Evaluation

May 21-June 4 (2 weeks):

  • Begin by computing empirical likelihood (EL) ratio for uncensored, multivariate means.
  • Use likelihood ratio to code confidence intervals and conduct hypothesis tests.
  • Add an option for Bartlett correction and location adjustment for confidence intervals.

June 4-June 18 (2 weeks)

  • Compute confidence intervals and conducts hypothesis tests for variance, skewness, kurtosis, covariance and correlation. This involves writing efficient code to estimate with nuisance parameters.

June 18-July 9 (3 weeks)

  • Empirical likelihood and linear models
  • Estimation of confidence intervals for regression coefficients and parameters of the linear model

    Add an option to force the regression through the origin ANOVA

July 9-July 16 (1 week, Midterm Evaluation)

  • Tie up any loose ends from first half of the project including documentation and test cases
  • Make any necessary revisions to 2nd half schedule based on feasibility and modifications to the scope of the projects.

Prior to Final Evaluation

July 16-July 29 (2 weeks)

  • Empirical Likelihood in estimating parameters and regression coefficients with censored data

July 30-August 12 (2 weeks)

  • Empirical Likelihood model selection criterion
  • This includes EL analogs of BIC, AIC, Hannan Quinn IC

August 12- August 20 (1 week)

  • Clean up documentation and finish examples
  • Tie up any loose ends in the estimating equations framework

Extra Opportunities

If time remains, other possible implementations include

  • Empirical likelihood goodness-of-fit measures
  • Empirical likelihood in nonlinear regression
  • Empirical likelihood and instrumental variable regression