Skip to content
alexandreyc edited this page Jun 6, 2012 · 6 revisions

R: Notes

Some collected notes for using R.

Warning I don't know much R. So this might not be the best way to do it

The SUR example and the new helper functions are currently in my branch

Skipper's original file is in tools/R2nparray

The files can be sourced in R to make the helper functions available (for example with Windows path separators) :

source("E:\\path_to_repo\\tools\\R2nparray\\R\\R2nparray.R")
source("E:\\path_to_repo\\tools\\topy.R")

Introspection

Assuming we already made a call to systemfit and assigned the results to SUR :

> names(SUR)
 [1] "eq"           "call"         "coefficients" "coefCov"      "residCovEst"  "residCov"     "method"       "rank"        
 [9] "df.residual"  "iter"         "control"      "panelLike"   
> attributes(SUR)
$names
 [1] "eq"           "call"         "coefficients" "coefCov"      "residCovEst"  "residCov"     "method"       "rank"        
 [9] "df.residual"  "iter"         "control"      "panelLike"   

$class(SUR)
[1] "systemfit"

> cc = SUR$coefCov
> is.numeric(cc)
[1] TRUE
> class(cc)
[1] "matrix"
> is.matrix(cc)
[1] TRUE
> class(SUR$eq)
[1] "list"

Looping over names - mkarray

A for loop that prints out all numeric attributes as python code that creates numpy arrays.
  • SUR[ [name]] or get(SUR, name) accesses the names attributes (?) of the object SUR. (I'm adding extra space between [ [ to avoid the Wiki to convert it to a link. It needs to be without space to be valid R code.)
  • mkarray is one of our helper functions in tools to print the data as np.array
  • it's a oneliner so it was easier to work with in the R shell
> for (name in names(SUR)) {if (is.numeric(SUR[ [name]])) {mkarray(SUR[ [name]], name)}}; cat("\n")
coefficients = np.array([0.9979991848420328,0.06886083327936214,...,0.0429020916196108])

coefCov = np.array([157.3943509170185,-0.2165142902938106,...,0.002035467551712387]).reshape(15,15, order='F')

residCovEst = np.array([176.3202565715889,-25.14782439226425,...,104.3078782568039]).reshape(5,5, order='F')

residCov = np.array([180.2786473970981,3.703259980763286,...,111.6549965340746]).reshape(5,5, order='F')

rank = np.array([15])

df.residual = np.array([85])

iter = np.array([1])

Create named list and save to python module - R2nparray

> aa = list(covparams=SUR$coefCov, rank=SUR$rank)
> R2nparray(aa, fname="temp3.py")    

The content of temp3.py module is then :

------------ temp3.py ----------
import numpy as np

covparams = np.array([157.3943509170185,-0.2165142902938106,...,0.002035467551712387]).reshape(15,15, order='F')

rank = np.array([15])
--------------------------------

Saving a dataframe

f is a data frame with fitted values from the SUR` model :: > class(f) [1] "data.frame" > f Chrysler General.Electric General.Motors US.Steel Westinghouse X1935 32.98546930516650 34.82254735597956 208.2453286635445 247.5131792455174 12.27690563625844 X1936 61.83516118316266 66.98918588257341 420.2793547553419 300.2827737683187 30.52156144761057 ... Calling another helper function, writes the data series of the data frame into a python module :: > R2nparray(f, fname="temp4.py") ------------ temp4.py ---------- import numpy as np Chrysler = np.array([32.9854693051665,...,177.371048256085]) General_Electric = np.array([34.82254735597956,...,195.5150518056073]) General_Motors = np.array([208.2453286635445,...,1364.599470457204]) US_Steel = np.array([247.5131792455174,3...,566.277048536767]) Westinghouse = np.array([12.27690563625844,...,77.5688631853628]) -------------------------------- We can also combine these two, named listaaand data framefand save them at the same time :: R2nparray(c(aa, f), fname="temp5.py") The resulting python module contains the merged content :: >>> import temp5 >>> dir(temp5) ['Chrysler', 'General_Electric', 'General_Motors', 'US_Steel', 'Westinghouse', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 'covparams', 'np', 'rank'] >>> temp5.covparams.shape (15, 15) Save all -cat_items------------------------ a new version that saves everything that is not blacklisted, but currently mainly numerical types are useful. (TODO:not committed to statsmodels/tools yet, and no name cleaning):: > cat_items(SUR, prefix="sur.", blacklist=c("eq", "control")) sur.call = '''systemfit(formula = formula, method = "SUR", data = panel)''' sur.coefficients = np.array([0.9979991848420328,...,0.0429020916196108]).reshape(15,1, order='F') sur.coefCov = np.array([157.3943509170185,...,0.002035467551712387]).reshape(15,15, order='F') sur.residCovEst = np.array([176.3202565715889,...,104.3078782568039]).reshape(5,5, order='F') sur.residCov = np.array([180.2786473970981,...,111.6549965340746]).reshape(5,5, order='F') sur.method = SUR sur.rank = 15 sur.df.residual = 85 sur.iter = 1 sur.panelLike = '''TRUE''' Redirecting output to file -sink------------------------------------- Our helper functions usecatto write the output.catprint the strings to the standard output. The output can be redirected to a file usingsink, for example :: fname = "tmp_sur.py" append = TRUE sink(file=fname, append=append) mkarray(SUR$coefficients, "params") mkarray(SUR$coefCov, "cov_params") mkarray(SUR$residCovEst, "resid_cov_est") mkarray(SUR$residCov, "resid_cov") mkarray(SUR$df.residual, "df_resid") sink()sink()clears the redirecting of the output. When there is an exception in the code, thensink()is not called and the interpreter shell doesn't print any output anymore. Typingsink()`` once or several times will bring the standard output back to the shell.

Clone this wiki locally