You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Noise standard deviation/variance doesn't exist for logistic or poisson regression. Seems like part of it is being computed in coef_cov_quad_form, but not being used to calculate the noise_std; whereas, in the sample_sparse_lin_reg function uses this method to calculate the variance.
"""Mini example for training LASSO with information criterion under logistic regression"""fromtimeimporttimeimportpandasaspdimportnumpyasnpfromyaglm.GlmTunedimportGlmTrainMetricfromyaglm.config.penaltyimportLassofromyaglm.toy_dataimportsample_sparse_log_regfromyaglm.metrics.info_criteriaimportInfoCriteriafromyaglm.infer.InferencerimportInferencer# create a python package that supports the simulationsfromglm_sims.utilsimportsample_seedsfromglm_sims.metricsimportget_results_log_reg################ Sample data ################# sample separate train, validation and test set seeds# these sees are used to sample the different data setssampling_seeds=sample_seeds(n_seeds=3, random_state=3482)
# note if the true data distrubtion has a random component# e.g. if we randomly generate beta, then we will# need another seed that fixes the distrubtion to be the same# for the train, validation and test data# store high-level information about the simulationsim_start_time=time()
# keyword arguments pass to each sampling function that specify# the underlying distrubtiondata_dis_kws= {'beta_type': 23,
'beta_random_state': 68,
'n_features': 10,
'corr': 0.5}
X_train, y_train, model_info= \
sample_sparse_log_reg(n_samples=100,
random_state=sampling_seeds[0], # train seed**data_dis_kws
)
# pull out the true model datacoef_true=model_info['coef']
X_val, y_val, _= \
sample_sparse_log_reg(n_samples=100,
random_state=sampling_seeds[1], # val seed**data_dis_kws
)
X_test, y_test, _= \
sample_sparse_log_reg(n_samples=1000,
random_state=sampling_seeds[2], # test seed**data_dis_kws
)
################# Setup models ################## Append the validation data to the training data for model fittingX_train_val=np.append(X_train, X_val, axis=0)
y_train_val=np.append(y_train, y_val)
cv_kws= {'loss': 'log_reg',
'cv': 5}
est_kws= {'standardize': False, 'fit_intercept': False}
models= {}
models['lasso__tune=AIC'] =GlmTrainMetric(penalty=Lasso(),
scorer=InfoCriteria(crit='aic'),
inferencer=Inferencer(dof='support'),
**est_kws)
results= []
forname, modelinmodels.items():
print(name)
# fit modelstart_time=time()
model.fit(X_train_val, y_train_val)
pen_val=model.best_tune_params_['penalty__pen_val']
try:
mix_val=model.best_tune_params_['penalty__mix_val']
except:
mix_val=np.nanruntime=time() -start_time# sklearn saves the coefficient as ndarray of shape (1, n_features)# the get_results function assumes the coefficient is an ndarray of shape (n_features,)if ((name=='sklasso__tune=cv') | (name=='skridge__tune=cv')):
model.coef_=np.reshape(model.coef_, (10,))
# compute evaulation metrics# this outputs a dict where each key is the name of a metric# e.g. res['L1_to_truth'] = 1.2, res['test_error'] = ...res=get_results_log_reg(model,
X_train=X_train, y_train=y_train,
X_test=X_test, y_test=y_test,
coef_true=coef_true, intercept_true=0)
res['runtime'] =runtime# store information identifying this row of the results data frameres['model'] =nameres['mc_idx'] =1res['n_samples_train'] =100res['n_features'] =10res['beta_type'] =23res['n_nonzero'] =10res['best_pen_val'] =pen_valres['best_mix_val'] =mix_val# possibly other information e.g. n_samplmes if we are varying# the number of samples for each simulationresults.append(res)
# convert list of dicts to data frameresults=pd.DataFrame(results)
The text was updated successfully, but these errors were encountered:
I think the the problem is GlmTrainMetric is doing linear regression i.e. you should specify est_kws = {'standardize': False, 'fit_intercept': False, 'loss': 'log_reg'}
On Fri, Apr 29, 2022 at 9:33 AM Iain Carmichael ***@***.***> wrote:
I think the the problem is GlmTrainMetric is doing linear regression i.e.
you should specify est_kws = {'standardize': False, 'fit_intercept':
False, 'loss': 'log_reg'}
—
Reply to this email directly, view it on GitHub
<#27 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AILWCJC4IEHTRBPBRPVNVWTVHPQKZANCNFSM5US6LMDQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
Noise standard deviation/variance doesn't exist for logistic or poisson regression. Seems like part of it is being computed in coef_cov_quad_form, but not being used to calculate the noise_std; whereas, in the sample_sparse_lin_reg function uses this method to calculate the variance.
yaglm/yaglm/toy_data.py
Line 219 in c6b55ea
Mini example below:
The text was updated successfully, but these errors were encountered: