Cross validation inside 'data' function #276

Parham1995 · 2020-04-10T20:24:30Z

Before filing an issue, please make sure to tick the following boxes.

Make sure your issue hasn't been filed already. Use GitHub search or manually check the existing issues, also the closed ones. Also, make sure to check the FAQ section of our readme.
Install latest hyperas from GitHub:
pip install git+git://github.com/maxpumperla/hyperas.git
Install latest hyperopt from GitHub:
pip install git+git://github.com/hyperopt/hyperopt.git
We have continuous integration running with Travis and make sure the build stays "green". If, after installing test utilities with pip install pytest pytest-cov pep8 pytest-pep8, you can't successfully run python -m pytest there's very likely a problem on your side that should be addressed before creating an issue.
Create a gist containing your complete script, or a minimal version of it, that can be used to reproduce your issue. Also, add your full stack trace to that gist. In many cases your error message is enough to at least give some guidance.

I am struggling to implement cross-validation using KFold inside the data function in hyperas. Since the data should be passed into optim.minimize() I do not know how to make cross-validation work using hyperas.

If you have an example code that I can take a look, it is a great help.
Thanks in advance.

The text was updated successfully, but these errors were encountered:

MatsPro · 2020-05-08T22:51:17Z

Hi, I would also be looking for example code for the problem of making kfold cross validation work in hyperas

Parham1995 · 2020-05-09T20:25:08Z

@MatsPro Hi, I figured out how to make it work.
You should return the whole data from the data function. The K-Fold implementation should be done inside the model function.
It is important to define the cross-fold after the layers you want to hyper optimize.

MatsPro · 2020-05-18T18:26:26Z

@MatsPro Hi, I figured out how to make it work.
You should return the whole data from the data function. The K-Fold implementation should be done inside the model function.
It is important to define the cross-fold after the layers you want to hyper optimize.

Do you use KerasClassifier or other classifiers?
Cause I'm using Keras and this won't work because you would need to hand it a model-function which I can't because I'm right in the middle of creating that model.

Any suggestions?

Could you show your code?

Parham1995 · 2020-05-20T08:56:02Z

@MatsPro Hi, I figured out how to make it work.
You should return the whole data from the data function. The K-Fold implementation should be done inside the model function.
It is important to define the cross-fold after the layers you want to hyper optimize.

Do you use KerasClassifier or other classifiers?
Cause I'm using Keras and this won't work because you would need to hand it a model-function which I can't because I'm right in the middle of creating that model.

Any suggestions?

Could you show your code?

I am doing a regression problem in Keras. However, it should not make a big difference in hyper optimization. If you could put your code I could tell you how to fix the problem

MatsPro · 2020-05-20T16:04:13Z

After some blood, sweat and tears I managed to do it. I also had to implement fold-wise scaling and oversampling which made it a bit more challenging.

This does the deed:

def data():
    import pandas as pd
    import feather
    
    df_hyper_X = feather.read_dataframe('df_hyper_X_train.feather')
    df_hyper_Y = feather.read_dataframe('df_hyper_Y_train.feather')
           
    return df_hyper_X, df_hyper_Y

def hyper_model(df_hyper_X,df_hyper_Y):

  ct = ColumnTransformer([('ct_std', StandardScaler(), ['pre_grade', 'math']),('ct_minmax', MinMaxScaler(), ['time'])
  ], remainder='passthrough')

  metrics = [
            tf.keras.metrics.TruePositives(name='tp'),
            tf.keras.metrics.FalsePositives(name='fp'),
            tf.keras.metrics.TrueNegatives(name='tn'),
            tf.keras.metrics.FalseNegatives(name='fn'), 
            tf.keras.metrics.BinaryAccuracy(name='accuracy'),
            tf.keras.metrics.Precision(name='precision'),
            tf.keras.metrics.AUC(name='auc'),
             ]

  model = tf.keras.Sequential()
  model.add(Dense({{choice([2,4,8,16,32,64])}}, activation={{choice(['relu', 'sigmoid', 'tanh', 'elu', 'selu'])}}, kernel_initializer={{choice(['lecun_uniform','glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform'])}}
                  , input_shape=(20,)))
  model.add(Dropout({{uniform(0, 0.5)}}))

  if ({{choice(['one', 'two'])}}) == 'two':
      model.add(Dense({{choice([2,4,8,16,32,64])}}, activation={{choice(['relu', 'sigmoid', 'tanh', 'elu', 'selu'])}}))
      model.add(Dropout({{uniform(0, 0.5)}}))

  model.add(Dense(1, activation='sigmoid'))

  adam = tf.keras.optimizers.Adam(lr={{choice([0.0001, 0.001, 0.01, 0.1])}})
  nadam = tf.keras.optimizers.Nadam(lr={{choice([0.0001, 0.001, 0.01, 0.1])}})
  adamax = tf.keras.optimizers.Adamax(lr={{choice([0.0001, 0.001, 0.01, 0.1])}})
  adagrad = tf.keras.optimizers.Adagrad(lr={{choice([0.0001, 0.001, 0.01, 0.1])}})
  adadelta = tf.keras.optimizers.Adadelta(lr={{choice([0.0001, 0.001, 0.01, 0.1])}})
  sgd = tf.keras.optimizers.SGD(lr={{choice([0.0001, 0.001, 0.01, 0.1])}})
  rmsprop = tf.keras.optimizers.RMSprop(lr={{choice([0.0001, 0.001, 0.01, 0.1])}})

  opti_choice = {{choice(['adam', 'nadam', 'adamax','adagrad', 'adadelta', 'sgd','rmsprop'])}}
  if opti_choice == 'adam':
      optimizer = adam
  elif opti_choice == 'nadam':
      optimizer = nadam
  elif opti_choice == 'adamax':
      optimizer = adamax
  elif opti_choice == 'adagrad':
      optimizer = adagrad
  elif opti_choice == 'adadelta':
      optimizer = adadelta
  elif opti_choice == 'sgd':
      optimizer = sgd
  else:
      optimizer = rmsprop

  model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=metrics)
  
  smt = SMOTETomek(sampling_strategy='auto', random_state=2)
  kfold = KFold(n_splits=10, shuffle=True, random_state=3)  
  scores = []

  for train_fold_index, val_fold_index in kfold.split(df_hyper_X,df_hyper_Y):

    X_train_fold, y_train_fold = df_hyper_X.iloc[train_fold_index], df_hyper_Y.iloc[train_fold_index]

    X_val_fold, y_val_fold = df_hyper_X.iloc[val_fold_index], df_hyper_Y.iloc[val_fold_index]

    X_train_fold = ct.fit_transform(X_train_fold)
    X_val_fold = ct.transform(X_val_fold)

    X_train_smtk, y_train_smtk = smt.fit_resample(X_train_fold, np.ravel(y_train_fold))

    model.fit(X_train_smtk, y_train_smtk, epochs={{choice([20,30,40,50,60,70])}}, batch_size={{choice([16,32, 64, 128])}}, verbose=0)

    predicts = model.predict(X_val_fold)
    score = precision_score(y_val_fold, predicts.round())
    scores.append(score)

  avg_score = np.mean(scores)    
  print('Precision', avg_score)
  return {'loss': -avg_score, 'status': STATUS_OK, 'model': model}

if __name__ == '__main__':
    best_run, best_model = optim.minimize(model=hyper_model,
                                          data=data,
                                          algo=tpe.suggest,
                                          max_evals=200,
                                          trials=Trials(),
                                          notebook_name = 'drive/My Drive/Colab Notebooks/final_NL_EU_Non-EU')
    df_hyper_X, df_hyper_Y = data()
    print("Best performing model chosen hyper-parameters:")
    print(best_run)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross validation inside 'data' function #276

Cross validation inside 'data' function #276

Parham1995 commented Apr 10, 2020

MatsPro commented May 8, 2020

Parham1995 commented May 9, 2020 •

edited

MatsPro commented May 18, 2020 •

edited

Parham1995 commented May 20, 2020 •

edited

MatsPro commented May 20, 2020

Cross validation inside 'data' function #276

Cross validation inside 'data' function #276

Comments

Parham1995 commented Apr 10, 2020

MatsPro commented May 8, 2020

Parham1995 commented May 9, 2020 • edited

MatsPro commented May 18, 2020 • edited

Parham1995 commented May 20, 2020 • edited

MatsPro commented May 20, 2020

Parham1995 commented May 9, 2020 •

edited

MatsPro commented May 18, 2020 •

edited

Parham1995 commented May 20, 2020 •

edited