Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross validation inside 'data' function #276

Open
5 tasks done
Parham1995 opened this issue Apr 10, 2020 · 5 comments
Open
5 tasks done

Cross validation inside 'data' function #276

Parham1995 opened this issue Apr 10, 2020 · 5 comments

Comments

@Parham1995
Copy link

Before filing an issue, please make sure to tick the following boxes.

  • Make sure your issue hasn't been filed already. Use GitHub search or manually check the existing issues, also the closed ones. Also, make sure to check the FAQ section of our readme.

  • Install latest hyperas from GitHub:
    pip install git+git://github.com/maxpumperla/hyperas.git

  • Install latest hyperopt from GitHub:
    pip install git+git://github.com/hyperopt/hyperopt.git

  • We have continuous integration running with Travis and make sure the build stays "green". If, after installing test utilities with pip install pytest pytest-cov pep8 pytest-pep8, you can't successfully run python -m pytest there's very likely a problem on your side that should be addressed before creating an issue.

  • Create a gist containing your complete script, or a minimal version of it, that can be used to reproduce your issue. Also, add your full stack trace to that gist. In many cases your error message is enough to at least give some guidance.

I am struggling to implement cross-validation using KFold inside the data function in hyperas. Since the data should be passed into optim.minimize() I do not know how to make cross-validation work using hyperas.

If you have an example code that I can take a look, it is a great help.
Thanks in advance.

@MatsPro
Copy link

MatsPro commented May 8, 2020

Hi, I would also be looking for example code for the problem of making kfold cross validation work in hyperas

@Parham1995
Copy link
Author

Parham1995 commented May 9, 2020

@MatsPro Hi, I figured out how to make it work.
You should return the whole data from the data function. The K-Fold implementation should be done inside the model function.
It is important to define the cross-fold after the layers you want to hyper optimize.

@MatsPro
Copy link

MatsPro commented May 18, 2020

@MatsPro Hi, I figured out how to make it work.
You should return the whole data from the data function. The K-Fold implementation should be done inside the model function.
It is important to define the cross-fold after the layers you want to hyper optimize.

Do you use KerasClassifier or other classifiers?
Cause I'm using Keras and this won't work because you would need to hand it a model-function which I can't because I'm right in the middle of creating that model.

Any suggestions?

Could you show your code?

@Parham1995
Copy link
Author

Parham1995 commented May 20, 2020

@MatsPro Hi, I figured out how to make it work.
You should return the whole data from the data function. The K-Fold implementation should be done inside the model function.
It is important to define the cross-fold after the layers you want to hyper optimize.

Do you use KerasClassifier or other classifiers?
Cause I'm using Keras and this won't work because you would need to hand it a model-function which I can't because I'm right in the middle of creating that model.

Any suggestions?

Could you show your code?

I am doing a regression problem in Keras. However, it should not make a big difference in hyper optimization. If you could put your code I could tell you how to fix the problem

@MatsPro
Copy link

MatsPro commented May 20, 2020

After some blood, sweat and tears I managed to do it. I also had to implement fold-wise scaling and oversampling which made it a bit more challenging.

This does the deed:

def data():
    import pandas as pd
    import feather
    
    df_hyper_X = feather.read_dataframe('df_hyper_X_train.feather')
    df_hyper_Y = feather.read_dataframe('df_hyper_Y_train.feather')
           
    return df_hyper_X, df_hyper_Y

def hyper_model(df_hyper_X,df_hyper_Y):

  ct = ColumnTransformer([('ct_std', StandardScaler(), ['pre_grade', 'math']),('ct_minmax', MinMaxScaler(), ['time'])
  ], remainder='passthrough')

  metrics = [
            tf.keras.metrics.TruePositives(name='tp'),
            tf.keras.metrics.FalsePositives(name='fp'),
            tf.keras.metrics.TrueNegatives(name='tn'),
            tf.keras.metrics.FalseNegatives(name='fn'), 
            tf.keras.metrics.BinaryAccuracy(name='accuracy'),
            tf.keras.metrics.Precision(name='precision'),
            tf.keras.metrics.AUC(name='auc'),
             ]

  model = tf.keras.Sequential()
  model.add(Dense({{choice([2,4,8,16,32,64])}}, activation={{choice(['relu', 'sigmoid', 'tanh', 'elu', 'selu'])}}, kernel_initializer={{choice(['lecun_uniform','glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform'])}}
                  , input_shape=(20,)))
  model.add(Dropout({{uniform(0, 0.5)}}))

  if ({{choice(['one', 'two'])}}) == 'two':
      model.add(Dense({{choice([2,4,8,16,32,64])}}, activation={{choice(['relu', 'sigmoid', 'tanh', 'elu', 'selu'])}}))
      model.add(Dropout({{uniform(0, 0.5)}}))

  model.add(Dense(1, activation='sigmoid'))

  adam = tf.keras.optimizers.Adam(lr={{choice([0.0001, 0.001, 0.01, 0.1])}})
  nadam = tf.keras.optimizers.Nadam(lr={{choice([0.0001, 0.001, 0.01, 0.1])}})
  adamax = tf.keras.optimizers.Adamax(lr={{choice([0.0001, 0.001, 0.01, 0.1])}})
  adagrad = tf.keras.optimizers.Adagrad(lr={{choice([0.0001, 0.001, 0.01, 0.1])}})
  adadelta = tf.keras.optimizers.Adadelta(lr={{choice([0.0001, 0.001, 0.01, 0.1])}})
  sgd = tf.keras.optimizers.SGD(lr={{choice([0.0001, 0.001, 0.01, 0.1])}})
  rmsprop = tf.keras.optimizers.RMSprop(lr={{choice([0.0001, 0.001, 0.01, 0.1])}})

  opti_choice = {{choice(['adam', 'nadam', 'adamax','adagrad', 'adadelta', 'sgd','rmsprop'])}}
  if opti_choice == 'adam':
      optimizer = adam
  elif opti_choice == 'nadam':
      optimizer = nadam
  elif opti_choice == 'adamax':
      optimizer = adamax
  elif opti_choice == 'adagrad':
      optimizer = adagrad
  elif opti_choice == 'adadelta':
      optimizer = adadelta
  elif opti_choice == 'sgd':
      optimizer = sgd
  else:
      optimizer = rmsprop

  model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=metrics)
  
  smt = SMOTETomek(sampling_strategy='auto', random_state=2)
  kfold = KFold(n_splits=10, shuffle=True, random_state=3)  
  scores = []

  for train_fold_index, val_fold_index in kfold.split(df_hyper_X,df_hyper_Y):

    X_train_fold, y_train_fold = df_hyper_X.iloc[train_fold_index], df_hyper_Y.iloc[train_fold_index]

    X_val_fold, y_val_fold = df_hyper_X.iloc[val_fold_index], df_hyper_Y.iloc[val_fold_index]

    X_train_fold = ct.fit_transform(X_train_fold)
    X_val_fold = ct.transform(X_val_fold)

    X_train_smtk, y_train_smtk = smt.fit_resample(X_train_fold, np.ravel(y_train_fold))

    model.fit(X_train_smtk, y_train_smtk, epochs={{choice([20,30,40,50,60,70])}}, batch_size={{choice([16,32, 64, 128])}}, verbose=0)

    predicts = model.predict(X_val_fold)
    score = precision_score(y_val_fold, predicts.round())
    scores.append(score)

  avg_score = np.mean(scores)    
  print('Precision', avg_score)
  return {'loss': -avg_score, 'status': STATUS_OK, 'model': model}

if __name__ == '__main__':
    best_run, best_model = optim.minimize(model=hyper_model,
                                          data=data,
                                          algo=tpe.suggest,
                                          max_evals=200,
                                          trials=Trials(),
                                          notebook_name = 'drive/My Drive/Colab Notebooks/final_NL_EU_Non-EU')
    df_hyper_X, df_hyper_Y = data()
    print("Best performing model chosen hyper-parameters:")
    print(best_run)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants