Skip to content
This repository has been archived by the owner on Nov 14, 2023. It is now read-only.

Duplicated trials #207

Open
qo4on opened this issue May 13, 2021 · 8 comments
Open

Duplicated trials #207

qo4on opened this issue May 13, 2021 · 8 comments

Comments

@qo4on
Copy link

qo4on commented May 13, 2021

I run your HEBO custom example and see that it runs the same trials multiple times. Can I skip them and finish when there are no unique hp configurations left?
Setting cv=5 I expected to see 5 test scores for each trial, but it shows only 3 of them: split0_test_score, split1_test_score, split2_test_score. Can you clarify how it works?

seed = 0

clf = RandomForestClassifier(random_state=seed)

param_distributions = {
    "n_estimators": tune.randint(20, 21),
    "max_depth": tune.randint(2, 3),
}

tune_search = TuneSearchCV(
    clf,
    param_distributions,
    n_trials=5,
    search_optimization=HEBOSearch(),
    cv=5,
    random_state=seed,
    local_dir='ray',
    verbose=2,
)

tune_search.fit(x_train, y_train)

image

@Yard1
Copy link
Member

Yard1 commented May 13, 2021

How duplicate trials are handled depends on the search algorithm itself, and it looks like HEBO doesn't account for those.

@qo4on
Copy link
Author

qo4on commented May 13, 2021

So, we can fix that by sending existing results to HEBO without additional training.
Where are the test score results for remaining split3_test_score and split4_test_score?

@Yard1
Copy link
Member

Yard1 commented May 13, 2021

Isn't it just due to Pandas trying to fit the dataframe on screen?
pd.DataFrame(tune_search.cv_results_) should have all the info you need

@qo4on
Copy link
Author

qo4on commented May 13, 2021

Thank you, it works.

Don't you think it would be good to skip all duplicates for all searchers by default? I think it's a big problem when you think you tune hp's but in fact you're training the same over and over again.

@Yard1
Copy link
Member

Yard1 commented May 13, 2021

It's not a straightforward thing, as those should be ideally handled by the search algorithm itself. For example, if we were to reject duplicates from a search algorithm that doesn't check for them, it is possible for the situation to become an infinite loop where the tuner rejects the duplicate suggestion only for the algorithm to suggest it again, as this is what it considers to be the best configuration.

In any case, that should be done in Ray Tune itself, and not here. @krfricke what do you think?

@richardliaw
Copy link
Collaborator

Hmm, so yeah this seems to be a common request.

We've actually implemented something similar in Bayesopt (see ray/python/ray/tune/suggest/bayesopt.py).

@qo4on
Copy link
Author

qo4on commented May 14, 2021

Besides HEBO counts duplicates as newly tested parameters and throws an error when it reaches all possible combinations count. This leads to the fact that not all parameters are tested.
huawei-noah/noah-research#28

@qo4on
Copy link
Author

qo4on commented May 16, 2021

@richardliaw
Which search_optimization don't have such issue except Bayesopt?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants