Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I set class weights in a multiclass classification with imbalance dataset? #183

Open
alegarbed opened this issue Aug 21, 2019 · 1 comment

Comments

@alegarbed
Copy link

I had difficulties to implement different class weights in a multiclass classification. The proper way to set a class weight is in a dictionary but I can just use with parameters: Real, Integer and Categorical. Are there any solution? Can you provide a simply example?
Thank you in advance.

@HunterMcGushion
Copy link
Owner

Thanks for opening this, @alegarbed! Yes, you can optimize class_weight values! Here's a basic example with SKLearn's RandomForestClassifier and the Iris dataset.

from hyperparameter_hunter import Environment, CVExperiment
from hyperparameter_hunter import BayesianOptPro, Integer, Categorical
from hyperparameter_hunter.utils.learning_utils import get_iris_data
from sklearn.ensemble import RandomForestClassifier

env = Environment(
    train_dataset=get_iris_data(),
    results_path="HyperparameterHunterAssets",
    target_column="species",
    metrics=["hamming_loss"],
    cv_params=dict(n_splits=5, random_state=32),
)

# Just a reference for normal `class_weight` usage outside of optimization
exp = CVExperiment(
    RandomForestClassifier, {"n_estimators": 10, "class_weight": {0: 1, 1: 1, 2: 1}}
)

opt = BayesianOptPro(iterations=10, random_state=32)
opt.forge_experiment(
    model_initializer=RandomForestClassifier,
    model_init_params=dict(
        #################### LOOK DOWN ####################
        class_weight={
            0: Categorical([1, 3]),
            1: Categorical([1, 4]),
            2: Integer(1, 9),  # You can also use `Integer` for low/high ranges
        },
        #################### LOOK UP ####################
        criterion=Categorical(["gini", "entropy"]),
        n_estimators=Integer(5, 100),
    ),
)
opt.go()

This should definitely be included in one of our examples, or at least documented, so thanks for asking again!

Side note: I just noticed that the automatic Experiment matching during optimization isn't working for this, which is a bug, so I'll look into that and update you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants