Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't specify list-valued hyperparameters in AutoMLSearch #2015

Closed
freddyaboulton opened this issue Mar 23, 2021 · 2 comments · Fixed by #2028
Closed

Can't specify list-valued hyperparameters in AutoMLSearch #2015

freddyaboulton opened this issue Mar 23, 2021 · 2 comments · Fixed by #2028
Assignees
Labels
bug Issues tracking problems with existing features.

Comments

@freddyaboulton
Copy link
Contributor

Repro:

from evalml.demos import load_breast_cancer
from evalml.pipelines import BinaryClassificationPipeline
from evalml.automl import AutoMLSearch

class PipeLine(BinaryClassificationPipeline):
    component_graph = ["Drop Columns Transformer", "Random Forest Classifier"]
    
X , y = load_breast_cancer()

automl = AutoMLSearch(X, y, problem_type="binary", allowed_pipelines=[PipeLine],
                      pipeline_parameters={"Drop Columns Transformer": {"columns": ["mean texture"]}})
automl.search()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/sources/evalml/evalml/pipelines/component_graph.py in instantiate(self, parameters)
     77             try:
---> 78                 new_component = component_class(**component_parameters, random_seed=self.random_seed)
     79             except (ValueError, TypeError) as e:

~/sources/evalml/evalml/pipelines/components/transformers/column_selectors.py in __init__(self, columns, random_seed, **kwargs)
     15         if columns and not isinstance(columns, list):
---> 16             raise ValueError(f"Parameter columns must be a list. Received {type(columns)}.")
     17 

ValueError: Parameter columns must be a list. Received <class 'str'>.

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-21-b4819258a317> in <module>
     10 automl = AutoMLSearch(X, y, problem_type="binary", allowed_pipelines=[PipeLine],
     11                       pipeline_parameters={"Drop Columns Transformer": {"columns": ["mean texture"]}})
---> 12 automl.search()

~/sources/evalml/evalml/automl/automl_search.py in search(self, show_iteration_plot)
    490         logger.info("Allowed model families: %s\n" % ", ".join([model.value for model in self.allowed_model_families]))
    491         self.search_iteration_plot = None
--> 492         if self.plot:
    493             self.search_iteration_plot = self.plot.search_iteration_plot(interactive_plot=show_iteration_plot)
    494 

~/sources/evalml/evalml/automl/automl_algorithm/iterative_algorithm.py in next_batch(self)
     63         next_batch = []
     64         if self._batch_number == 0:
---> 65             next_batch = [pipeline_class(parameters=self._transform_parameters(pipeline_class, {}), random_seed=self.random_seed)
     66                           for pipeline_class in self.allowed_pipelines]
     67 

~/sources/evalml/evalml/automl/automl_algorithm/iterative_algorithm.py in <listcomp>(.0)
     63         next_batch = []
     64         if self._batch_number == 0:
---> 65             next_batch = [pipeline_class(parameters=self._transform_parameters(pipeline_class, {}), random_seed=self.random_seed)
     66                           for pipeline_class in self.allowed_pipelines]
     67 

~/sources/evalml/evalml/pipelines/classification_pipeline.py in __init__(self, parameters, random_seed)
     23         """
     24         self._encoder = LabelEncoder()
---> 25         super().__init__(parameters, random_seed=random_seed)
     26 
     27     def fit(self, X, y):

~/sources/evalml/evalml/pipelines/pipeline_base.py in __init__(self, parameters, random_seed)
     77         else:
     78             self._component_graph = ComponentGraph(component_dict=self.component_graph, random_seed=self.random_seed)
---> 79         self._component_graph.instantiate(parameters)
     80 
     81         self.input_feature_names = {}

~/sources/evalml/evalml/pipelines/component_graph.py in instantiate(self, parameters)
     80                 self._is_instantiated = False
     81                 err = "Error received when instantiating component {} with the following arguments {}".format(component_name, component_parameters)
---> 82                 raise ValueError(err) from e
     83 
     84             component_instances[component_name] = new_component

ValueError: Error received when instantiating component Drop Columns Transformer with the following arguments {'columns': 'mean texture'}

The IterativeAlgorithm selects the first element of the columns list which is not the intended behavior.

@freddyaboulton freddyaboulton added the bug Issues tracking problems with existing features. label Mar 23, 2021
@angela97lin
Copy link
Contributor

angela97lin commented Mar 23, 2021

This issue arises when IterativeAlgorithm calls _transform_parameters and tries to unpack the parameters. This code was added to address when the user passes in pipeline_parameters to freeze or set the hyperparameters to a particular subset. For example:

    params = {'Imputer': {'numeric_impute_strategy': ['median', 'most_frequent']},
              'Decision Tree Regressor': {'max_depth': [17, 18, 19], 'max_features': Categorical(['auto'])},
              'Elastic Net Regressor': {"alpha": Real(0, 0.5), "l1_ratio": (0.01, 0.02, 0.03)}}
    automl = AutoMLSearch(X_train=X, y_train=y, problem_type='regression', pipeline_parameters=params, n_jobs=1)
    automl.search()

In the first batch in _transform_parameters, to handle list inputs such as max_depth or numeric_impute_strategy above, we simply choose or sample the first element in the list.

One way around this issue is thus to remove this line and enforce that lists aren't allowed.

@dsherry @freddyaboulton @bchen1116 @chukarsten FYI :)

@jeremyliweishih
Copy link
Contributor

@dsherry @chukarsten

In #1862 the plan is to add a Drop Columns Transformer in _get_preprocessing_components when an index column exists and then add those columns to self. pipeline_parameters as well so this issue will blocking that as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues tracking problems with existing features.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants