Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message when passing wrong value to node - AttributeError: ‘int’ object has no attribute ‘split’ #2733

Closed
noklam opened this issue Jun 27, 2023 · 4 comments · Fixed by #3715 · May be fixed by #2734
Closed
Assignees
Labels
Help Wanted 🙏 Contribution task, outside help would be appreciated! Issue: Feature Request New feature or improvement to existing feature

Comments

@noklam
Copy link
Contributor

noklam commented Jun 27, 2023

Description

Is your feature request related to a problem? A clear and concise description of what the problem is: "I'm always frustrated when ..."

Node is suppose to take name of datasets (or parmaeters), when users put a value in node, they get an obscure error.

node(
    func=split_data,
    inputs=["example_iris_data", "parameters"],
    outputs={"X_train":1},
    name="split",
),

AttributeError: 'int' object has no attribute 'split'

See full traceback:

> ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
> │ /Users/Nok_Lam_Chan/miniconda3/envs/kedro_core/bin/kedro:8 in <module>                           │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/framework/cli/cli.py:211 in main                          │
> │                                                                                                  │
> │   208 │   """                                                                                    │
> │   209 │   _init_plugins()                                                                        │
> │   210 │   cli_collection = KedroCLI(project_path=Path.cwd())                                     │
> │ ❱ 211 │   cli_collection()                                                                       │
> │   212                                                                                            │
> │                                                                                                  │
> │ ╭───────────── locals ─────────────╮                                                             │
> │ │ cli_collection = <KedroCLI None> │                                                             │
> │ ╰──────────────────────────────────╯                                                             │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/miniconda3/envs/kedro_core/lib/python3.8/site-packages/click/core.py:1130 in │
> │ __call__                                                                                         │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/framework/cli/cli.py:139 in main                          │
> │                                                                                                  │
> │   136 │   │   )                                                                                  │
> │   137 │   │                                                                                      │
> │   138 │   │   try:                                                                               │
> │ ❱ 139 │   │   │   super().main(                                                                  │
> │   140 │   │   │   │   args=args,                                                                 │
> │   141 │   │   │   │   prog_name=prog_name,                                                       │
> │   142 │   │   │   │   complete_var=complete_var,                                                 │
> │                                                                                                  │
> │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
> │ │       __class__ = <class 'kedro.framework.cli.cli.KedroCLI'>                                 │ │
> │ │            args = ['run']                                                                    │ │
> │ │    complete_var = None                                                                       │ │
> │ │           extra = {                                                                          │ │
> │ │                   │   'obj': ProjectMetadata(                                                │ │
> │ │                   │   │                                                                      │ │
> │ │                   config_file=PosixPath('/Users/Nok_Lam_Chan/GitHub/kedro/richlog/pyproject… │ │
> │ │                   │   │   package_name='richlog',                                            │ │
> │ │                   │   │   project_name='richlog',                                            │ │
> │ │                   │   │                                                                      │ │
> │ │                   project_path=PosixPath('/Users/Nok_Lam_Chan/GitHub/kedro/richlog'),        │ │
> │ │                   │   │   project_version='0.18.7',                                          │ │
> │ │                   │   │                                                                      │ │
> │ │                   source_dir=PosixPath('/Users/Nok_Lam_Chan/GitHub/kedro/richlog/src'),      │ │
> │ │                   │   │   kedro_init_version='0.18.7'                                        │ │
> │ │                   │   )                                                                      │ │
> │ │                   }                                                                          │ │
> │ │       prog_name = None                                                                       │ │
> │ │            self = <KedroCLI None>                                                            │ │
> │ │ standalone_mode = True                                                                       │ │
> │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/miniconda3/envs/kedro_core/lib/python3.8/site-packages/click/core.py:1055 in │
> │ main                                                                                             │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/miniconda3/envs/kedro_core/lib/python3.8/site-packages/click/core.py:1657 in │
> │ invoke                                                                                           │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/miniconda3/envs/kedro_core/lib/python3.8/site-packages/click/core.py:1404 in │
> │ invoke                                                                                           │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/miniconda3/envs/kedro_core/lib/python3.8/site-packages/click/core.py:760 in  │
> │ invoke                                                                                           │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/framework/cli/project.py:459 in run                       │
> │                                                                                                  │
> │   456 │   with KedroSession.create(                                                              │
> │   457 │   │   env=env, conf_source=conf_source, extra_params=params                              │
> │   458 │   ) as session:                                                                          │
> │ ❱ 459 │   │   session.run(                                                                       │
> │   460 │   │   │   tags=tag,                                                                      │
> │   461 │   │   │   runner=runner(is_async=is_async),                                              │
> │   462 │   │   │   node_names=node_names,                                                         │
> │                                                                                                  │
> │ ╭──────────────────────────────────────── locals ─────────────────────────────────────────╮      │
> │ │   conf_source = None                                                                    │      │
> │ │        config = None                                                                    │      │
> │ │           env = None                                                                    │      │
> │ │   from_inputs = []                                                                      │      │
> │ │    from_nodes = []                                                                      │      │
> │ │      is_async = False                                                                   │      │
> │ │  load_version = {}                                                                      │      │
> │ │ load_versions = {}                                                                      │      │
> │ │     namespace = None                                                                    │      │
> │ │    node_names = ()                                                                      │      │
> │ │   nodes_names = ()                                                                      │      │
> │ │        params = {}                                                                      │      │
> │ │      pipeline = None                                                                    │      │
> │ │        runner = <class 'kedro.runner.sequential_runner.SequentialRunner'>               │      │
> │ │       session = <kedro.framework.session.session.KedroSession object at 0x7f8c90dfc5e0> │      │
> │ │           tag = ()                                                                      │      │
> │ │          tags = ()                                                                      │      │
> │ │      to_nodes = []                                                                      │      │
> │ │    to_outputs = []                                                                      │      │
> │ ╰─────────────────────────────────────────────────────────────────────────────────────────╯      │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/framework/session/session.py:376 in run                   │
> │                                                                                                  │
> │   373 │   │   name = pipeline_name or "__default__"                                              │
> │   374 │   │                                                                                      │
> │   375 │   │   try:                                                                               │
> │ ❱ 376 │   │   │   pipeline = pipelines[name]                                                     │
> │   377 │   │   except KeyError as exc:                                                            │
> │   378 │   │   │   raise ValueError(                                                              │
> │   379 │   │   │   │   f"Failed to find the pipeline named '{name}'. "                            │
> │                                                                                                  │
> │ ╭────────────────────────────────────────── locals ──────────────────────────────────────────╮   │
> │ │       context = <kedro.framework.context.context.KedroContext object at 0x7f8c907dc2e0>    │   │
> │ │  extra_params = {}                                                                         │   │
> │ │   from_inputs = []                                                                         │   │
> │ │    from_nodes = []                                                                         │   │
> │ │ load_versions = {}                                                                         │   │
> │ │          name = '__default__'                                                              │   │
> │ │     namespace = None                                                                       │   │
> │ │    node_names = ()                                                                         │   │
> │ │ pipeline_name = None                                                                       │   │
> │ │        runner = <kedro.runner.sequential_runner.SequentialRunner object at 0x7f8c907dc1c0> │   │
> │ │  save_version = '2023-06-27T14.44.06.340Z'                                                 │   │
> │ │          self = <kedro.framework.session.session.KedroSession object at 0x7f8c90dfc5e0>    │   │
> │ │    session_id = '2023-06-27T14.44.06.340Z'                                                 │   │
> │ │          tags = ()                                                                         │   │
> │ │      to_nodes = []                                                                         │   │
> │ │    to_outputs = []                                                                         │   │
> │ ╰────────────────────────────────────────────────────────────────────────────────────────────╯   │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/framework/project/__init__.py:136 in inner                │
> │                                                                                                  │
> │   133 │                                                                                          │
> │   134 │   # pylint: disable=protected-access                                                     │
> │   135 │   def inner(self, *args, **kwargs):                                                      │
> │ ❱ 136 │   │   self._load_data()                                                                  │
> │   137 │   │   return func(self._content, *args, **kwargs)                                        │
> │   138 │                                                                                          │
> │   139 │   return inner                                                                           │
> │                                                                                                  │
> │ ╭─────────────────────────── locals ────────────────────────────╮                                │
> │ │   args = ('__default__',)                                     │                                │
> │ │   func = <built-in function getitem>                          │                                │
> │ │ kwargs = {}                                                   │                                │
> │ │   self = <repr-error "'int' object has no attribute 'split'"> │                                │
> │ ╰───────────────────────────────────────────────────────────────╯                                │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/framework/project/__init__.py:181 in _load_data           │
> │                                                                                                  │
> │   178 │   │   register_pipelines = self._get_pipelines_registry_callable(                        │
> │   179 │   │   │   self._pipelines_module                                                         │
> │   180 │   │   )                                                                                  │
> │ ❱ 181 │   │   project_pipelines = register_pipelines()                                           │
> │   182 │   │                                                                                      │
> │   183 │   │   self._content = project_pipelines                                                  │
> │   184 │   │   self._is_data_loaded = True                                                        │
> │                                                                                                  │
> │ ╭───────────────────────────────── locals ──────────────────────────────────╮                    │
> │ │ register_pipelines = <function register_pipelines at 0x7f8cd0879af0>      │                    │
> │ │               self = <repr-error "'int' object has no attribute 'split'"> │                    │
> │ ╰───────────────────────────────────────────────────────────────────────────╯                    │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/richlog/src/richlog/pipeline_registry.py:14 in                  │
> │ register_pipelines                                                                               │
> │                                                                                                  │
> │   11 │   Returns:                                                                                │
> │   12 │   │   A mapping from pipeline names to ``Pipeline`` objects.                              │
> │   13 │   """                                                                                     │
> │ ❱ 14 │   pipelines = find_pipelines()                                                            │
> │   15 │   pipelines["__default__"] = sum(pipelines.values())                                      │
> │   16 │   return pipelines                                                                        │
> │   17                                                                                             │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/framework/project/__init__.py:336 in find_pipelines       │
> │                                                                                                  │
> │   333 │   │   │   │   )                                                                          │
> │   334 │   │   │   )                                                                              │
> │   335 │   else:                                                                                  │
> │ ❱ 336 │   │   pipeline_obj = _create_pipeline(pipeline_module)                                   │
> │   337 │                                                                                          │
> │   338 │   pipelines_dict = {"__default__": pipeline_obj or pipeline([])}                         │
> │   339                                                                                            │
> │                                                                                                  │
> │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
> │ │      pipeline_module = <module 'richlog.pipeline' from                                       │ │
> │ │                        '/Users/Nok_Lam_Chan/GitHub/kedro/richlog/src/richlog/pipeline.py'>   │ │
> │ │ pipeline_module_name = 'richlog.pipeline'                                                    │ │
> │ │         pipeline_obj = None                                                                  │ │
> │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/framework/project/__init__.py:288 in _create_pipeline     │
> │                                                                                                  │
> │   285 │   │   )                                                                                  │
> │   286 │   │   return None                                                                        │
> │   287 │                                                                                          │
> │ ❱ 288 │   obj = getattr(pipeline_module, "create_pipeline")()                                    │
> │   289 │   if not isinstance(obj, Pipeline):                                                      │
> │   290 │   │   warnings.warn(                                                                     │
> │   291 │   │   │   f"Expected the 'create_pipeline' function in the "                             │
> │                                                                                                  │
> │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
> │ │ pipeline_module = <module 'richlog.pipeline' from                                            │ │
> │ │                   '/Users/Nok_Lam_Chan/GitHub/kedro/richlog/src/richlog/pipeline.py'>        │ │
> │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/richlog/src/richlog/pipeline.py:12 in create_pipeline           │
> │                                                                                                  │
> │    9                                                                                             │
> │   10                                                                                             │
> │   11 def create_pipeline(**kwargs) -> Pipeline:                                                  │
> │ ❱ 12 │   return pipeline(                                                                        │
> │   13 │   │   [                                                                                   │
> │   14 │   │   │   node(                                                                           │
> │   15 │   │   │   │   func=split_data,                                                            │
> │                                                                                                  │
> │ ╭── locals ───╮                                                                                  │
> │ │ kwargs = {} │                                                                                  │
> │ ╰─────────────╯                                                                                  │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/pipeline/modular_pipeline.py:210 in pipeline              │
> │                                                                                                  │
> │   207 │   │   # To ensure that we are always dealing with a *copy* of pipe.                      │
> │   208 │   │   pipe = Pipeline([pipe], tags=tags)                                                 │
> │   209 │   else:                                                                                  │
> │ ❱ 210 │   │   pipe = Pipeline(pipe, tags=tags)                                                   │
> │   211 │                                                                                          │
> │   212 │   if not any([inputs, outputs, parameters, namespace]):                                  │
> │   213 │   │   return pipe                                                                        │
> │                                                                                                  │
> │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
> │ │     inputs = None                                                                            │ │
> │ │  namespace = None                                                                            │ │
> │ │    outputs = None                                                                            │ │
> │ │ parameters = None                                                                            │ │
> │ │       pipe = [                                                                               │ │
> │ │              │   Node(split_data, ['example_iris_data', 'parameters'], {'X_train': 1},       │ │
> │ │              'split'),                                                                       │ │
> │ │              │   Node(make_predictions, ['X_train', 'X_test', 'y_train'], 'y_pred',          │ │
> │ │              'make_predictions'),                                                            │ │
> │ │              │   Node(report_accuracy, ['y_pred', 'y_test'], None, 'report_accuracy')        │ │
> │ │              ]                                                                               │ │
> │ │       tags = None                                                                            │ │
> │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/pipeline/pipeline.py:145 in __init__                      │
> │                                                                                                  │
> │   142 │   │   │   │   [[n] if isinstance(n, Node) else n.nodes for n in nodes]                   │
> │   143 │   │   │   )                                                                              │
> │   144 │   │   )                                                                                  │
> │ ❱ 145 │   │   _validate_transcoded_inputs_outputs(nodes)                                         │
> │   146 │   │   _tags = set(_to_list(tags))                                                        │
> │   147 │   │                                                                                      │
> │   148 │   │   nodes = [n.tag(_tags) for n in nodes]                                              │
> │                                                                                                  │
> │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
> │ │ nodes = [                                                                                    │ │
> │ │         │   Node(split_data, ['example_iris_data', 'parameters'], {'X_train': 1}, 'split'),  │ │
> │ │         │   Node(make_predictions, ['X_train', 'X_test', 'y_train'], 'y_pred',               │ │
> │ │         'make_predictions'),                                                                 │ │
> │ │         │   Node(report_accuracy, ['y_pred', 'y_test'], None, 'report_accuracy')             │ │
> │ │         ]                                                                                    │ │
> │ │  self = <repr-error "'Pipeline' object has no attribute '_topo_sorted_nodes'">               │ │
> │ │  tags = None                                                                                 │ │
> │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/pipeline/pipeline.py:875 in                               │
> │ _validate_transcoded_inputs_outputs                                                              │
> │                                                                                                  │
> │   872 │                                                                                          │
> │   873 │   invalid = set()                                                                        │
> │   874 │   for dataset_name in all_inputs_outputs:                                                │
> │ ❱ 875 │   │   name = _strip_transcoding(dataset_name)                                            │
> │   876 │   │   if name != dataset_name and name in all_inputs_outputs:                            │
> │   877 │   │   │   invalid.add(name)                                                              │
> │   878                                                                                            │
> │                                                                                                  │
> │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
> │ │ all_inputs_outputs = {                                                                       │ │
> │ │                      │   1,                                                                  │ │
> │ │                      │   'X_test',                                                           │ │
> │ │                      │   'X_train',                                                          │ │
> │ │                      │   'example_iris_data',                                                │ │
> │ │                      │   'parameters',                                                       │ │
> │ │                      │   'y_test',                                                           │ │
> │ │                      │   'y_pred',                                                           │ │
> │ │                      │   'y_train'                                                           │ │
> │ │                      }                                                                       │ │
> │ │       dataset_name = 1                                                                       │ │
> │ │            invalid = set()                                                                   │ │
> │ │              nodes = [                                                                       │ │
> │ │                      │   Node(split_data, ['example_iris_data', 'parameters'], {'X_train':   │ │
> │ │                      1}, 'split'),                                                           │ │
> │ │                      │   Node(make_predictions, ['X_train', 'X_test', 'y_train'], 'y_pred',  │ │
> │ │                      'make_predictions'),                                                    │ │
> │ │                      │   Node(report_accuracy, ['y_pred', 'y_test'], None,                   │ │
> │ │                      'report_accuracy')                                                      │ │
> │ │                      ]                                                                       │ │
> │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/pipeline/pipeline.py:55 in _strip_transcoding             │
> │                                                                                                  │
> │    52 │   │   ValueError: Raised if more than one transcoding separator                          │
> │    53 │   │   is present in the name.                                                            │
> │    54 │   """                                                                                    │
> │ ❱  55 │   return _transcode_split(element)[0]                                                    │
> │    56                                                                                            │
> │    57                                                                                            │
> │    58 class OutputNotUniqueError(Exception):                                                     │
> │                                                                                                  │
> │ ╭── locals ───╮                                                                                  │
> │ │ element = 1 │                                                                                  │
> │ ╰─────────────╯                                                                                  │
> │                                                                                                  │
> │ /Users/Nok_Lam_Chan/GitHub/kedro/kedro/pipeline/pipeline.py:33 in _transcode_split               │
> │                                                                                                  │
> │    30 │   │   ValueError: Raised if more than one transcoding separator                          │
> │    31 │   │   is present in the name.                                                            │
> │    32 │   """                                                                                    │
> │ ❱  33 │   split_name = element.split(TRANSCODING_SEPARATOR)                                      │
> │    34 │                                                                                          │
> │    35 │   if len(split_name) > 2:                                                                │
> │    36 │   │   raise ValueError(                                                                  │
> │                                                                                                  │
> │ ╭── locals ───╮                                                                                  │
> │ │ element = 1 │                                                                                  │
> │ ╰─────────────╯                                                                                  │
> ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
> AttributeError: 'int' object has no attribute 'split'

Context

Why is this change important to you? How would you use it? How can it benefit other users?
The error is common for beginner. It would be great that if the error message can suggest the fix immediately.

Possible Implementation

(Optional) Suggest an idea for implementing the addition or change.
Raise a KedroError instead of AttributeError

Possible Alternatives

(Optional) Describe any alternative solutions or features you've considered.
Improve the logic how dataset is reference, insert validating logic. The error is now coming from pipeline/pipeline.py:33 _transcode_split, which doesn't seem to be the right place to raise error.

@noklam noklam added Issue: Feature Request New feature or improvement to existing feature Help Wanted 🙏 Contribution task, outside help would be appreciated! labels Jun 27, 2023
@noklam
Copy link
Contributor Author

noklam commented Jun 27, 2023

It's easy if the value is int type, because we can do some simple type checking, but if user pass X_train:'50' it's harder to differentiate whether it's a valid dataset.

    for dataset_name in all_inputs_outputs:
        # Check if `dataset_name` is not `str`
        name = _strip_transcoding(dataset_name)
        if name != dataset_name and name in all_inputs_outputs:
            invalid.add(name)

@ElenaKhaustova
Copy link
Contributor

ElenaKhaustova commented Mar 14, 2024

I looked through the implementation and noted that the following:

  1. At first we validate inputs/outputs at the Node level - check that their type must be one of [String, List, Dict, None];
  2. Then at the Pipeline level - _validate_transcoded_inputs_outputs
  3. And then at the Runner level - check that pipeline inputs are in the DataCatalog
node(
    func=split_data,
    inputs=["example_iris_data", "parameters"],
    outputs={"X_train":1},
    name="split",
),

The reason of getting an AttributeError: 'int' object has no attribute 'split' for the above example is that at Node level we only check inputs/outputs (to the Node constructor) types but not types of names of variables used as inputs/outputs. So we only check if we input [String, List, Dict, None] but not [String, List[str], Dict[str, str], None]. So at the Pipeline level we expect strings but getting ints.

Given the above my suggestion is adding missing validation at the Node level. In this case we'll trigger on

  node(
      func=train_model,
      inputs={"X_train": 10, "y_train": "y_train"},
      outputs="regressor",
      name="train_model_node",
  ),

but not on

 node(
     func=train_model,
     inputs={"X_train": "10", "y_train": "y_train"},
     outputs="regressor",
     name="train_model_node",
 ),

which seems fine, because it will be validated at the Runner level.

Any thoughts? 🙂

@ElenaKhaustova
Copy link
Contributor

Please see the suggestion here: main...feature/2733-validate-node-values

@noklam
Copy link
Contributor Author

noklam commented Mar 14, 2024

@ElenaKhaustova nice analysis, I think if we can validate at node level, we should do this as we can give better error message. Please go ahead and open a PR, I think people may have some comment on the error message, but this looks good enough to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Help Wanted 🙏 Contribution task, outside help would be appreciated! Issue: Feature Request New feature or improvement to existing feature
Projects
Status: Done
3 participants