Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tree_plot.html() Fails #89

Open
rlcauvin opened this issue Apr 24, 2024 · 8 comments
Open

tree_plot.html() Fails #89

rlcauvin opened this issue Apr 24, 2024 · 8 comments

Comments

@rlcauvin
Copy link

rlcauvin commented Apr 24, 2024

Retrieving the HTML for a tree plot from my gradient boosted tree model fails:

tree_plot = df_model.plot_tree(tree_idx = 1, max_depth = 16)
html = tree_plot.html()

Output:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[26], line 2
      1 tree_plot = df_model.plot_tree(tree_idx = 1, max_depth = 16)
----> 2 html = tree_plot.html()

File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/model/tree/plot.py:93, in TreePlot.html(self)
     91 def html(self) -> str:
     92   """Returns HTML plot of the tree."""
---> 93   return self._repr_html_()

File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/model/tree/plot.py:100, in TreePlot._repr_html_(self)
     97   # Plotting library.
     98   import pkgutil
--> 100   plotter_js = pkgutil.get_data(__name__, "plotter.js").decode()
    102   container_id = "tree_plot_" + uuid.uuid4().hex
    104   html_content = string.Template("""
    105 <script src='${d3js_url}'></script>
    106 <div id="${container_id}"></div>
   (...)
    116       json_tree_content=json.dumps(self._tree_json),
    117   )

File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/pkgutil.py:639, in get_data(package, resource)
    637 parts.insert(0, os.path.dirname(mod.__file__))
    638 resource_name = os.path.join(*parts)
--> 639 return loader.get_data(resource_name)

File <frozen importlib._bootstrap_external>:1073, in get_data(self, path)

FileNotFoundError: [Errno 2] No such file or directory: '/home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/model/tree/plotter.js'

Here are some relevant package versions I have installed:

tensorflow==2.16.1
tensorflow-datasets==4.9.4
tensorflow-estimator==2.15.0
tensorflow-io-gcs-filesystem==0.36.0
tensorflow-metadata==1.15.0
tensorflow-ranking==0.5.5
tensorflow-recommenders==0.7.3
tensorflow-serving-api==2.15.1
tensorflow_decision_forests==1.9.0
tf_keras==2.16.0
ydf==0.4.2
@rlcauvin rlcauvin changed the title tree_plot.html() Fails tree_plot.html() Fails Apr 25, 2024
@achoum
Copy link
Collaborator

achoum commented May 3, 2024

Thanks for the report.
This has been solved in 2299af1 and will be made available in the next release.

@rlcauvin
Copy link
Author

rlcauvin commented May 8, 2024

After installing ydf 0.4.3, the output of the same code is now:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[56], line 2
      1 tree_plot = df_model.plot_tree(tree_idx = 1, max_depth = 16)
----> 2 html = tree_plot.html()

File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/model/tree/plot.py:93, in TreePlot.html(self)
     91 def html(self) -> str:
     92   """Returns HTML plot of the tree."""
---> 93   return self._repr_html_()

File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/model/tree/plot.py:116, in TreePlot._repr_html_(self)
    100   plotter_js = pkgutil.get_data(__name__, "plotter.js").decode()
    102   container_id = "tree_plot_" + uuid.uuid4().hex
    104   html_content = string.Template("""
    105 <script src='${d3js_url}'></script>
    106 <div id="${container_id}"></div>
    107 <script>
    108 ${plotter_js}
    109 display_tree(${options}, ${json_tree_content}, "#${container_id}")
    110 </script>
    111 """).substitute(
    112       d3js_url=self._d3js_url,
    113       options=json.dumps(self._options),
    114       plotter_js=plotter_js,
    115       container_id=container_id,
--> 116       json_tree_content=json.dumps(self._tree_json),
    117   )
    118   return html_content

File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/json/__init__.py:231, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    226 # cached encoder
    227 if (not skipkeys and ensure_ascii and
    228     check_circular and allow_nan and
    229     cls is None and indent is None and separators is None and
    230     default is None and not sort_keys and not kw):
--> 231     return _default_encoder.encode(obj)
    232 if cls is None:
    233     cls = JSONEncoder

File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/json/encoder.py:199, in JSONEncoder.encode(self, o)
    195         return encode_basestring(o)
    196 # This doesn't pass the iterator directly to ''.join() because the
    197 # exceptions aren't as detailed.  The list call should be roughly
    198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
    200 if not isinstance(chunks, (list, tuple)):
    201     chunks = list(chunks)

File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/json/encoder.py:257, in JSONEncoder.iterencode(self, o, _one_shot)
    252 else:
    253     _iterencode = _make_iterencode(
    254         markers, self.default, _encoder, self.indent, floatstr,
    255         self.key_separator, self.item_separator, self.sort_keys,
    256         self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)

File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/json/encoder.py:179, in JSONEncoder.default(self, o)
    160 def default(self, o):
    161     """Implement this method in a subclass such that it returns
    162     a serializable object for ``o``, or calls the base implementation
    163     (to raise a ``TypeError``).
   (...)
    177 
    178     """
--> 179     raise TypeError(f'Object of type {o.__class__.__name__} '
    180                     f'is not JSON serializable')

TypeError: Object of type RepeatedScalarContainer is not JSON serializable

@achoum
Copy link
Collaborator

achoum commented May 8, 2024

Thanks for the prompt alert :).

It seems to be another error. I suspect this is caused by a dependency mismatch between ydf / json / protobuff as the error message suggest json does not allow for the serialization of proto arrays.

Do you mind printing your json and protobuff versions?

import google.protobuf
print(google.protobuf.__version__)

import json
print(json.__version__)

@rlcauvin
Copy link
Author

rlcauvin commented May 8, 2024

I just restarted the notebook and am no longer able to import the ydf package. See #87 (comment).

However, despite the unsuccessful ydf package import, I did print the json and protobuf versions in case it helps:

3.20.3
2.0.9

@rlcauvin
Copy link
Author

rlcauvin commented May 8, 2024

After a fresh install of ydf and tensorflow_decisions_forests, I am getting the same error from tree_plot.html(), but only when I build the model using a specific parameter set for ydf.GradientBoostedTreesLearner. When I used the default parameters, the tree_plot.html() worked fine.

Meanwhile, I have verified protobuf and json versions again are:

3.20.3
2.0.9

@achoum
Copy link
Collaborator

achoum commented May 8, 2024

Thanks for the versions.

Can you share the parameters of the GradientBoostedTreesLearner that cause this issue?

@rlcauvin
Copy link
Author

rlcauvin commented May 8, 2024

Here is the code I'm using to create the learner and model (that leads to the error calling tree_plot.html()):

df_param_method = "best_guess"

df_params = {"compute_permutation_variable_importance": True}
if df_param_method == "tune":
  df_tuner = ydf.RandomSearchTuner(num_trials = 20, automatic_search_space = True, max_trial_duration = None)
  df_params |= {"tuner": df_tuner}
elif df_param_method == "best_guess":
  df_params |= \
    {
    "split_axis": "SPARSE_OBLIQUE",
    "sparse_oblique_projection_density_factor": 3.0,
    "sparse_oblique_normalization": "STANDARD_DEVIATION",
    "sparse_oblique_weights": "BINARY",
    "categorical_algorithm": "RANDOM",
    "growing_strategy": "LOCAL",
    "max_num_nodes": None,
    "sampling_method": "RANDOM",
    "subsample": 0.6,
    "shrinkage": 0.02,
    "min_examples": 20,
    "use_hessian_gain": True,
    "num_candidate_attributes_ratio": 0.2,
    "max_depth": 8
    }

df_learner = ydf.GradientBoostedTreesLearner(label = target_column_name, **df_params)
df_model = df_learner.train(ds = cached_train_ds, valid = cached_test_ds)

@achoum
Copy link
Collaborator

achoum commented May 10, 2024

Thanks.

There is an issue when plotting sparse oblique trees. The fix will be available in the next release.
In the meantime you can "print" the trees instead of "plotting" them:

Temporarily use model.print_tree() instead of model.plot_tree().

*I'll close the issue after the new version get release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants