Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Error in chapter 15 (Forecasting Multivariate Time Series) #112

Open
michabuehlmann opened this issue Dec 11, 2023 · 5 comments

Comments

@michabuehlmann
Copy link

michabuehlmann commented Dec 11, 2023

To Reproduce
My error is in chapter 15 in the paragraph "Forecasting Multivariate Time Series" (page 559 in the book).
In cell 43 is the following code:

train_mulvar_ds = tf.keras.utils.timeseries_dataset_from_array(
    mulvar_train.to_numpy(),  # use all 5 columns as input
    targets=mulvar_train["rail"][seq_length:],  # forecast only the rail series
    sequence_length=seq_length,
    batch_size=32,
    shuffle=True,
    seed=42
)

I get here a ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float). See the stack trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[45], line 3
      1 tf.random.set_seed(42)  # extra code – ensures reproducibility
----> 3 train_mulvar_ds = tf.keras.utils.timeseries_dataset_from_array(
      4     mulvar_train.to_numpy(),  # use all 5 columns as input
      5     targets=mulvar_train["rail"][seq_length:],  # forecast only the rail series
      6     sequence_length=seq_length,
      7     batch_size=32,
      8     shuffle=True,
      9     seed=42
     10 )
     11 valid_mulvar_ds = tf.keras.utils.timeseries_dataset_from_array(
     12     mulvar_valid.to_numpy(),
     13     targets=mulvar_valid["rail"][seq_length:],
     14     sequence_length=seq_length,
     15     batch_size=32
     16 )

File ~/anaconda3/envs/mlp/lib/python3.8/site-packages/keras/src/utils/timeseries_dataset.py:245, in timeseries_dataset_from_array(data, targets, sequence_length, sequence_stride, sampling_rate, batch_size, shuffle, seed, start_index, end_index)
    233 # For each initial window position, generates indices of the window elements
    234 indices = tf.data.Dataset.zip(
    235     (tf.data.Dataset.range(len(start_positions)), positions_ds)
    236 ).map(
   (...)
    242     num_parallel_calls=tf.data.AUTOTUNE,
    243 )
--> 245 dataset = sequences_from_indices(data, indices, start_index, end_index)
    246 if targets is not None:
    247     indices = tf.data.Dataset.zip(
    248         (tf.data.Dataset.range(len(start_positions)), positions_ds)
    249     ).map(
    250         lambda i, positions: positions[i],
    251         num_parallel_calls=tf.data.AUTOTUNE,
    252     )

File ~/anaconda3/envs/mlp/lib/python3.8/site-packages/keras/src/utils/timeseries_dataset.py:270, in sequences_from_indices(array, indices_ds, start_index, end_index)
    269 def sequences_from_indices(array, indices_ds, start_index, end_index):
--> 270     dataset = tf.data.Dataset.from_tensors(array[start_index:end_index])
    271     dataset = tf.data.Dataset.zip((dataset.repeat(), indices_ds)).map(
    272         lambda steps, inds: tf.gather(steps, inds),
    273         num_parallel_calls=tf.data.AUTOTUNE,
    274     )
    275     return dataset

File ~/anaconda3/envs/mlp/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py:746, in DatasetV2.from_tensors(tensors, name)
    742 # Loaded lazily due to a circular dependency (dataset_ops ->
    743 # from_tensors_op -> dataset_ops).
    744 # pylint: disable=g-import-not-at-top,protected-access
    745 from tensorflow.python.data.ops import from_tensors_op
--> 746 return from_tensors_op._from_tensors(tensors, name)

File ~/anaconda3/envs/mlp/lib/python3.8/site-packages/tensorflow/python/data/ops/from_tensors_op.py:23, in _from_tensors(tensors, name)
     22 def _from_tensors(tensors, name):  # pylint: disable=unused-private-name
---> 23   return _TensorDataset(tensors, name)

File ~/anaconda3/envs/mlp/lib/python3.8/site-packages/tensorflow/python/data/ops/from_tensors_op.py:31, in _TensorDataset.__init__(self, element, name)
     29 def __init__(self, element, name=None):
     30   """See `tf.data.Dataset.from_tensors` for details."""
---> 31   element = structure.normalize_element(element)
     32   self._structure = structure.type_spec_from_value(element)
     33   self._tensors = structure.to_tensor_list(self._structure, element)

File ~/anaconda3/envs/mlp/lib/python3.8/site-packages/tensorflow/python/data/util/structure.py:133, in normalize_element(element, element_signature)
    130       else:
    131         dtype = getattr(spec, "dtype", None)
    132         normalized_components.append(
--> 133             ops.convert_to_tensor(t, name="component_%d" % i, dtype=dtype))
    134 return nest.pack_sequence_as(pack_as, normalized_components)

File ~/anaconda3/envs/mlp/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py:183, in trace_wrapper.<locals>.inner_wrapper.<locals>.wrapped(*args, **kwargs)
    181   with Trace(trace_name, **trace_kwargs):
    182     return func(*args, **kwargs)
--> 183 return func(*args, **kwargs)

File ~/anaconda3/envs/mlp/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:1443, in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
   1441 # TODO(b/142518781): Fix all call-sites and remove redundant arg
   1442 preferred_dtype = preferred_dtype or dtype_hint
-> 1443 return tensor_conversion_registry.convert(
   1444     value, dtype, name, as_ref, preferred_dtype, accepted_result_types
   1445 )

File ~/anaconda3/envs/mlp/lib/python3.8/site-packages/tensorflow/python/framework/tensor_conversion_registry.py:234, in convert(value, dtype, name, as_ref, preferred_dtype, accepted_result_types)
    225       raise RuntimeError(
    226           _add_error_prefix(
    227               f"Conversion function {conversion_func!r} for type "
   (...)
    230               f"actual = {ret.dtype.base_dtype.name}",
    231               name=name))
    233 if ret is None:
--> 234   ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
    236 if ret is NotImplemented:
    237   continue

File ~/anaconda3/envs/mlp/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:324, in _constant_tensor_conversion_function(v, dtype, name, as_ref)
    321 def _constant_tensor_conversion_function(v, dtype=None, name=None,
    322                                          as_ref=False):
    323   _ = as_ref
--> 324   return constant(v, dtype=dtype, name=name)

File ~/anaconda3/envs/mlp/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:263, in constant(value, dtype, shape, name)
    166 @tf_export("constant", v1=[])
    167 def constant(value, dtype=None, shape=None, name="Const"):
    168   """Creates a constant tensor from a tensor-like object.
    169 
    170   Note: All eager `tf.Tensor` values are immutable (in contrast to
   (...)
    261     ValueError: if called on a symbolic tensor.
    262   """
--> 263   return _constant_impl(value, dtype, shape, name, verify_shape=False,
    264                         allow_broadcast=True)

File ~/anaconda3/envs/mlp/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:275, in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
    273     with trace.Trace("tf.constant"):
    274       return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
--> 275   return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
    277 const_tensor = ops._create_graph_constant(  # pylint: disable=protected-access
    278     value, dtype, shape, name, verify_shape, allow_broadcast
    279 )
    280 return const_tensor

File ~/anaconda3/envs/mlp/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:285, in _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
    283 def _constant_eager_impl(ctx, value, dtype, shape, verify_shape):
    284   """Creates a constant on the current device."""
--> 285   t = convert_to_eager_tensor(value, ctx, dtype)
    286   if shape is None:
    287     return t

File ~/anaconda3/envs/mlp/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:98, in convert_to_eager_tensor(value, ctx, dtype)
     96     dtype = dtypes.as_dtype(dtype).as_datatype_enum
     97 ctx.ensure_initialized()
---> 98 return ops.EagerTensor(value, ctx.device_name, dtype)

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).

Expected behavior
It should be the base for the code in cell 45:

# extra code – compiles, fits, and evaluates the model, like earlier
fit_and_evaluate(mulvar_model, train_mulvar_ds, valid_mulvar_ds,
                 learning_rate=0.05)

Screenshots
There is no screenshot.

Versions (please complete the following information):

  • OS: [MacOSX 14.1.2]
  • Python: [3.8.18]
  • TensorFlow: [2.13.0]
  • Scikit-Learn: [1.3.1]

Additional context
The following paragraphs build on each other. So I get also errors in this blocks.

@michabuehlmann michabuehlmann changed the title Error in chapter 15 (Forecasting Multivariate Time Series) [QUESTION] Error in chapter 15 (Forecasting Multivariate Time Series) Dec 11, 2023
@ageron
Copy link
Owner

ageron commented Jan 31, 2024

Thanks for your feedback. I've just tried running this code on Colab, and it ran smoothly:
image

I'm not sure why it's failing for you. Could you please check which versions of Pandas, TensorFlow, and NumPy you're using? Ideally they should match those on Colab, which are currently Pandas 1.5.3, TensorFlow 2.15.0, NumPy 1.23.5. Please let me know if this helps.

@michabuehlmann
Copy link
Author

michabuehlmann commented Feb 1, 2024 via email

@mario-ct
Copy link

mario-ct commented Mar 14, 2024

i was trying out Chapter 15 just now and encoutered the same issue. After a bit of bug hunting i found out that the culprit is pandas at the current latest version 2.1.4. if you print the first argument of the timeseries_dataset_from_array function = print(mulvar_train.to_numpy()) with pandas 2.1.4 you'll get this :

 [[0.303321 0.319835 True False False]
 [0.448859 0.365509 False True False]
 [0.34054 0.287661 False False True]
 ...
 [0.394088 0.307105 False True False]
 [0.31455 0.26531 False False True]
 [0.463165 0.386058 False True False]]

However if you print it using pandas 1.5.3 (currently used in colab), you'll get:

[[0.303321 0.319835 1.       0.       0.      ]
[0.448859 0.365509 0.       1.       0.      ]
[0.34054  0.287661 0.       0.       1.      ]
...
[0.394088 0.307105 0.       1.       0.      ]
[0.31455  0.26531  0.       0.       1.      ]
[0.463165 0.386058 0.       1.       0.      ]]

So i guess that the timeseries function won't work with boolean values.
I think the origin of the boolean values is the pandas api for one-hot encoding pd.get_dummies(df_mulvar)

use this pip install pandas==1.5.3 and it will run with no problem.

@tlac980
Copy link

tlac980 commented Mar 19, 2024

As @mario-ct explained, this is due to a change of behavior in the pd.get_dummies() function with Pandas 2. The default data type for the newly created columns (for the one-hot encoding) is Boolean, which the tf.keras.utils.timeseries_dataset_from_array() function does not like.

You can fix this by specifying np.float32 as data type of the new columns when creating the one-hot encoding for the df_mulvardataframe:
df_mulvar = pd.get_dummies(df_mulvar, dtype = np.float32)

@michabuehlmann
Copy link
Author

Thanks a lot for your solution, tlac980. Also because I don't have to install an old version of the pandas library. I tested it and it worked also on my machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants