Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow_datasets v4.9.4 introduces bug that prevents loading datasets #5203

Open
kpertsch opened this issue Dec 21, 2023 · 4 comments
Open
Labels
bug Something isn't working

Comments

@kpertsch
Copy link

Short description
When upgrading to the most recent tensorflow_datasets==4.9.4 I am getting errors for loading datasets (from the official TFDS catalogue). I have verified that the same datasets can load in version 4.9.3 without problem.

Environment information

Reproduction instructions

import tensorflow_datasets as tfds
ds = tfds.load("fractal20220817_data", data_dir="gs://gresearch/robotics")

OR colab: https://colab.research.google.com/drive/1neCJ3_TnF1tqr8qv4FM5__-v4SwVOOxJ?usp=sharing

Link to logs

FileNotFoundError                         Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_info.py](https://localhost:8080/#) in read_from_json(path)
   1033   try:
-> 1034     json_str = epath.Path(path).read_text()
   1035   except OSError as e:

27 frames
FileNotFoundError: [Errno 2] No such file or directory: 'fractal20220817_data/0.1.0/dataset_info.json'

The above exception was the direct cause of the following exception:

FileNotFoundError                         Traceback (most recent call last)
FileNotFoundError: Could not load dataset info from fractal20220817_data/0.1.0/dataset_info.json

The above exception was the direct cause of the following exception:

FileNotFoundError                         Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/utils/py_utils.py](https://localhost:8080/#) in reraise(e, prefix, suffix)
    383     else:
    384       exception = RuntimeError(f'{type(e).__name__}: {msg}')
--> 385     raise exception from e
    386   # Otherwise, modify the exception in-place
    387   elif len(e.args) <= 1:

FileNotFoundError: Failed to construct dataset "fractal20220817_data", builder_kwargs "{'data_dir': 'gs://gresearch/robotics'}": Could not load dataset info from fractal20220817_data/0.1.0/dataset_info.json

Additional context
Interestingly, constructing a builder_from_directory still seems to work even in the most recent tfds version.
builder = tfds.builder_from_directory("gs://gresearch/robotics/fractal20220817_data/0.1.0")

@tomvdw
Copy link
Collaborator

tomvdw commented Dec 22, 2023

Thanks for your detailed bug report!

This is caused by that _GCS_BUCKET was made empty in this commit: b78fc27

I'll contact the people who changed it, but with the holidays I don't know how quickly they'll respond.

In the meantime you can also load it by specifying the version:

ds = tfds.load("fractal20220817_data:0.1.0", data_dir="gs://gresearch/robotics")

@tomvdw
Copy link
Collaborator

tomvdw commented Jan 12, 2024

A fix was submitted. Could you test with tfds nightly if it now works?

@Ericodencoder
Copy link

Thanks for your detailed bug report!

This is caused by that _GCS_BUCKET was made empty in this commit: b78fc27

I'll contact the people who changed it, but with the holidays I don't know how quickly they'll respond.

In the meantime you can also load it by specifying the version:

ds = tfds.load("fractal20220817_data:0.1.0", data_dir="gs://gresearch/robotics")

Hey, I am facing on the same issue, I tried the recommended line on Jupyter Notebook:
ds = tfds.load("fractal20220817_data:0.1.0", data_dir="gs://gresearch/robotics")
But still not work, and I got:
UnimplementedError: File system scheme 'gs' not implemented (file: 'gs://gresearch/robotics/fractal20220817_data/0.1.0/features.json')

And the same line in Colab, it doesn't raise errors,

but I got a stupid question which is:
How could I down the dataset (for example, fractal20220817_data) to my local PC, plz?

Thx a lot!

@Rahulraj0308
Copy link

@tomvdw or can we continue using the tfds.builder_from_directory workaround for loading datasets from the specified directory...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants