Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GLDV2 - Google Landmarks Download Issue #3880

Open
sarapieri opened this issue Apr 29, 2023 · 1 comment
Open

GLDV2 - Google Landmarks Download Issue #3880

sarapieri opened this issue Apr 29, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@sarapieri
Copy link

GLDV2 Dataset

I am trying to use the load_data function on the GLDV2 dataset in the reduced version of 23K (see function call below) using this code:

import tensorflow as tf
import tensorflow_federated as tff

gldv2_train, gldv2_test = tff.simulation.datasets.gldv2.load_data(gld23k = True)

Environment:

  • Linux
  • TensorFlow Federated: 0.56.0
  • TensorFlow: 2.12.0
  • Python version: 3.9
  • Not building it from source

Expected behavior
Download the data from landmarks-user-160k.zip

Error
raise ValueError(
ValueError: Incomplete or corrupted file detected. The md5 file hash does not match the provided value of 825975950b2e22f0f66aa8fd26c1f153 images_000.tar

@sarapieri sarapieri added the bug Something isn't working label Apr 29, 2023
@edvinasstaupas
Copy link

edvinasstaupas commented Oct 22, 2023

Having seemingly the same problem
Using Notebook 2023/09/22 version

gldv2_train, gldv2_test = tff.simulation.datasets.gldv2.load_data() produces:

Incomplete or corrupted file detected. The md5 file hash does not match the provided value of 825975950b2e22f0f66aa8fd26c1f153  images_000.tar

and

gldv2_train, gldv2_test = tff.simulation.datasets.gldv2.load_data(num_worker=5, cache_dir='/cache') produces:

[/usr/local/lib/python3.10/dist-packages/tensorflow_federated/python/simulation/datasets/gldv2.py](https://localhost:8080/#) in load_data(num_worker, cache_dir, gld23k, base_url)
    450     logger.info('Try loading dataset from cache')
--> 451     return vision_datasets_utils.load_data_from_cache(
    452         existing_data_cache, TRAIN_SUB_DIR, TEST_FILE_NAME, LOGGER

11 frames
NotFoundError: Could not find directory /cache/gld160k/train

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/keras/src/utils/data_utils.py](https://localhost:8080/#) in get_file(fname, origin, untar, md5_hash, file_hash, cache_subdir, hash_algorithm, extract, archive_format, cache_dir)
    360         if os.path.exists(fpath) and file_hash is not None:
    361             if not validate_file(fpath, file_hash, algorithm=hash_algorithm):
--> 362                 raise ValueError(
    363                     "Incomplete or corrupted file detected. "
    364                     f"The {hash_algorithm} "

ValueError: Incomplete or corrupted file detected. The md5 file hash does not match the provided value of f62fb3b43041a320d474b6dfbc6696af  images_025.tar
.

@sarapieri have you found anyway to fix this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants