Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

canot load EMNIST dataset #5356

Closed
davidshen84 opened this issue Apr 7, 2024 · 8 comments
Closed

canot load EMNIST dataset #5356

davidshen84 opened this issue Apr 7, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@davidshen84
Copy link

/!\ PLEASE INCLUDE THE FULL STACKTRACE AND CODE SNIPPET

Short description
Failed to load the emnist dataset

Environment information

  • Operating System: Linux

  • Python version: 3.9

  • tensorflow-datasets/tfds-nightly version: 4.9.4

  • tensorflow/tf-nightly version: 12.6.1

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ?

Reproduction instructions

import tensorflow_datasets as tfds

tfds.load("emnist", split=["train"])

If you share a colab, make sure to update the permissions to share it.

Link to logs

Expected behavior
The emnist dataset is loaded successfully.

Additional context

NonMatchingChecksumError: Artifact https://www.itl.nist.gov/iaui/vip/cs_links/EMNIST/gzip.zip, downloaded to /root/tensorflow_datasets/downloads/itl.nist.gov_iaui_vip_cs_links_EMNIST_gzipi4VnNviDSrfd9Zju6qv40flc3wr22t8ldulNStS6tmk.zip.tmp.8cdbd18c3c7144529f0a2a11d1829c60/itl, has wrong checksum:
* Expected: UrlInfo(size=535.73 MiB, checksum='fb9bb67e33772a9cc0b895e4ecf36d2cf35be8b709693c3564cea2a019fcda8e', filename='gzip.zip')
* Got: UrlInfo(size=110.12 KiB, checksum='bfd529724d06f22872f32d6649561a57fd25ec17ea51d6f2ad24b96ea0519c34', filename='itl')
To debug, see: https://www.tensorflow.org/datasets/overview#fixing_nonmatchingchecksumerror

I tried to download the file directly using the link https://www.itl.nist.gov/iaui/vip/cs_links/EMNIST/gzip.zip, but I got redirected to the NIST homepage. I think the link is outdated.

@davidshen84 davidshen84 added the bug Something isn't working label Apr 7, 2024
@marcenacp
Copy link
Collaborator

@davidshen84 Well spotted, thanks for opening the issue! It seems the URL (https://www.itl.nist.gov/iaui/vip/cs_links/EMNIST/gzip.zip) now redirects to https://www.nist.gov/itl which causes the problem.

Did you find the actual link?

@davidshen84
Copy link
Author

davidshen84 commented Apr 11, 2024 via email

@davidshen84
Copy link
Author

davidshen84 commented Apr 12, 2024 via email

@davidshen84
Copy link
Author

This should be the new emnist dataset URL: https://biometrics.nist.gov/cs_links/EMNIST/gzip.zip

@minchan0410
Copy link

Did you fix the error? If you have solved it, can you tell me how? I also get the same error :(

@davidshen84
Copy link
Author

davidshen84 commented May 7, 2024 via email

@ccl-core
Copy link
Collaborator

ccl-core commented May 8, 2024

Hello, #5401 which should have solved the issue is now merged!
Starting from tomorrow, the change will be available in tfds-nightly.

@ccl-core ccl-core closed this as completed May 8, 2024
@minchan0410
Copy link

Thank you both for letting us know. It was helpful!! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants