Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to obtain rel-stackex because of hash mismatch #134

Open
rohitnayak opened this issue Mar 30, 2024 · 2 comments
Open

Unable to obtain rel-stackex because of hash mismatch #134

rohitnayak opened this issue Mar 30, 2024 · 2 comments

Comments

@rohitnayak
Copy link

I am getting this error while trying to download the rel-stackex dataset. I am following the examples in the repo readme. rel-amazondoes get downloaded fine

>>> dataset = get_dataset(name="rel-stackex")
Downloading file 'rel-stackex/db.zip' from 'https://relbench.stanford.edu/staging_data/rel-stackex/db.zip' to '/root/.cache/relbench'.
100%|███████████████████████████████████████| 882M/882M [00:00<00:00, 3.58TB/s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/relbench/datasets/__init__.py", line 18, in get_dataset
    return dataset_cls_dict[name](*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/relbench/datasets/stackex.py", line 26, in __init__
    super().__init__(process=process)
  File "/usr/local/lib/python3.10/dist-packages/relbench/data/dataset.py", line 67, in __init__
    db_path = _pooch.fetch(
  File "/usr/local/lib/python3.10/dist-packages/pooch/core.py", line 589, in fetch
    stream_download(
  File "/usr/local/lib/python3.10/dist-packages/pooch/core.py", line 808, in stream_download
    hash_matches(tmp, known_hash, strict=True, source=str(fname.name))
  File "/usr/local/lib/python3.10/dist-packages/pooch/hashes.py", line 176, in hash_matches
    raise ValueError(
ValueError: SHA256 hash of downloaded file (db.zip) does not match the known hash: expected dfb84faa4918c6c4ecac791a69a30a477a7bee097d7295d48c78ceb8f59c997c but got deb00ccdf825e569b34935834444429cd1c0074b50226b12d616aab22d36242d. Deleted download for safety. The downloaded file may have been corrupted or the known hash may be outdated.
@rohitnayak
Copy link
Author

FYI: I was able to proceed by downloading it directly into the local cache directory.

Not sure what is causing it to fail using get_dataset(), since I didn't have to update the hardcoded hashes. I am on ubuntu:latest docker image running on a Mac, with python version Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux.

@xcvil
Copy link

xcvil commented May 1, 2024

Need to modify the __init__.py file in /PATH-TO-miniconda3/lib/python3.8/site-packages/relbench/__init__.py and change the corresponding hash in pooth.create and re-import the pkg

_pooch = pooch.create(
    path=pooch.os_cache("relbench"),
    base_url="https://relbench.stanford.edu/staging_data/",  # TODO: change
    registry={
        # extremely small dataset only used for testing download functionality
        "rel-amazon-fashion_5_core/db.zip": "27e08bc808438e8619560c54d0a4a7a11e965b90b8c70ef3a0928b44a46ad028",
        "rel-amazon-fashion_5_core/tasks/rel-amazon-churn.zip": "d98f2240aefa0f175dab2fce4a48a1cc595be584d4960cd9eb750d012326117d",
        "rel-amazon-fashion_5_core/tasks/rel-amazon-ltv.zip": "bd2b7b798efad2838a3701def8386dba816b45ef277a8e831052b79f5448aed8",
        "rel-stackex/db.zip": "dfb84faa4918c6c4ecac791a69a30a477a7bee097d7295d48c78ceb8f59c997c",
        "rel-stackex/tasks/rel-stackex-engage.zip": "9afce696507cf2f1a2655350a3d944fd411b007c05a389995fe7313084008d18",
        "rel-stackex/tasks/rel-stackex-votes.zip": "0dab5bebd76a95d689c8a3a62026c1c294a252c561fd940e8d9329d165d98a5a",
        "rel-amazon-books_5_core/db.zip": "2f6bd920bcfe08cbb7d47115f47f8d798a2ec1a034b6c2f3d8d9906e967454b4",
        "rel-amazon-books_5_core/tasks/rel-amazon-churn.zip": "d3890621b1576a9d5b6bc273cdd2ea2084aeaf9c8055c1421ded84be0c48dacb",
        "rel-amazon-books_5_core/tasks/rel-amazon-ltv.zip": "2e91be0ca5d9f591d8e33a40f70b97db346090a8bb9f3a94f49b147f0dc136be",
    },
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants