Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A dataset not found when I run "python script/pretrain.py -c config/pretrain/mc_gearnet_edge.yaml --gpus [0]" #45

Open
horacehht opened this issue Aug 1, 2023 · 5 comments

Comments

@horacehht
Copy link

It seems that the file located in "https://ftp.ebi.ac.uk/pub/databases/alphafold/latest/UP000006548_3702_ARATH_v2.tar" really doesn't exist. When I entered this url in my browser, it also noticed me that the file doesn't exist.

14:43:55   Downloading https://ftp.ebi.ac.uk/pub/databases/alphafold/latest/UP000006548_3702_ARATH_v2.tar to /home/horace/scratch/protein-datasets/alphafold/UP000006548_3702_ARATH_v2.tar
Traceback (most recent call last):
  File "script/pretrain.py", line 50, in <module>
    dataset = core.Configurable.load_config_dict(cfg.dataset)
  File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/core/core.py", line 269, in load_config_dict
    return cls(**new_config)
  File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/core/core.py", line 288, in wrapper
    return init(self, *args, **kwargs)
  File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/datasets/alphafolddb.py", line 122, in __init__
    tar_file = utils.download(self.urls[species_id], path, md5=self.md5s[species_id])
  File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/utils/file.py", line 31, in download
    urlretrieve(url, save_file)
  File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
@horacehht
Copy link
Author

horacehht commented Aug 1, 2023

Oh, I found that on the web the dataset's version turns to v4 instead of v2. So If I just used v4 dataset, will it have an effect on the experiments? Addtionally, how did I use v4?

@Oxer11
Copy link
Collaborator

Oxer11 commented Aug 1, 2023

I think it's okay to use v4 instead of v2. The pre-training dataset doesn't have a large effect on the final performance.

@horacehht
Copy link
Author

I think it's okay to use v4 instead of v2. The pre-training dataset doesn't have a large effect on the final performance.

I have downloaded the v4 dataset and put it into the correct directory. However, when I tried to run the command python script/pretrain.py -c config/pretrain/mc_gearnet_edge.yaml --gpus [0], the program still started to download the v2 dataset. I don't know how to deal with this condition.

@Oxer11
Copy link
Collaborator

Oxer11 commented Aug 2, 2023

Sorry for the inconvience! This is because I set the default files as v2 datasets instead of v4 datasets. The easiest way to change this is to inherit the datasets.AlphaFoldDB class and rewrite the urls and md5s attributes here. The class will check the downloaded files according to filenames in urls and check the md5 values.

@Sajib-006
Copy link

Sajib-006 commented Apr 3, 2024

I think this url issue is resolved in the updated version(0.2.1)
Installing the updated torchdrug fixed this
Use: pip install torchdrug==0.2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants