Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue training on multiple nodes #550

Open
edwardsp opened this issue Apr 18, 2024 · 2 comments
Open

Issue training on multiple nodes #550

edwardsp opened this issue Apr 18, 2024 · 2 comments
Labels
type/question An issue that's a question

Comments

@edwardsp
Copy link

❓ The question

I am trying to run training and I get this error when staring up:

HfHubHTTPError: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/datasets/glue/paths-info/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c
[2024-04-18 15:55:06] CRITICAL [olmo.util:158, rank=6] Uncaught HfHubHTTPError: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/datasets/glue/paths-info/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c

I am running on 2 nodes each with 8 GPUs, using the main branch and pytorch 2.2.2+cu121.

This works with just 1 node using 8 GPUs.

@edwardsp edwardsp added the type/question An issue that's a question label Apr 18, 2024
@xijiu9
Copy link

xijiu9 commented Apr 18, 2024

I have exactly the same problem. 1 node works, but 2 node fails. I think this is a problem on huggingface side.

@2015aroras
Copy link
Contributor

2015aroras commented Apr 19, 2024

We run into issues like that too. We don't have a robust solution yet, but one trick we do is caching the datasets locally (or once per node or however many file systems you have) as follows and then making HF not call the hub by setting the environment variable HF_DATASETS_OFFLINE=1.

from olmo.eval.downstream import *
tokenizer = Tokenizer.from_file("tokenizers/allenai_gpt-neox-olmo-dolma-v1_5.json")
for x in label_to_task_map.values():
    kwargs = {}
    if isinstance(x, tuple):
        x, kwargs = x
    x(tokenizer=tokenizer, **kwargs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question An issue that's a question
Projects
None yet
Development

No branches or pull requests

3 participants