Potential Bug with split_between_processes #2736

Vincent-Li-9701 · 2024-05-02T18:49:07Z

System Info

accelerate==0.29.3

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

The main issue comes from here. If inputs originally is not on the device but on cpu, there will be a Runtime Error.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cpu!

def grade(inputs):
    distributed_state = Accelerator()
    model_path = "mistralai/Mistral-7B-Instruct-v0.2"

    tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False, padding_side="left")
    tokenizer.pad_token = tokenizer.unk_token
    model = AutoModelForCausalLM.from_pretrained(model_path).eval().to(distributed_state.device)

    prompts = tokenizer(
        list(inputs['prompt']), 
        add_special_tokens=True,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=756,
    ).data 
    generation_config = GenerationConfig(
        max_new_tokens=5,
        pad_token_id=tokenizer.pad_token_id,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )
    txt_outs = []
    batch_size = 10
    with distributed_state.split_between_processes(prompts, apply_padding=True) as inputs:        
        for i in range(0, len(inputs["input_ids"]), batch_size):
            with torch.no_grad():
                res = model.generate(
                    input_ids=inputs["input_ids"][i: i + batch_size].to(distributed_state.device), 
                    attention_mask=inputs["attention_mask"][i: i + batch_size].to(distributed_state.device),
                    generation_config=generation_config
                )
            txt_out = tokenizer.batch_decode(res, skip_special_tokens=True, clean_up_tokenization_spaces=True)
            txt_out_across_devices = [None for _ in range(distributed_state.num_processes)]

            dist.gather_object(
                 txt_out,
                 txt_out_across_devices if distributed_state.is_main_process else None,
                    dst=0
            )
            if distributed_state.is_main_process:
                txt_outs.extend(txt_out_across_devices)
test_samples = pd.DataFrame({"prompt": ["Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?"]})
notebook_launcher(grade, args=[test_samples], num_processes=8)

Expected behavior

No error and padding applied successfully.

The text was updated successfully, but these errors were encountered:

bohao-cao · 2024-05-23T19:43:14Z

Did you figure out the solutioin?

Vincent-Li-9701 · 2024-05-23T20:47:12Z

Did you figure out the solution?

There is no good solution on this other than changing the source code. What I'm doing now is pre-padding the samples myself. It's also needed when you are running multi-process/node inferencing. Do the padding before running the split between process

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential Bug with split_between_processes #2736

Potential Bug with split_between_processes #2736

Vincent-Li-9701 commented May 2, 2024 •

edited

bohao-cao commented May 23, 2024

Vincent-Li-9701 commented May 23, 2024 •

edited

Potential Bug with split_between_processes #2736

Potential Bug with split_between_processes #2736

Comments

Vincent-Li-9701 commented May 2, 2024 • edited

System Info

Information

Tasks

Reproduction

Expected behavior

bohao-cao commented May 23, 2024

Vincent-Li-9701 commented May 23, 2024 • edited

Vincent-Li-9701 commented May 2, 2024 •

edited

Vincent-Li-9701 commented May 23, 2024 •

edited