You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)
Reproduction
The main issue comes from here. If inputs originally is not on the device but on cpu, there will be a Runtime Error.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cpu!
def grade(inputs):
distributed_state = Accelerator()
model_path = "mistralai/Mistral-7B-Instruct-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False, padding_side="left")
tokenizer.pad_token = tokenizer.unk_token
model = AutoModelForCausalLM.from_pretrained(model_path).eval().to(distributed_state.device)
prompts = tokenizer(
list(inputs['prompt']),
add_special_tokens=True,
return_tensors="pt",
padding=True,
truncation=True,
max_length=756,
).data
generation_config = GenerationConfig(
max_new_tokens=5,
pad_token_id=tokenizer.pad_token_id,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
)
txt_outs = []
batch_size = 10
with distributed_state.split_between_processes(prompts, apply_padding=True) as inputs:
for i in range(0, len(inputs["input_ids"]), batch_size):
with torch.no_grad():
res = model.generate(
input_ids=inputs["input_ids"][i: i + batch_size].to(distributed_state.device),
attention_mask=inputs["attention_mask"][i: i + batch_size].to(distributed_state.device),
generation_config=generation_config
)
txt_out = tokenizer.batch_decode(res, skip_special_tokens=True, clean_up_tokenization_spaces=True)
txt_out_across_devices = [None for _ in range(distributed_state.num_processes)]
dist.gather_object(
txt_out,
txt_out_across_devices if distributed_state.is_main_process else None,
dst=0
)
if distributed_state.is_main_process:
txt_outs.extend(txt_out_across_devices)
test_samples = pd.DataFrame({"prompt": ["Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?", "Who am i?"]})
notebook_launcher(grade, args=[test_samples], num_processes=8)
Expected behavior
No error and padding applied successfully.
The text was updated successfully, but these errors were encountered:
There is no good solution on this other than changing the source code. What I'm doing now is pre-padding the samples myself. It's also needed when you are running multi-process/node inferencing. Do the padding before running the split between process
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
The main issue comes from here. If inputs originally is not on the device but on cpu, there will be a Runtime Error.
Expected behavior
No error and padding applied successfully.
The text was updated successfully, but these errors were encountered: