How to generate outputs from the PPOTrainer of chatgpt? #2906
Unanswered
huliangbing
asked this question in
Community | Q&A
Replies: 2 comments 1 reply
-
Thanks for your feedback. We have already supported actor-inference in our newly updated PR. |
Beta Was this translation helpful? Give feedback.
1 reply
-
how to infer a RM(reward_model) like rm_checkpoint.pt? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
How to generate outputs from the PPOTrainer of chatgpt? Can we generate outputs from reward_model or initial_model?
Can you show me the code like this:
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
inputs = tokenizer(prompt, return_tensors="pt")
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
outputs = ### model.generate(**inputs, num_beams=5, num_beam_groups=5, max_new_tokens=30)
tokenizer.decode(outputs[0], skip_special_tokens=True)
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions