Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

my own style model #2

Open
dickkky opened this issue May 15, 2024 · 3 comments
Open

my own style model #2

dickkky opened this issue May 15, 2024 · 3 comments

Comments

@dickkky
Copy link

dickkky commented May 15, 2024

How do I train with my own text to get my own style model?

@annacomnena
Copy link

同问,我也想知道这个数据集是怎样构建的

@Tirpitz-z
Copy link

同问

@stylellm
Copy link
Owner

以下是本repo所发布模型的创建步骤,需要训练其他风格模型的人都可以以此为参考:

  1. 准备平行数据集
    • 准备希望学习风格的小说文本。
    • 将小说的全文分割成句子。
    • 使用LLM,如GPT来修改每个句子的风格。输入提示 "change the style of this sentence. \ninput: … \noutput:",即可得到一个风格不同但含义相同的新句子。
    • 将新句子与原小说句子配对,构成一个平行数据集。
  2. SFT
    • 使用平行数据集训练 SFT 模型,输入是新句子,输出为原句子。
  3. RM
    • 使用不同的checkpoint和温度采样多个 SFT 模型输出。
    • 对输出进行排名,构成一个偏好数据集。
    • 使用偏好数据集训练 RM 模型。
  4. PPO
    • 使用 SFT 和 RM 模型训练最终的 PPO 模型。

上述的平行数据和偏好数据集也会在未来发布供参考。

Here are the steps to build models in this repository. Anyone who wants to build other style model can use this as a reference:

  1. Prepare parallel dataset
    • Obtain novel text that you wish to learn style from.
    • Segment the full text of the novel into sentences.
    • Utilize an LLM such as GPT to alter the style of each sentence. Input the prompt "change the style of this sentence. \ninput: … \noutput:" and we will receive a new sentence with a distinct style while maintaining the same meaning.
    • Pair the new sentence with the original novel sentence, thereby creating a parallel dataset.
  2. SFT
    • Train the SFT model using the parallel data, with the new sentence as input and the original sentence as output.
  3. RM
    • Sample multiple SFT model outputs using different checkpoints and temperatures.
    • Rank the outputs to create a preference dataset.
    • Train the RM model using the preference dataset.
  4. PPO
    • Train the final PPO model with SFT and RM model.

The parallel data and preference dataset mentioned above will also published in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants