my own style model #2

dickkky · 2024-05-15T15:08:39Z

How do I train with my own text to get my own style model?

annacomnena · 2024-05-27T09:46:26Z

同问，我也想知道这个数据集是怎样构建的

Tirpitz-z · 2024-05-28T03:30:15Z

同问

stylellm · 2024-05-29T13:41:52Z

以下是本repo所发布模型的创建步骤，需要训练其他风格模型的人都可以以此为参考：

准备平行数据集
- 准备希望学习风格的小说文本。
- 将小说的全文分割成句子。
- 使用LLM，如GPT来修改每个句子的风格。输入提示 "change the style of this sentence. \ninput: … \noutput:"，即可得到一个风格不同但含义相同的新句子。
- 将新句子与原小说句子配对，构成一个平行数据集。
SFT
- 使用平行数据集训练 SFT 模型，输入是新句子，输出为原句子。
RM
- 使用不同的checkpoint和温度采样多个 SFT 模型输出。
- 对输出进行排名，构成一个偏好数据集。
- 使用偏好数据集训练 RM 模型。
PPO
- 使用 SFT 和 RM 模型训练最终的 PPO 模型。

上述的平行数据和偏好数据集也会在未来发布供参考。

Here are the steps to build models in this repository. Anyone who wants to build other style model can use this as a reference:

Prepare parallel dataset
- Obtain novel text that you wish to learn style from.
- Segment the full text of the novel into sentences.
- Utilize an LLM such as GPT to alter the style of each sentence. Input the prompt "change the style of this sentence. \ninput: … \noutput:" and we will receive a new sentence with a distinct style while maintaining the same meaning.
- Pair the new sentence with the original novel sentence, thereby creating a parallel dataset.
SFT
- Train the SFT model using the parallel data, with the new sentence as input and the original sentence as output.
RM
- Sample multiple SFT model outputs using different checkpoints and temperatures.
- Rank the outputs to create a preference dataset.
- Train the RM model using the preference dataset.
PPO
- Train the final PPO model with SFT and RM model.

The parallel data and preference dataset mentioned above will also published in future.

Provide feedback