Skip to content

Can I use real dataset for GPT2-gemini training? #2649

Discussion options

You must be logged in to vote

Hi, of course, you can use a real dataset for training. The code under the example directory is to demonstrate how to use the Gemini in your applications. Because how to load a dataset is usually a personal thing, we use dummy data here.

You can refer the the following instructions to prepare a Webtext dataset.
https://github.com/hpcaitech/ColossalAI-Examples/blob/main/language/gpt/README.md#how-to-prepare-webtext-dataset

Replies: 2 comments 8 replies

Comment options

You must be logged in to vote
7 replies
@feifeibear
Comment options

@yurishin929
Comment options

@yurishin929
Comment options

@feifeibear
Comment options

@binmakeswell
Comment options

Answer selected by yurishin929
Comment options

You must be logged in to vote
1 reply
@binmakeswell
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants