Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WikiText 103 evaluation #246

Closed
karpathy opened this issue Apr 24, 2024 · 1 comment
Closed

WikiText 103 evaluation #246

karpathy opened this issue Apr 24, 2024 · 1 comment
Labels
good first issue Good for newcomers

Comments

@karpathy
Copy link
Owner

I've seen some repos use WikiText-103 as the dataset they use to eval GPT-like models, e.g.:

https://github.com/tysam-code/hlb-gpt/tree/main

Add prepro script to download and preprocess and tokenize WikiText-103 just like tiny shakespeare / tiny stories, following this repo. Adapt the mainline training script train_gpt2.cu to report the validation performance on this set.

Add python code that does the same, evaluates on WikiText-103, and reports performance for all the GPT-2 model sizes. This is our baseline to reach, training from scratch init.

Optionally help research other ways that people have evaluated GPT-2 models, or attempted to reproduce them in the past.

@karpathy
Copy link
Owner Author

We are abandoning WikiText103 because it's a total mess. We'll instead look at one/few of ARC Easy / Challenge, Squad, Hellaswag, TriviaQA, LAMBADA. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant