WikiText 103 evaluation #246

karpathy · 2024-04-24T18:14:16Z

I've seen some repos use WikiText-103 as the dataset they use to eval GPT-like models, e.g.:

https://github.com/tysam-code/hlb-gpt/tree/main

Add prepro script to download and preprocess and tokenize WikiText-103 just like tiny shakespeare / tiny stories, following this repo. Adapt the mainline training script train_gpt2.cu to report the validation performance on this set.

Add python code that does the same, evaluates on WikiText-103, and reports performance for all the GPT-2 model sizes. This is our baseline to reach, training from scratch init.

Optionally help research other ways that people have evaluated GPT-2 models, or attempted to reproduce them in the past.

The text was updated successfully, but these errors were encountered:

karpathy · 2024-05-16T22:47:02Z

We are abandoning WikiText103 because it's a total mess. We'll instead look at one/few of ARC Easy / Challenge, Squad, Hellaswag, TriviaQA, LAMBADA. Closing.

karpathy added the good first issue Good for newcomers label Apr 24, 2024

This was referenced Apr 28, 2024

Adding WikiText-103 dataset preprocessing and tokenization #276

Closed

Adding GPT2 model evaluation on WikiText-103 with optional preprocessing in dev/model-eval/ #340

Closed

karpathy mentioned this issue May 6, 2024

WikiText103 eval, attempt to reproduce Alec table posted on Reddit #374

Closed

karpathy closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WikiText 103 evaluation #246

WikiText 103 evaluation #246

karpathy commented Apr 24, 2024

karpathy commented May 16, 2024

WikiText 103 evaluation #246

WikiText 103 evaluation #246

Comments

karpathy commented Apr 24, 2024

karpathy commented May 16, 2024