Skip to content

Releases: minimaxir/gpt-2-simple

v0.8.1: TensorFlow 2 support

18 Oct 02:38
Compare
Choose a tag to compare

Thanks to https://github.com/YaleDHLab via #275, gpt-2-simple now supports TensorFlow 2 by default, and the minimum TensorFlow version is now 2.5.1! The Colab Notebook has also been update to no longer use TensorFlow 1.X.

Note: Development on gpt-2-simple has mostly been superceded by aitextgen, which has similar AI text generation capabilities with more efficient training time and resource usage. If you do not require using TensorFlow, I recommend using aitextgen instead. Checkpoints trained using gpt-2-simple can be loaded using aitextgen as well.

Fix model URL

14 Feb 21:13
Compare
Choose a tag to compare
  • Switched the model URL from GCP to Azure. (#253)
  • Pin TensorFlow 1.15 (#200)
  • Add checkpoint loading from other checkpoints (#175)

Remove finetuning asserts

28 Dec 04:05
Compare
Choose a tag to compare

Some have successfully finetuned 774M/1558M, so the assert has been removed.

Multi-GPU support + TF 2.0 assert

01 Dec 18:39
Compare
Choose a tag to compare
  • Multi-GPU support (#127) (not fully tested; will add some docs when done)
  • Fixed checkpoint dir bug (#134)
  • Added a hard assert of a TensorFlow version >= 2.0 is used (#137)

Handle 774M (large)

28 Aug 17:11
e6afb28
Compare
Choose a tag to compare
  • 774M is explicitly blocked from being fine-tuned and will trigger an assert if attempted. If a way to finetune it without being super-painful is added, the ability to finetune it will be restored.
  • Allow ability to generate text from the default pretrained models by passing model_name to gpt2.load_gpt2() and gpt2.generate() (this will work with 774M.
  • Addsgd as an optimizer parameter to finetune (default: adam)
  • Support for changed model names, w/ changes more prominent in the README.

Polish before TF 2.0

29 Jul 00:07
Compare
Choose a tag to compare

Merged a few PRs:

Fixed generate cmd run name: #78
Resolved most depreciation warnings: #83
Optional model parameters: #90

This does not make the package fully TF 2.0 compatible, but it's a big step!

Remove assertion

19 Jun 05:35
Compare
Choose a tag to compare

Assertion was triggering false positives, so removing it.

Prevent OOB + Cap Gen Length

18 Jun 04:00
Compare
Choose a tag to compare

Minor fix to prevent issue hit with gpt-2-cloud-run.

A goal of the release was to allow a graph reset without resetting the parameters; that did not seem to work, so holding off on that release.

Fixed prefix + miscellaneous bug fixes

16 Jun 03:16
Compare
Choose a tag to compare

Merged PRs (including fix for prefix issue). (see commits for more info)

A bunch of highly-requested features

20 May 03:53
7dc8210
Compare
Choose a tag to compare

Adapted a few functions from Neil Shepperd's fork:

  • Nucleus Sampling (top_p) when generating text, which results in surprisingly different results. (setting top_p=0.9 works well). Supercedes top_k when used. (#51)
  • An encode_dataset() function to preencode and compress a large dataset before loading it for finetuning. (#19, #54)

Improvements to continuing model training:

  • overwrite argument for finetune: with restore_from="latest", this continues model training without creating a duplicate copy of the model, and is therefore good for transfer learning using multiple datasets (#20)
  • You can continue to finetune a model without having the original GPT-2 model present.

Improvements with I/O involving Colaboratory

  • Checkpoint folders are now packaged into a .tar file when copying to Google Drive, and when copying from Google Drive, the '.tar' file is automatically unpackaged into the correct checkpoint format. (you can pass copy_folder=True to the copy_checkpoint function to revert to the old behavior). (#37: thanks @woctezuma !)
  • copy_checkpoint_to_gdrive and copy_checkpoint_from_gdrive now take a run_name argument instead of a checkpoint_folder argument.

Miscellaneous

  • Added CLI arguments for top_k, top_p, overwrite.
  • Cleaned up redundant function parameters (#39)