Custom-GPT2

General comments:

This is edited from mymusise/gpt2-quickly . The main changes include a BPE tokenizer instead of the sentence-piece (works better for English on GPT2) and a custom tokenizer wrapper Additionally, the data preprocessing and storage are heavily edited for simplicty and to match english language. Finally, the code includes a simple to use terminal interface along with an option to use a model I trained for 100 epochs

Usage:

Please install all requirements in requirements.txt using the following

pip install -r requirements.txt

Run the code (training and Prediction)

python main.py

To run with trained weights:

Download weights from google drive (check trained_weights readme) and save it in the the trained_weights folder
Then run :

python main.py -wtr

The folder structure is as follows:

Working Directorty
- datasets
  - raw.txt
- trainng_data
- models

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
datasets		datasets
trained_weights		trained_weights
README.md		README.md
customtokenizer.py		customtokenizer.py
main.py		main.py
requirements.txt		requirements.txt
trainmed.py		trainmed.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets

datasets

trained_weights

trained_weights

README.md

README.md

customtokenizer.py

customtokenizer.py

main.py

main.py

requirements.txt

requirements.txt

trainmed.py

trainmed.py

Repository files navigation

Custom-GPT2

General comments:

Usage:

Run the code (training and Prediction)

To run with trained weights:

About

Releases

Packages

Languages

amoghadishesha/Custom-GPT2

Folders and files

Latest commit

History

Repository files navigation

Custom-GPT2

General comments:

Usage:

Run the code (training and Prediction)

To run with trained weights:

About

Resources

Stars

Watchers

Forks

Languages