This is an example how to do fine tuning with your own datasets from huggingface gpt2 models
-
git clone this repository
-
pip install -r requirements.txt
-
FYI Huggingface only allowing 1024byte of data on every training, so we should chunk our data
run prepare_data.py to generate got.txt into chunked
-
run models.py to fine tuned our model into existing huggingface data
Turn on the api by typing this -
uvicorn main:app --reload
To hit the api you can open postman/any other apps for api post you want
hit into localhost:8000/generate
with this payload as json
{"prefix": "The young knight",
"max_length": 800,
"top_k": 5 }
If your machine is equipped with CUDA capabilities, I kindly request that you uncomment a few lines in the models.py and runner.py files. These lines pertain to CUDA-specific configurations and optimizations. However, please note that I do not have access to a CUDA-enabled machine, so I have disabled these lines in my version of the code.
To ensure compatibility and seamless execution on non-CUDA machines, I have thoroughly tested the project without CUDA dependencies. Rest assured that the functionality and performance remain unaffected.