Final Project for ADL 2023 Fall
R11922101 Chia-Hung Huang
data/
variant.csv
: Chinese variant character list.train_data.json
: Training data.val_data.json
: Validation data.
model/
: Saved checkpoint.utils/
clean_variant.py
: Replace Chinese variant characters with standard characters.opanai_ner.py
: Call the OpenAI GPT-4-turbo API and generate the NER data in Python dictionary format.labeling.py
: Label all the tokens based on the dictionary data generated in the previous step.
generate_dataset.py
: Use functions inutils/
to generate the training and validation dataset.run_ner.py
: The training script.run_ner_test.py
: The testing (predicting) script.plot.py
: Plot the training curve (loss, f1 score) on the validation set.app.py
: UI.
- Install Dependencies
pip install -r requirements.txt
- Download the Model
bash download.sh
- Train
bash run.sh
- Run the App
streamlit run app.py
- If you want to generate datasets with
generate_dataset.py
, please create a.env
file. You can take.env-default
as a template.