This project aims to perform extractive question answering in Chinese.
- Python 3.10.12
download.sh
: Script to install Python dependencies and download necessary files.requirements.txt
: List of Python packages required for this project.download.py
: Python script to download necessary files from Google Drive.run.sh
: Bash script to run the inference code.main.py
: Python script containing the inference code.train_src
: Folder containing additional resources for training models on paragraph selection and span selection on your own.report.pdf
: Explanations of the data processing and model selection.
git clone https://github.com/your_username/Chinese-Extractive-QA.git
cd Chinese-Extractive-QA
Run the download.sh
script to install the Python packages listed in requirements.txt
and download the necessary files.
./download.sh
To run the inference code, execute the run.sh
script with the following arguments:
${1}
: Path tocontext.json
${2}
: Path totest.json
${3}
: Path to the output prediction file namedprediction.csv
./run.sh /path/to/context.json /path/to/test.json /path/to/pred/prediction.csv
Note: Make sure to replace /path/to/context.json
, /path/to/test.json
, and /path/to/pred/prediction.csv
with the actual paths to your files. To use the example code in this repo, you can run as below.
./run.sh ./ADL_HW1/datasets/context.json ./ADL_HW1/datasets/test.json ./prediction.csv
In notebooks paragraph_selection.ipynb
and span_selection.ipynb
, you can fine-tune existing models or train from scratch. These codes are modified from these sources:
- Paragraph Selection: Transformers model on a multiple choices dataset, like SWAG
- Span Selection: Transformers model on a question-answering dataset, like SQuAD
This project is licensed under the MIT License - see the LICENSE.md file for details.