Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
baleen		baleen
colbert		colbert
utility		utility
.gitignore		.gitignore
ColBERTREADME.md		ColBERTREADME.md
LICENSE		LICENSE
LoTTE.md		LoTTE.md
README.md		README.md
app.py		app.py
colbert_index.py		colbert_index.py
colbert_search.py		colbert_search.py
colbert_train.py		colbert_train.py
conda_env.yml		conda_env.yml
conda_env_cpu.yml		conda_env_cpu.yml
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

ColBERT trained on the Chinese cmedqa-v2

This is a repo copy from the orignal ColBERT repo, by the Stanford NLP research team

这个库是复制的斯坦福的nlp 研究团队的的ColBERT

But I made some minor changes to the original repo, in order to make it compatible with Chinese BERT

但是我做了一点小的改动，使它能够兼容中文BERT

I trained ColBERT on the cmedqa-v2 dataset and observed impressive performance

我在cmedqa-v2数据集上训练了ColBERT,效果不错

I deployed the model on HuggingFace Space, link:https://huggingface.co/spaces/diagaiwei/ir_chinese_medqa

我把模型部署到了hf space,链接 https://huggingface.co/spaces/diagaiwei/ir_chinese_medqa

if you want to train it on your own data, please organize your data in the following format:

如果你想训练自己的数据，可以按照以下格式组织数据

it requires 3 files, queries.tsv, doc.tsv, triplets.jsonl
- the content of queries.tsv should be like this ( header is not needed in the tsv)
  query_1_id \t 高血压吃什么药 ?
  query_2_id \t 糖尿病吃什么药
- the content of doc.tsv should be like this ( header is not needed in the tsv):
  doc_1_id \t 高血压应该吃药A
  doc_2_id \t 糖尿病应该吃药B
- each line in the triplets file should be like this: query_id, pos_doc_id,neg_doc_id, for example:
  [query_1_id, doc_1_id, doc_2_id]
  [query_1_id, doc_1_id, doc_3_id]
  [... ... .... ]

After preparing the data, run python colbert_train.py

准备好数据后，运行 python colbert_train.py

After the training process, run python colbert_index.py to create search index

训练完后，运行 python colbert_index.py构建索引

After indexing run python colbert_search.py to test the model

构建索引后再运行 python colbert_search.py 测试

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

No packages published

Languages