Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

W2NER模型加载不连续实体的data #122

Open
Bureaux-Tao opened this issue Apr 19, 2023 · 3 comments
Open

W2NER模型加载不连续实体的data #122

Bureaux-Tao opened this issue Apr 19, 2023 · 3 comments
Labels
todo list New feature or request

Comments

@Bureaux-Tao
Copy link

您好,W2NER模型如何load中文不连续实体呢?就像原文中的CADEC数据集一样的格式,以所有字符的下标数组作为实体位置标记:

{
    "sentence": ["For", "all", "of", "you", "who", "now", "have", "extremely", "low", "LDL", "and", "a", "bad", "case", "of", "joint", "pain", "to", "the", "extent", "that", "it", "is", "very", "arthritic", "or", "having", "bad", "muscle", "cramps", "that", "you", "never", "got", "prior", "to", "the", "drug", ",", "it", "is", "from", "the", "statins", "."],
    "ner": [{
        "index": [15, 16, 23, 24],
        "type": "ADR"
    }, {
        "index": [28, 29],
        "type": "ADR"
    }]
}
@Tongjilibo
Copy link
Owner

不好意思,最近在搞大模型的一些东西,没顾上看这块的内容,请问你这边已经搞定了吗?

@Bureaux-Tao
Copy link
Author

不好意思,最近在搞大模型的一些东西,没顾上看这块的内容,请问你这边已经搞定了吗?

我参考别人对W2NER源码的修改和注释改了一下,能在论文源码的基础上load和预测上面给出的不连续实体了,供您参考https://github.com/Bureaux-Tao/discontinuous-ner/blob/main/data_loader.py

@Tongjilibo
Copy link
Owner

好的,谢谢,后续我会参考下的您的代码看看

@Tongjilibo Tongjilibo added the todo list New feature or request label May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
todo list New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants