Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

答案返回的是完整的context #51

Open
yehx1 opened this issue Dec 22, 2022 · 5 comments
Open

答案返回的是完整的context #51

yehx1 opened this issue Dec 22, 2022 · 5 comments

Comments

@yehx1
Copy link

yehx1 commented Dec 22, 2022

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
tokenizer = AutoTokenizer.from_pretrained("luhua/chinese_pretrain_mrc_roberta_wwm_ext_large")
model = AutoModelForQuestionAnswering.from_pretrained("luhua/chinese_pretrain_mrc_roberta_wwm_ext_large")
QA = pipeline('question-answering', model=model, tokenizer=tokenizer)
QA_input = {'question': "著名诗歌《假如生活欺骗了你》的作者是",'context': "普希金从那里学习人民的语言,吸取了许多有益的养料,这一切对普希金后来的创作产生了很大的影响。这两年里,普希金创作了不少优秀的作品,如《囚徒》、《致大海》、《致凯恩》和《假如生活欺骗了你》等几十首抒情诗,叙事诗《努林伯爵》,历史剧《鲍里斯·戈都诺夫》,以及《叶甫盖尼·奥涅金》前六章。"}
QA(QA_input)
这样返回答案是context,试了很多都是这样。
能麻烦帮忙看一下吗?谢谢!

@KyleLeith-007
Copy link

我也遇到了这样的情况,有些数据输入进去,结果却把整个context给输出了,不明白是怎么回事····

@juaner09
Copy link

metrics.py 456行:

tokenizer = BasicTokenizer(do_lower_case=do_lower_case)

tok_text = " ".join(tokenizer.tokenize(orig_text))

分词与训练中分词不一致导致的

@KyleLeith-007
Copy link

metrics.py 456行:

tokenizer = BasicTokenizer(do_lower_case=do_lower_case)

tok_text = " ".join(tokenizer.tokenize(orig_text))

分词与训练中分词不一致导致的

能否指导一下如何修改···太菜了不会修改···

@juaner09
Copy link

metrics.py 456行:
tokenizer = BasicTokenizer(do_lower_case=do_lower_case)

tok_text = " ".join(tokenizer.tokenize(orig_text))

分词与训练中分词不一致导致的

能否指导一下如何修改···太菜了不会修改···

分词统一为BertTokenizer,解决掉BertTokenizer出现的‘[UNK]’和wordpiece分词导致的单词被分开就可以了

@KyleLeith-007
Copy link

metrics.py 456行:
tokenizer = BasicTokenizer(do_lower_case=do_lower_case)

tok_text = " ".join(tokenizer.tokenize(orig_text))

分词与训练中分词不一致导致的

能否指导一下如何修改···太菜了不会修改···

分词统一为BertTokenizer,解决掉BertTokenizer出现的‘[UNK]’和wordpiece分词导致的单词被分开就可以了

enen,非常感谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants