Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

entities must span whole tokens. Wrong entity end. #122

Open
xyiiinexg3 opened this issue Aug 18, 2022 · 1 comment
Open

entities must span whole tokens. Wrong entity end. #122

xyiiinexg3 opened this issue Aug 18, 2022 · 1 comment

Comments

@xyiiinexg3
Copy link

现象
在输入命令行后:rasa train -c config/config.yml --data data/training_dataset_1660793545.json data/stories.md --out models/movie --domain config/domain.yml --num-threads 5 --augmentation 100 -vv
会出现类似以下的warning提示:
C:\Users\26282\miniconda3\envs\rasa2formovieQA\lib\site-packages\rasa\shared\utils\io.py:93: UserWarning: Failed to use example '郭富城表演过哪些喜剧电影' to train MITIE entity extractor. Example will be skipped.Error: Invalid entity {'end': 10, 'entity': 'genre', 'start': 8, 'value': '喜剧'} in example '郭富城表演过哪些喜剧电影': entities must span whole tokens. Wrong entity end.
这导致在后面模型跑起来的时候,识别不出genre这种实体(喜剧、动画等等)。

训练模型的数据
{"text":"方中信表演动画电影有哪些","intent":"search_person_genre_movie","entities":[{"end":3,"entity":"person","start":0,"value":"方中信"},{"end":7,"entity":"genre","start":5,"value":"动画"}]}

config.yml
有设置jieba分词的用户词典
pipeline:

  • name: "MitieNLP"
    model: "data/total_word_feature_extractor_zh.dat"
  • name: "JiebaTokenizer"
    dictionary_path: "jieba_userdict"
  • name: "MitieEntityExtractor"
  • name: "EntitySynonymMapper"
  • name: "RegexFeaturizer"
  • name: "MitieFeaturizer"
  • name: "SklearnIntentClassifier"

image

@xyiiinexg3
Copy link
Author

我统计了下,在genre词典中,只有动画、恐怖、喜剧、科幻这四种,不能识别出来。请问这是为什么呀?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant