使用 RASA NLU 来构建中文自然语言理解系统（NLU）

README written in English

使用 RASA NLU 来构建中文自然语言理解系统（NLU）

本仓库提供前沿、详细和完备的中文自然语言理解系统构建指南。

在线演示

TODO

特性

提供中文语料库
提供语料库转换工具，帮助用户转移语料数据
提供多种基于 RASA NLU 的中文语言处理流程
提供模型性能评测工具，帮助自动选择和优化模型

系统要求

Python 3 (也许支持 python2, 但未经过良好测试)

处理流程

详情请访问 workflow.md

可用 pipeline 列表

MITIE+jieba

描述

jieba 提供中文分词功能
MITIE 负责 intent classification 和 slot filling

安装依赖的软件包

pip install git+https://github.com/mit-nlp/MITIE.git
pip install jieba

下载所需的模型数据

MITIE 需要一个模型文件，在本人的另一个项目: MITIE_Chinese_Wikipedia_corpus 的 release 下载 total_word_feature_extractor.dat.tar.gz. 解压后将 total_word_feature_extractor.dat 放至 data

pipeline

language: "zh"

pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor.dat"
- name: "tokenizer_jieba"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_featurizer_mitie"
- name: "intent_classifier_sklearn"

训练脚本

trainer/MITIE+jieba.bash

评估脚本

cross_validation/MITIE+jieba.bash

tensorflow_embedding

描述

jieba 提供中文分词功能
tensorflow_embedding 负责 intent classification
MITIE 负责 slot filling

安装依赖的软件包

pip install git+https://github.com/mit-nlp/MITIE.git
pip install jieba
pip install tensorflow

下载所需的模型数据

MITIE 需要一个模型文件，在本人的另一个项目: MITIE_Chinese_Wikipedia_corpus 的 release 下载 total_word_feature_extractor.dat.tar.gz. 解压后将 total_word_feature_extractor.dat 放至 data

pipeline

language: "zh"

pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor.dat"
- name: "tokenizer_jieba"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
- name: "ner_mitie"
- name: "ner_synonyms"

训练脚本

trainer/tensorflow_embedding.bash

评估脚本

cross_validation/tensorflow_embedding.bash

spacy

描述

Chinese_models_for_SpaCy 负责 intent classification and slot filling

安装依赖的软件包

pip install https://github.com/howl-anderson/Chinese_models_for_SpaCy/releases/download/v2.0.3/zh_core_web_sm-2.0.3.tar.gz
./spacy_model_link.bash

pipeline

language: "zh"

pipeline:
- name: "nlp_spacy"
  model: "zh"
- name: "tokenizer_spacy"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_spacy"
- name: "ner_crf"
- name: "ner_synonyms"
- name: "intent_classifier_sklearn"

训练脚本

trainer/spacy.bash

评估脚本

cross_validation/spacy.bash

性能测试

DialogFlow > weather

	Intent						Entity
	train			test			train			test
No	ACC	F1	PRC	ACC	F1	PRC	ACC	F1	PRC	ACC	F1	PRC
1	0.986	0.986	0.986	0.665	0.631	0.648	0.987	0.987	0.988	0.967	0.968	0.973
2	0.990	0.990	0.990	0.434	0.406	0.432	0.987	0.987	0.988	0.968	0.970	0.975
3	0.992	0.992	0.992	0.657	0.598	0.587	0.987	0.987	0.988	0.939	0.934	0.947
ACC: Accuracy; F1: F1-score; PRC: Precision;

Model List

No	Pipeline	Configure
1	MITIE+jieba	使用 `MITIE_Chinese_Wikipedia_corpus` 项目提供的 `total_word_feature_extractor.dat`
2	tensorflow_embedding	使用 `MITIE_Chinese_Wikipedia_corpus` 项目提供的 `total_word_feature_extractor.dat`
3	spacy	使用 `Chinese_models_for_SpaCy` 项目提供的中文 SpaCy 模型

如何贡献

请阅读 CONTRIBUTING.md , 然后提交 pull requests 给我们.

版本化控制

我们使用 SemVer 做版本化的标准. 查看 tags 以了解所有的版本.

作者

Xiaoquan Kong - Initial work - howl-anderson

更多贡献者信息，请参考 contributors.

版权

MIT License - 详见 LICENSE.md

致谢

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
cross_validation		cross_validation
data		data
dataset/dialogflow/weather		dataset/dialogflow/weather
evaluation_result		evaluation_result
pipeline		pipeline
projects		projects
trainer		trainer
.gitignore		.gitignore
README.en-US.md		README.en-US.md
README.en-US.tpl.md		README.en-US.tpl.md
README.md		README.md
README.tpl.md		README.tpl.md
config.py		config.py
convert_dataset_format.py		convert_dataset_format.py
cross_validation.py		cross_validation.py
http_server.bash		http_server.bash
render_readme.py		render_readme.py
requirements.txt		requirements.txt
spacy_model_link.bash		spacy_model_link.bash
workflow.md		workflow.md

Aguila-team/Chinese_NLU_by_using_RASA_NLU

Folders and files

Latest commit

History

Repository files navigation

使用 RASA NLU 来构建中文自然语言理解系统（NLU）

在线演示

特性

系统要求

处理流程

可用 pipeline 列表

MITIE+jieba

描述

安装依赖的软件包

下载所需的模型数据

pipeline

训练脚本

评估脚本

tensorflow_embedding

描述

安装依赖的软件包

下载所需的模型数据

pipeline

训练脚本

评估脚本

spacy

描述

安装依赖的软件包

pipeline

训练脚本

评估脚本

性能测试

DialogFlow > weather

Model List

如何贡献

版本化控制

作者

版权

致谢

About

Topics

Resources

Stars

Watchers

Forks

Languages