Skip to content

Chinese Open Information Extraction (Tree-based Triple Relation Extraction Module)

Notifications You must be signed in to change notification settings

tim5go/zhopenie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chinese Open Information Extraction (Zhopenie)

Installation

This module makes heavily use of pyltp

  1. Install pyltp
    pip install pyltp
    
  2. Download NLP model from 百度雲

Why use LTP?

LTP has an excellent semantic parsing module shown below: Alt text

Also, in general, LTP performs better than other open-source Chinese NLP libraries,like Jieba ,here's the comparison on word tokenization for SIGHAN Bakeoff 2005 PKU, 510KB dataset: Alt text

Usage

The extractor module tries to break down a Chinese sentence into a Triple relation (e1, e2, r), which can be understood by computer
e.g. 星展集团是亚洲最大的金融服务集团之一, 拥有约3千5百亿美元资产和超过280间分行, 业务遍及18个市场。
are parsed as follows:

e1:星展集团, e2:亚洲最大的金融服务集团之一, r:是
e1:星展集团, e2:约3千5百亿美元资产, r:拥有
e1:业务, e2:18个市场, r:遍及

However, this extractor is about ~70% accurate and is still under improvement at this moment. Feel free to comment and make pull request.

Credits

哈工大社会计算与信息检索研究中心研制的语言技术平台 LTP

About

Chinese Open Information Extraction (Tree-based Triple Relation Extraction Module)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages