Skip to content

hetao29/scws

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

说明

基于scws的golang的开源的中文分词系统,特点:

  1. 支持编译安装,纯golang代码,无需要第3方的静态和动态库(和scws已经混合编码)
  2. docker / docker swarm / k8s 部署
  3. 支持多个自定义词库,格式可以和txt也可以是xdb格式(参考 scws 的词典生成工具scws-gen-dict)
  4. 支持规则文件
  5. etc/dict下面可以把百度输入法的词库转换成txt分词词库的工具
  6. rest api接口进行分词,也支持实时reload词库操作 具体可以参考etc下的配置文件,或者Makefile的示例

Docker & Swarm/Composer

docker pull hetao29/scws:latest

编译

make build

运行

make start

自定义字典(格式)

  1. <词> <词频(TF)> <词重(IDF)> <词性(北大标注)>
  2. 用空格或者制表符分开,tf,idf,attr不是必须
  3. 词性参考 http://www.xunsearch.com/scws/docs.php#attr
#<word>[\t<tf>[\t<idf>[\t<attr>]]]
新词
新地 5 1 n

测试

make test

结果

{"message":"pong","words":["关于","幼教","体系","组织","结构调整","结构","调整","","","通知"]}curl "http://127.0.0.1:8020/words?key=外国钱币硬币银铌世界纸钞爱藏"
{"message":"pong","words":["外国","钱币","硬币","","","世界","纸钞","",""]}curl "http://127.0.0.1:8020/words?key=矮人火枪地狱兽残酷角斗士的军刺"
{"message":"pong","words":["矮人","火枪","地狱兽","地狱","残酷","角斗士","角斗","","",""]}

About

golang中文分词服务(基于scws的c语言分词算法),提供http和docker服务

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published