Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
wainshine committed Jul 27, 2019
1 parent e9b9866 commit e4c4986
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,15 @@

新增性别标注。

---

<strong>中文古代人名(Ancient_Names_Corpus)</strong>

25万。

多个人名词典汇总。删除了罕见姓氏、和带生僻字的人名。

---

<strong>日文人名(Japanese_Names_Corpus)</strong>

Expand All @@ -31,6 +33,7 @@

数据清洗过程相见,“[日本人名数据清洗分享](https://github.com/wainshine/Chinese-Names-Corpus/issues/4)”。

---

<strong>翻译人名(English_Cn_Name_Corpus)</strong>

Expand All @@ -40,13 +43,15 @@

清洗后仍存有少量badcase,尤其是英文地名。

---

<strong>中文姓氏(Chinese_Family_Name)</strong>

1千。

从亿级人名语料中提取。删除了罕见姓氏。

---

<strong>中文称呼(Chinese_Relationship)</strong>

Expand All @@ -59,6 +64,7 @@

多个人名词典汇总。清洗后仍存有大量badcase。

---

<strong>成语词典(ChengYu_Corpus)</strong>

Expand Down

0 comments on commit e4c4986

Please sign in to comment.