Skip to content

Latest commit

 

History

History
59 lines (36 loc) · 2.28 KB

constituency_parsing.md

File metadata and controls

59 lines (36 loc) · 2.28 KB

Chinese Constituency Parsing

Background

Given a sentence, consituency parsing produces a parse tree whose internal nodes are constituents and whose leaf nodes are words.

Example

Input:

柴犬是一种像精灵一样的犬种。

Output:

(IP (NP-SBJ (NN 柴犬)) (VP (VC 是) (NP-PRD (QP (CD 一) (CLP (M 种))) (DVP (IP (VP (PP (P 像) (NP (NN 精灵))) (VP (VA 一样)))) (DEV 的)) (VP (VA 犬种)))) (PU 。))

Standard Metrics

  • Exact match (EM): the percentage of predicted parse trees that match the ground truth exactly.
  • F1 score of constituents in the predicted parse tree.
  • Labeled precision (LP): precision of constituents in the predicted parse tree.
  • Labeled precall (LR): recall of constituents in the predicted parse tree.

Chinese Tree Bank Datasets.

Dataset # sentences (train) # sentences (dev) # sentences (test)
CTB 5.1 17,544 352 348

Metrics

EM, F1, LP and LR can be calculated using the Evalb tool.

Results

System EM F1 LP LR code
Liu and Zhang (2017) 44.94 91.81 - - Github
Zhou and Zhao (2019) - 92.18 92.33 92.03 Github
Mrini et al. (2020) - 92.64 93.45 91.85 Github
Yang and Deng (2020) 49.72 93.59 93.80 93.40 Github

Suggestions? Changes? Please send email to chinesenlp.xyz@gmail.com