Skip to content

thu-coai/CDConv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CDConv

Data and codes for EMNLP 2022 paper "CDConv: A Benchmark for Contradiction Detection in Chinese Conversations"

If you use our codes or your research is related to our paper, please kindly cite our paper:

@inproceedings{zheng-etal-2022-cdconv,
  title={CDConv: A Benchmark for Contradiction Detection in Chinese Conversations},
  author={Zheng, Chujie  and 
    Zhou, Jinfeng  and 
    Zheng, Yinhe  and 
    Peng, Libiao  and 
    Guo, Zhen  and 
    Wu, Wenquan  and 
    Niu, Zhengyu  and 
    Wu, Hua  and 
    Huang, Minlie},
  booktitle={EMNLP},
  year={2022}
}

We also provide an out-of-the-box torch-version classifier (2-class) fine-tuned on CDConv on huggingface, check here. Note that the torch-version may not perform as well as the paddle-version in this repo.

Data

cdconv.txt中的每一行为一段对话session,各字段含义如下:

  • u1, b1, u2, b2表示user和bot之间的对话(各两句,交替发言)

  • file表示标注批次,共包含5个标注批次

  • model表示bot所采用的模型(eva或plato)

    • eva为EVA 2.0模型(编码器-解码器模型,各24层、共2.8B参数,项目地址:https://github.com/thu-coai/EVA/
    • plato为32层版本的模型(共1.6B参数)
  • method表示u2的构造方法,具体含义如下:

    • 短句:u2为无信息量的短句
    • 设问-bot:u2对b1中的实体信息提问
    • 设问-user(-v2):u2对u1中的实体信息提问
    • 同义-回译:将u1翻译成英文、再回译成中文
    • 同义-同义词:替换u1中的词为同义词
    • 反义-反义词:替换u1中的词为反义词
    • 反义-否定词:在u1中插入否定词
  • label表示矛盾类型标注(0:无矛盾,1:b2句内矛盾,2:b2角色混淆,3:b2与对话历史矛盾)

    • persona表示从人设角度,对对话历史矛盾进行了矛盾内容的标注(1:人物属性,2:人物观点和偏好,3:人物经历,0:其他)

数据集的统计指标如下:

EVA PLATO Total
# Conversations 5,458 6,202 11,660
# Positive 3,233 4,076 7,309
# Negative 2,225 2,126 4,351
Trigger Methods (Positive / Negative Samples)
# Short 429 / 91 692 / 304 1,121 / 395
# Inquiring (Bot) 764 / 577 845 / 406 1,609 / 983
# Inquiring (User) 127 / 116 131 / 106 258 / 222
# Inquiring (User-M) 251 / 552 477 / 541 728 / 1,093
# Paraphrasing 962 / 448 846 / 389 1,808 / 837
# Synonym 288 / 145 376 / 147 664 / 292
# Antonym 185 / 143 319 / 103 504 / 246
# Negative 227 / 153 390 / 130 617 / 283
Contradiction Categories (of Negative Samples)
Intra-sentence 17.3% 6.8% 12.2%
Role 5.8% 29.9% 17.6%
History 76.9% 63.3% 70.2%
Persona Labels (of History Contradiction)
Attributes 48.8% 46.2% 47.7%
Opinions 22.2% 20.7% 21.5%
Experiences 26.3% 31.5% 28.6%
Unrelated 2.7% 1.6% 2.2%

Codes

参见codes文件夹

About

Data and codes for EMNLP 2022 paper "CDConv: A Benchmark for Contradiction Detection in Chinese Conversations"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published