COPEN

Dataset and code for EMNLP 2022 paper ''COPEN: Probing Conceptual Knowledge in Pre-trained Language Models''. COPEN is a COnceptual knowledge Porobing bENchmark which aims to analyze the conceptual understanding capabilities of Pre-trained Language Models (PLMs). Specifically, COPEN consists of three tasks:

Conceptual Similarity Judgment (CSJ). Given a query entity and several candidate entities, the CSJ task requires to select the most conceptually similar candidate entity to the query entity.
Conceptual Property Judgment (CPJ). Given a statement describing a property of a concept, PLMs need to judge whether the statement is true.
Conceptualization in Contexts (CiC). Given a sentence, an entity mentioned in the sentence, and several concept chains of the entity, PLMs need to select the most appropriate concept according to the context for the entity.

Extensive experiments on different sizes and types of PLMs show that existing PLMs systematically lack conceptual knowledge and suffer from various spurious correlations. We believe this is a critical bottleneck for realizing human-like cognition in PLMs. More concept-aware objectives or architectures are needed to develop conceptual knowledgeable PLMs.

Codalab

To get the test results, you need to submit your results to codalab.

1. Quick Start

The code repository is based on Pytorch and Transformers. Please use the following command to install all the necessary dependcies. pip install -r requirements.txt

2. Download Datasets

The COPEN benchmark is placed on Tsinghua Cloud, please use the following command to download the datasets and place them in the propor path.

cd data/
wget --content-disposition https://cloud.tsinghua.edu.cn/f/f0b33fb429fa4575aa7f/?dl=1
unzip copen_data.zip
mkdir task1/data
mkdir task2/data
mkdir task3/data
mv copen_data/task1/* task1/data
mv copen_data/task2/* task2/data
mv copen_data/task3/* task3/data

3. Pre-processing Datasets

Probing

cd task1
python probing_data_processor.py
cd ../
cd task2
python probing_data_processor.py
cd ../
cd task3
python probing_data_processor.py
cd ../

Fine-tuning

python processor_utils.py task1 mc 
python processor_utils.py task2 sc
python processor_utils.py task3 mc

4. Run

Probing

cd code/probing
bash task1/run.sh 0 bert bert-base-uncased
bash task2/run.sh 0 bert bert-base-uncased
bash task3/run.sh 0 bert bert-base-uncased

Fine-Tuning

cd code/finetuning
cd task1/ 
bash ../run.sh 0 bert bert-base-uncased task1 mc 42
cd task2/ 
bash ../run.sh 0 bert bert-base-uncased task2 sc 42
cd task3/ 
bash ../run.sh 0 bert bert-base-uncased task3 mc 42

5. Cite

If our codes or benchmark help you, please cite us:

@inproceedings{peng2022copen,
  title={COPEN: Probing Conceptual Knowledge in Pre-trained Language Models},
  author={Peng, Hao and Wang, Xiaozhi and Hu, Shengding and Jin, Hailong and Hou, Lei and Li, Juanzi and Liu, Zhiyuan and Liu, Qun},
  booktitle={Proceedings of EMNLP},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
code		code
data		data
imgs		imgs
results		results
utils/transformers-4.16.2		utils/transformers-4.16.2
.gitignore		.gitignore
LICENSE		LICENSE
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

imgs

imgs

results

results

utils/transformers-4.16.2

utils/transformers-4.16.2

.gitignore

.gitignore

LICENSE

LICENSE

readme.md

readme.md

requirements.txt

requirements.txt

Repository files navigation

COPEN

Codalab

1. Quick Start

2. Download Datasets

3. Pre-processing Datasets

Probing

Fine-tuning

4. Run

Probing

Fine-Tuning

5. Cite

About

Releases

Packages

Contributors 2

Languages

License

THU-KEG/COPEN

Folders and files

Latest commit

History

Repository files navigation

COPEN

Codalab

1. Quick Start

2. Download Datasets

3. Pre-processing Datasets

Probing

Fine-tuning

4. Run

Probing

Fine-Tuning

5. Cite

About

Topics

Resources

License

Stars

Watchers

Forks

Languages