WWW 2021

This is the official repository of the paper:

Mining Dual Emotion for Fake News Detection. [PDF] [Code] [Slides] [Video] [中文讲解视频]

Xueyao Zhang, Juan Cao, Xirong Li, Qiang Sheng, Lei Zhong, and Kai Shu. Proceedings of 30th The Web Conference (WWW 2021)

An Overall Framework

An overall framework of using Dual Emotion Features for fake news detection. Dual Emotion Features consist of three components:

a) Publisher Emotion extracted from the content;

b) Social Emotion extracted from the comments;

c) Emotion Gap representing the similarity and difference between publisher emotion and social emotion.

Dual Emotion Features are concatenated with the features from d) Fake News Detector (here, BiGRU as an example) for the final prediction of veracity.

Datasets

The datasets are available at https://drive.google.com/drive/folders/1pjK0BYiiJt0Ya2nRIrOLCVo-o53sYRBV?usp=sharing. The downloaded datasets (i.e., the dataset folder) need to be moved into the root path of this project.

RumourEval-19

The raw dataset is released by SemEval-2019 Task 7:

Genevieve Gorrell, Ahmet Aker, Kalina Bontcheva, Elena Kochkina, Maria Liakata, Arkaitz Zubiaga, Leon Derczynski (2019). SemEval-2019 Task 7: RumourEval, Determining Rumour Veracity and Support for Rumours. Proceedings of the 13th International Workshop on Semantic Evaluation, ACL.

Our experimental dataset is in the folder dataset/RumourEval-19, which contains three json files. In every json file,

the id identifies the unique id of the post.
the label identifies the veracity of the post, whose value ranges in [ fake, real, unverified].
the content is the content of the post.
the comments are the users' comments list towards the post.
the content_emotions_labels and cotent_emotions_probs are the Emotion Category features of the content. And the comments100_emotions_labels_mean_pooling, comments100_emotions_labels_max_pooling, comments100_emotions_probs_mean_pooling, and comments100_emotions_probs_max_pooling are the Emotion Category features of the earliest 100 comments. The way how to use these features will be described in here.

Weibo-16

The original dataset is firstly proposed in:

Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Bernard J Jansen, Kam-Fai Wong, and Meeyoung Cha. 2016. Detecting rumors from microblogs with recurrent neural networks. In IJCAI 2016. 3818–3824.

In Section 4.1.2 and Appendix A of our paper, we described that there are many fake news duplications in the original dataset. The original version of Weibo-16 is in the folder dataset/Weibo-16-original, and our experimental dataset (a deduplicated version) of Weibo-16 is in the folder dataset/Weibo-16. In every json file in these folders,

the label identifies the veracity of the post, whose value ranges in [ fake, real].
the content is the content of the post.
the comments are the users' comments list towards the post.
the content_emotions are the Emotion Category features of the content. And the comments100_emotions_mean_pooling and comments100_emotions_max_pooling are the Emotion Category features of the earliest 100 comments. The way how to use these features will be described in here.

Weibo-20

Weibo-20 is our newly proposed dataset, and it is in the folder dataset/Weibo-20. Besides, in Section 4.4.3 of the paper, we conducted the experiments under the real-world scenario simulation. This temporal version of Weibo-20 is in the folder dataset/Weibo-20-temporal. In every json file in these folders,

the label identifies the veracity of the post, whose value ranges in [ fake, real].
the content is the content of the post.
the comments are the users' comments list towards the post.
the content_emotions are the Emotion Category features of the content. And the comments100_emotions_mean_pooling and comments100_emotions_max_pooling are the Emotion Category features of the earliest 100 comments. The way how to use these features will be described in here.

Emotion Resources

Type	Language	Resources
Emotion Category	English	https://github.com/NVIDIA/sentiment-discovery
	Chinese	https://ai.baidu.com/tech/nlp_apply/emotion_detection
Emotion Lexicon	English	`resources/English/NRC`
	Chinese	`/resources/Chinese/大连理工大学情感词汇本体库`
Emotional Intensity	English	`resources/English/NRC`
	Chinese	`/resources/Chinese/大连理工大学情感词汇本体库`
Sentiment Score	English	nltk.sentiment.vader.SentimentIntensityAnalyzer
	Chinese	`resources/Chinese/BosonNLP`
Other Auxilary Features	English	Wiki: List of emoticons, `resources/English/HowNet`, `resources/English/others`
	Chinese	`resources/Chinese/HowNet`, `resources/English/others`

Code

Requirements

Python==3.6.10
Keras==2.1.2
Tensorflow==1.13.1
Tensorflow-GPU==1.14.0

Usage

Step1: Preprocess

Step1.1: Get the `labels`

cd code/preprocess
python output_of_labels.py

Step1.2: Get the `emotion features`

cd code/preprocess
python input_of_emotions.py

Note that the Emotion Category features are depended on the external resources (NVIDIA-sentiment-discovery for English, and Baidu AI for Chinese). And they have been saved in the dataset files (e.g.: content_emotions, comments100_emotions_mean_pooling, content_emotions_probs, comments100_emotions_labels_max_pooling, etc.).

If you want to extract emotion features for your custom datasets, you need to access these external resources and prepare Emotion Category features. Of course, you can also leave Emotion Category unused and extract other features by input_of_emotion.py.

Step1.3: Get the `semantic features`

In this repo, we consider the semantic features as word embeddings. You need to download the preprained word embeddings (see here for more details) before running the following code:

cd code/preprocess
python input_of_semantics.py

Now, the preprocessed data are stored in preprocess/data.

Step 2: Configuration

Config the experimental dataset, the model and other hyperparameters in code/train/config.py.

Step3: Training and Testing

cd code/train
python master.py

Now, the results are stored in train/results.

Citation

@inproceedings{10.1145/3442381.3450004,
    author = {Zhang, Xueyao and Cao, Juan and Li, Xirong and Sheng, Qiang and Zhong, Lei and Shu, Kai},
    title = {Mining Dual Emotion for Fake News Detection},
    year = {2021},
    url = {https://doi.org/10.1145/3442381.3450004},
    doi = {10.1145/3442381.3450004},
    booktitle = {Proceedings of the Web Conference 2021},
    pages = {3465–3476},
    series = {WWW '21}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
code		code
resources		resources
word-embedding		word-embedding
.gitignore		.gitignore
framework.png		framework.png
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

resources

resources

word-embedding

word-embedding

.gitignore

.gitignore

framework.png

framework.png

readme.md

readme.md

Repository files navigation

WWW 2021

An Overall Framework

Datasets

RumourEval-19

Weibo-16

Weibo-20

Emotion Resources

Code

Requirements

Usage

Step1: Preprocess

Step1.1: Get the `labels`

Step1.2: Get the `emotion features`

Step1.3: Get the `semantic features`

Step 2: Configuration

Step3: Training and Testing

Citation

About

Releases

Packages

Languages

RMSnow/WWW2021

Folders and files

Latest commit

History

Repository files navigation

WWW 2021

An Overall Framework

Datasets

RumourEval-19

Weibo-16

Weibo-20

Emotion Resources

Code

Requirements

Usage

Step1: Preprocess

Step1.1: Get the labels

Step1.2: Get the emotion features

Step1.3: Get the semantic features

Step 2: Configuration

Step3: Training and Testing

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages

Step1.1: Get the `labels`

Step1.2: Get the `emotion features`

Step1.3: Get the `semantic features`