Skip to content

comp-journalism/predicting_newsworthiness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predicting Newsworthiness

This repository contains the data needed to replicate the findings of the our article From Crowd Ratings to Predictive Models of Newsworthiness to Support Science Journalism, published in the Proceedings of the ACM on Human-Computer Interaction, and presented at CSCW 2022.

train.json

Crowdsourced dataset of ratings for the news values of different arXiv articles (n=500). Used to train Extra Trees model Please refer to Section 5.1 of our paper for details about model training with this data. Contains the following fields:

arxiv_id: Unique identifiers for arXiv articles, sourced from arXiv API.
arxiv_url: URLs for arXiv articles, sourced from arXiv API.
title: Titles for arXiv articles, sourced from arXiv API.
summary: Abstracts for arXiv articles, sourced from arXiv API.
published: Date of publication for arXiv articles, sourced from arXiv API.
authors: Authors for arXiv articles, sourced from arXiv API.
arxiv_primary_category: Author-provided primary category for arXiv articles, sourced from arXiv API.
readability: Readability score for article's summary field, assigned by De-Jargonizer, and scaled to be from 0-1.
actuality: Score for actuality news value, assigned byt MTurk crowdworkers, range 1-5.
controversy: Score for controversy news value, assigned byt MTurk crowdworkers, range 1-5.
relevance_magnitude: Score for relevance_magnitude news value, assigned byt MTurk crowdworkers, range 1-5.
relevance_valence: Score for relevance_valence news value, assigned byt MTurk crowdworkers, range 1-5.
newsworthiness_crowd_sum: Average of the four news values - actuality, controversy, relevance_magnitude, relevance_valence, range 1-5. Binarized at the value of 3 for training the newsworthiness classification model.

validate.json

Crowdsourced dataset of ratings for the news values of different arXiv articles (n=55). Also contains expert evaluations of newsworthiness for this data. Used to evaluate Extra Trees model. Please refer to Section 5.2 of our paper for details on findings.

In addition to the fields found in train.json, this data also contains the following:

nw_expert1: Score for newsworthiness assigned by expert 1, range 1-5.
nw_expert2: Score for newsworthiness assigned by expert 2 , range 1-5.
newsworthiness_expert: Average of the both experts' ratings for newsworthiness, range 1-5.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published