Skip to content

webis-de/ACL-18

Repository files navigation

A Stylometric Inquiry into Hyperpartisan And Fake News

This repository contains the code for reproducing results of the paper:

Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. A Stylometric Inquiry into Hyperpartisan and Fake News. In Proceedings of 56th Annual Meeting of the Association for Computational Linguistics (ACL 18), July 2018

Resources

  • Download the dataset, place it under data, and extract it there.
  • Get the required libraries, aitools4-ie-uima.jar and jsoup-1.6.1.jar, from the resources page and place them under lib.
  • Download the Tree Tagger binaries that match your operating system and add it to the directory structure as detailed below (naming must be exact). Please visit the TreeTagger homepage beforehand to view the license terms (and instructions for the Windows installation).
    • Linux to lib/thirdparty-tt4j-1.1.0/tree-tagger-Linux-3.2
    • Windows to lib/thirdparty-tt4j-1.1.0/tree-tagger-Win-3.2
    • MacOSX to lib/thirdparty-tt4j-1.1.0/tree-tagger-MacOSX-3.2-intel
  • In all cases, there should be a bin directory directly within the operating-system-specific directory. Then add a lib directory next to this bin directory and add the parameters file you extract from this archive as english.par into this lib directory.
  • Get the TeX hyphenation patterns ZIP, place it next to the ACL-18 directory, and extract it there. This should create a directory called thirdparty next to the ACL-18 directory of this project.

Building

Just use ant in this directory. This will create a single acl18-bundle.jar JAR file that contains everything you need.

Classification experiments

Split the data into three folds (by portal/publisher) and convert to UIMA XMI.

java -cp acl18-bundle.jar de.aitools.ie.articles.DataPreprocessor data/articles data/xmi

Then extract the features using UIMA and generate WEKA ARFF files for each task. Note that this extracts all features. The actually used feature set is specified in the next step.

java -cp acl18-bundle.jar de.aitools.ie.articles.FeatureExtractor VERACITY data/xmi data/veracity
java -cp acl18-bundle.jar de.aitools.ie.articles.FeatureExtractor ORIENTATION data/xmi data/orientation
java -cp acl18-bundle.jar de.aitools.ie.articles.FeatureExtractor HYPERPARTISANSHIP data/xmi data/hyperpartisanship

You can then train and test the classifier. Available feature sets are: TOPIC, TEXT_STYLE, HYPERTEXT_STYLE, STYLE (= TEXT_STYLE + HYPERTEXT_STYLE), ALL (= TOPIC + STYLE). The following command will build the TOPIC classifier for VERACITY on the first fold training set and evaluate it on the first fold test set.

java -cp acl18-bundle.jar de.aitools.ie.articles.RandomForestClassifier TOPIC data/veracity/*-fold1-training.arff data/veracity/*-fold1-test.arff