Skip to content

joebrew/vilaweb

Repository files navigation

vilaweb: The R package of Joe Brew's VilaWeb data analysis

This package contains all the code and data used for reproducing the analyses underlying the articles published in VilaWeb's "La dada d'en Joe Brew" column. It is publicly available for the purposes of reproducibility and transparency.

Installation

To install this package, run the below from within the R console.

if(!require(devtools)) install.packages("devtools")
install_github('joebrew/vilaweb')

Developer details

In order to reproduce the entire package, raw data will need to be downloaded from various sources.

Downloading raw data

1. Download CIS data

Download into data-raw/cis/fichero_integrado the "Barómetro Mensual - 2000-2018 all data" file from http://analisis.cis.es/fid/fidHistorico.jsp. This requires an account. Following log-in, go to the "Ficheros Integrados de Datos" page. Download the full data file. This will be downloaded as FID_637_06bcee7b-ea6b-4f41-a74e-37a941519966.zip (or similar, depending on date downloaded). Extract the data as is in the folder in which it was downloaded.

For month specific files, download from the CIS page into data-raw/cis/monthly into a folder with the following format: YYYY-MM. Extract the file .The final path of each monthly survey will resemble something like this: 'data-raw/cis/monthly/2018-10/MD3226/3226.sav'

2. Download CEO data

Download into data-raw/ceo the "Matriu de dades fusionada a partir de 2014 (presdencial)" file as 2014_Microdades_anonimitzades_fusio_cine_pres.rar from http://ceo.gencat.cat/ca/barometre/matrius-fusionada-BOP/. Extract the data as is in the folder in which it was downloaded.

3. Download ICPS data

Download into data-raw/icps the "Sondeig d'opinió Catalunya 2018" data from https://www.icps.cat/recerca/sondeigs-i-dades/sondeigs/sondeigs-d-opinio-catalunya. This will require creating an account and password. Download both the 2017 and 2018 data into the 'data' sub-folder.

4. Download sentiment data

Download the following into data-raw/sentiment: http://www.saifmohammad.com/WebDocs/NRC-Emotion-Lexicon-v0.92-InManyLanguages-web.xlsx

Generating R data objects

After downloading the above data, go into the data-raw directory and run from the linux command line: Rscript create_data_files.R.

Twitter data

Many of the analyses in this package are based on data from twitter. Data are retrieved from twitter using the python twint package (install as per instructions at https://github.com/twintproject/twint). Data are stored locally in a Posgresql database. What follows are instructions for building the database so as to reproduce analyses with twitter data.

  1. Create a psql database named twitter.
  2. Within twitter, create a table called twitter.
  3. Go through the code in analyses/set_up_database/set_up_database.R to get the database set up.
  4. Within R, run the update_database() function specifying the accounts via the people argument.
  5. To keep the database updated, run the update_database() function.
  6. Generate periodic data dumps for back-up purposes: pg_dump twitter > backup.sql

Building the package

Having done the above, run Rscript build_package.R from within the main directory to compile the package.

About

Data analysis for VilaWeb

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages