EarTimeWrangler

Tabular Document Wrangler. The code parses data from several types of poorly formatted tabular data formats, including pdf and csv files – on ministerial meetings between ministers and lobbyists. It forms a central part of 'Ear-time with the Cabinet: Ministerial meetings as vehicles for lobbying', which is a joint collaboration between the department of Sociology, University of Oxford and Transparency International UK.

Prerequisites

As a pre-requisite to running wrangler.py, you might consider setting up a virtual environment with an installation of pdfminer3k. An install of Python 3.6 or greater is required. pdfminer3k can be installed with the command pip install 'pdfminer3k', a full tutorial can be found here.

Running the Code

Download a zip of this repository or git clone https://github.com/ianknowles/EarTimeWrangler this repository and run python wrangler.py from the src folder at the command line. For a step-by-step setup guide aimed at beginners see beginners.md.

Input/output file folders

The script looks for input files in the data folder, with subfolders expected to match the government department code. The SQLite database and csv exports will be placed in the output folder.

License

This work is free. You can redistribute it and/or modify it under the terms of the MIT license. This license does not apply to any input or output data processed.

Acknowledgments

The project was funded by an ESRC IAA Kick-Start (1609-KICK-244) grant.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.idea		.idea
data		data
log		log
output		output
src		src
EarTimeWrangler.iml		EarTimeWrangler.iml
LICENSE		LICENSE
README.md		README.md
beginners.md		beginners.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

data

data

log

log

output

output

src

src