Skip to content

uweremer/PPPD_DB

 
 

Repository files navigation

PPPD

The purpose of PPPD (Projekt-Polizei-Presse-Daten) is mainly to scrape press releases from Presseportal-Blaulicht and extract the relevant data to use it in research projects.

Installation

  1. Clone/download this repository.
  2. Populate config.ini (see config.ini[EXAMPLE], use DEVEL_MODE=True to toggle the webscraping to a small subset as proof of concept).
  3. Install the conda environment from the file "env.yaml".

Usage

The simplest way of scraping press releases from Presseportal-Blaulicht is to use the function get_blaulicht_data() from the module ppRunner. This function downloads and processes every press release from every newsroom in the given federal states and years of interest.

In the following example, the function is used to download all press releases from 2020 (years=2020) posted by police departments (dept_type="police") in Baden-Württemberg (states="baden-württemberg"). A folder named "ppp_bw" (output_folder_name="ppp_bw") will be created within the project folder and all data will be stored in it.

from src import ppRunner as ppr

ppr.get_blaulicht_data(
    states="baden-württemberg",                         
    years=2020,                                         
    dept_type="police",
    output_folder_name="ppp_bw",
)

Multiple states and years at once

The arguments states and years can both be either a single value or a list of values. In the following example, multiple federal states and multiple years are specified. Caution: The execution of the code below may take a few days.

from src import ppRunner as ppr

ppr.get_blaulicht_data(
    states=["baden-württemberg", "hessen", "niedersachsen"],                         
    years=[2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021],                                         
    dept_type="police",
    output_folder_name="example_project",
)

Database usage

If you want to use PostgreSQL as database, fire up a docker environment e.g. as provided in the docker-compose.yml:

sudo docker-compose -f docker-compose.yml --env-file config.ini up -d

Don't forget to provide the credentials within the config.ini.

To import legacy data (from csv and txt files from the initial webscraping from folder ./output_data/ppp_bw/) run the script 01-load_basic_data.py from the folder scripts/init_db. It expects two cli arguments: The first specifies, whether the db should be initialized from scratch (old data will be deleted), the second argument specifies the year to import.

# First run to initialize the db and to import 2015 data:
python 01-load_basic_data.py init 2015

# Subsequent runs (data will be appended) for other years, e.g. 2019:
python 01-load_basic_data.py append 2019

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%