PPPD

The purpose of PPPD (Projekt-Polizei-Presse-Daten) is mainly to scrape press releases from Presseportal-Blaulicht and extract the relevant data to use it in research projects.

Installation

Clone/download this repository.
Populate config.ini (see config.ini[EXAMPLE], use DEVEL_MODE=True to toggle the webscraping to a small subset as proof of concept).
Install the conda environment from the file "env.yaml".

Usage

The simplest way of scraping press releases from Presseportal-Blaulicht is to use the function get_blaulicht_data() from the module ppRunner. This function downloads and processes every press release from every newsroom in the given federal states and years of interest.

In the following example, the function is used to download all press releases from 2020 (years=2020) posted by police departments (dept_type="police") in Baden-Württemberg (states="baden-württemberg"). A folder named "ppp_bw" (output_folder_name="ppp_bw") will be created within the project folder and all data will be stored in it.

from src import ppRunner as ppr

ppr.get_blaulicht_data(
    states="baden-württemberg",                         
    years=2020,                                         
    dept_type="police",
    output_folder_name="ppp_bw",
)

Multiple states and years at once

The arguments states and years can both be either a single value or a list of values. In the following example, multiple federal states and multiple years are specified. Caution: The execution of the code below may take a few days.

from src import ppRunner as ppr

ppr.get_blaulicht_data(
    states=["baden-württemberg", "hessen", "niedersachsen"],                         
    years=[2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021],                                         
    dept_type="police",
    output_folder_name="example_project",
)

Database usage

If you want to use PostgreSQL as database, fire up a docker environment e.g. as provided in the docker-compose.yml:

sudo docker-compose -f docker-compose.yml --env-file config.ini up -d

Don't forget to provide the credentials within the config.ini.

To import legacy data (from csv and txt files from the initial webscraping from folder ./output_data/ppp_bw/) run the script 01-load_basic_data.py from the folder scripts/init_db. It expects two cli arguments: The first specifies, whether the db should be initialized from scratch (old data will be deleted), the second argument specifies the year to import.

# First run to initialize the db and to import 2015 data:
python 01-load_basic_data.py init 2015

# Subsequent runs (data will be appended) for other years, e.g. 2019:
python 01-load_basic_data.py append 2019

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
external_data		external_data
scripts/init_db		scripts/init_db
src		src
tests		tests
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
config.ini[EXAMPLE]		config.ini[EXAMPLE]
docker-compose.yml		docker-compose.yml
env.yaml		env.yaml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

external_data

external_data

scripts/init_db

scripts/init_db

src

src

tests

tests

.flake8

.flake8

.gitignore

.gitignore

README.md

README.md

config.ini[EXAMPLE]

config.ini[EXAMPLE]

docker-compose.yml

docker-compose.yml

env.yaml

env.yaml

pytest.ini

pytest.ini

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

PPPD

Installation

Usage

Database usage

About

Releases

Packages

Languages

uweremer/PPPD_DB

Folders and files

Latest commit

History

Repository files navigation

PPPD

Installation

Usage

Database usage

About

Resources

Stars

Watchers

Forks

Languages