Local Development

Environment Setup

The instructions below should be inclusive of both OS X and Linux, at least on recent distributions/versions of each. We cannot guarantee that the steps below have clean analogs in a local Windows environment.

Option 1: Local Python installation

Install Python

If you can install python3.7 locally, do so. For local Python development, you will also need to install the libpq PostgreSQL client library and openssl.

On a Mac with Homebrew, you can install python3.7, libpq, and openssl with:

$ brew install python3 postgresql openssl

On Ubuntu 18.04, openssl is installed by default, you can install python3.7 and libpq with:

$ apt update -y && apt install -y python3.7-dev python3-pip libpq-dev

You do not need to change your default python version, as pipenv will look for 3.7.

Install Pip and Pipenv

If you do not already have pip installed, you can install it on a Mac with these commands:

$ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
$ python get-pip.py --user

On Ubuntu 18.04, you can install pip with:

$ sudo apt-get install python-pip

Upgrade your pip to the latest version:

$ pip install -U pip

NOTE: if you get ImportError: cannot import name 'main' after upgrading pip, follow the suggestions in this issue.

Install pipenv:

$ pip install pipenv --user

Setup the project

Fork this repository, clone it locally, and enter its directory:

$ git clone git@github.com:your_github_username/pulse-data.git
$ cd pulse-data

Create a new pipenv environment and install all project and development dependencies.

On a Mac, run the initial_pipenv_setup_mac script.

NOTE: Installation of one of our dependencies (psycopg2) requires OpenSSL, and as OpenSSL is not linked on Macs by default, this script temporarily sets the necessary compiler flags and then runs pipenv sync --dev. After this initial installation all pipenv sync/installs should work without this script.

$ ./initial_pipenv_setup_mac.sh

On a Linux machine, run the following:

$ pipenv sync --dev

NOTE: if you get pipenv: command not found, add the binary directory to your PATH as described here.

To activate your pipenv environment, run:

$ pipenv shell

Finally, run pytest. If no tests fail, you are ready to develop!

NOTE: If some recidiviz/tests/ingest/aggregate tests fail, you may need to install the Java Runtime Environment (JRE) version 7 or higher.

You can ignore those tests with:

$ pytest --ignore=recidiviz/tests/ingest/aggregate

On a Mac with Homebrew, you can install the JRE with:

$ brew cask install java

On Ubuntu 18.04, you can install the JRE with:

$ apt update -y && apt install -y default-jre

Option 2: Docker container

If you can't install python3.7 locally, you can use Docker instead.

Follow these instructions to install Docker on Linux:

Click the following links to directly download Docker installation binaries for Mac and Windows:

Mac
Windows

Once Docker is installed, fork this repository, clone it locally, and enter its directory:

$ git clone git@github.com:your_github_username/recidiviz-data.git
$ cd recidiviz-data

Build the image:

$ docker build -t recidiviz-image . --build-arg DEV_MODE=True

Stop and delete previous instances of the image if they exist:

$ docker stop recidiviz && docker rm recidiviz

Run a new instance, mounting the local working directory within the image:

$ docker run --name recidiviz -d -t -v $(pwd):/app recidiviz-image

Open a bash shell within the instance:

$ docker exec -it recidiviz bash

Once in the instance's bash shell, update your pipenv environment:

$ pipenv sync --dev

To activate your pipenv environment, run:

$ pipenv shell

Finally, run pytest. If no tests fail, you are ready to develop!

Using this Docker container, you can edit your local repository files and use git as usual within your local shell environment, but execute code and run tests within the Docker container's shell environment.

Running the build

Running tests

Individual tests can be run via pytest filename.py. To run all tests, go to the root directory and run pytest recidiviz.

The configuration in setup.cfg and .coveragerc will ensure the right code is tested and the proper code coverage metrics are displayed.

A few tests (such as sessions.py) depend on running emulators (i.e. Cloud Datastore Emulator). These tests are skipped by default when run locally, but will always be tested by Travis. If you are modifying code tested by these tests then you can run the tests locally. You must first install the both emulators via gcloud components install cloud-datastore-emulator and gcloud components install cloud-pusub-emulator, which depends on the Java JRE (>=8). You will also need to install the beta command to execute these emulators, with gcloud components install beta. Then run the tests, telling it to bring up the emulators and include these tests:

$ pytest recidiviz --with-emulator

A bug in the google client requires that you have default application credentials. This should not be necessary in the future. For now, make sure that you have done both gcloud config set project recidiviz and gcloud auth application-default login.

Linting

Run Pylint across the main body of code, in particular: pylint recidiviz. This may take a few minutes.

The output will include individual lines for all style violations, followed by a handful of reports, and finally a general code score out of 10. Fix any new violations in your commit. If you believe there is cause for a rule change, e.g. if you believe a particular rule is inappropriate in the codebase, then submit that change as part of your inbound pull request.

Static type checking

Run Mypy across all code to check for static type errors: mypy recidiviz. This should take only a few seconds.

Running the app

There are two ways to run the app - on your local machine, or deployed to the cloud. In practice, the former is very limited as there are not locally installable equivalents to most of the managed services we rely on.

Local

For scrape-based ingest development, a scraper can be run locally using the run_scraper.py script. See that file for instructions on how to run it. By default the scraped entities will be logged only but not persisted into any database. To persist data during a local run, set the PERSIST_LOCALLY environment variable to true.

The full application layer runtime can also be run locally using flask run, connecting to the local emulators for GCP services (as described in Running tests above). The App Engine documentation has more information about running locally.

Deployed

Our own deployment has duplicate environment setups for staging and production. We liberally deploy to stage to test release candidates and experimental builds. Deploys to stage automatically happen from our continuous integration pipeline when a new release is created. Deploys to stage of the local build can be triggered manually with the deploy_local_to_staging.sh script.

Home
Architecture
Schemas
- State Corrections and Supervision
- County Corrections
- Expansion
Methodology
- Data Extraction
- Data Normalization
- Entity Matching
- Recidivism Measurement
Development
- Local Development
- Create a Scraper
- Add a New Schema
- Update BigQuery Views
- Continuous Integration
- Operations
  - Database Ops
  - Release and Deploy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly