Skip to content

Reporting

Joshua Essex edited this page May 21, 2020 · 1 revision

Reporting Analysis and Context

As discussed previously, our calculation system produces a large number of metrics about the performance of the criminal justice system, from the highest vantage points down to the lowest, with a variety of units of analysis and comparisons. However, those metrics are only valuable in and of themselves to a limited extent: the right numbers need to be provided to the right person, at the right time, with the right context, to guide the system towards improved outcomes.

We provide reporting in a variety of channels, including inbound channels such as our web apps and outbound channels such as email reports sent to your inbox. This page briefly documents these channels, how they work, and how they consume from our data platform.

Web Dashboards

Recidiviz provides a number of web dashboards for different purposes and user groups. Our first and primary app, Pulse Dashboard, provides a variety of views for different user groups that help set and track goals for outcome improvement, and explore different metrics to learn more about what is driving trends in certain directions.

The app consumes metrics from our calculation pipeline through json files which are generated by a process known as the "Dashboard Export Manager." The overall workflow looks like this:

  1. Calculation pipeline jobs are executed in serial or in parallel, as configuration dictates.
  2. When all expected pipelines for a day have completed successfully, the Dashboard Export Manager ("export") is invoked.
  3. The export queries/materializes a set of BigQuery views into BigQuery tables, each of which are responsible for producing a single json file that is to be consumed by the app. BigQuery views can rely on other BigQuery views, producing a graph of queries that are executed by BigQuery in the correct order. That is, if there are 10 json files we expect to produce for the app, there should be 10 BigQuery tables, produced by 10 "final" BigQuery views, each of which may in turn rely on more views that compose or format data in certain ways conducive to the final view's needs.
  4. Each json file is uploaded to a secure bucket in Google Cloud Storage once the table is successfully materialized. The files are written in-place, overriding the previous version of a given file with the same name. (The bucket has versioning enabled with a retention policy set to destroy the oldest versions.)
  5. The backend server for the app, a Node.js server running in Google App Engine, has a metric cache with a configurable TTL. When that TTL expires, the app automatically re-fetches the latest metric files from Google Cloud Storage to refresh the cache.
  6. Finally, the frontend of the app, a React bundle, makes secure API calls to the backend server as the user navigates through the app. As these are static requests, caching is handled automatically by the browser.

Email Reports

Recidiviz provides a growing set of email reports, delivered to users' inboxes. These reports are targeted to specific user groups' needs, promoting actions and policies which analysis shows will lead to improved outcomes. For example, our PO Monthly Report is sent to parole and probation officers with the twin goals of promoting/increasing the use of early discharges from supervision and of reducing the use of revocations back to prison.

The data consumption set up for the reporting pipeline is the same as described above, with the exception that the metric files consumed by the reporting pipeline are uploaded into a different bucket in Google Cloud Storage. The workflow of the reporting pipelines follows:

  1. The "report_start_new_batch" function is invoked, either by a human operator, on a schedule, or through some programmatic trigger. The request includes two required parameters: state_code and report_type.
  2. The function directly invokes the Data Retrieval step, which fetches the correct metric file for the given state and report type. That is, there should be exactly one "live" json file for each combination of state and report type. Each json file should include exactly one object for each intended recipient in that batch.
  3. For each recipient in the fetched json file, a ReportContext is constructed, which contains the key metadata requirements for that recipient (e.g. email address and state code) and the data required to generate the report.
  4. Each ReportContext object is passed to the Data Preparation step, which performs a series of report type-specific operations that convert the "recipient data" into "prepared data." This includes actions as simple as rounding values meant to be whole integers for display, to actions as complex as setting a variety of color and image attributes based on a comparison between values in the recipient data (e.g. to display different icons based on how an individual recipient compares to district and state averages).
    1. This step is aided by a properties.json file which exists for each combination of state code and report type, and provides static property values required by Data Preparation. The file is retrieved from a bucket in Google Cloud Storage at runtime.
  5. Each ReportContext object is passed to the Generation step, where the contents of the "prepared data" are injected into a template.html file which exists for each combination of state code and report type, fetched from the same bucket as the accompanying properties.json file. With the template injected with the recipient-specific values, an HTML file is generated for the user and uploaded to a secure Google Cloud Storage bucket.
  6. Once Generation has completed for each intended recipient in the original metric file, the function returns the id of the batch that was just successfully generated.
  7. A second Cloud Function can be invoked: "report_deliver_emails_for_batch." This function takes in the batch_id to be delivered, fetches all HTML files for that batch from Google Cloud Storage, and delivers each to its intended recipient, as noted in the filename itself. The function can optionally take in a test_address parameter, which will send each email to the given email address for review purposes.