Skip to content
View palewire's full-sized avatar

Sponsors

@newswim
@casperdcl

Highlights

  • Pro

Organizations

@pastpages @california-civic-data-coalition @palermo-hollywood
Block or Report

Block or report palewire

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
palewire/README.md

Hello. My name is Ben Welsh.

I’m an Iowan living in New York City. I work as a journalist, albeit an unconventional one. I specialize in what some people call data journalism, some call computational journalism and some others call computer-assisted reporting.

This README includes a directory of my open-source computer code on GitHub and other platforms. It does not include the dozens of apps, stories and graphics I've published as part of my journalism career. For that, visit palewi.re to find my résumé, a database of my news clips and an archive of my public-speaking engagements.

Table of contents

Products

Websites

repo description
amsat-satellite-index A searchable, sortable table listing all the ham satellites in space
californiacivicdata.org The online home of the California Civic Data Coalition
cummings.ee A collection of the work of Edward Estlin Cummings, as it enters the public domain
news-homepages An open-source archive that gathers, archives and shares news homepages
palewi.re My blog
savemy.news A personal, permanent clipping service
studs-terkel-podcast Selections from WFMT's Studs Terkel Radio Archive delivered to your podcatcher

Lesson plans

repo description
first-automated-chart Learn how you can use Python and the Datawrapper API to create a limitless number of charts and maps
first-django-admin A step-by-step guide to creating a simple web application that empowers you to enlist reporters in data entry and refinement
first-github-scraper An introduction to free, automated web scraping with GitHub Actions
first-pull-request How to propose changes to open-source software using GitHub pull requests
first-python-notebook A step-by-step guide to analyzing data with Python and the Jupyter Notebook
first-visual-story A step-by-step guide to publishing a standalone story from a dataset

Bots

repo description
old-la-photos A bot that posts photographs from the Los Angeles Public Library’s digital collection
metar-weather-bot A bot that posts the latest METAR weather report for LAX airport
muckrockbot A bot that posts the latest public records requests filed and completed at muckrock.com
nyc-open-data-monitor Automated monitoring of new and updated datasets posted to New York City's data portal
reuters-jobs A bot that posts the latest open jobs at Reuters
random-pigeon-gpt A bot that posts AI-generated images of New York City pigeons generated using random adjectives
sanborn-maps-bot A bot that posts random images from the Library of Congress collection of Sanborn Fire Insurance Company maps

Data

Computational notebooks

repo description
baseball-notebooks Python notebooks exploring Major League Baseball data
california-crop-production-wages-analysis Crop worker pay in California
california-electricity-capacity-analysis California's costly power glut
california-fire-zone-analysis California buildings within fire hazard zones
california-h2a-visas-analysis Temporary visas granted to foreign agricultural workers
census-hard-to-map-analysis A census undercount could cost California billions — and L.A. is famously hard to track
cfb-gap-analysis College football's most imbalanced teams
chicago-regions-map Creates a regional map of Chicago based on the city's official designations
chicago-trees-analysis How many trees has Chicago planted? And where?
construction-jobs-analysis Demographics and pay of construction workers
cubs-opening-day-analysis Analysis of the Opening Day starters for the Chicago Cubs baseball team
deadspin-scraper Scrape posts from Deadspin
deleon-district-election-results-analysis How former state Sen. Kevin de León fared in his own district
drudge-domain-analysis A simple example of using storytracker and the PastPages API to conduct a link analysis
faa-drone-license-analysis Who can fly commercially?
ferc-enforcement-analysis Civil penalties issued by FERC
helicopter-accident-analysis A Los Angeles Times analysis of helicopter accident rates
hollister-ranch-analysis Agricultural property tax breaks in Hollister Ranch
houston-flood-zone-analysis Geospatial analysis of Houston homes after Hurricane Harvey
hsr-document-analysis How California’s faltering high-speed rail project was ‘captured’ by costly consultants
judge-home-run-analysis How the Yankee slugger's 2022 pace compares to the past
la-settlements-analysis Legal payouts by L.A. city
la-vacant-building-complaints-analysis Vacant building complaints filed with L.A. city
la-weedmaps-analysis Black market cannabis shops thrive in L.A. even as city cracks down
literary-notebooks Python notebooks exploring Project Gutenberg texts
native-american-census-analysis The 2020 census is coming. Will Native Americans be counted?
promenade-west-sales-report An analysis of downtown Los Angeles housing prices
street-racing-analysis Street racing fatalities in L.A. County
swana-census-analysis Are Arabs and Iranians white? Census says yes, but many disagree
washingtonpost-newswhip-analysis How many pieces does the Washington Post publish?

Git scrapers

repo description
amateur-satellite-database The amateur satellites in space. A machine-readable mirror of JE9PEL's website and the SatNOGS database.
aphis-inspection-reports Scrapes inspection data and PDFs from the USDA's Animal and Plant Health Inspection Service
california-coronavirus-data The Los Angeles Times' open-source archive of California coronavirus data
california-coronavirus-scrapers The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker
fed-dot-plot-scraper Extracting the "dot plot" economic projections posted online by the Federal Open Market Committee
noaa-hurricane-gis-scraper Automated downloads of geographic information system data posted by the National Oceanic and Atmospheric Administration's National Hurricane Center and Central Pacific Hurricane Center

Public records requests

repo description
california-business-entities Corporations and limited-liability companies registered with the California Secretary of State
california-house-members A simple machine-readable list of the 53 men and women California sends to Congress
california-topojson-atlas Simple maps of California's 58 counties
cedar-rapids-buildings-unsafe-after-derecho-2020 Buildings marked as unsafe to occupy by the Cedar Rapids city government following the 2020 derecho storms
la-county-2016-primary-precinct-maps Maps of the consolidated precincts used in Los Angeles County's 2016 primary election
la-county-election-precincts-2018 Final election precincts used by the Los Angeles County Registrar-Recorder/County Clerk in the 2018 elections
la-county-election-precincts-2020 Final election precincts used by the Los Angeles County Registrar-Recorder/County Clerk in the 2020 general elections
la-county-trail-maps Geospatial data of trails managed or planned by the Los Angeles County Department of Parks and Recreation
la-magnets-2016-test-scores A database of test scores for roughly 200 L.A. Unified magnet schools obtained by the Los Angeles Times
la-metro-maps Geospatial data from L.A. Metro's public transportation system
lausd-school-campus-polygons The areas of school campuses at the Los Angeles Unified School District
los-angeles-county-tsunami-hazard-areas California Geological Survey maps of flooding tsunamis could produce in Los Angeles County
noaa-hurricane-hunters-logo An official logo of NOAA's Hurricane Hunters released via FOIA
nrol-39-logo A vector PDF of the official mission logo of NROL-39 released via FOIA
nyc-parks-logo The official logos of NYC Parks released via FOIL
regional-connector-art Public art created for light rail stations on the Los Angeles Metro's Regional Connector line
san-francisco-campaign-contributions Itemized monetary campaign contributions compiled by San Francisco's Ethics Commission
space-force-emblems The official logos of 83 US Space Force units
union-station-site-map The glossy map on display in the Los Angeles transit hub
us-ca-butte_county-addresses_parcels_roads-shp SHP files of addresses, parcels and roads received in a public record request from Butte County, California
us-ca-el_dorado_county-currprcl-shp SHP file of parcels with situs address attached provided via public records request by local government in El Dorado County California
us-ca-lake_county-situs_parcels-shp SHP file of parcels with situs address attached provided via public records request by local government in Lake County California
us-ca-lassen_county-situs_parcels-shp SHP file of parcels with situs address attached provided via public records request by local government in Lassen County California
us-ca-madera_county-situs-shp SHP file of address points provided via public records request by local government in Madera County California
us-ca-orange_county-situs_parcels-shp A SHP file of parcel polygons downloaded from Orange County California's public website
us-ca-san_joaquin_county-situs_parcels-shp SHP file of parcels with situs address attached provided via public records request by local government in San Joaquin County California
us-ca-santa_clara_county-gis-shp SHP files downloaded from Santa Clara County California's password-protected GIS repository
us-ca-santa_cruz_county-PointAddress_SC-shp SHP file of address points provided via public records request by local government inSanta Cruz County California
us-ca-shasta_county-ShastaCountySitusPoints-shp SHP file of address points provided via public records request by Shasta County government
us-ca-sonoma-county-sc_base_adr_addresses-shp SHP file of parcels with situs address attached provided via public records request by local government in Sonoma County California
us-ca-yuba_county-AddressPoints-shp SHP file of address points provided via public records request by local government in Yuba County California
usa-style-guides U.S. government style guides acquired via the Freedom of Information Act
usgs-anss-logo The logo for the U.S. Geological Survey's Advanced National Seismic System
usgs-hawaii-volcano-drone-survey-october-2022 Photography and a digital elevation model from an October 2022 USGS drone mission over the Kilauea volcano's Halema‘uma‘u pit crater.

Python

Templates

repo description
django-heroku-template A template for Django projects hosted by Heroku
python-open-source-template A template for open-source Python software repositories

Packages

repo description
air-quality-index Download air quality index data from AirNow
altair Declarative statistical visualization library for Python
atcf-data-parser Parser for the a-deck data posted online by the Automated Tropical Cyclone Forecasting System
archiveis A simple Python wrapper for the archive.is capturing service
calfire-wildfires Download wildfires data from CalFire
census-data-aggregator Combine U.S. census data responsibly
census-data-downloader Download U.S. census data and reformat it for humans
census-error-analyzer Analyze the margin of error in U.S. census data
census-map-consolidator Combine Census blocks into new shapes
census-map-downloader Easily download U.S. census maps
cpi Quickly adjust U.S. dollars for inflation using the Consumer Price Index (CPI)
django-anss-archive Archive real-time earthquake notifications from the USGS's Advanced National Seismic System
django-bakery A set of helpers for baking your Django site out as flat files
django-calaccess-downloads-website An open-source archive of campaign finance and lobbying disclosure data from the California Secretary of State’s CAL-ACCESS database
django-calaccess-raw-data A Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database
django-calaccess-processed-data A Django app to transform and refine campaign-finance data from the California Secretary of State’s CAL-ACCESS database
django-calaccess-scraped-data A Django app to scrape campaign-finance data from the California Secretary of State’s CAL-ACCESS website
django-calaccess-technical-documentation Technical documentation for our pipeline of Django apps that download, extract, load and process the CAL-ACCESS database
django-greeking Django template tools for printing filler, a technique from the days of hot type known as greeking
django-internetarchive-storage A custom Django storage system for Internet Archive collections
django-postgres-copy Quickly import and export delimited data with Django support for PostgreSQL's COPY command
django-yamlfield A Django database field for storing YAML data
inciweb-wildfires Download wildfire data from inciweb
install-python-pipenv-pipfile Easily install Python, pipenv and Pipfile packages in your GitHub Action
ipsos-credibility-interval Calculate Bayesian credibility intervals for online polling using the Ipsos method
mlbcolors Easy access to the official colors of every team in Major League Baseball
nasa-wildfires Download wildfire data from NASA satellites
nifc-wildfires Download wildfires data from NIFC
noaa-wildfires Download wildfires data from NOAA satellites
nws-aurora Download forecast data for Aurora Borealis and Aurora Australis from the National Weather Service
nws-wwa Download watch, warning and advisory data from the National Weather Service
python-censusbatchgeocoder A simple Python wrapper for U.S. Census Geocoding Services API batch service
python-googlegeocoder A simple Python wrapper for version three of Google's geocoder API
python-muckrock A simple Python wrapper for the MuckRock API
reuters-style A Python library format dates, numbers and text to conform with the Reuters Style Guide, the standards that guide the world's largest independent newsroom
savepagenow A simple Python wrapper for archive.org's "Save Page Now" capturing service
sphinx-palewire-theme A Sphinx theme for sites hosted at palewi.re
storysniffer Inspect a URL and estimate if it links to news story

Examples

repo description
altair-column-sort-example An example of how to sort the columns in a bar chart created by the Altair data visualization library
altair-election-maps-example An experiment in creating precinct-level election results maps using Python's Altair library
altair-interactive-scatterplot-example An example of how to add a tooltip to a scatterplot in Python's Altair charting library
dorling-cartogram-example How to calculate a dorling cartogram with Python
geopandas-intersection-area-example How to use geopandas' overlay method to find the area of intersections between two datasets
geopandas-spatial-join-example An example of how to join point to polygon data with geopandas and Python
git-scraper-example A example of a git scraper that download, lints, commits and archives a data set
jupyter-notebook-execution-examples Examples of how to remotely execute Jupyter Notebooks from other contexts
pandas-combine-workbooks-example How to use Python's pandas library to combine tabs from multiple Microsoft Excel workbooks into a single CSV
pandas-squarify-example How to use the squarify extension to matplotlib to visualize a pandas DataFrame as a treemap
random-tract A Python hack to respond to a Twitter challenge to "select a random geographic point in the US, with the probability weighted by population."

JavaScript

Examples

repo description
10 PRINT CHR$(205.5+RND(1)); : GOTO 10 RUN A popular one-line script for the Commodore 64
2018-year-in-review A streamgraph of the Los Angeles Times' master branch
analyzing-color Experiments in manipulating color data
baseball-visualizations Abstracting America's pastime
california-fire-zones Maps developed as part of a Los Angeles times geospatial analysis of fire risk zones
california-in-mercator A base map of the state
california-poppy-generator A randomized generator of California poppies
covid-19-prototypes Experiments developed as part of the Los Angeles Times’ coronavirus tracking effort
delaunay-headline-hero This modification of Mike Bostock's Delaunay Dual diagram was drafted for consideration as lead art on a gallery of data analysis pieces
earthquake-intensity-map A "shakemap" of the 7.1 magnitude earthquake that struck Searles Valley, Calif., on July 5, 2019
election-2013 Resultados de las Oct. 2013 Buenos Aires elecciones
election-results-by-education-treemap A treemap of Iowa's 2016 presidential election results by education
election-results-challenge Got a better idea? Here's you chance to prove it
first-observable-notebook A course taught at the 2020 conference of the National Institute for Computer Assisted Reporting
hexbin-headline-hero The first draft of the diagram that serves as the lead art on a gallery of data analysis pieces
how-iowa-voted Mapping election results from the state of Iowa
inglewood-inventory Lunches in the "City of Champions" with the Los Angeles Times Data Desk
iowa-dorling An example of a Dorling cartogram mapping the population of Iowa's 99 counties
load-d3-data-incrementally-using-sorted-value Gradually loads all the cities of California from north to south, and then removes them from south to north.
numbers-in-the-newsroom Calculators for common newsroom needs, including those featured in Sarah Cohen’s book “Numbers in the Newsroom: Using Math and Statistics in News”
observable-helpers Utilities to help do things
spike-chart This variation on a standard bar chart substitutes in a proportionally sized spike for the traditional rectangle
the-ichiro-bet The Ichiro Bet
the-many-voices-of-the-other-americans Visualizing the rotating narrators of Laila Lalami's novel "The Other Americans"
tinting-a-canvas-image How to overlay a color filter on top of a canvas image
trump-tweets Techniques for working President Donald Trump's posts on twitter.com
us-census-data A variety of methods for visualizing data from the United States Census Bureau
vega-visualizations Examples of using the Vega data visualization toolkit
voronoi-husband-and-wife A photograph of my wife and I stippled using a Voronoi diagram
web-dubois Digital recreations of data visualizations made for W.E.B. Du Bois’ presentation at the “Paris Exposition Universelle” in 1900

Other stuff

repo description
dotfiles My configuration files
ebook-exports Export the e.e. cummings free poetry archive to a variety of ebook formats
internet-archive-upload Upload files to an archive.org item
is-5 Page scans of E.E. Cummings’ 1926 book of verse
tulips-and-chimneys Page scans of E.E. Cummings’ first published book of verse

Inactive projects

Websites

repo description
boundaries.latimes.com An API that serves up local GIS data
documentstacker Use DocumentCloud to publish PDFs for humans
nicar18-datadesk-family-reunion Los Angeles Times Data Desk Reunion @ NICAR 2018
nicar19-datadesk-family-reunion Los Angeles Times Data Desk Reunion @ NICAR 2019
orchestral-motion.github.io L.A. Phil hackday website
pastpages.org The news homepage archive
tablestacker Publish spreadsheets as interactive tables. And do it on deadline.

Lesson plans

repo description
first-news-app A step-by-step guide to publishing a simple news application
first-web-scraper A step-by-step guide to writing a web scraper with Python

Bots

repo description
checkbook-la-watchdog A periodically updated archive of financial data published by the city of Los Angeles' Checkbook LA data portal
everytractcount Statistics about every U.S. Census tract mapped by @everytract
mistadobalina A script that posts raps by Del Tha Funkee Homosapien to @MISTADOBALINA on Twitter
mlb-postseason-bot Twitter bot that posts daily updates on a team’s chance to make the Major League Baseball postseason
questionheds A feed of headlines with question marks in them
trump-tweets All @RealDonaldTrump tweets stored at trumptwitterarchive.com in a single JSON

Templates

repo description
appengine-template Bootstrap a Google App Engine project with Django and other goodies
django-project-template A custom template for initializing a new Django project the Data Desk way
django-calaccess-project-template A custom template for initializing a new Django project with the California Civic Data Coalition's applications for analyzing the California Secretary of State’s CAL-ACCESS database

Packages

repo description
altair-latimes A Los Angeles Times theme for Python's Altair statistical visualization library
calculate Some simple math we use to do journalism
django-a-matter An app for authoring background biographical matter on newsworthy people
django-autoarchive Django helpers for automatically archiving URLs
django-calaccess-campaign-browser A Django app to refine, review and republish campaign finance data drawn from the California Secretary of State’s CAL-ACCESS database
django-calaccess-cookbook A Chef cookbook and Fabfile for deploying the California Civic Data Coalition's applications for analyzing the California Secretary of State’s CAL-ACCESS database on Amazon Web Services
django-calaccess-docker A standalone Docker stack serving the California Civic Data Coalition's applications for analyzing the California Secretary of State’s CAL-ACCESS database
django-calaccess-lobbying-browser A simple Django app browse California lobbying activity data from CAL-ACCESS
django-correx A set of models and template tags for pulling in lists of content changes across applications
django-memento-framework A set for helpers for Django web sites to enable the Memento framework for time-based access
django-orchestral-motion-db A Django channels app for receiving live motion data from an accelerometer
django-rapture An archive of the Rapture Index at raptureready.com
django-swineflu A quick and dirty data dump of the H1N1 flu vaccine locations that LA County public health currently buries in a PDF
django-urlarchivefield A custom Django model field that automatically archives a URL
lametro-api A simple Python wrapper for the L.A. Metro’s API for bus stops, routes and vehicles
mappingla A Python wrapper for accessing the Mapping L.A. Boundaries API
pastpages2gif Create an animated GIF from the PastPages news homepage archive
pluggablemaps A pluggable GeoDjango app with the boundaries of all states in the United States of America. Geography, loosely coupled
pluggablemaps-hackshackers A GeoDjango app that maps unemployment, meant to demonstrate concepts from my talk
pluggablemaps-lametrorail A pluggable GeoDjango app mapping the Los Angeles Metro Rail system
pluggablemaps-uscounties A pluggable GeoDjango app with the boundaries of United States counties
pyplacefinder A very simple wrapper for Yahoo PlaceFinder
python-elections A Python wrapper for the Associated Press' U.S. election data service
qiklog A simplified wrapper for Python's logging module
scrapy-calaccess-crawler A Scrapy app to scrape campaign-finance data from the California Secretary of State’s CAL-ACCESS website
statestyle A Python library that standardizes the names of U.S. states
storytracker Tools for tracking stories on news homepages
webcitation A simple Python wrapper for the webcitation.org capturing service
wordpress-memento-plugin A plugin for Wordpress web sites to enable the Memento framework for time-based access

Other stuff

repo description
california-k12-notebooks Scripts to download and process California K12 schools data
chirp-ham-radio-channels Channels formatted for the CHIRP amateur radio programming system
first-python-notebook-binder A template for deploying "First Python Notebook" with Binder
ire2010 Class materials for the Django bootcamp at the IRE 2010 conference in Las Vegas
nicar2010 Materials from the Django bootcamp at NICAR 2010
nicar2011 An example Django app for a class at the NICAR 2011 conference
python-calaccess-notebooks Python notebooks analyzing campaign finance and lobbying activity data from California Secretary of State’s CAL-ACCESS database
osm-quiet-la A street-centric base layer for overlaying point data about Southern California
osm-silent-la A template for a black base layer about Southern California
sopr-activity A quick and dirty script for pulling down lobby disclosure docs filed with the Senate Office of Public Records
sopr-contribs Scripts for processing and analyzing federal lobbyist disclosure data reporting contributions to political campaigns
the-mondesi-bet The Mondesi Bet

Popular repositories

  1. nrol-39-logo nrol-39-logo Public archive

    A vector PDF of the official mission logo of NROL-39 released via FOIA

    408 60

  2. django-bakery django-bakery Public

    A set of helpers for baking your Django site out as flat files

    Python 399 61

  3. first-web-scraper first-web-scraper Public

    A step-by-step guide to writing a web scraper with Python

    Python 202 164

  4. archiveis archiveis Public

    A simple Python wrapper for the archive.is capturing service

    Python 175 16

  5. django-postgres-copy django-postgres-copy Public

    Quickly import and export delimited data with Django support for PostgreSQL's COPY command

    Python 173 48

  6. savepagenow savepagenow Public

    A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service

    Python 164 23