Skip to content

Bus-Data-NYC/mta-bus-archive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mta bus archive

Download archived NYC MTA bus position data, and scrape gtfs-realtime data from the MTA.

Bus position data for July 2017 forward is archived at https://s3.amazonaws.com/nycbuspositions. Archive files follow the pattern https://s3.amazonaws.com/nycbuspositions/YYYY/MM/YYYY-MM-DD-bus-positions.csv.xz, e.g. https://s3.amazonaws.com/nycbuspositions/2017/07/2017-07-14-bus-positions.csv.xz.

Requirements:

  • Python 3.x
  • PostgreSQL 9.5+

Set up

Specify your connection parameters using the standard Postgres environment variables:

PGDATABASE=dbname
PGUSER=myuser
PGHOST=myhost.com

You may skip this step if you're using a socket connection to your user's database.

Initiation

This command will create a number of whose tables that begin with rt_, notably rt_vehicle_positions, rt_alerts and rt_trip_updates. It will also install the Python requirements, including the Google Protobuf library.

make install

Download an MTA Bus Time archive file

Download a (UTC) day from data.mytransit.nyc, and import into the Postgres database dbname:

make -f download.mk download DATE=2016-12-31

Scraping

Scrapers have been tested with Python 3.4 and above. Earlier versions of Python (e.g. 2.7) won't work.

Scrape

The scraper depends assumes an environment variable, BUSTIME_API_KEY, contains an MTA BusTime API key. Get a key from the MTA.

export BUSTIME_API_KEY=xyz123

Download the current positions from the MTA API and save a local PostgreSQL database named mtadb:

make positions

Download current trip updates:

make tripupdates

Download current alerts:

make alerts

Scheduling

The included crontab shows an example setup for downloading data from the MTA API. It assumes that this repository is saved in ~/mta-bus-archive. Fill-in the PG_DATABASE and BUSTIME_API_KEY variables before using.

Uploading files to Google Cloud

Setup

Create a project in the Google API Console. Make sure to enable the "Google Cloud Storage API" for your application. Then set up a service account. This will download a file containing credentials named something like myprojectname-3e1f812da9ac.json.

Then run the following (on the machine you'll be using to scrape and upload) and follow instructions:

gsutil config -e

Next, create a bucket for the data using the Google Cloud Console.

You've now authenticated yourself to the Google API. You'll now be able to run a command like:

make -e gcloud DATE=2017-07-14 PG_DATABASE=mydbname

By default, the Google Cloud bucket will have the same name as the database. Use the variable GOOGLE_BUCKET to customize it.

License

Available under the Apache License.