mta bus archive

Download archived NYC MTA bus position data, and scrape gtfs-realtime data from the MTA.

Bus position data for July 2017 forward is archived at https://s3.amazonaws.com/nycbuspositions. Archive files follow the pattern https://s3.amazonaws.com/nycbuspositions/YYYY/MM/YYYY-MM-DD-bus-positions.csv.xz, e.g. https://s3.amazonaws.com/nycbuspositions/2017/07/2017-07-14-bus-positions.csv.xz.

Requirements:

Python 3.x
PostgreSQL 9.5+

Set up

Specify your connection parameters using the standard Postgres environment variables:

PGDATABASE=dbname
PGUSER=myuser
PGHOST=myhost.com

You may skip this step if you're using a socket connection to your user's database.

Initiation

This command will create a number of whose tables that begin with rt_, notably rt_vehicle_positions, rt_alerts and rt_trip_updates. It will also install the Python requirements, including the Google Protobuf library.

make install

Download an MTA Bus Time archive file

Download a (UTC) day from data.mytransit.nyc, and import into the Postgres database dbname:

make -f download.mk download DATE=2016-12-31

Scraping

Scrapers have been tested with Python 3.4 and above. Earlier versions of Python (e.g. 2.7) won't work.

Scrape

The scraper depends assumes an environment variable, BUSTIME_API_KEY, contains an MTA BusTime API key. Get a key from the MTA.

export BUSTIME_API_KEY=xyz123

Download the current positions from the MTA API and save a local PostgreSQL database named mtadb:

make positions

Download current trip updates:

make tripupdates

Download current alerts:

make alerts

Scheduling

The included crontab shows an example setup for downloading data from the MTA API. It assumes that this repository is saved in ~/mta-bus-archive. Fill-in the PG_DATABASE and BUSTIME_API_KEY variables before using.

Uploading files to Google Cloud

Setup

Create a project in the Google API Console. Make sure to enable the "Google Cloud Storage API" for your application. Then set up a service account. This will download a file containing credentials named something like myprojectname-3e1f812da9ac.json.

Then run the following (on the machine you'll be using to scrape and upload) and follow instructions:

gsutil config -e

Next, create a bucket for the data using the Google Cloud Console.

You've now authenticated yourself to the Google API. You'll now be able to run a command like:

make -e gcloud DATE=2017-07-14 PG_DATABASE=mydbname

By default, the Google Cloud bucket will have the same name as the database. Use the variable GOOGLE_BUCKET to customize it.

License

Available under the Apache License.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.github/workflows		.github/workflows
sql		sql
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
crontab		crontab
download.mk		download.mk
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

sql

sql

src

src

.gitignore

.gitignore

Makefile

Makefile

README.md

README.md

crontab

crontab

download.mk

download.mk

requirements.txt

requirements.txt

Repository files navigation

mta bus archive

Set up

Initiation

Download an MTA Bus Time archive file

Scraping

Scrape

Scheduling

Uploading files to Google Cloud

Setup

License

About

Releases

Packages

Languages

Bus-Data-NYC/mta-bus-archive

Folders and files

Latest commit

History

Repository files navigation

mta bus archive

Set up

Initiation

Download an MTA Bus Time archive file

Scraping

Scrape

Scheduling

Uploading files to Google Cloud

Setup

License

About

Topics

Resources

Stars

Watchers

Forks

Languages