Skip to content

DataUSA/datausa-acs-bamboo-etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data USA ACS Bamboo ETL

This repository will have a series of Bamboo pipelines to process and ingest the data from the American Community Survey used in Data USA.

Note: the api for the year 2014, estimate 1 and geographic division for nation does not work correctly so there is a folder with these files downloaded manually.

Local Setup

  1. Create a new virtual environment and activate it:
python -m venv venv
source venv/bin/activate
  1. Install requirements
pip install -r requirements.txt
  1. Create an environment variables file following this structure:
export API_KEY='<Your API KEY goes here>';
export DATAUSA_DB_PW='monetdb'; # The default password for MonetDB is monetdb.
export DATAUSA_DB_HOST='localhost'; # Assuming you're ingesting inside a local MonetDB container.
export PYTHONPATH=$PYTHONPATH:<path to repository>/datausa-acs-bamboo-etl;
  1. Run a pipeline, for example, the Gini pipeline:
source .env
cd acs/acs_yg_gini
python acs_yg_gini_pipeline.py

Dockerized Setup

  1. Create Python 3.7.9 container:
docker run -it -v <path to repository>/datausa-acs-bamboo-etl:/datausa-acs-bamboo-etl --name=python3-local python:3.7.9 bash
  1. Use previously created .env file and run a pipeline, say the Gini pipeline:
cd datausa-acs-bamboo-etl
pip install -r requirements.txt
source .env
cd acs/acs_yg_gini
python acs/acs_yg_gini_pipeline.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •