Skip to content

greatexpectationslabs/airflow_meetup_demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README

Setup

This demo builds from the apache airflow quick start (https://airflow.apache.org/start.html).

Set AIRFLOW_HOME to point to the airflow/ directory in this repository.

Setup:

docker-compose up -d
airflow initdb  # Only necessary on first run, of course
airflow webserver -p 8080
airflow scheduler

Analysis

We will evaluate NPI data from cms.gov. For the original raw data, see: .

  1. Install GE in our project, and profile the datasource.
  2. Review data-docs built for the NPI data.
  3. Identify some columns to investigate further:
    • "Provider Other Organization Name Type Code"
    • "Provider Enumeration Date"
  4. Run the create expectations notebook

data_asset_name = 'demo__dir/default/npidata_pfile' data_asset_name = 'demo__dir/default/npidata_pfile_transformed'

ge.dataset.util.build_categorical_partition_object(batch, column='Provider Other Organization Name Type Code') batch.expect_column_distinct_values_to_be_in_set(column='Provider Other Organization Name Type Code', value_set=[3, 4, 5])

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published