GitHub - alanbalugu/Cancer-and-Carcinogens-COSC287: Data Science GitHub Repo Fall 2019

netids: agb76, pnl8, kie3, akh70

Twitter:

to download twitter data as csv files:
    run GetTwitterDataByHandle.py
    this file will use twitterhandles.txt to access different handles' data

to view original twitter data in csv format:
    open UncleanedTwitterData.zip
    contains separate csv files for data from all twitter handles in twitterhandles.txt

to clean twitter data:
    run twitterDataCleaning.py
    also run combineCSV.py

to view cleaned epa data in csv format:
    open CleanedTwitterData.zip
    contains separate csv files for cleaned data from all twitter handles in twitterhandles.txt
    also contains one larger csv with compiled cleaned data from all handles

EPA Data:

to download epa data as csv files:
    run epa_downloader.py

to view original epa data in csv format:
    open 'original' folder
    contains two files:
        epa_data_state_releases.csv
        epa_data_state_chems_and_releases.csv

to clean epa data:
    run clean_trim_EPA.py

to view cleaned epa data in csv format:
    open 'cleaned' folder
    contains two files:
        epa_data_state_releases_cleaned.csv
        epa_data_state_chems_and_releases_cleaned.csv

to generate a frame of state as rows and years as columns:
    run epa_generate_state_year_frame.py

to view a frame of state as rows and years as columns:
    open epa_state_year_frame.csv

to view EPA statistics
    run epa_stats.py

to view EPA graphs
    open 'epa_graphs' folder

CDC Data:

to view aggregated cancer stats from 1999-2016 by
state:
    open USCS_CancerTrends_OverTime_ByState.csv

to generate csv files containing data from the CDC
    dataset on chronic disease indicators for cancer, 
    run "cdc_download_clean.py" which writes out the original, uncleaned 
    CDC data, as well as performs the data cleaning 
    procedures to then create a new, cleaned dataset
    this will create: 
        "CDC_API.csv" which contains
            the original, uncleaned data from CDC.gov and 
        "CDC_API_Clean.csv" which contains the
            dataframe after cleaning, sorting, and removing
            rows containing null data

to view CDC API statistics:
    run cdc_api_statistics.py

to generate NaiveBayes and RandomForest classification and other analysis for USCS:
    run FINAL_CDC_P2.py

to generate heat maps comparing the regions for USCS:
    run CDC_regions_heatmaps.py

to run association rule mining for USCS data:
    run AssociationRuleMining.py

to generate clusters (DBScan, Hierchical and K-means) for USCS data:
    run CdcClustering.py

Merged Data:

to view our final merged data set contained both CDC and EPA data for every year and state 
    open merged_data.csv

to merge EPA and CDC data into final data set
    run merge_data.py

to run KNN classification on merged data:
    run kNearestNeighbors.py

to run decision tree classification on merged data:
    run decision_tree.py

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
CDC		CDC
EPA		EPA
Other Datasets		Other Datasets
Project_3		Project_3
Twitter		Twitter
archive		archive
.DS_Store		.DS_Store
.gitattributes		.gitattributes
Data Science P2 Write Up.pdf		Data Science P2 Write Up.pdf
README.md		README.md
decision_tree.py		decision_tree.py
kNearestNeighbors.py		kNearestNeighbors.py
merge_data.py		merge_data.py
merged_data.csv		merged_data.csv
usamap_testing.html		usamap_testing.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CDC

CDC

EPA

EPA

Other Datasets

Other Datasets

Project_3

Project_3

Twitter

Twitter

archive

archive

.DS_Store

.DS_Store

.gitattributes

.gitattributes

Data Science P2 Write Up.pdf

Data Science P2 Write Up.pdf

README.md

README.md

decision_tree.py

decision_tree.py

kNearestNeighbors.py

kNearestNeighbors.py

merge_data.py

merge_data.py

merged_data.csv

merged_data.csv

usamap_testing.html

usamap_testing.html

Repository files navigation

About

Releases

Packages

Contributors 4

Languages

alanbalugu/Cancer-and-Carcinogens-COSC287

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages