Skip to content

Scrapes data from a website; cleans, filters, and analyzes the data; and maps it.

License

Notifications You must be signed in to change notification settings

plain-jane-gray/webscraping-and-mapping

Repository files navigation

Webscraping data from a website and mapping it

The code in this respository scrapes data from a publicly available website and displays key information on a map so it is easier to understand. Specifically, the code scrapes data from a Department of Defense (DOD) website; cleans, filters, and compares the data to proposed environmental standards; and then maps it so the exceedances of these standards can be easily visualized.

This repository contains two Jupyter notebooks:

A. Creating and cleaning data for mapping - GH.ipynb.

Inputs: Two urls that are responses from a Department of Defense (DOD) website. One contains metadata, specifically the geospatial locations for military bases in the United States. The other contains per-and polyfluoroalkyl substances (PFAS) concentrations measured at those bases in JSON format.

Outputs: Multiple data tables that could be exported. The most valuable ones are two data tables that report the highest concentration of PFAS for each military base reported, whether or not that concentration exceeds Environmnetal Protection Agency's (EPA) proposed standards and guidelines (either 4 parts per trillion (ppt) or 70 ppt, respectively), and associated geospatial data for each base.

The code does the following:

  1. Sends API request to a public DOD website and pulls in geospatial locations for military bases in the United States and PFAS concentrations measured at those bases in JSON format.
  2. Accesses JSON sublists if needed and converts lists to a pandas dataframe.
  3. Cleans the data by removing nan values and then filters the data a variety of ways, such as by only selecting PFOA and PFOS (forms of PFAS that the EPA has proposed standards for).
  4. Compares the concentrations of PFOA and PFOS to EPA's proposed standard of 4 ppt and identifying samples that exceed this by marking it with a True statement in a column.
  5. Filters the data again in a variety of ways (i.e. selecting only bases that exceed this standard, determining the number of bases with exceedances, determining the number of bases with exceedances and no treatment system).
  6. Filtes the data so only the highest concentration of PFOA or PFOS is selected for each military base. This allows for easy viewing on a map.
  7. Merges the geospatial locations (i.e. latitude and longitudes) and highest concentration of PFAS for each military base to allow for mapping.
  8. Reapeats step 4-7 using 70 ppt instead of 4 ppt. 70 ppt is an older guidance threshold for PFAS EPA had published previously.
  9. Cleaning and sorting date fields to determine the span of dates the PFAS samples were collected.

B. Mapping PFAS webscraped data GH.ipynb.

Inputs:

  1. CSV file generated by the "Creating and cleaning data for mapping - GH.ipynb" notebook. This csv file contains a data table that reports military bases that exceed EPA's proposed standard of 4 ppt.
  2. A basemap shapefile of the state outlines from USGS

Outputs: A map displaying the military bases in the continential US with PFAS samples that exceed EPA's proposed standard of 4 ppt.

The code does the following:

  1. Inputs a csv file generated by the first Jupyter notebook and converts it to a pandas dataframe. This csv file contains a data table that reports military bases that exceed EPA's proposed standard of 4 ppt.
  2. Filters the dataframe to only the necessary information for mapping.
  3. Uses geopandas to create a geometry column which contains both the latitudes and longitudes for each military base.
  4. Filters the data so only includes data for the continental US to allow for a map that is easier to view.
  5. Pulls in the basemap from USGS and makes sure the projections are the same between the basemap and the military base data (WGS84).
  6. Maps the military bases on top of the basemap. The local version of this notebook produced a pop out window with the map that allows the user to zoom in to see the labels more clearly.

About

Scrapes data from a website; cleans, filters, and analyzes the data; and maps it.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published