Skip to content

r-barnes/waterviz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WaterViz

By Richard Barnes.
See the live map and the source code.

Introduction

WaterViz aims to provide a high-level view of water availability and factors affecting its quantity and quality by providing a visual overview of real-time river conditions in the conterminous United States. All U.S. rivers and all active U.S. gauge stations are colored and sized based on how their current discharge rate ranks against a thirty year history. Optionally, current land use can be displayed, as well as an analysis of how land use has changed in proximity to water. Historic hurricane tracks are provided to better explain river surges due to high rain volumes.

Details

WaterViz.com aims to provide an intuitive view of the current and historic state of water in the conterminous United States. To do so, it presents all of the relevant information in a mapped form.

A walk-through video is here and a video showing how WaterViz.com can be used to facilitate analysis is here.

Hydrographic data drawn from the National Hydrography Dataset Plus allows the map to display all rivers and active U.S. gauge stations. These are colored and sized based on how their current discharge rate ranks against a thirty year history. Hovering over rivers and stations will display their current discharge rate and stage height, along with links for more information. Current discharge rates are drawn from the National Water Information System

To facilitate understanding of current conditions, the National Land Cover Database, 2011 edition has been used to generate a base layer showing a 20-classification land-use through 13 distinct zoom levels.

To facilitate understanding of the relationships between land-use and water, the National Hydrography Dataset Plus was burned into raster format and proximity-to-water rasters were developed. These were used in conjunction with the National Land Cover Database edition 2001 and 2011 editions to quantify how land-use has changed with regards to proximity to water over the past decade.

This information is displayed on a per-county basis (using the Census TIGER/Line files) where counties are coloured from white to deep red depending on how much near-shore development they have incurred.

Put another way, the foregoing provides an analysis of riparian buffer zones. The lack of such zones has much to do with poor water quality and dead zones downstream and policy action to expand buffer zone size and quantity has become a hot topic recently.

Hurricane data has been overlaid in order to facilitate understanding of the causes for high water volumes.

Future improvements to the project include optimizing server-side database queries and caching. While the project is set up to update data in real-time, updating has been disabled for the next couple of weeks as a proof-of-concept to ensure speedy performance. Additionally improvements will be made to the user interface to facilitate deeper exploration of the data.

In terms of project stats, the server holds 50 gigabytes of explorable data. To perform county-level buffer strip analysis over 11 terabytes of intermediate products were generated and a few hundred hours of supercomputing time were expended. Being a student can have its benefits!

Technical details are available on the project's GitHub page.

Technology

Server Side

  1. PostgreSQL and PostGIS for a geospatial database. PostgreSQL 9.1 or later and PostGIS 2 are recommended for ease of installing the PostGIS extension. This database is moderately large; you may want to [tune Postgres settings](http://nelsonslog.wordpress.com/2011/10/12/quick-post gresql-tuning-notes/) to use more memory.
  2. TileStache for the Python web app that serves map tiles. TileStache has an undocumented dependency on Shapely that you can install via pip.
  3. Gunicorn: A Python web server container
  4. Flask: A Python microframework web server
  5. Numpy: Used for number crunching in determine county statistics
  6. Scipy: Used to determine flow rankings
  7. Psycopg2: A Python library for interfacing with PostGIS
  8. Nginx: A light-weight, secure, fast web server through which we will proxy all the other servers to gain security and cacheing
  9. pip: Used to install the latest Python packages
  10. p7zip for unpacking NHDPlus and NLCD data. Ubuntu users be sure to install p7zip-full.
  11. shp2pgsql, part of PostGIS, for importing ESRI shapefiles into PostGIS
  12. pgdbf for importing DBF databases into PostgreSQL. Note you need at least version 0.6.2 for the -s flag.
  13. requests: Used to retrieve real-time hydrographic data
  14. gdal: For creating NLCD tiles and performing statistics on them

Client side

  1. Leaflet: A simple, blazin' fast map handlin' library
  2. D3.js: Data-driven documents, allows quick loading of river networks
  3. Underscore.js: A JS functional programming library
  4. jQuery: A library for expediting JS DOM manipulations
  5. jquery DatePicker: A light-weight JS date picker
  6. Moment.js: For handling time
  7. Turf.js: Makes hurricane tracks smooth with bezier curves

Data Sources

  1. NHDPlus: Source for river flowlines, gauge locations, gauge information, and gauge history
  2. NLCD 2011: Source for the land use information
  3. Census TIGER/Line: Source for county outlines
  4. National Water Information System: Source of real-time hydrography data
  5. IBTrACS-WMO Hurricane Data: Source of historic hurricane tracks and windspeeds

Getting started

This project contains everything you need from start to finish to make a vector based web map of American rivers in the contiguous 48 states. There are three parts to the project: data preparation, HTTP serving of vector tiles, and clients that render maps.

Quick start

  • Install the aforementioned software.
  • Run dataprep/downloadNhd.sh to download data to a directory named "NHD".
  • Run dataprep/importNhd.sh to bring data NHD into a PostGIS database named "rivers".
  • Run serve.sh from inside the server directory to start TileStache in Gunicorn at http://localhost:8000/.
  • Load a sample tile on localhost to verify GeoJSON tiles are being served.
  • Run server/gauges.py from within its directory to serve up information needed for styling rivers and counties
  • Set up a cron job to run server/gauges_backend.py to keep new data flowing in
  • Load clients/index.html to view the map.
  • Use server/nginx-rivers.conf to configure the nginx server.

About vector tiles

Vector tiles are an exciting, underutilized idea to make efficient maps. Google Maps revolutioned online cartography with "slippy maps", raster maps that are a mosaïcof PNG or JPG images. But a lot of geographic data is intrinsically vector oriented, lines and polygons. Today many map servers render vector data into raster images that are then served to clients. But serving the vector data directly to the user's browser for rendering on the client can make maps that are more flexible and more efficient.

In this project, we use vector tiles to serve up all of the rivers in the United States. We are then able to style these rivers client-side based on recent hydrographic data.

Extra Ubuntu 14.04 details

A partial list of installation instructions:

# Install needed software with apt and PIP
apt-get install git p7zip-full python-pip postgresql-server-dev-all python-dev libevent-dev gdal-bin postgis postgresql-client postgresql pgdbf nginx
pip install psycopg2 gunicorn tilestache requests grequests shapely --allow-external PIL --allow-unverified PIL

# Postgres needs to be set up with appropriate user login.
sudo -u postgres createuser -s -d nelson

# Configure Postgres to let user connect without password by specifying "trust" method
# (or else alter code to supply a password)
edit /etc/postgresql/9.3/main/pg_hba.conf

# Optionally tune postgres performance
edit /etc/postgresql/9.3/main/postgresql.conf

Project components

This project consists of several short scripts and configuration files to glue together the software components. There is precious little programming logic here, most of it is integration.

  • dataprep/downloadNhd.sh downloads data from [NHDPlus](http://www.horizon- systems.com/nhdplus/), a nice repository of cleaned up National Hydrographic Data distributed as ESRI shapefiles. This shell script takes care of downloading the files and then extracting the specific data files we're interested in. NHDPlus is a fantastic resource if you're interested in mapping water in the United States. Note by default the script only downloads data for California; edit the script if you want the entire US.

  • dataprep/importNhd.sh imports the NHDPlus data into PostGIS and prepares it for serving. This script borrows ideas from Seth Fitzsimmons' NHD importer. Note that detailed output is logged to a file named /tmp/nhd.log.*, see the first line of script output for details. The steps this script takes are:

    1. Create a database named rivers

  • Import NHDFlowline shapefiles into a table named `nhdflowline`
  • Import PlusFlowlineVAA DBF files into a table named `plusflowlinevaa`
  • Run `processNhd.sql` to create a table named `rivers`
  • Run `mergeRivers.py` to create a table named `merged_rivers`
    • dataprep/processNhd.sql prepares the imported data to a format more tailored to our needs. It makes a new table named rivers which joins the geometry from NHDFlowline with metadata such as river name, reach code, and Strahler number from PlusFlowlineVAA. It has about 2.7 million rows for the whole US. (NHDFlowline has nearly 3 million rows; flowlines which have no comid in PlusFlowlineVAA are discarded.)

    • dataprep/mergeRivers.py optimizes the data by merging geometry. NHD data has many tiny little rows for a single river. For efficiency we merge geometries based on river ID and the HUC8 portion of the reach code. The resulting merged_rivers table has about 330,000 rows. This step is complex and not strictly necessary — TileStache can serve the geometry in the rivers table directly. But the resulting GeoJSON is large and slow to render; merging each river into a single LineString or MultiLineString results in vector tiles roughly one tenth the size and time to process.

    • server/serve.sh is a simple shell script to invoke Gunicorn and the TileStache webapp and serve it at http://localhost:8000/. In a real production deployment this should be replaced with a server management framework. (It's also possible to serve TileStache via CGI, but it's terribly slow.)

    • server/gunicorn.cfg.py is the Gunicorn server configuration. There's very little here in this example, Gunicorn has many configuration options.

    • server/tilestache.cfg sets up TileStache to serve a single layer named rivers from the merged_rivers table, backed by a cache in /tmp/stache. It uses the VecTiles provider, the magic in TileStache that takes care of doing PostGIS queries and preparing nicely cropped GeoJSON tiles. At this layer we start making significant cartographic decisions.

    Cartographic decisions

    Some cartographic decisions are made on the server side. The TileStache VecTiles configuration contains an array of queries that return results at different zoom levels. At high zoom levels (say z=4) we only return rivers which are relatively big, those with a Strahler number of 6 or higher. At finer grained zoom levels we return more and smaller rivers. This per-zoom filtering both limits the bandwidth used on large scale maps and prevents the display from being overcluttered. Rendering zillions of tiny streams can be quite beautiful, but also resource intensive.

    VecTiles also simplifies the geometry, serving only the precision needed at the zoom level. You can see this in action if you watch it re-render as you navigate; rivers will start to grow more bends and detail as you zoom in. TileStache does that for us automatically.

    Credits

    Nelson Minar developed the original river visualization and some/much of the writing above about that. I have added all the real-time connections, colour and size styling, county buffer analysis, and NLCD tiles.

  • About

    Real-time and historical visualization of US hydrology

    Topics

    Resources

    License

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published