Skip to content

Tools to extract cooling tower locations from aerial imagery

License

Notifications You must be signed in to change notification settings

agrc/dhhs-cooling-towers

Repository files navigation

dhhs-cooling-towers

Tools to extract cooling tower locations from aerial imagery

Prerequisites

  1. Create WMTS index
    • Run prerequisites\polars_build_offet_index.py
    • This creates a parquet file (imagery_index.parquet) that will be loaded into BigQuery by terraform
  2. Create the processing footprint
    • This was done manually in ArcGIS Pro with the following steps:
      1. Buffer Utah Census Places 2020 by 800m
      2. Query Utah Buildings down to those larger than 464.5 sq m (5000 sq ft)
      3. Select by Location the queried buildings that are more than 800m from the census places buffer
      4. Export selected buildings to a new layer
      5. Buffer the new buildings layer by 800m
      6. Combine the buffered census places and buffered buildings into a single polygon layer
      7. Simplify the combined polygon layer to remove vertices
      8. Project the simplified polygon layer to WGS84 (EPSG: 4326)
      9. Export the projected polygon layer to shapefile (processing_footprint.shp)
      10. Convert the processing footprint from shapefile to CSV with geometries represented as GeoJSON using GDAL
        • Use the process outlined in this Blog Post about loading geographic data into BigQuery
        • ogr2ogr -f csv -dialect sqlite -sql "select AsGeoJSON(geometry) AS geom, * from processing_footprint" footprints_in_4326.csv processing_footprint.shp
      11. The footprints_in_4326.csv file will be loaded into BigQuery by terraform

Data preparation

  1. Run the tower scout terraform

    • this is a private github repository
  2. Execute the two data transfers in order

  3. Execute the two scheduled queries in order

  4. Export {PROJECT_ID}.indices.images_within_habitat to GCS

    there is a terraform item for this but I don't know how it will work since the data transfers are manual and the table may not exist

    • GCS Location: {PROJECT_ID}.images_within_habitat.csv
    • Export format: CVS
    • Compression: None
  5. Using the cloud sql proxy

    1. Create a cloud sql table for the task tracking

      CREATE TABLE public.images_within_habitat (
         row_num int NULL,
         col_num int NULL,
         processed bool NULL DEFAULT false
      );
      
      CREATE UNIQUE INDEX idx_images_within_habitat_all ON public.images_within_habitat USING btree (row_num, col_num, processed);
      1. Create a cloud sql table for the results
      CREATE TABLE public.cooling_tower_results (
         envelope_x_min decimal NULL,
         envelope_y_min decimal NULL,
         envelope_x_max decimal NULL,
         envelope_y_max decimal NULL,
         confidence decimal NULL,
         object_class int NULL,
         object_name varchar NULL,
         centroid_x_px decimal NULL,
         centroid_y_px decimal NULL,
         centroid_x_3857 decimal NULL,
         centroid_y_3857 decimal NULL
      );
      1. Grant access to users
         GRANT pg_read_all_data TO "cloud-run-sa@ut-dts-agrc-dhhs-towers-dev.iam";
         GRANT pg_write_all_data TO "cloud-run-sa@ut-dts-agrc-dhhs-towers-dev.iam";
  6. Import the CSV into the images_within_habitat table

To work with the CLI locally

  1. Download the PyTorch model weights file and place in the tower_scout directory

    • Add URL
  2. Clone YOLOv5 repository from parent directory

    git clone https://github.com/ultralytics/yolov5
  3. Create virtual environment from the parent directory with Python 3.10

    python -m venv .env
    .env\Scripts\activate.bat
    pip install -r requirements.dev.txt

CLI

To work with the CLI,

  1. Create a python environment and install the requirements.dev.txt into that environment
  2. Execute the CLI to see the commands and options available
    • python cool_cli.py

Testing

Cloud Run Job

To test a small amount of data

  1. Set the number of tasks to 1
  2. Set the environment variables
    • SKIP: int e.g. 1106600
    • TAKE: int e.g. 50
    • JOB_NAME: string e.g. alligator

To run a batch job

  1. Set the number of tasks to your desired value e.g. 10000
  2. Set the concurrency to your desired value e.g. 35
  3. Set the environment variables
    • JOB_NAME: string e.g. alligator
    • JOB_SIZE: int e.g. 50 (this value needs to be processable within the timeout)

Our metrics show that we can process 10 jobs a minute. The default cloud run timeout is 10 minutes.

References for Identifying Cooling Towers in Aerial Imagery