scarp-reduce

Sandbox for map-reduce style distributed template matching. Very much a work in progress, not for general consumption yet.

Requirements

Python
The usual suspects (pip install -r requirements.txt)
scarplet
AWS EC2, S3, EFS, CloudWatch

Core functionality:

Worker.py: Classes for matcher and reducer instances
match.py: Start and maintain template matching worker
reduce.py: Initialize reducer that reduces working results directory, then exits
reduce_loop.py: Start reducer that reduces all intermediate results, until working data directory is empty

Utilities:

launch_instaces.py: Various conveience functions for AWS EC2 instance management
monitor.py: Monitors and restarts idle instances
manage_data.py: Fetches tiles from S3 bucket in batches. Fetches entire bucket, or subset with specified starting file.

Input data

Also contains various utilities for copying files in bounding box, tiling a large GeoTIFF dataset, and padding tiles.

All processing currently relies on a filename convention based on UTM coordinates of tile bounding box, e.g. fg396_4508.tif. This is based on EarthScope survey naming conventions:

ccXXX_YYYY.fmt

where:

cc is a data code (u for unfiltered, fg for filtered ground returns only, etc)
XXX and YYYY are the most siginificant digits of the dataset's lower left corner (XXX000, YYYY000) in UTM coordinate system. In this case, I work in UTM zone 10N.

TODO

See scarplet-python issue tracker for all tasks related to scarplet project

Refactor to use rasterio instead of GDAL bindings, remove various hack-y subprocess calls
Tests, ack
Add SQS task management from private repo

Name		Name	Last commit message	Last commit date
Latest commit History 256 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Worker.py		Worker.py
copy_files_in_bbox.py		copy_files_in_bbox.py
download_results_in_bbox.py		download_results_in_bbox.py
launch_instances.py		launch_instances.py
logging.conf		logging.conf
manage_data.py		manage_data.py
mask_postprocessed_results.py		mask_postprocessed_results.py
match.py		match.py
merge_grids.py		merge_grids.py
monitor.py		monitor.py
postprocess_data.py		postprocess_data.py
preprocess_data.py		preprocess_data.py
reduce.py		reduce.py
reduce_loop.py		reduce_loop.py
s3utils.py		s3utils.py
scarputils.py		scarputils.py
settings.py		settings.py
tile_data.py		tile_data.py

License

rmsare/scarp-reduce

Folders and files

Latest commit

History

Repository files navigation

scarp-reduce

Requirements

Core functionality:

Utilities:

Input data

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Languages