Mpop 2.0 specification

Mpop 2.0 functional specification

Scenarios

Scenario 1: Interactive usage for scientific prototyping

The installation of mpop 2 should be as easy as pip install mpop2. All the necessary dependencies should be installed, and a list of possible extensions and how to install the extensions would be nice to have (ie optional dependencies). The scientific user explores the data and prototypes new algorithms. It needs access not only to the calibrated data, but also to the raw data and probably a majority of the metadata. The user would work with data locally, and it has to be easy to tell mpop 2 where the data is. Providing filename templates or editing config file before starting working is a pain, so it should be avoided.

To load the data should be a simple 1-step procedure. At load time, the user provides the data and metadata he/she needs, and if some items are unavailable/unaccessible, the user should be informed in a gentle but clear way (ie. no crash). The data and metadata available from the file have to be explorable, so that the user don’t need to guess what the (meta)data is called.

Loading the data should allow for loading small excerpts at the time, so that just an given area is covered.

The data container (scene) should be able to have different versions of the same band (different resolutions, calibrations, satellite/sensors).

In order to process the data, the basic array operations need to be available to the user in a non-cumbersome way (eg arithmetic operations). When combining datasets, the relevant metadata should be kept, while the irrelevant one should be removed (eg wavelength should not be present in the result of band 0.6um + band 0.8um).

Available composites for a given sensor/platform should be available from the command line.

Custom composites must be easy to add, either on the prompt or in a file to be loaded on the fly by mpop for the user to be able to save its work.

Visualization of the data is important and should be an easy one-line, like eg show(my_dataset). In a similar way, saving the data to disk should be simple, for example save(dataset, filename), with sensible defaults provided depending on the filename extension (eg. geotiff for .tif, netcdf for .nc). Saving several datasets at once would be nice to have.

Resampling of the dataset should be easily available, with sensible defaults. The list of available resampling methods should be available from the command line, and possibly the list of predefined areas. Defining new areas must be available both from the python prompt and in a custom file.

Coastline/graticule overlay should also be easy.

Memory footprint should be as small as possible, since the user will most likely run on a desktop computer.

Data formats that mpop 2 must support is at least the same as mpop 1, namely:

H/L-RIT format as provided by Eumetsat for the different geostationary satellites
MODIS l1b
HRPT-AAPP l1b
EPS l1b
VIIRS SDR l1b
NOAA GAC and LAC l1b
Sentinel 1 l1b
NWCSAF MSG and PPS l2 (at least CMa, CT, CTTH, CPP)

This will include concatenation of granule based data.

Scenario 2: as a library in Polar2Grid

Polar2Grid is an all-in-one pre-compiled 64-bit Linux tarball that provides individual bash scripts to create gridded images of satellite imagery data. Mpop 2.0 will act as the internal library used by a future version of Polar2Grid. Eventually Mpop 2.0 will replace all of the core functionality of Polar2Grid. Mpop 2.0 must be able to replace the existing features of Polar2Grid. Not all features must be implemented for the initial versions of Mpop 2.0, but they must not be prohibited by the design.

Polar2Grid is primarily a command line interface on top of python and may be executed by "non-programmers" or users who are unfamiliar with the details of file formats, projections, resampling, or programming languages concepts or syntax. Polar2Grid is usually called via bash scripts wrapping python where the user typically provides the frontend (reader) name, the backend (writer) name, and a series of input file paths or directories. Optionally the user can provide a list of grids/areas to be remapped to. Other command line flags allow overriding of various components settings, such as remapping parameters, output compression algorithms, or specifying what specific products (datasets) should be processed. The command line interface of Polar2Grid also allows users to print a list of available datasets that can be loaded based on what files were provided. For example, if you provide VIIRS SVI01 files to the command line script with the --list-products flag, Polar2Grid will print out products that can be created from data in the SVI01 files, but not other VIIRS products that could have been generated if other files were provided. Although Polar2Grid's usual cases can handle this decision making there is currently a limitation in the design that limits this to per-file type; meaning it does not detect whether or not a variable is available from the files provided, this is assumed to be static. Polar2Grid also requires that geolocation files be included in the list of files provided via the command line, although this has been debated among the Polar2Grid team since in some situations these geolocation files can be determined at run time.

Polar2Grid is primarily used by the direct broadcast community to provide imagery to forecasters. This usually requires that Polar2Grid provide as much data as possible from what was requested. It logs error messages when products could not be created due to a failed calculation, missing data, or some other issue, but all other request products are created and provided to the user as if nothing went wrong. Some products in Polar2Grid also have certain conditions that must be met before they are produced. For example, reflectance products from VIIRS and MODIS must be part of a scene that is at least 10% day time and should not be processed otherwise. Similarly if the user requires data be mapped to two Areas in a single execution, but the data doesn't fall in one of the Areas the products for the "empty" Area should not be output in to image files and the products in the "filled" Area should be provided. This way the forecasters will have as much information as possible and no extra time was used to process unsuccessful products and no extra bandwidth was used to send empty or mostly empty output images.

Due to this command line method of execution, Mpop 2.0 must allow core features to be configurable via either a configuration file or via high-level arguments that can be passed via a command line interface. Mpop 2.0 must provide logical builtin defaults (file configurable where possible) so that parameters that must be passed by users is limited. Parameters passed should not require deep knowledge of scientific or programming concepts. For example, due to the command line interface of Polar2Grid, passing a dictionary to a library function/method can not be required. Informative "INFO" log messages must provide progress information that can be understood by users of any experience level as processing is stepped through. This is especially important before long running processes like resampling that can take a long time.

Mpop 2.0 must provide comparable performance to the previous version of Polar2Grid. Comparable performance means that on the same hardware creating the same images from the same input data files would take about the same execution time. Memory usage is also a concern, but most users should not notice a difference in run time between non-Mpop version of Polar2Grid and Mpop versions of Polar2Grid.

Mpop 2.0 must allow for remapping to multiple areas in a simple interface ("for" loop is acceptable) regardless of what areas are being mapped to. Similarly multiple output formats should be allowed in the same processing instance, although this is not currently supported by Polar2Grid. It should also be simple to remap all datasets that were requested by the user in 1 or 2 calls from the highest level interface. This is important for the Elliptical Weighted Averaging resampling algorithm that Polar2Grid uses because it can have performance advantages to remap multiple datasets at the same time. Often times Polar2Grid needs to remap both radiance data and categorical data so Mpop 2.0 must be able to have configurable remapping parameters based on sensor or data type. For example, a cloud mask should not be resampled with an interpolation algorithm, but rather something like nearest neighbor.

One important feature that has been requested for Polar2Grid, but is not currently implemented would allow for area/grid definitions of various parameters. For example, some users only know their Area's center longitude and latitude and a radius (meters). Some users know a bounding box and a number of pixels or pixel resolution. Others know the upper-left origin and number of pixels and pixel resolution. Some users may know the geographic bounds of their Area, but know nothing about projections or what projections produce minimally distorted images for their Area. Default projections for an Area based on location will allow inexperienced users to view their data in a good looking image. Mpop 2.0 grid configurations should not prohibit these as input area definitions.

Other features not mentioned above that Polar2Grid currently provides that should be reproduced:

Read data for the following input formats:
- VIIRS SDR
- VIIRS EDR
- MODIS L1B
- Corrected Reflectance (both from external corrected reflectance software or runtime correction)
- AVHRR AAPP L1B
Remap to built in and user defined and configurable grids/areas
Remap to static areas where all parameters are specified before execution
Remap to dynamic areas where some parameters may be determined at runtime to fit the data being remapped
Remap with various algorithms (nearest neighbor, EWA, etc)
Write enhanced image data to various output formats:
- AWIPS NetCDF
- Geotiff
- Ninjotiff
- HDF5 (containing all datasets for a scene)
Geotiff writing that allows both fill value luminance band geotiff and alpha band masking
Configurable enhancements with enhancement configurations having defaults per format being written
Enhancements that may include linear, square root, histogram, piece-wise functions (multiple parts), or look up tables
True color and false color composites with more possible in the future

Scenario 3: as a library in trollduction

For the installation, mpop should be automatically installable and available from a standard place, like pypi. Fresh releases should be available with new feature short after they come out.

The memory footprint of mpop have to be stable since the python process can be running for long periods of time.

Reading the data should close the files properly as soon as the data is read.

The metadata of the datasets need to be available at save time in order to be able to save files given file patterns. The resulting filename should be available for easy copy/linking to other places.

Updates to non-core parts (eg. product configs, new composites) should be doable without restarting so that operational production is not interrupted.

Scenario 4: migration from mpop 1

Documentation/script on how to migrate custom readers to mpop 2 should be available Documentation/script on how to migrate custom composites to mpop 2 should be available.

Non goals

This version will not support the following:

backward compatibility for mpop 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly