Skip to content

Refactor ImageMosaic for extensibility (Defunct)

Devon Tucker edited this page Feb 23, 2017 · 1 revision

Description

This proposal is no longer being developed. It was eventually broken into a proposal with a much smaller scope and completed in 16.x

The gt-mosaic plugin provides a infrastructure to mosaic raster data both spatially as well as across other dimensions like time and elevation. This proposal isolates and documents several gt-imagemosaic classes for reuse, allowing these classes to be used to make additional coverage formats.

This refactoring introduces and documents interfaces (and default implementations) to configure behaviour:

  • granule index creation and harvesting refactor
    • for customized index creation
    • customize index attribute generation (used during index creation or when individual granules are added)
    • different harvest strategy (example use for mask generation, navigating deep folder structures)
  • support reprojection of rasters into common CRS for mosaic operation
  • support on-the-fly processing on mosaic granules (e.g. artifact filtering)

These API updates open up gt-imagemosaic for additional use-cases that may not be appropriate for the mainstream plugin, but are of interest to users:

  • Use Case 1: Support on-the-fly reprojection allowing granules in different spatial reference systems to be combined
  • Use Case 2: Customized granule harvesting allowing recursive folder traversal during harvesting
  • Use Case 3: Customized index generation allowing additional attributes to be generated during harvesting

References:

Break this into three

  1. One refactor indexing - to control index schema generation (and property generation to collect the values).

    Additional properties harvested by code (geotiff metadata, side carfile files, filename naming convention etc...). May consider some ease-of-use changes such as determining the schema based on configured property collectors.

  2. Refactor to support dynamic processing of granular information (for reprojecting, mix data with different characteristics, color models).

    This is a tradeoff with preprocessing time needed. Lots of good suggestions about group granules together for processing and then mosaic the results together.

  3. Harvest flexibility.

    Additional options for harvesting granules (traverse multiple directories, deep traversal of a directory tree). Rescan a directory may already be handled.

    Out of scope but interesting - consider moving combining these forces async tasks to schedule harvesting task (if that goes ahead).

Not in scope - existing functionality:

  • GUI feedback during initial harvesting (which takes five to ten mins).

  • Preprocessing (overviews, artifact detection, reproject during ingest, compress, retile) prior to harvesting.

    We have a design question if preprocessing if data prep happens during harvesting or happens outside of GeoServer before harvesting.

    The REST to work with the index (similar to WFS) and ask GeoServer to harvest a directory or individual file. This assumes data prep happens before harvesting.

    The alternative is to ask the GeoServer Importer (GUI or REST API) to do the preprocessing step (you can use gdaaddo and others to prep the granules). Original assumption was that this could be used only once when preparing the initial image mosaic - turns out it can be used to add granules to an existing mosaic.

Status

Choose one of:

  • Under Discussion
  • In Progress
  • Completed
  • Rejected
  • Deferred

Voting:

  • Andrea Aime
  • Ben Caradoc-Davies
  • Christian Mueller
  • Ian Turton
  • Justin Deoliveira
  • Jody Garnett
  • Simone Giannecchini

Tasks

This section is used to make sure your proposal is complete (did you remember documentation?) and has enough paid or volunteer time lined up to be a success. Use initials to indicate volunteer, or ⚠️ where volunteer is needed. Proposals that lack resources to be successful are unlikely to be approved.

  1. DT: Introduce API on feature branch image_mosaic_api

    • JG: Initial review
    • ⚠️ Final review and merge (subject to this proposal being accepted)
  2. DT: Refactor image mosaic internals to api.

    • DT: Integration testing with full geoserver build
    • ⚠️ Review and merge
  3. DT: Rework gt-dem into gt-dynamic-moasic

    • Rename as gt-dynamic-moasic
    • migrate functionality from RnD branches to community module
    • test coverage and documentation requirements

API Change

The following API changes have been identified:

Index Generation

Refactor to completely encapsulate the index creation configuration class (inside the default implementation of MosaicIndexConfiguration) so that an alternative way of creating the index schema and content can be provided.

  • Before: The responsibility of index creation is split across several classes ImageMosaicConfigHandler, CatalogManager, CatalogBuilderConfiguration, Indexer and RasterLayerResponse (creation of SortBy) and ImageMosaicWalker (creation of the reader).

    • Creation of index schema was handled by CatalogManagerImpl.createDefaultSchema
    • List of properties collectors() provided by service provider plugin (SPI) or from configuration in the store properties file.
    • GridCoverage2DReader is created using AbstractGridFormat.findFormat method.
    • Sort by order was handled as GeneralParameterValue for the reader
  • After: This refactor gathers up index creation responsibilities into MosaicIndexConfiguration:

    • createSchema(name,crs): allows customization of index schema
    • getPropertiesCollectors(): allows greater flexibility in providing PropertiesCollectors to the index creation
    • getReader(reader): Provide clients the ability to customize coverage reader creation
    • getOrder(name): provides sort order
    /**
     * Configuration that tells us how to index a mosaic store.
     * Implementations are often responsible for a configuration file (for example xml or properties) on disk to which they   record their settings.
     */
    public interface MosaicIndexConfiguration {
        /**
         * Create a schema given a coverage name and CRS. 
         */
        SimpleFeatureType createSchema(String name, CoordinateReferenceSystem crs);
        /**
         * Used to access custom attributes during index generation.
         */
        List<PropertiesCollector> getPropertiesCollectors();
        /** 
         * Create a reader from a granule.
         */
        GridCoverage2DReader getReader(SimpleFeature reader);
        /**
         * Return the order of the granules
         */
        SortBy[] getOrder(String name);
        /**
         * Access configuration parameter (see Prop).
         */
        String getParameter(Prop prop);
    }

    CatalogManager configured with MosaicIndexConfiguration (and renamed to GranuleCatalogManager).

    /**
     * Manages GranuleCatalogs. Create them and update them with new information.
     */
    public interface GranuleCatalogManager {
        /**
         * Set the mosaic index configuration used for managing the index.
         */
        public void setMosaicIndexConfiguration(MosaicIndexConfiguration indexer);
    
        /**
         * Load or create a GranuleCatalog (use default datastore such as shapefile)
         *
         * @param coverageName the name of the coverage you want a granule catalog for
         * @param create if true create a new catalog, otherwise an existing one is loaded
         */
        GranuleCatalog loadGranuleCatalog(String coverageName, boolean create)
    
        /**
         * Load or create a GranuleCatalog from a datastore properties file
         * 
         * @param coverageName the name of the coverage you want a granule catalog for
         * @param properties the datastore properties file
         * @param create if true create a new catalog, otherwise an existing one is loaded
         * @param hints for datastore
         */
        GranuleCatalog loadGranuleCatalogFromDataStore(String coverageName, Properties properties, boolean create, Hints hints);
    
        /** 
         * Update the catalog with a new granule (and add to the index).
         *
         * @param store the granulestore this coverage is being added to
         * @param coverageName the name of the coverage the granule is added to
         * @param file the file of the new granule
         * @param reader the reader of the new granule
         * @param transaction a transaction (allows multiple updates to happen in a single transaction)
         */
        void updateCatalog(GranuleStore store, String coverageName, File file, GridCoverageReader reader, Transaction transaction);
    }

Harvesting

We want to separate the harvesting process from the mosaic creation logic, refactoring so that direct use of ImageMoasicReader is not required:

  • Before: RasterManager received a 'parent' ImageMosaicReader to retrieve data. This had the side effect of forcing the use of the ImageMosaicReader (and all its specifics about harvesting etc).
  • After: RasterManager receives an ImageMosaic class, keeping the reader out of the way

With this refactor the following interfaces and classes will take over functionality that is currently inside the ImageMosaicReader.

/**
 * Stores rastermanagers (includes GranuleStores)
 */
public class ImageMosaic {
    GranuleCatalogManager catalogManager;
    String defaultName = null;
    Map<String, RasterManager> rasterManagers = new ConcurrentHashMap<String, RasterManager>();

    /**
     *  Get the name of the default coverage
     */
    public String getDefaultCoverageName();

    /**
     * Get the rastermanager for a particular coverage
     *
     * @param name the name of the coverage
     */ 
    public RasterManager getRasterManager(String name);

    /**
     * Remove a particular coverage: remove the rastermanager and data on disk associated with this coverage
     *
     * @param name the name of the coverage to be removed
     * @param forcedelete delete any data (granules) left on disk for this coverage
     * @param checkForReferences check if any granule is referred by other coverages, and prevent their complete deletion
     */
    public void removeRaster(String name, boolean forceDelete, boolean checkForReferences);

    /**
     * Add a new coverage and create a rasterManager for it
     * 
     * @param configuration configuration for this coverage
     * @param init if the Manager should be initialized.
     */
    public RasterManager addRaster(final MosaicConfigurationBean configuration, final boolean init);

}

/**
 * Decides how to harvest a new file into the store
 */
public interface MosaicHarvester {
     /**
      * Harvest a source to a certain imagemosaic
      * 
      * @param mosaic the ImageMosaic
      * @param defaultCoverage name of the default coverage
      * @param source the source (a file, directory, ...)
      * @param hints
      */
     public void harvest(ImageMosaic mosaic, String defaultCoverage, Object source, Hints hints);
}

/**
 * Reader. Implements the proper interfaces and delegates different tasks to other classes
 */
public class ImageMosaicReader() extends AbstractGridCoverage2DReader implements StructuredGridCoverage2DReader {
     private MosaicHarvester harvester;
     private ImageMosaic mosaic;
     
     ...
}

Delegate coverage acceptance/rejection to a predicate object

Introduce ImageMosaicConfigHandler to allow plugins to control which rasters can be harvested:

  • Before: ImageMosaicConfigHandler can reject rasters from the mosaic for a number of reasons (eg. incompatible color model, different CRSs).

      ImageMosaicConfigHandler {
        void updateConfiguration() {
          ...
          ///just an example of current code
          if (Utils.checkColorModels(colorModel, palette, actualCM)) {
            eventHandler.fireFileEvent(Level.INFO, fileBeingProcessed, false, "Skipping image "
                    + fileBeingProcessed + " because color models do not match.",
                    (((fileIndex + 1) * 99.0) / numFiles));
            return;
          }
        }
      }
  • After: Whether a raster should be included or excluded from a mosaic is delegated to a predicate object.

      public interface CoverageInclusionPredicate {
        public boolean shouldAccept(GridCoverage2DReader coverage, String coverageName, File fileBeingProcessed, MosaicConfigurationBean config);
      }
      
      ImageMosaicConfigHandler {
        void updateConfiguration() {
          ...
          //delegate acceptance/rejection
          if (!shouldAccept(coverage, coverageName, fileBeingProcessed, config)) {
            return null;
          }
        }
      }

    With these API changes future clients may be able to overcome these limitations or otherwise work around them during the mosaic process.

The CoverageInclusionPredicate will then be part of the new ImageMosaic API for mosaic creation.

Pre-process Granule Footprint Before Indexing

Since one of the motivations of this API change is to introduce on-the-fly reprojection, we may need to do things such as homogenize the footprints of the granules for indexing.

  • Before: Footprints were taken directly from the coverage reader.
  • After: This interface will provides a means of pre-processing the granule footprint before indexing
  public interface FootprintProcessor {
    public Geometry getFinalFootprint(MosaicConfigurationBean config, GridCoverage2DReader coverage);
  }

This will be used in ImageMosaicConfigHandler.updateCatalog.

Generalize Mosaicking per GranuleCollector

To support something like on-the-fly reprojection we need a mechanism to collect granules that may need to be processed together before final mosaicking (for example, to mosaic all granules in a certain projection before mosaicking in the target projection). ImageMosaic already does this to a certain extent with multidimensional coverages, mosaicking for each single dimension and then mosaicking the result.

In the refactored API each GranuleCollector will contain knowledge about how to pre-mosaic its own elements(inverting the current behaviour).

  • Before: Currently RasterLayerResponse.MosaicProducer.produce() iterates through GranuleCollectors to pre-mosaic granules:

    for (GranuleCollector collector : granuleCollectors) {
      ...
      final MosaicElement preparedMosaic = new Mosaicker(collector.collectGranules(),
              MergeBehavior.FLAT).createMosaic();
      ...
    }
  • After: The refactored ImageMosaic will invert this responsibility:

      final MosaicElement finalMosaic = granuleCollector.createMosaic();

This will allow a GranuleCollector to be configured with a pre-mosaicking strategy to allow things like the current multidimensional stack mosaicking or future reprojecting mosaicking.

Update GranuleCollector to a tree-like hierarchy

  • Before: Currently GranuleCollector are just stored in a list, and the MosaicProducer iterates over this list and stops at the first one that accepts the GranuleDescriptor.

    private MosaicProducer {
        private MosaicOutput visit() {      
            ...
            for (GranuleCollector granuleCollector : granuleCollectors) {          
                if (collector.accept(granuleDescriptor)) {              
                  ...
                }
            }
        }
    }
  • After: Change to a GranuleCollector that can delegate to child collectors.

    This will simplify to:

    private MosaicProducer {
        private MosaicOutput visit() {    
          ...
            collector.accept(granuleDescriptor);
        }
    }

    The default GranuleCollector will delegate to a list of child Collectors based on Filters the way the current implementation works.

Enhance the GranuleDescriptor and GranuleCatalogVisitor interfaces

Allow GranuleDescriptor to pass on arbitrary properties to client code. Since during index creation/update we could potentially be storing any number of arbitrary properties (via the PropertyCollectors interface), we now need a way for downstream code to make use of these properties. For example, a future implementation could potentially store the raster's collection date in the mosaic index and then later use this information to make decisions at mosaic time. GranuleDescriptor will be updated to handle arbitrary properties:

  • After:

    public class GranuleDescriptor {
       Map<String, Object> getProperties();
    }

The GranuleCatalogVisitor interface could also be tightened up.

  • Before: Currently GranuleCatalogVisitor takes an arbitrary object along with the GranuleDescriptor. In practice this "companion" object is only ever Null or the Feature in question.

  • After: This interface will be updated to match that in order to have stronger static guarantees.

    public interface GranuleCatalogVisitor {
      public visit(GranuleDescriptor descriptor, Feature feature);
    }
Clone this wiki locally