Skip to content

Restructure GeoTools into Jigsaw modules

Ian Turton edited this page Oct 25, 2018 · 24 revisions

Description

One of the most difficult tasks identified for Java 9 Compatibility is refactoring the GeoTools library to be compatible with the Jigsaw module system.

This proposal covers:

  1. Providing an automatic module name for each jar

  2. Refactoring the codebase to avoid package conflicts

References:

Automatic Module bridge to CLASSPATH

The new module system is a three lane road:

  • CLASSPATH: This is the unstructured free-for-all we know and love.
  • MODULE PATH Automatic Modules: This is the middle lane, jars are handled as modules with a couple of assumptions (a name is assigned, and all their public packages are published). Since they are in the middle lane they can merge left or right - accessing anything on the CLASSPATH ... and anything on the MODULEPATH!
  • MODULE PATH Normal Modules: This is the restricted world where jars have a module-info.java file documenting exactly what they publish, what reflection they allow, and importantly what other modules they require. These modules can depend on Automatic modules in the middle lane, but cannot access the CLASSPATH

Using automatic modules as a bridge to the CLASSPATH, the long-term vision for the GeoTools library is to use this three-lane system out our advantage. A normal module like gt-renderer defines a SPI interface ExternalGraphicFactory. It makes use of SPI to locate implementations that have been shared using ServiceLocator. The gt-svg automatic-module provides an implementation of ExternalGraphicFactory making use of the apache batik library. As an automatic module gt-svg can access anything on the CLASSPATH and the Factory isAvalable() method will check if batik can be found, with the specifics reported by getAvailableStatus(). It can also see all public MODULEPATH packages giving it a chance to implement ExternalGraphicFactory.

GeoTools Jars

Propose the following methods for GeoTools Factory:

interface Factory {
   Map<RenderingHints.Key,?> getImplementationHints();
   default boolean isAvailable() { return true; }
   default String getAvailableStatus() { return null; }
}   

Use of Java 8 MANIFEST.MF Automatic Module Name

There is a catch-22 for use of the module system known as the module hell problem - although steps have been taken to make the transition easier it is important that java libraries such as GeoTools make a release as early as possible that includes an automatic module name (even if they still build with Java 8). This allows downstream projects to transition to Jigsaw and not be held up by the GeoTools project.

Worst case senario the java open source community is looking at a Python 2 vs Python 3 split where old projects are stuck on Java 8 unable to be used by modern code

Given this landscape projects have three approaches to be used on the MODULEPATH:

Do nothing, this works for single jar projects. The jar can be added to the module path and used by depending on using a module name based on the jar filename. This is how untouched jars built in Java 8 appear when placed on the MODULEPATH.

To use GeoTools an example application module-info.java lists:

  module example.application {
    requires gt-opengis-21.0;
    requires gt-main-21.0;
    requires gt-jdbc-postgis-21.0;
  }

As shown above this approach ends up including the version number in the dependency. As module dependencies are transitive it makes subsequent updates very difficult to orchestrate.

Provide an automatic module name using a MANIFEST.MF entry, the provides stable module name for use in Java 11, while still being built in Java 8.

To use GeoTools an example application module-info.java lists:

  module example.application {
    requires org.geotools.api;
    requires org.geotools;
    requires org.geotools.jdbc.postgis;
  }

This is the approach we intend to use for the restructure. It is also the approach we intend to use long term for plugins (since an automatic module can act as a bridge between the module system and JDBC Driver on the CLASSPATH).

Make a selective build, that builds a normal Java 8 jar as gt-main-21.0-java8.jar, and a Java 11 jar as gt-main-21.jar which includes a module-info.java file.

To use GeoTools an example application module-info.java lists:

  module example.application {
    requires org.geotools.api;
    requires org.geotools;
    requires org.geotools.jdbc.postgis;
  }

The use of module-info.java would result in less disruption to the codebase, as it allows some packages to remain unpublished (only accessible internally or via an SPI factory). This is a tradeoff for consideration by the PSC.

Propose the follwing changes to pom.xml files:

    <build>
         <plugins>
             <plugin>
                 <artifactId>maven-jar-plugin</artifactId>
                 <configuration>
                   <archive>
                     <manifestEntries>
                       <Automatic-Module-Name>org.geotools.geotools.api</Automatic-Module-Name>
                     </manifestEntries>
                   </archive>
                 </configuration>
             </plugin>
         </plugins>
     </build>

Refactor Core GeoTools Library

Our architecture diagram shows a split between the jars providing key public API and the implementing modules.

GeoTools Jars

This may be further shown as maven dependences:

GeoTools Dependencies

This approach will not work on the MODULEPATH, as no two jars can "publish" the same public package.

The first layer of our architecture shows several conflicts between the API defining a package, and the implementation making use of the same package. As an example gt-metadata publishes org.geotools.metadata.iso.spatial, this prevents gt-referencing from loading.

GeoAPI Implementation Conflicts

The second layer shows additional conflicts between gt-api publishing packages, and implementations trying to provide abstract classes to help implementors, and facade classes to assist with ease-of-use. As an example gt-api publishes org.geotools.data, preventing gt-main from loading.

GeoTools Implementation Conflicts

In the above diagrams noted conflicts are marked in bold.

Considered: Removing the gt-opengis and gt-api jars and distribute the interfaces and abstract classes along side the implementations. Goal is to cause the least disruption, and not change classnames so that "organize imports" can fix any problems. Ideally we can record the refactor in an IDE, and replay it for each affected downstream codebases.

GeoTools Refactored

Redistribute gt-api packages to avoid split-modules

gt-main:

  • org.geotools.data
  • org.geotools.data.simple
  • org.geotools.factory
  • org.geotools.feature
  • org.geotools.feature.collection --> org.geotools.feature (for deprecated RandomFeatureAccess)
  • org.geotools.filter
  • org.geotools.filter.expression
  • org.geotools.geometry --> org.geotools.geometry.iso
  • org.geotools.geometry.jts
  • org.geotools.styling --> org.geotools.style

gt-metadata:

  • org.geotools.util (for deprecated ProgressListener)
  • org.geotools.util --> org.geotools.util.convert (for geotools converters api)

Restructure and repackage core library to avoid split-modules

restructure gt-metadata into:

  • gt-util - focused on java helper classes and integration work like Factory and LazySet
  • gt-metadata - implementing org.opengis.metadata interfaces

Refactor packages to avoid split-modules (just listing changes):

  • gt-util: org.geotools.util

    Java helper classes and integration utilities, including our Factory SPI support from gt-metadata. Can gather up non-metadata api like console, math, etc...

    • org.geotools.factory --> org.geotools.util.factory
    • org.geotools.factory --> org.geotools.util
    • org.geotools.resources --> org.geotools.util
    • org.geotools.util --> org.geotools.util
  • gt-referencing: org.geotools.referencing

    No major changes, utility classes moving to util folder.

    • org.geotools.resources --> org.geotools.referencing.util
    • org.geotools.resources.geometry --> org.geotools.geometry.util
  • gt-main: org.geotools

    • org.geotools.geometry.coordinatesequence --> org.geotools.geometry.jts.coordinatesequence
    • org.geotools.geometry.text --> org.geotools.geometry.jts.text
    • org.geotools.renderer --> org.geotools.data.util (can move to gt-data)
    • org.geotools.renderer.crs --> org.geotools.renderer.crs (can move to gt-referencing)
  • gt-data: org.geotools.data

    no change

  • gt-jdbc: org.geotools.jdbc

    org.geotools.sql --> org.geotools.jdbc.util

  • gt-coverage: org.geotools.coverage

    • org.geotools.resources.coverage --> org.geotools.coverage.util
    • org.geotools.resources.image --> org.geotools.image.util
  • gt-renderer: org.geotools.renderer

    • org.geotools.renderer.i18n --> org.geotools.renderer.resources
    • org.geotools.renderer.style.shape --> org.geotools.renderer.util
    • org.geotools.referencing.piecewise -> org.geotools.renderer.lite.gridcoverage2d
    • org.geotools.legend --> org.geotools.map.legend
    • org.geotools.map.direct --> org.geotools.map
    • org.geotools.map.event --> org.geotools.map
    • org.geotools.renderer.windbarbs --> org.geotools.renderer.style.windbarbs
    • org.geotools.renderer.markwkt --> org.geotools.renderer.style.markwkt
  • gt-cql: org.geotools.cql

    • org.geotools.filter.function --> (moves to gt-main)

Strictly repackage plugins and extensions

Individual plugins may also run into split-package conflicts, most often when they provide their own utility class or filter. In each case the implementation the core-library gets priority and the plugin is repackaged appropriately.

The vast majority of plugins do not require repackaging:

  • gt-shapefile: org.geotools.data.shapefile
  • gt-geotiff: org.geotools.gce.geotiff

A few examples do exist:

  • gt-epsg-hsql: org.geotools.referencing.epsg (locked!)

    In these cases we may need to adjust the package visibility used to provide privileged access to these implementations. If we were using normal modules we could grant additional access, as it is we are using automatic modules and will need to make any required API public.

Restructure core library into smaller modules

gt-util split into:

  • gt-util - java creature comforts like LazySet and null safe equals
  • gt-logging - the logging redirection code
  • gt-factory - the service locator / service registry code representing the geotools "plugin" system

gt-referencing split into:

  • gt-geometry-iso - may be possible to split these out (although geometry CoordinateReferenceSystem may tie them together)
  • gt-referencing - coordinate reference system care and feeding

gt-main split into:

  • gt-main: common factory finder, depends on others (due to transitive dependencies downstream code unaffected by split)
  • gt-filter
  • gt-feature
  • gt-geometry-jts
  • gt-ows - open web service data model used by gt-wms, gt-wmts, etc...
  • gt-xml - some code moves to gt-xml
  • gt-util - some code moves to gt-util

gt-coverage:

  • gt-image - all the org.geotools.image packages
  • gt-coverage - all the org.geotools.coverage packages

Assigned to Release

GeoTools 21

Status

Choose one of:

  • Under Discussion
  • In Progress
  • Completed
  • Rejected,
  • Deferred

Voting:

  • Andrea Aime:
  • Ian Turton: +1
  • Jody Garnett: +1
  • Nuno Oliveira: +1
  • Simone Giannecchini:
  • Torben Barsballe: +1

Tasks

  1. Adjust definition of Factory isAvailable() and isAvailableStatus()

  2. redistribute gt-api classes to gt-metadata and gt-main

  3. restructure gt-metadata into gt-metadata and gt-util

  4. core library refactor packages to avoid split-modules

  5. plugins and extensions packages refactor to avoid split-modules

  6. restructure remaining library packages into smaller modules

  7. Limited documentation restructure.

    Moving pages to match the revised library structure. Goal is to maintain working code-examples showing library functionality (even if the text is out of date).

  8. Documentation update instructions.

    This refactor goes beyond search and replace, provide IDE specific update instructions to fix imports.

Out of scope:

  • refactor gt-opengis into org.geotools packages and redistribute across library modules.

API Change

MANIFEST.MF Automatic module name

Changes to pom.xml:

    <build>
         <plugins>
             <plugin>
                 <artifactId>maven-jar-plugin</artifactId>
                 <configuration>
                   <archive>
                     <manifestEntries>
                       <Automatic-Module-Name>org.geotools.geotools.api</Automatic-Module-Name>
                     </manifestEntries>
                   </archive>
                 </configuration>
             </plugin>
         </plugins>
     </build>

Factory isAvailable and getAvaialbleStatus

To better act as a bridge with the CLASSPATH the following change is proposed to the GeoTools factory interface.

Factory interface extended with default methods for isAvailable() and getAvailableStatus() methods:

interface Factory {
   Map<RenderingHints.Key,?> getImplementationHints();
   default boolean isAvailable() { return true; }
   default String getAvaialbleStatus() { return null; }

Allows modules to both check and report back if they found their required dependencies on the CLASSPATH.

Out of scope: Distribute gt-opengis interfaces across library in org.geotools packages

Before:

import org.opengis.filter.Filter;
import org.opengis.feature.Feature;
import org.opengis.feature.simple.SimpleFeature;
import org.opengis.feature.type.FeatureType;

After:

import org.geotools.filter.api.Filter;
import org.geotools.feature.api.Feature;
import org.geotools.feature.api.SimpleFeature;
import org.geotools.feature.api.FeatureType;

As shown during this transition the package hierarchy will be flattened if appropriate. This change is an alternative to [Resolve-GeoAPI-3.0.0-Incompatibilities] which has not attracted a contributor, and is likely to cause considerably more conflict on the MODULEPATH.

Update: This has been marked out of scope due to disruption to downstream projects, planning for this activity has been completed as it affects our ability to restructure the core library modules

Links:

Clone this wiki locally