Skip to content

SQL Encoding of Vendor‐Provided Custom Functions

Carsten Klein edited this page Jan 16, 2024 · 2 revisions
  • Contact: cklein05
  • Tracker: none yet (need one?)
  • TLDR: Simple way to enable SQL encoding for custom vendor provided functions

Description

GeoTools has a factory and SPI-based way to extend its set of filter functions described in Function Tutorial and also in Factory Tutorial. This extension point greatly helps software system vendors to implement project specific functions without needing too much knowledge of GeoTools' internals.

Goal

However, there's currently no simple way to make these custom functions being encoded in SQL, that is, these functions cannot be run directly on the data store's database (e.g. if the store's encode functions option is enabled). The goal of this proposal is to provide an easy way to contribute custom filter functions that can be encoded in SQL, and so can be run directly on the store's database.

Use Cases

Use cases are features like full text search, fuzzy or similarity search (e.g. tri-gram based search algorithms) and many other project specific SQL functions.

Internal Core Functions

GeoTools provides a fixed set of general core filter functions that are available with all data stores (like greaterEqualThan, env, atan, strEndsWith, etc.). All these internal core functions always have a Java implementation. Depending on the used data store, some of these functions can be encoded in SQL and then run directly on the store's database. Currently it is hard-coded in each store's implementation, whether a function can be encoded in SQL (and how) or not. Typically this is done in a class named ´XXXFilterToSQL´ (XXX indicating the store type).

For example, the DB2 data store does this in class DB2FilterToSQL in method createFilterCapabilities. The PostGIS data store has moved this method (and others) into helper class FilterToSqlHelper.

In all cases, the createFilterCapabilities method returns a FilterCapabilities instance, which is responsible to determine which Filters support SQL encoding. For that, the FilterCapabilities class maintains a list of Filter classes, that can be encoded in SQL. Its supports(Filter) method simply checks whether the passed Filter class or any of its implemented interfaces is contained in that list.

Each data store's createFilterCapabilities then just has to register those core functions, it wants to enable SQL encoding for, by passing their implementing class to the FilterCapabilities.addType() method.

That way, each data store has its own central place to pick only those functions for SQL encoding, that the underlying database can handle. This concept works quite well for the internal core functions and shall not be changed by this proposal.

Custom Vendor-Provided Functions

By using the environment and structures present for the core functions already, a custom function could quite easily be marked for SQL encoding with the help of a new tag interface SQLEncodableFunction (or likewise). The new interface must unconditionally be added to the FilterCapabilities class in each participating data store's createFilterCapabilities method. Then, if a software vendor implements a custom function that should be encoded in SQL, the function's class must just implement that new tag interface. Basically, that's it.

On the devel mailing list I was discussing with Andrea about that interface. The preferred option was to create distinct tag interfaces, one for each data store. This will end in having interfaces org.geotools.data.db2.DB2SQLEncodableFunction, org.geotools.data.postgis.PostGISSQLEncodableFunction, etc. Actually, since such custom functions are often very project specific and target a single store/database only, a single core interface org.geotools.data.jdbc.SQLEncodableFunction may be sufficient as well.

Limitations

SQL encoding assumes that the function must be called like function_name(arg1, arg2, ..., argN) in SQL. There is no custom syntax transformation available (as it is for the core functions).

Coverage

The proposed changes should be applied to all SQL database based data stores, which are: (in opinion based order of importance)

Optional Extensions and Features

Hide those functions from GetCapabilities response

Andrea also suggested to prevent such custom functions from being advertised in the WFS GetCapabilities response. This could make sense, since those functions typically are no (as he called them) first class functions: likely, they do not work with all stores and most of them won't have a proper Java implementation (e.g. can't implement PostgreSQL's Full Text Search in pure Java).

As an example, Andrea mentioned interface org.geotools.api.filter.NativeFilter, which is quite similar to this proposal's new tag interface(s)). However, I did not yet find an existing mechanism that would prevent NativeFilter instances from being exposed in the GetCapabilities response.

Since, in general, a client may always require that filter functions are advertised in the GetCapabilities, I recommend letting this decision up to the vendor/implementer of the custom function. Maybe another tag interface org.geotools.api.filter.NotExposedFilterFunction (or the like) could do the trick.

Perform SQL encoding independently from the store's encode functions option

In FilterToSqlHelper, Function classes are only added to the FilterCapabilities, if the store's encode functions option is enabled. Especially for custom functions that have no Java implementation but only work when encoded in SQL (e.g. PostgreSQL's Full Text Search, etc.), it could be desirable to register them with the FilterCapabilities, even if the store's encode functions option is not enabled. There are two options to achieve this:

  1. Function classes implementing the proposed tag interface(s) always get encoded in SQL, no matter what the store's encode functions option is set to (that is, the registration of the tag interface happens out of the if (encodeFunctions) { branch).

    Actually, I guess, that's not the best idea...

  2. Using two distinct tag interfaces, SQLEncodableFunction and ForcedSQLEncodedFunction. The first is registered inside the if (encodeFunctions) { branch, the latter one out of it.

References:

Status

Choose one of:

  • Under Discussion
  • In Progress
  • Completed
  • Rejected,
  • Deferred

Voting:

  • Andrea Aime:
  • Ian Turton:
  • Jody Garnett:
  • Nuno Oliveira:
  • Simone Giannecchini:
  • Torben Barsballe:

Tasks

  1. Update implementation
  2. Verify with test case
  3. Documentation changes

API Change

One or more new interfaces (still to be discussed)

Clone this wiki locally