SQL Encoding of Vendor‐Provided Custom Functions
- Contact: cklein05
- Tracker: none yet (need one?)
- TLDR: Simple way to enable SQL encoding for custom vendor provided functions
GeoTools has a factory and SPI-based way to extend its set of filter functions described in Function Tutorial and also in Factory Tutorial. This extension point greatly helps software system vendors to implement project specific functions without needing too much knowledge of GeoTools' internals.
However, there's currently no simple way to make these custom functions being encoded in SQL, that is, these functions cannot be run directly on the data store's database (e.g. if the store's encode functions option is enabled). The goal of this proposal is to provide an easy way to contribute custom filter functions that can be encoded in SQL, and so can be run directly on the store's database.
Use cases are features like full text search, fuzzy or similarity search (e.g. tri-gram based search algorithms) and many other project specific SQL functions.
GeoTools provides a fixed set of general core filter functions that are available with all data stores (like greaterEqualThan
, env
, atan
, strEndsWith
, etc.). All these internal core functions always have a Java implementation. Depending on the used data store, some of these functions can be encoded in SQL and then run directly on the store's database. Currently it is hard-coded in each store's implementation, whether a function can be encoded in SQL (and how) or not. Typically this is done in a class named ´XXXFilterToSQL´ (XXX indicating the store type).
For example, the DB2 data store does this in class DB2FilterToSQL
in method createFilterCapabilities
. The PostGIS data store has moved this method (and others) into helper class FilterToSqlHelper
.
In all cases, the createFilterCapabilities
method returns a FilterCapabilities
instance, which is responsible to determine which Filters support SQL encoding. For that, the FilterCapabilities
class maintains a list of Filter classes, that can be encoded in SQL. Its supports(Filter)
method simply checks whether the passed Filter class or any of its implemented interfaces is contained in that list.
Each data store's createFilterCapabilities
then just has to register those core functions, it wants to enable SQL encoding for, by passing their implementing class to the FilterCapabilities.addType()
method.
That way, each data store has its own central place to pick only those functions for SQL encoding, that the underlying database can handle. This concept works quite well for the internal core functions and shall not be changed by this proposal.
By using the environment and structures present for the core functions already, a custom function could quite easily be marked for SQL encoding with the help of a new tag interface SQLEncodableFunction
(or likewise). The new interface must unconditionally be added to the FilterCapabilities
class in each participating data store's createFilterCapabilities
method. Then, if a software vendor implements a custom function that should be encoded in SQL, the function's class must just implement that new tag interface. Basically, that's it.
On the devel mailing list I was discussing with Andrea about that interface. The preferred option was to create distinct tag interfaces, one for each data store. This will end in having interfaces org.geotools.data.db2.DB2SQLEncodableFunction
, org.geotools.data.postgis.PostGISSQLEncodableFunction
, etc. Actually, since such custom functions are often very project specific and target a single store/database only, a single core interface org.geotools.data.jdbc.SQLEncodableFunction
may be sufficient as well.
SQL encoding assumes that the function must be called like function_name(arg1, arg2, ..., argN)
in SQL. There is no custom syntax transformation available (as it is for the core functions).
The proposed changes should be applied to all SQL database based data stores, which are: (in opinion based order of importance)
Andrea also suggested to prevent such custom functions from being advertised in the WFS GetCapabilities response. This could make sense, since those functions typically are no (as he called them) first class functions: likely, they do not work with all stores and most of them won't have a proper Java implementation (e.g. can't implement PostgreSQL's Full Text Search in pure Java).
As an example, Andrea mentioned interface org.geotools.api.filter.NativeFilter
, which is quite similar to this proposal's new tag interface(s)). However, I did not yet find an existing mechanism that would prevent NativeFilter instances from being exposed in the GetCapabilities response.
Since, in general, a client may always require that filter functions are advertised in the GetCapabilities, I recommend letting this decision up to the vendor/implementer of the custom function. Maybe another tag interface org.geotools.api.filter.NotExposedFilterFunction
(or the like) could do the trick.
In FilterToSqlHelper
, Function classes are only added to the FilterCapabilities
, if the store's encode functions option is enabled. Especially for custom functions that have no Java implementation but only work when encoded in SQL (e.g. PostgreSQL's Full Text Search, etc.), it could be desirable to register them with the FilterCapabilities
, even if the store's encode functions option is not enabled. There are two options to achieve this:
-
Function classes implementing the proposed tag interface(s) always get encoded in SQL, no matter what the store's encode functions option is set to (that is, the registration of the tag interface happens out of the
if (encodeFunctions) {
branch).Actually, I guess, that's not the best idea...
-
Using two distinct tag interfaces,
SQLEncodableFunction
andForcedSQLEncodedFunction
. The first is registered inside theif (encodeFunctions) {
branch, the latter one out of it.
References:
Choose one of:
- Under Discussion
- In Progress
- Completed
- Rejected,
- Deferred
Voting:
- Andrea Aime:
- Ian Turton:
- Jody Garnett:
- Nuno Oliveira:
- Simone Giannecchini:
- Torben Barsballe:
- Update implementation
- Verify with test case
- Documentation changes
- API change make a note upgrading page.
- Update the user guide with code example
- Update Function Tutorial
One or more new interfaces (still to be discussed)