New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GEOS-11284] Promote community module "datadir catalog loader" to core #7421
base: main
Are you sure you want to change the base?
Conversation
c2eb0ae
to
26877f3
Compare
@@ -1035,7 +1035,7 @@ public void syncTo(CatalogFacade dao) { | |||
if (dao instanceof DefaultCatalogFacade) { | |||
// do an optimized sync | |||
DefaultCatalogFacade other = (DefaultCatalogFacade) dao; | |||
|
|||
Catalog catalog = other.getCatalog(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this fixes a pre-existing issue by which all loaded StoreInfo
s and ResourceInfo
s end up with a reference to the discarded Catalog
, which is then never garbage collected.
Everything kept working as expected by accident, due to the fact that CatalogImpl.sync()
will take ownership of the discarded catalog internals (resource pool, facade's CatalogInfoLookup's, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started the review on Monday but could not complete it. Since I'm not in a position to complete it this week (some emergencies popped up) I'm leaving the few comments I've gathered so far, as a starting point, but will need to finish the work later.
If you prefer to work through the feedback in a single go, you can safely ignore the comments for the moment, hopefully I'll manage to complete this next Monday.
@@ -218,7 +219,7 @@ protected void resolve(StoreInfo store) { | |||
protected void resolve(ResourceInfo resource) { | |||
setId(resource); | |||
ResourceInfoImpl r = (ResourceInfoImpl) resource; | |||
|
|||
r.setCatalog(getCatalog()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential inconsistency: the catalog has been set in the StorInfo and in the ResourceInfo, but not in the resolve
for all other CatalogInfo(s). Is there a good reason to keep it this way? Or is it a leftover from the catalog sync work?
@@ -1035,7 +1035,7 @@ public void syncTo(CatalogFacade dao) { | |||
if (dao instanceof DefaultCatalogFacade) { | |||
// do an optimized sync | |||
DefaultCatalogFacade other = (DefaultCatalogFacade) dao; | |||
|
|||
Catalog catalog = other.getCatalog(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense.
static final String SYSPROP_KEY = "datadir.loader.enabled"; | ||
static final String ENVVAR_KEY = "DATADIR_LOADER_ENABLED"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GeoServer code normally checks for the same property, written in the same way, in both system and environment (and servlet context). Here you're allowing for a different syntax for the ENV case.
That's not something we have done anywhere else in GeoServer.
I'll also note that DATADIR_LOAD_PARALLELISM
comes in only one syntactical form (which is fine, it's having it in two that's uncommon).
All these properties should receive documentation, we cannot expect users to hunt down their value and usage in the source code. I'd note them in the upgrade guide for starters, but also somewhere in the "running a production environment" section, probably in "data considerations"
// GeoServerExtensions.getProperty() doesn't check the Environment property | ||
// doing it here, with lower priority than env variables and system properties, | ||
// so it also works with GeoServer Cloud's externalized configuration | ||
value = context.getEnvironment().getProperty(prop); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javadoc above says or {@code DATADIR_LOADER_ENABLED=false} System Property or environment variable.
But looking at the Spring javadoc, it seems the properties can be coming from a wider spectrum, covering all of GeoServerExtension capabilites, and in addition, JNDI as well: https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/core/env/Environment.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some more notes.
I've also tested interactively with a data directory that's really a dumpster, and managed to just lock up GeoServer for good, the parallel load never completes.
Setting up -Ddatadir.loader.enabled=false
instead, the classic loader reports the same errors, but manages to complete the startup without further issues (end result, a working geoserver where a few layers are not functioning).
Here is some logs and a jstack:
|
||
@Before | ||
public void setup() { | ||
GeoServerExtensionsHelper.setIsSpringContext(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the tests setting up configuration, but not clearing it. Isn't there a chance of the configurations being set causing unintended interactions among tests? I was expecting to see also a GeoServerServerExtensionsHelper.clear()
here.
* Test suite for {@link org.geoserver.config.DataDirectoryGeoServerLoader}, first creates the test | ||
* data using the {@link org.geoserver.config.DefaultGeoServerLoader} and then verifies {@link | ||
* org.geoserver.config.DataDirectoryGeoServerLoader} produces the same results. | ||
*/ | ||
@TestSetup(run = TestSetupFrequency.REPEAT) | ||
public class DataDirectoryGeoServerLoaderTest extends GeoServerSystemTestSupport { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the test compares the two catalogs only by contents size. Wondering if it would make sense to compare contents by equality, to check it's actually the same? Or is there some reason why that would not work?
return Optional.ofNullable(info); | ||
} | ||
|
||
private static XStreamPersister getXstream(Catalog catalog) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static XStreamPersister getXstream(Catalog catalog) { | |
private static XStreamPersister getXStream(Catalog catalog) { |
/** | ||
* Holds per-thread {@link XStreamPersister}s. No need to call {@link ThreadLocal#remove()} | ||
* because the calling thread dies with the managed {@link ForkJoinPool} used to call {@link | ||
* #depersist}. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good comment, I was just wondering about cleanup 🤣
@@ -3,7 +3,7 @@ | |||
* application directory. | |||
*/ | |||
|
|||
package org.geoserver.catalog.datadir.internal; | |||
package org.geoserver.config.internal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code in this class, all the "save" methods look odd... saving is normally related to permanently persisting information on disk, while this is adding it into the catalog. I would find it more intuitive if all these methods were called "add" instead (that's what they end up calling in the catalog in the end).
import org.geoserver.config.util.XStreamServiceLoader; | ||
import org.geoserver.ows.util.OwsUtils; | ||
import org.geoserver.platform.resource.FileSystemResourceStore; | ||
import org.geoserver.platform.resource.Resource; | ||
import org.geoserver.platform.resource.Resources; | ||
import org.geotools.util.logging.Logging; | ||
|
||
/** @since 2.25 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note as with the catalog config loader, regarding the "save" method.
…gLoader subclass Retrofit DataDirectoryGeoServerLoader as a DefaultCatalogLoader subclass, now it's no longer a community module and can make protected GeoServerLoader.readCatalog(XStreamPersister):Catalog
…rProxy Let GeoServerLoaderProxy decide whether to instantiate a DefaultGeoServerLoader or a DataDirectoryGeoServerLoader based on externalized configuration, and remove the spring @configuration class for DataDirectoryGeoServerLoader. This preserves the behavior of allowing only one GeoServerLoader bean contributed by an external plugin. Otherwise the jdcconfig one would fail.
…rverLaoder and improve javadocs
* Add missing StyleInfo.setCatalog(Catalog) call in AbsractCatalogFacade.resolve(StyleInfo) * Rename config property datadir.load.parallelism as datdir.loader.parallelism for consistency with datadir.loader.enabled, and make it either a config property (i.e. lower.case and dot-separated) or an environment variable (i.e. UPPER_CASE) * Fix logic to dispose the catalog loader when both the Catalog and GeoServer have been loaded. * Rename CatalogConfigLoader.save(...) to CatalogConfigLoader.add(...) for correct semantics. * Rename method getXstream() to getXStream() * Clear GeoServerExtensionsHelper (init(null) does it all) after tests that use it.
I see there has been a recent push, and some test failures:
Do you think the lockups during startup are gone? Should I retry? |
Oh dear, I thought that this had already been added to 2.25.x and wrote this for the release note. Parking it here for when the work is completed. datadir Catalog loader promoted to coreOne such community module has just been promoted to the core GeoServer codebase. It was created in 2022 for the GeoServer Cloud project and then incubated in GeoServer as a community module. It is now being moved to GeoServer's main module as the new default catalog and config loader for data directories as it is much faster that the original loader. It can still be disabled (falling back to DefaultGeoServerLoader), through the Thanks to Gabriel Roldan for contributing this work. |
DataDirectoryGeoServerLoader
is a faster alternative toDefaultGeoServerLoader
, especially over network drives like NFS shares.It was created in 2022 for the GeoServer Cloud project and then incubated in GeoServer as a community module.
Since then it's been used in several GeoServer Cloud production deployments, and now it's being moved to GeoServer's
main
module as the new default catalog and config loader for data directories.This loader parallelizes both I/O calls and parsing of Catalog and Config info objects,
minimizing I/O calls as much as possible, trying to make a single pass over the
workspaces
directory tree, and loading both catalog and config files in one pass.
The point is that large Catalogs contain several thousand small XML files that need to be read
and parsed, and network shares (NFS in particular) are really bad at serving lots of small files.
Though this is the default data directory loader since GeoServer 2.25, it can be disabled (falling
back to
DefaultGeoServerLoader
), through thedatadir.loader.enabled=false
orDATADIR_LOADER_ENABLED=false
System Property or environment variable.The loading process is multi-threaded, and will take place in an work-stealing
Executor
whoseparallelism is determined by an heuristic resolving to the minimum between
16
and thenumber of available processors as reported by
Runtime#availableProcessors()
.The parallelism level can also be overridden through the environment variable or system
property
DATADIR_LOAD_PARALLELISM
. A value of zero or less will produce a warning andfall back to the default value heuristic mentioned above.
Checklist
main
branch (backports managed later; ignore for branch specific issues).For core and extension modules:
[GEOS-XYZWV] Title of the Jira ticket
.