Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GEOS-11284] Promote community module "datadir catalog loader" to core #7421

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

groldan
Copy link
Member

@groldan groldan commented Feb 12, 2024

GEOS-11284 Powered by Pull Request Badge

DataDirectoryGeoServerLoader is a faster alternative to DefaultGeoServerLoader, especially over network drives like NFS shares.

It was created in 2022 for the GeoServer Cloud project and then incubated in GeoServer as a community module.

Since then it's been used in several GeoServer Cloud production deployments, and now it's being moved to GeoServer's main module as the new default catalog and config loader for data directories.

This loader parallelizes both I/O calls and parsing of Catalog and Config info objects,
minimizing I/O calls as much as possible, trying to make a single pass over the workspaces
directory tree, and loading both catalog and config files in one pass.

The point is that large Catalogs contain several thousand small XML files that need to be read
and parsed, and network shares (NFS in particular) are really bad at serving lots of small files.

Though this is the default data directory loader since GeoServer 2.25, it can be disabled (falling
back to DefaultGeoServerLoader), through the datadir.loader.enabled=false or
DATADIR_LOADER_ENABLED=false System Property or environment variable.

The loading process is multi-threaded, and will take place in an work-stealing Executor whose
parallelism is determined by an heuristic resolving to the minimum between 16 and the
number of available processors as reported by Runtime#availableProcessors().

The parallelism level can also be overridden through the environment variable or system
property DATADIR_LOAD_PARALLELISM. A value of zero or less will produce a warning and
fall back to the default value heuristic mentioned above.

Checklist

For core and extension modules:

  • New unit tests have been added covering the changes.
  • Documentation has been updated (if change is visible to end users).
  • The REST API docs have been updated (when changing configuration objects or the REST controllers).
  • There is an issue in the GeoServer Jira (except for changes that do not affect administrators or end users in any way).
  • Commit message(s) must be in the form [GEOS-XYZWV] Title of the Jira ticket.
  • Bug fixes and small new features are presented as a single commit.
  • Each commit has a single objective (if there are multiple commits, each has a separate JIRA ticket describing its goal).

@groldan groldan force-pushed the GEOS-11284 branch 6 times, most recently from c2eb0ae to 26877f3 Compare February 15, 2024 03:00
@groldan groldan marked this pull request as ready for review February 15, 2024 04:18
@@ -1035,7 +1035,7 @@ public void syncTo(CatalogFacade dao) {
if (dao instanceof DefaultCatalogFacade) {
// do an optimized sync
DefaultCatalogFacade other = (DefaultCatalogFacade) dao;

Catalog catalog = other.getCatalog();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this fixes a pre-existing issue by which all loaded StoreInfos and ResourceInfos end up with a reference to the discarded Catalog, which is then never garbage collected.
Everything kept working as expected by accident, due to the fact that CatalogImpl.sync() will take ownership of the discarded catalog internals (resource pool, facade's CatalogInfoLookup's, etc)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense.

Copy link
Member

@aaime aaime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started the review on Monday but could not complete it. Since I'm not in a position to complete it this week (some emergencies popped up) I'm leaving the few comments I've gathered so far, as a starting point, but will need to finish the work later.

If you prefer to work through the feedback in a single go, you can safely ignore the comments for the moment, hopefully I'll manage to complete this next Monday.

@@ -218,7 +219,7 @@ protected void resolve(StoreInfo store) {
protected void resolve(ResourceInfo resource) {
setId(resource);
ResourceInfoImpl r = (ResourceInfoImpl) resource;

r.setCatalog(getCatalog());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential inconsistency: the catalog has been set in the StorInfo and in the ResourceInfo, but not in the resolve for all other CatalogInfo(s). Is there a good reason to keep it this way? Or is it a leftover from the catalog sync work?

@@ -1035,7 +1035,7 @@ public void syncTo(CatalogFacade dao) {
if (dao instanceof DefaultCatalogFacade) {
// do an optimized sync
DefaultCatalogFacade other = (DefaultCatalogFacade) dao;

Catalog catalog = other.getCatalog();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense.

Comment on lines 75 to 76
static final String SYSPROP_KEY = "datadir.loader.enabled";
static final String ENVVAR_KEY = "DATADIR_LOADER_ENABLED";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GeoServer code normally checks for the same property, written in the same way, in both system and environment (and servlet context). Here you're allowing for a different syntax for the ENV case.
That's not something we have done anywhere else in GeoServer.
I'll also note that DATADIR_LOAD_PARALLELISM comes in only one syntactical form (which is fine, it's having it in two that's uncommon).

All these properties should receive documentation, we cannot expect users to hunt down their value and usage in the source code. I'd note them in the upgrade guide for starters, but also somewhere in the "running a production environment" section, probably in "data considerations"

// GeoServerExtensions.getProperty() doesn't check the Environment property
// doing it here, with lower priority than env variables and system properties,
// so it also works with GeoServer Cloud's externalized configuration
value = context.getEnvironment().getProperty(prop);
Copy link
Member

@aaime aaime Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc above says or {@code DATADIR_LOADER_ENABLED=false} System Property or environment variable.
But looking at the Spring javadoc, it seems the properties can be coming from a wider spectrum, covering all of GeoServerExtension capabilites, and in addition, JNDI as well: https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/core/env/Environment.html

Copy link
Member

@aaime aaime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more notes.

I've also tested interactively with a data directory that's really a dumpster, and managed to just lock up GeoServer for good, the parallel load never completes.
Setting up -Ddatadir.loader.enabled=false instead, the classic loader reports the same errors, but manages to complete the startup without further issues (end result, a working geoserver where a few layers are not functioning).

Here is some logs and a jstack:


@Before
public void setup() {
GeoServerExtensionsHelper.setIsSpringContext(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the tests setting up configuration, but not clearing it. Isn't there a chance of the configurations being set causing unintended interactions among tests? I was expecting to see also a GeoServerServerExtensionsHelper.clear() here.

* Test suite for {@link org.geoserver.config.DataDirectoryGeoServerLoader}, first creates the test
* data using the {@link org.geoserver.config.DefaultGeoServerLoader} and then verifies {@link
* org.geoserver.config.DataDirectoryGeoServerLoader} produces the same results.
*/
@TestSetup(run = TestSetupFrequency.REPEAT)
public class DataDirectoryGeoServerLoaderTest extends GeoServerSystemTestSupport {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the test compares the two catalogs only by contents size. Wondering if it would make sense to compare contents by equality, to check it's actually the same? Or is there some reason why that would not work?

return Optional.ofNullable(info);
}

private static XStreamPersister getXstream(Catalog catalog) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static XStreamPersister getXstream(Catalog catalog) {
private static XStreamPersister getXStream(Catalog catalog) {

Comment on lines +48 to +52
/**
* Holds per-thread {@link XStreamPersister}s. No need to call {@link ThreadLocal#remove()}
* because the calling thread dies with the managed {@link ForkJoinPool} used to call {@link
* #depersist}.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comment, I was just wondering about cleanup 🤣

@@ -3,7 +3,7 @@
* application directory.
*/

package org.geoserver.catalog.datadir.internal;
package org.geoserver.config.internal;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code in this class, all the "save" methods look odd... saving is normally related to permanently persisting information on disk, while this is adding it into the catalog. I would find it more intuitive if all these methods were called "add" instead (that's what they end up calling in the catalog in the end).

import org.geoserver.config.util.XStreamServiceLoader;
import org.geoserver.ows.util.OwsUtils;
import org.geoserver.platform.resource.FileSystemResourceStore;
import org.geoserver.platform.resource.Resource;
import org.geoserver.platform.resource.Resources;
import org.geotools.util.logging.Logging;

/** @since 2.25 */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note as with the catalog config loader, regarding the "save" method.

…gLoader subclass

Retrofit DataDirectoryGeoServerLoader as a DefaultCatalogLoader
subclass, now it's no longer a community module and can make protected
GeoServerLoader.readCatalog(XStreamPersister):Catalog
…rProxy

Let GeoServerLoaderProxy decide whether to instantiate a
DefaultGeoServerLoader or a DataDirectoryGeoServerLoader based on
externalized configuration, and remove the spring @configuration class
for DataDirectoryGeoServerLoader.

This preserves the behavior of allowing only one GeoServerLoader bean
contributed by an external plugin. Otherwise the jdcconfig one would
fail.
* Add missing StyleInfo.setCatalog(Catalog) call in
  AbsractCatalogFacade.resolve(StyleInfo)
* Rename config property datadir.load.parallelism as
  datdir.loader.parallelism for consistency with datadir.loader.enabled,
  and make it either a config property (i.e. lower.case and
  dot-separated) or an environment variable (i.e. UPPER_CASE)
* Fix logic to dispose the catalog loader when both the Catalog and
  GeoServer have been loaded.
* Rename CatalogConfigLoader.save(...) to CatalogConfigLoader.add(...)
  for correct semantics.
* Rename method getXstream() to getXStream()
* Clear GeoServerExtensionsHelper (init(null) does it all) after tests
  that use it.
@aaime
Copy link
Member

aaime commented Mar 18, 2024

I see there has been a recent push, and some test failures:

Error: 8,790 [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.563 s <<< FAILURE! - in org.geoserver.config.DataDirectoryGeoServerLoaderEnablementTest
Error: 8,790 [ERROR] testDisabledWithEnvVariable(org.geoserver.config.DataDirectoryGeoServerLoaderEnablementTest)  Time elapsed: 0 s  <<< FAILURE!
java.lang.AssertionError
	at org.geoserver.config.DataDirectoryGeoServerLoaderEnablementTest.testDisabledWithEnvVariable(DataDirectoryGeoServerLoaderEnablementTest.java:61)

Do you think the lockups during startup are gone? Should I retry?

@petersmythe
Copy link
Contributor

petersmythe commented Mar 19, 2024

Oh dear, I thought that this had already been added to 2.25.x and wrote this for the release note. Parking it here for when the work is completed.

datadir Catalog loader promoted to core

One such community module has just been promoted to the core GeoServer codebase.

It was created in 2022 for the GeoServer Cloud project and then incubated in GeoServer as a community module. It is now being moved to GeoServer's main module as the new default catalog and config loader for data directories as it is much faster that the original loader.

It can still be disabled (falling back to DefaultGeoServerLoader), through the datadir.loader.enabled=false or
DATADIR_LOADER_ENABLED=false System Property or environment variable.

Thanks to Gabriel Roldan for contributing this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants