Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-4952 - Introduce FedXConfig overrides for FedXRepositoryConfig #4953

Conversation

vtermanis
Copy link
Contributor

@vtermanis vtermanis commented Apr 17, 2024

GitHub issue resolved: #4952

Briefly describe the changes proposed in this PR:

Introduce FedXConfig option overrides for FedXRepositoryConfig so that repo templates can also access options which previously were only available programmatically.

We've raised the PR against the main branch because from our understanding it's not clear yet when v5.0.0 will be released and we'd be looking to use this feature in production ourselves. Looking at what's changed between 4 latest & 5 (against the files affected by this change) I think it should be trivial to port.


PR Author Checklist (see the contributor guidelines for more details):

  • my pull request is self-contained
  • I've added tests for the changes I made
  • I've applied code formatting (you can use mvn process-resources to format from the command line)
  • I've squashed my commits where necessary
    • I will squash all the fixup commits once review is complete.
  • every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

@@ -187,6 +203,14 @@ public void parse(Model m, Resource implNode) throws RepositoryConfigException {
Models.objectLiteral(m.getStatements(implNode, DATA_CONFIG, null))
.ifPresent(value -> setDataConfig(value.stringValue()));

Models.objectResource(m.getStatements(implNode, FEDX_CONFIG, null))
.ifPresent(res -> {
if (getConfig() == null) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I chose to do this so the previous behaviour is preserved:

  1. If no FedXConfig has been provided (programmatically or via template) then it will be null
  2. When exporting, FedXConfig is not included (when null) rather than being expanded into defaults (see export above)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good considerations! Please see my comments for the exporting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed with the other suggestion: 5a160df

@vtermanis
Copy link
Contributor Author

Note: Integration test pipeline has failed due to an unrelated reason:

compliance/rio/src/test/java/org/eclipse/rdf4j/rio/n3/N3ParserTest.java:28 | Expected '.', found '3' [line 2]

@hmottestad
Copy link
Contributor

I've set it to rerun the integration test. I've not seen this test being flaky before, but maybe it is.

@hmottestad
Copy link
Contributor

It's not an issue with your code. It's a test that is marked as disabled, but has somehow started to run again. Not sure how that happened.

@hmottestad hmottestad changed the base branch from main to develop April 19, 2024 15:04
@hmottestad hmottestad changed the base branch from develop to main April 19, 2024 20:04
Copy link
Contributor

@aschwarte10 aschwarte10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the initiative and contribution. I see the value of making certain settings of the federation easier configurable via the repository configuration file (e.g. managed from the RDF4J workbench). Actually I was also looking for such feature.

Overall the functional change looks clean and good. Thanks for the unit test coverage.

I have one comment though inline on compatibility (at least with the use FedX in our product). Can you please have a look at that?

In addition, users would benefit from a small documentation adjustment of https://rdf4j.org/documentation/programming/federation/#fedx-configuration (stating with a small example that settings can be added in the repository configuration). Likely the example from the javadoc is sufficient already. As far as I remember the documentation is also built from git, the location should be rdf4j/site/content/documentation/programming/federation.md

Regarding the contribution procedures I'd also like to see a confirmation from @hmottestad

/**
* IRI of the property populating {@link FedXConfig#getEnforceMaxQueryTime()}
*/
public static final IRI CONFIG_ENFORCE_MAX_QUERY_TIME = vf.createIRI(NAMESPACE, "enforceMaxQueryTime");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f.y.i: nowadays in RDF4J instead of ValueFactory one could simply use Values.iri(NAMESPACE, "enforceMaxQueryTime")

For me OK to stick with the ValueFactory

Copy link
Contributor Author

@vtermanis vtermanis Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know 👍

Happy to update or, based on your comment, do you prefer it "looks" the same as FedXRepositoryConfig does?

Edit: Right - with the other change this will be N/A since they'll all use the ValueFactory approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -152,6 +163,11 @@ public Resource export(Model m) {
m.add(implNode, DATA_CONFIG, vf.createLiteral(getDataConfig()));
}

if (getConfig() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will mean: any externally set FedXConfig (e.g. via code) will be serialized into the repository configuration, e.g. when saving it.

I think overall (from history) there is a bit of technical design issue in the FedXRepositoryConfig and FedXConfig: the FedXConfig mixes "string" like options and things that can only be set via code (e.g. the TaskWrapper).

For our product use-case: we are having an extension of FedXConfig (with some proprietary options, and also a TaskWrapper) and we are setting this config via code. We generally would not want to see it exported to the repository config.

So maybe to retain compatibility on that level, we can use one of the following options:

a) introduce another boolean field (say isExternalConfig) and only export if it is not externally provided
b) put the handling of the FedXConfig for both parsing and exporting to a protected method, which can be overridden in customizations of the repository config

I think option b) is cleaner an will allow most flexibility and still retain compatibility.

So the idea would be to move your new exporting/parsing part of the code from this class to new protected methods

  • protected void exportFedXConfig(Model model, Resource implNode)
  • protected void parseFedXConfig(Model model, Resource implNode)

and call those in the current places.

Does this make sense to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - that sounds good. I'll have a go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought - the proposed method signatures allow sub-classes to do whatever they want (be it config-related or not). Would it not make more sense to pass them the node which contains the config (like is done in the separate class) - separation of concerns and all that.
And e.g. allow new FedXConfig(ConfigReader) to inject a custom or sub-classed implementation?

(But I also appreciate that this probably doesn't need to be over-engineered.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -187,6 +203,14 @@ public void parse(Model m, Resource implNode) throws RepositoryConfigException {
Models.objectLiteral(m.getStatements(implNode, DATA_CONFIG, null))
.ifPresent(value -> setDataConfig(value.stringValue()));

Models.objectResource(m.getStatements(implNode, FEDX_CONFIG, null))
.ifPresent(res -> {
if (getConfig() == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good considerations! Please see my comments for the exporting

@aschwarte10
Copy link
Contributor

@vtermanis did you have a chance to look at the above feedback? I think it would be ready with a minimal technical change of moving some pieces of code to protected methods.

@vtermanis
Copy link
Contributor Author

@aschwarte10 - sorry about the delay, we've been busy. I will update the PR based your recommendations. Thank you for the initial review.
Follow up Q: It occurred to me that it'll be useful to us to also be able to tweak the join block size and maybe the worker thread counts. To that effect: Is it worth exposing all (apart from from TaskWrapper) options? Are there any which you don't think people should touch?

@vtermanis
Copy link
Contributor Author

I've added the remaining conf options: f72c49a

@vtermanis
Copy link
Contributor Author

In addition, users would benefit from a small documentation adjustment of https://rdf4j.org/documentation/programming/federation/#fedx-configuration (stating with a small example that settings can be added in the repository configuration).

I've added that in a separate commit - see what you think.

Copy link
Contributor

@aschwarte10 aschwarte10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments.

Overall the change now looks clean and solid. I am fine to expose all configuration settings also as part of the repository config.

I have one comment / question inline, where I'd like to hear @hmottestad 's feedback.


BNode confNode = Values.bnode();

model.add(confNode, CONFIG_JOIN_WORKER_THREADS, vf.createLiteral(config.getJoinWorkerThreads()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the downside of the current implementation is that exporting will also write the FedXConfig default values, i.e. values that may not have been present in the repository configuration ttl before.

Ideally the parse / export logic would be consistent, i.e. running parse and export produces the same result.

I think I could live with this, but I would like hear @hmottestad's opinion whether this is in line with the RDF4J design and other repo configurations.

Copy link
Contributor Author

@vtermanis vtermanis May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the parse / export logic would be consistent, i.e. running parse and export produces the same result.

One could check against the default values for each option, but ...

If the user has specified explicit configuration including default values in the input RDF and we choose to ignore defaults - then the export will not be the same since it'll be missing options previously included that were set to default values. (that information is lost on import.)
.. so unless all options were Optional to show whether they've been set at any point we can't be sure (unless I'm mistaken).

So I'm not sure whether it's worth going for a halfway house. (But if you decide you'd like the check-whether-default-before-write - I can add it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave it up to @hmottestad as he has likely the broadest overview at this point.

As a side note: we are usually solving the parsing / export consistency by having "null" state for the field (i.e. Boolean, Integer, ...). In the current design this does not really fit as it uses the API level FedXConfig. To achieve this properly something similar one would need a repository level FedXConfig and some translation to the api level FedXConfig.

@@ -299,9 +299,10 @@ FedX provides various means for configuration. Configuration settings can be def
|Property | Description |
|---------|-------------|
|prefixDeclarations | Path to prefix declarations file, see [PREFIX Declarations](#prefix-declarations) |
|cacheLocation | Location where the memory cache gets persisted at shutdown, default _cache.db_ |
|sourceSelectionCacheSpec | Cache specification for the `SourceSelectionMemoryCache`, default _maximumSize=1000,expireAfterWrite=6h_ |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing includeInferredDefault / consumingIterationMax

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmottestad hmottestad changed the base branch from main to develop May 10, 2024 08:25
@hmottestad
Copy link
Contributor

@aschwarte10 I've created a CQ for the license issue, but you don't need to wait for that before you merge this PR.

Copy link
Contributor

@aschwarte10 aschwarte10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the chance to have a conversation with @hmottestad and we are both fine with the exporting of defaults. Sorry for the delay

Hence, for me the change looks good to me. Thanks for the clean and concise contribution.

We will see this change as part of the 5.0 release of rdf4j.

@aschwarte10 aschwarte10 merged commit 03aefb9 into eclipse-rdf4j:develop May 13, 2024
8 of 9 checks passed
@vtermanis
Copy link
Contributor Author

Sorry for the delay

No, problem. Thanks for the review!

@vtermanis vtermanis deleted the GH-4952-fedx-config-template-overrides branch May 14, 2024 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FedX - Allow for FedXConfig options to be overriden in repo config templates
4 participants