Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUT /_snapshot/{repository} with invalid repository configuration returns an error response but still successfully changes the config #107840

Open
DiannaHohensee opened this issue Apr 24, 2024 · 8 comments
Assignees
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement good first issue low hanging fruit Team:Distributed Meta label for distributed team

Comments

@DiannaHohensee
Copy link
Contributor

DiannaHohensee commented Apr 24, 2024

Description

In a support case, it was possible to set invalid config (missing S3 required settings fields) despite the API request returning a 500 error. We should at least verify the expected request fields, and perhaps verify connection with the remote repository can be made successfully, before setting new repository config in the cluster state.

PUT _snapshot/found-snapshots
{
  "type": "s3",
  "settings": {
    "compress": false
  }
}
  • response
{
  "error": {
    "root_cause": [
      {
        "type": "repository_verification_exception",
        "reason": "[found-snapshots] path  is not accessible on master node"
      }
    ],
    "type": "repository_verification_exception",
    "reason": "[found-snapshots] path  is not accessible on master node",
    "caused_by": {
      "type": "i_o_exception",
      "reason": "Unable to upload object [tests-kdwUK-MWR-mwL1Bhev2agA/master.dat] using a single upload",
      "caused_by": {
        "type": "sdk_client_exception",
        "reason": "Failed to connect to service endpoint: ",
        "caused_by": {
          "type": "connect_exception",
          "reason": "Connection refused"
        }
      }
    }
  },
  "status": 500
}

GET _snapshot/found-snapshots sends this response.

{
  "found-snapshots": {
    "type": "s3",
    "settings": {
      "compress": "false"
    }
  }
}
@DiannaHohensee DiannaHohensee added >enhancement needs:triage Requires assignment of a team area label :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team and removed needs:triage Requires assignment of a team area label labels Apr 24, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Apr 24, 2024

I think the only required setting for a S3 repo is bucket, which appears in the code to have the validation you suggest, but there's a bug because it's never null even when not set:

diff --git a/modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3Repository.java b/modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3Repository.java
index 26b1b1158de..a777f846ca7 100644
--- a/modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3Repository.java
+++ b/modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3Repository.java
@@ -223,7 +223,7 @@ class S3Repository extends MeteredBlobStoreRepository {

         // Parse and validate the user's S3 Storage Class setting
         this.bucket = BUCKET_SETTING.get(metadata.settings());
-        if (bucket == null) {
+        if (Strings.hasText(bucket) == false) {
             throw new RepositoryException(metadata.name(), "No bucket defined for s3 repository");
         }

The other settings, including base_path and client, have reasonable defaults and do not need to be present in the request.

@DiannaHohensee
Copy link
Contributor Author

The request also returned a 500 error, so it seems like that wasn't sufficient to stop the config from being set. Would we want to confirm the config works before setting it in the cluster state, or are there reasons, like performance or something, not to do so?

@DaveCTurner
Copy link
Contributor

There's no great reason IMO, it's just not how it was originally implemented and would be a little awkward to achieve. At the moment, first we add the repository metadata to the cluster state, which triggers the creation of the repository instance on all the nodes, and then we ask all the nodes to try using the repository we just added. It's this last step which fails with a 500-code exception, by which point the repo is already in the cluster state. Ideally we'd do the verification step first and only if that succeeds would we add the repo to the cluster state, but that'd mean creating a temporary repository instance on each node just for the verification (and worrying about overlapping requests etc etc).

@DiannaHohensee
Copy link
Contributor Author

DiannaHohensee commented Apr 24, 2024

but that'd mean creating a temporary repository instance on each node just for the verification

Is there anything stopping us from only doing an initial verification on the node running the request? A best-effort check to verify the config, before proceeding to send it out to all nodes via cluster state update (and then all nodes verify as per usual, too).

@DaveCTurner
Copy link
Contributor

Yeah that'd work too I reckon. Indeed we do already create a temporary repository instance on the master first, we just don't try and use it at that point:

// Trying to create the new repository on master to make sure it works
closeRepository(createRepository(newRepositoryMetadata));

@DaveCTurner
Copy link
Contributor

so maybe something like this would be enough:

diff --git a/server/src/main/java/org/elasticsearch/repositories/RepositoriesService.java b/server/src/main/java/org/elasticsearch/repositories/RepositoriesService.java
index f7a2a605a18..0fa7a0abead 100644
--- a/server/src/main/java/org/elasticsearch/repositories/RepositoriesService.java
+++ b/server/src/main/java/org/elasticsearch/repositories/RepositoriesService.java
@@ -349,7 +349,14 @@ public class RepositoriesService extends AbstractLifecycleComponent implements C
         final RepositoryMetadata newRepositoryMetadata = new RepositoryMetadata(request.name(), request.type(), request.settings());

         // Trying to create the new repository on master to make sure it works
-        closeRepository(createRepository(newRepositoryMetadata));
+        final var repository = createRepository(newRepositoryMetadata);
+        try {
+            if (request.verify()) {
+                repository.endVerification(repository.startVerification());
+            }
+        } finally {
+            closeRepository(repository);
+        }
     }

     private void submitUnbatchedTask(@SuppressWarnings("SameParameterValue") String source, ClusterStateUpdateTask task) {

@DiannaHohensee DiannaHohensee added the good first issue low hanging fruit label Apr 24, 2024
elasticsearchmachine pushed a commit that referenced this issue Apr 25, 2024
Add basic validation to S3 bucket name - nullity and empty string. It is
aligned with public
[docs](https://www.elastic.co/guide/en/elasticsearch/reference/8.13/repository-s3.html#repository-s3-repository)
for "bucket" as required field. We might want to add more validations
based on S3 naming rules. This PR should not be a breaking change
because missing bucket  will eventually throw exception later in the
code with obscure error.

I've added yaml test to modules
[repository_s3/10_basic.yml](https://github.com/elastic/elasticsearch/compare/main...mhl-b:elasticsearch:s3-bucket-validation?expand=1#diff-08cf26742fe939f5575961254c4d3b4bff6915141cdd6abe4cd28a743d1b70ba),
not sure if it's a right place.

Addresses #107840
@ankanchanda

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement good first issue low hanging fruit Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

5 participants