Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splunk Operator: controller doesn't detect if statefulset deleted for standalone type #1284

Open
yaroslav-nakonechnikov opened this issue Feb 15, 2024 · 8 comments
Assignees
Labels
documentation Improvements or additions to documentation Q2

Comments

@yaroslav-nakonechnikov
Copy link

Please select the type of request

Bug

Tell us more

Describe the request
if there are some changes done in stateful set for standalone instance - controlled doesn't recreate it as it is done for other types

Expected behavior
controller creates statefulset from crd

@yaroslav-nakonechnikov
Copy link
Author

$ kubectl get standalone
NAME     PHASE   DESIRED   READY   AGE
c-prod                             23m

in description there was warning:
Warning validateStandaloneSpec 20m splunk-standalone-controller validate standalone spec failed invalid Volume Name for App Source: custom. volume: csh, doesn't exist

but, it shouldn't be the case, as before deleting statefulset it was working

@vivekr-splunk
Copy link
Collaborator

can you add CR spec here, we will retry on our end and let you know

@yaroslav-nakonechnikov
Copy link
Author

broken example, which creates warning and blocking creating resources:

 appRepo:
    appInstallPeriodSeconds: 90
    appSources:
    - location: custom/
      name: custom
      scope: local
      volumeName: csh
    - location: base/
      name: base
      scope: local
      volumeName: csh
    appsRepoPollIntervalSeconds: 120
    installMaxRetries: 3
    volumes:
    - endpoint: https://s3-eu-central-1.amazonaws.com
      name: splunkapps
      path: bucketname/splunk-apps/csh
      provider: aws
      region: eu-central-1
      storageType: s3

correct example:

 appRepo:
    appInstallPeriodSeconds: 90
    appSources:
    - location: custom/
      name: custom
      scope: local
      volumeName: splunkapps
    - location: base/
      name: base
      scope: local
      volumeName: splunkapps
    appsRepoPollIntervalSeconds: 120
    installMaxRetries: 3
    volumes:
    - endpoint: https://s3-eu-central-1.amazonaws.com
      name: splunkapps
      path: bucketname/splunk-apps/csh
      provider: aws
      region: eu-central-1
      storageType: s3

so, all in all, expectation that warning doesn't block anything, but feature just not working.
or, it should be error - and in that case it will be expected.

@vivekr-splunk vivekr-splunk added documentation Improvements or additions to documentation Q2 labels Apr 23, 2024
@akondur akondur assigned akondur and unassigned vivekr-splunk Apr 29, 2024
@akondur
Copy link
Collaborator

akondur commented Apr 29, 2024

Hi @yaroslav-nakonechnikov , in every custom resource's reconcile logic we do a validation of its spec and error out if any CR spec is erroneous(same for all CRs).

For standalone we validate its spec here. Another example of an indexer cluster here.

As mentioned here already, Splunk operator records a Warning event in the standalone CR indicating incorrect volume for the app source custom. Currently, only the following event types: {Normal, Warning} are supported as mentioned in the golang documentation here. In the future, if they introduce an {Error} type we will make sure to mark the event as an error.

Additionally we log an error here to indicate an erroneous config.

Please let us know if the explanation above is acceptable.

@akondur
Copy link
Collaborator

akondur commented May 1, 2024

Hi @yaroslav-nakonechnikov , please let us know if the explanation above is acceptable.

@yaroslav-nakonechnikov
Copy link
Author

explanation doesn't fix problem.

i expect that controller overrides stateful set (and or recreate it) in case of any changes in statefulset.

otherwise, behavior is not consistent and leads to additional issues, which are not expected.

@akondur
Copy link
Collaborator

akondur commented May 3, 2024

Hey @yaroslav-nakonechnikov , currently we are looking to set the Phase as Error in the CR status in such scenarios to indicate that the CR spec is invalid i.e

NAME     PHASE   DESIRED   READY   AGE
c-prod   Error                     23m

In the future we are looking to add a Message field in the CR status to indicate such errors i.e

NAME     PHASE   DESIRED   READY   AGE   MESSAGE
c-prod   Error                     23m   validateStandaloneSpec: validate standalone spec failed invalid Volume Name for App Source: custom. volume: csh, doesn't exist

Please let us know if that solution works.

@yaroslav-nakonechnikov
Copy link
Author

yes, adding nice log messages will help a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation Q2
Projects
None yet
Development

No branches or pull requests

5 participants