Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splunk Operator: Change the readiness probe for search head clusters to not show instances that are in manual detention as ready #1322

Open
gjanders opened this issue Apr 15, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request shcreadinessprobe

Comments

@gjanders
Copy link
Contributor

Please select the type of request

Enhancement

Tell us more

Describe the request
Currently the readiness probe used in a Splunk search head cluster tests if port 8089 is running, if it is running the instance is "ready", if not it is not ready. However I'd like to have this further customized to ignore nodes that are in manual (or automatic detention).

Expected behavior
The probe should check the status of the member, for example it could hit the endpoint https://localhost:8089/services/shcluster/member/ready and a response without errors would be considered successful.

A response such as:

<?xml version="1.0" encoding="UTF-8"?>
<!--This is to override browser formatting; see server.conf[httpServer] to disable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .-->
<?xml-stylesheet type="text/xml" href="/static/atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:s="http://dev.splunk.com/ns/rest" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
  <title>shclusterready</title>
  <id>https://localhost:8089/services/shcluster/member/ready</id>
  <updated>2024-04-13T15:50:05+10:00</updated>
  <generator build="d95b3299fa65" version="9.1.3"/>
  <author>
    <name>Splunk</name>
  </author>
  <opensearch:totalResults>0</opensearch:totalResults>
  <opensearch:itemsPerPage>30</opensearch:itemsPerPage>
  <opensearch:startIndex>0</opensearch:startIndex>
  <s:messages/>
</feed>

Would be successful/search head is ready for traffic, a response such as:

<?xml version="1.0" encoding="UTF-8"?>
<response>
  <messages>
    <msg type="ERROR">Search Head is in detention</msg>
  </messages>
</response

Would result in that search head not receiving new traffic

Ideally this would be a switch/parameter in case someone wants to send traffic to members in detention.

Splunk setup on K8S
Splunk search head clusters will have this feature, and only search head clusters...

Reproduction/Testing steps
Any search head cluster has this feature, you can manually put a node in detention as per Put a search head cluster member into detention

K8s environment
N/A

Proposed changes(optional)
Provide either a flag or a new default that for the SHC CRD the readiness probe checks the search head status and members in manual detention as considered "not ready"

K8s collector data(optional)
N/A

Additional context(optional)
I've raised the related issue #1321

@yaroslav-nakonechnikov
Copy link

agree, it adds issues.

we define startup probe timeout for 5 mins - it postpone checks, and just after searcheads are online, so ip's are assigned and deployer can work with that.
but failureThreshold is set to really high number (50+, depends on period of check), so it allows deployer to finish all tasks.

it has issue, as there will be logs about about non-working deployer, but we ignore it. checking only restart reasons.

@vivekr-splunk vivekr-splunk assigned akondur and unassigned kumarajeet and jryb Apr 24, 2024
@vivekr-splunk vivekr-splunk added Q2 enhancement New feature or request and removed Q2 labels Apr 24, 2024
@gjanders
Copy link
Contributor Author

gjanders commented May 7, 2024

Now logged as CSPL-2594

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request shcreadinessprobe
Projects
None yet
Development

No branches or pull requests

6 participants