Splunk Operator: indexers don't start with 9.1.2 #1260

yaroslav-nakonechnikov · 2023-12-13T09:37:17Z

Please select the type of request

Bug

Tell us more

Describe the request
All nodes starting as expected, but only indexers can't

Expected behavior
all works as it was

Splunk setup on K8S
eks

Reproduction/Testing steps

start cluster with 9.1.1
and then replace splunk image with 9.1.2
or just try to start cluster with 9.1.2

K8s environment
1.28

Additional context(optional)

reported to [docker-splunk]
](Validating databases (splunkd validatedb) failed with code '-1' on 9.1.2 docker-splunk#633)
reported to splunk directly in case: 3366553

yaroslav-nakonechnikov · 2023-12-13T14:24:13Z

so, after several tests i can give what breaks.

in cluster manager definition we had that:

 smartstore:
    defaults:
      maxGlobalDataSizeMB: 0
      maxGlobalRawDataSizeMB: 0
      volumeName: smartstore
    indexes:
    - hotlistBloomFilterRecencyHours: 1
      hotlistRecencySecs: 3600
      name: tf-test
      remotePath: tf-test/
      volumeName: smartstore
    volumes:
    - endpoint: https://s3-eu-central-1.amazonaws.com
      name: smartstore
      path: bucket-for-smart-store
      provider: aws
      region: eu-central-1
      storageType: s3

and when we removed that block and recreated cm and indexers - all started to work.

and it has same behavior with splunk-operator versions 2.4.0 and latest

yaroslav-nakonechnikov · 2023-12-13T15:31:41Z

and, final tests showing, that problem in defaults section.

yaroslav-nakonechnikov · 2023-12-14T10:08:55Z

so, my further investigation leads that splunk-operator creates default settings:

[splunk@splunk-site1-indexer-0 splunk]$ bin/splunk btool indexes  list --debug | grep "\[default\]"
/opt/splunk/etc/peer-apps/splunk-operator/local/indexes.conf                 [default]
[splunk@splunk-site1-indexer-0 splunk]$ cat /opt/splunk/etc/peer-apps/splunk-operator/local/indexes.conf
[default]
repFactor = auto
maxDataSize = auto
homePath = $SPLUNK_DB/$_index_name/db
coldPath = $SPLUNK_DB/$_index_name/colddb
thawedPath = $SPLUNK_DB/$_index_name/thaweddb

[volume:smartstore]
storageType = remote
path = s3://bucket-for-smart-store
remote.s3.endpoint = https://s3-eu-central-1.amazonaws.com
remote.s3.auth_region = eu-central-1

and doesn't work with definition from crd.

also, we had some default settings defined in our custom created app, and it also breaks indexer startup. so something changed which shouldn't be touched.

vivekr-splunk · 2023-12-15T17:36:34Z

hello @yaroslav-nakonechnikov are you using IRSA with privatelink

yaroslav-nakonechnikov · 2023-12-18T10:23:16Z

@vivekr-splunk, no, we don't use privatelink.

main point, that with 9.1.1 same config was working fine.

vivekr-splunk · 2023-12-18T16:44:44Z

Hello @yaroslav-nakonechnikov this has been fixed in upcoming release of 9.1.3 and 9.0.7 and also in 9.2.1.

yaroslav-nakonechnikov · 2024-01-25T15:07:42Z

still same issue with 9.1.3

FAILED - RETRYING: Restart the splunkd service - Via CLI (5 retries left).
FAILED - RETRYING: Restart the splunkd service - Via CLI (4 retries left).
FAILED - RETRYING: Restart the splunkd service - Via CLI (3 retries left).
FAILED - RETRYING: Restart the splunkd service - Via CLI (2 retries left).
FAILED - RETRYING: Restart the splunkd service - Via CLI (1 retries left).

RUNNING HANDLER [splunk_common : Restart the splunkd service - Via CLI] ********
fatal: [localhost]: FAILED! => {
    "attempts": 60,
    "changed": true,
    "cmd": [
        "/opt/splunk/bin/splunk",
        "restart",
        "--answer-yes",
        "--accept-license"
    ],
    "delta": "0:00:11.173687",
    "end": "2024-01-25 15:06:11.736729",
    "rc": 10,
    "start": "2024-01-25 15:06:00.563042"
}

STDOUT:

splunkd is not running.

Splunk> 4TW

Checking prerequisites...
        Checking mgmt port [8089]: open
        Checking kvstore port [8191]: open
        Checking configuration... Done.


STDERR:

ERROR: pid 5825 terminated with signal 11 (core dumped)
Validating databases (splunkd validatedb) failed with code '-1'.  If you cannot resolve the issue(s) above after consulting documentation, please file a case online at http://www.splunk.com/page/submit_issue


MSG:

non-zero return code
Thursday 25 January 2024  15:06:11 +0000 (0:22:44.302)       0:23:46.336 ******
Thursday 25 January 2024  15:06:11 +0000 (0:00:00.000)       0:23:46.336 ******
Thursday 25 January 2024  15:06:11 +0000 (0:00:00.000)       0:23:46.337 ******

PLAY RECAP *********************************************************************
localhost                  : ok=106  changed=20   unreachable=0    failed=1    skipped=67   rescued=0    ignored=0

Thursday 25 January 2024  15:06:11 +0000 (0:00:00.003)       0:23:46.341 ******
===============================================================================
splunk_common : Restart the splunkd service - Via CLI ---------------- 1364.30s
splunk_common : Restart the splunkd service - Via CLI ------------------ 18.39s
splunk_common : Set options in saml ------------------------------------- 6.26s
splunk_common : Set options in roleMap_SAML ----------------------------- 6.04s
splunk_common : Get Splunk status --------------------------------------- 1.43s
splunk_common : Set node as license slave ------------------------------- 1.17s
splunk_indexer : Update HEC token configuration ------------------------- 1.17s
Gathering Facts --------------------------------------------------------- 1.14s
splunk_indexer : Set current node as indexer cluster peer --------------- 1.12s
splunk_common : Update /opt/splunk/etc ---------------------------------- 0.97s
splunk_indexer : Setup Peers with Associated Site ----------------------- 0.97s
splunk_common : Set options in authentication --------------------------- 0.88s
splunk_common : Test basic https endpoint ------------------------------- 0.79s
splunk_indexer : Setup global HEC --------------------------------------- 0.70s
splunk_indexer : Check for required restarts ---------------------------- 0.68s
Check for required restarts --------------------------------------------- 0.67s
splunk_indexer : Get existing HEC token --------------------------------- 0.67s
splunk_indexer : Check Splunk instance is running ----------------------- 0.67s
splunk_indexer : Check Splunk instance is running ----------------------- 0.66s
splunk_common : Check Splunk instance is running ------------------------ 0.66s

yaroslav-nakonechnikov · 2024-02-12T12:43:25Z

that one looks like fixed in 9.2.*
but still testing

fabiusgoh · 2024-02-16T07:11:56Z

Still hitting with the same error on 9.2.0 and Splunk Operator 2.5.0

yaroslav-nakonechnikov · 2024-02-16T07:24:45Z

@fabiusgoh have you raised ticket in splunk support? may i ask you for its number?

fabiusgoh · 2024-02-16T07:26:26Z

i have not raised a support ticket yet, am in the midst to test it out on 9.1.3 as it is the officially supported version for the operator

yaroslav-nakonechnikov · 2024-02-19T12:14:59Z

i can confirm, 9.2 and 9.2.0.1 starts with our config.
which wasn't working with 9.1.2 and 9.1.3

vivekr-splunk · 2024-04-02T17:12:28Z

@yaroslav-nakonechnikov, As we discussed in our meeting, we now understand the issue. This problem arose due to the upgrade path we followed in the 2.5.0 release. Previously, we expected the search head clusters to be running before starting the indexers (if both indexers and SHC are pointing to the same CM). However, since the SHC had trouble starting, the indexers were never created.
As agreed, we will modify the logic to start the indexers parallel to the search head. We'll keep you updated on our progress with these changes.

yaroslav-nakonechnikov · 2024-04-03T07:12:26Z

@vivekr-splunk yep, i agree, it was informative meeting. But this ticket is different, as it is about Splunk logic itself(or splunk-ansible), which was fixed in splunk container starting from 9.2.0.

we were discussing : #1293

yaroslav-nakonechnikov · 2024-04-03T08:17:12Z

also, today i've rechecked 9.1.4 - is it not working as well.
so, 9.1.1 last working version and the last supported version.

all others are broken or not supported.

yaroslav-nakonechnikov assigned jryb, kumarajeet and vivekr-splunk Dec 13, 2023

This was referenced Dec 13, 2023

Splunk Operator: something breaking local config files on pod restart #1212

Closed

cspl-2500: support for splunk 9.1.2 #1258

Open

vivekr-splunk added 9.1.2 9.2.1 9.0.7 9.1.3 and removed 9.1.2 labels Dec 18, 2023

yaroslav-nakonechnikov closed this as completed Feb 19, 2024

vivekr-splunk unassigned kumarajeet and jryb Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splunk Operator: indexers don't start with 9.1.2 #1260

Splunk Operator: indexers don't start with 9.1.2 #1260

yaroslav-nakonechnikov commented Dec 13, 2023

yaroslav-nakonechnikov commented Dec 13, 2023 •

edited

yaroslav-nakonechnikov commented Dec 13, 2023

yaroslav-nakonechnikov commented Dec 14, 2023

vivekr-splunk commented Dec 15, 2023

yaroslav-nakonechnikov commented Dec 18, 2023 •

edited

vivekr-splunk commented Dec 18, 2023

yaroslav-nakonechnikov commented Jan 25, 2024

yaroslav-nakonechnikov commented Feb 12, 2024

fabiusgoh commented Feb 16, 2024

yaroslav-nakonechnikov commented Feb 16, 2024

fabiusgoh commented Feb 16, 2024

yaroslav-nakonechnikov commented Feb 19, 2024

vivekr-splunk commented Apr 2, 2024

yaroslav-nakonechnikov commented Apr 3, 2024

yaroslav-nakonechnikov commented Apr 3, 2024

Splunk Operator: indexers don't start with 9.1.2 #1260

Splunk Operator: indexers don't start with 9.1.2 #1260

Comments

yaroslav-nakonechnikov commented Dec 13, 2023

Please select the type of request

Tell us more

yaroslav-nakonechnikov commented Dec 13, 2023 • edited

yaroslav-nakonechnikov commented Dec 13, 2023

yaroslav-nakonechnikov commented Dec 14, 2023

vivekr-splunk commented Dec 15, 2023

yaroslav-nakonechnikov commented Dec 18, 2023 • edited

vivekr-splunk commented Dec 18, 2023

yaroslav-nakonechnikov commented Jan 25, 2024

yaroslav-nakonechnikov commented Feb 12, 2024

fabiusgoh commented Feb 16, 2024

yaroslav-nakonechnikov commented Feb 16, 2024

fabiusgoh commented Feb 16, 2024

yaroslav-nakonechnikov commented Feb 19, 2024

vivekr-splunk commented Apr 2, 2024

yaroslav-nakonechnikov commented Apr 3, 2024

yaroslav-nakonechnikov commented Apr 3, 2024

yaroslav-nakonechnikov commented Dec 13, 2023 •

edited

yaroslav-nakonechnikov commented Dec 18, 2023 •

edited