Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update getting-started-splunk-setup.md #2417

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
102 changes: 56 additions & 46 deletions docs/gettingstarted/getting-started-splunk-setup.md
@@ -1,49 +1,59 @@
# Splunk setup
## Create Indexes

SC4S is pre-configured to map each sourcetype to a typical index. For new installations, it is best practice to create them in Splunk when
using the SC4S defaults. SC4S can be easily customized to use different indexes if desired.

* email
* epav
* epintel
* infraops
* netauth
* netdlp
* netdns
* netfw
* netids
* netlb
* netops
* netwaf
* netproxy
* netipam
* oswin
* oswinsec
* osnix
* print
* _metrics (Optional opt-in for SC4S operational metrics; ensure this is created as a metrics index)

## Configure the Splunk HTTP Event Collector

- Set up the Splunk HTTP Event Collector with the HEC endpoints behind a load balancer (VIP) configured for https round robin *WITHOUT* sticky
session. Alternatively, a list of HEC endpoint URLs can be configured in SC4S (native syslog-ng load balancing) if no load balancer is in
place. In most scenarios the recommendation is to use an external load balancer, as that makes longer term
maintenance simpler by eliminating the need to manually keep the list of HEC URLs specified in sc4s current. However, if a LB is not
available, native load balancing can be used with 10 or fewer Indexers where HEC is used exclusively for syslog.

In either case, it is _strongly_ recommended that SC4S traffic be sent to HEC endpoints configured directly on the indexers rather than
an intermediate tier of HWFs.
- Create a HEC token that will be used by SC4S and ensure the token has access to place events in main, _metrics, and all indexes used as
event destinations.

* NOTE: It is recommended that the "Selected Indexes" on the token configuration page be left blank so that the token has access to
_all_ indexes, including the `lastChanceIndex`. If this list is populated, extreme care must be taken to keep it up to date, as an attempt to
send data to an index not in this list will result in a `400` error from the HEC endpoint. Furthermore, the `lastChanceIndex` will _not_ be
consulted in the event the index specified in the event is not configured on Splunk. Keep in mind just _one_ bad message will "taint" the
whole batch (by default 1000 events) and prevent the entire batch from being sent to Splunk.
* In case you are not using TLS on SC4S- turn off SSL on global settings for HEC in Splunk.
- Refer to [Splunk Cloud](http://docs.splunk.com/Documentation/Splunk/7.3.1/Data/UsetheHTTPEventCollector#Configure_HTTP_Event_Collector_on_managed_Splunk_Cloud)
or [Splunk Enterprise](http://dev.splunk.com/view/event-collector/SP-CAAAE6Q) for specific HEC configuration instructions based on your
To set up syslog processing with SC4S, perform the following tasks in your Splunk instance:
mstopa-splunk marked this conversation as resolved.
Show resolved Hide resolved
1. Create indexes within Splunk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check how mkdocs renders documentation: https://splunk.github.io/splunk-connect-for-syslog/2417/gettingstarted/getting-started-splunk-setup/

in this case we need a newline before the list else we get:
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new line before the steps. I don't see it rendering though, am i doing it wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. Configure your HTTP event collector.
3. Create a load balancing mechanism.
mstopa-splunk marked this conversation as resolved.
Show resolved Hide resolved
mstopa-splunk marked this conversation as resolved.
Show resolved Hide resolved


## Step 1: Create indexes within Splunk

SC4S maps each sourcetype to the following indexes by default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customers report problems to us because sometimes they don't create those indexes in Splunk, so better to say in the docs that the SC4S's default set of indexes must be created in Splunk

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes so much more sense!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, but the call for action for the reader is that they must create those indexes or they will have problems, please compare with the original document

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make it explicit:

Suggested change
SC4S maps each sourcetype to the following indexes by default:
SC4S maps each sourcetype to the following indexes by default. Make sure to create them in Splunk.


* `email`
* `epav`
* `epintel`
mstopa-splunk marked this conversation as resolved.
Show resolved Hide resolved
* `infraops`
* `netauth`
* `netdlp`
* `netdns`
* `netfw`
* `netids`
* `netlb`
* `netops`
* `netwaf`
* `netproxy`
* `netipam`
* `oswin`
* `oswinsec`
* `osnix`
* `print`
* `_metrics` (Optional opt-in for SC4S operational metrics; ensure this is created as a metrics index)

You can also you create your own indexes in Splunk. See [Create custom indexes]( https://docs.splunk.com/Documentation/Splunk/9.2.1/Indexer/Setupmultipleindexes) for more information.
mstopa-splunk marked this conversation as resolved.
Show resolved Hide resolved

## Step 2: Configure your HTTP event collector

See [Use the HTTP event collector](https://docs.splunk.com/Documentation/Splunk/9.2.1/Data/UsetheHTTPEventCollector) for HEC configuration instructions based on your
Splunk type.

Keep in mind the following best practices specific to HEC for SC4S:
* Make sure that the HEC token created for SC4S has permissions to add events to `main`, `_metrics`, and all other event destination indexes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_metrics are not events, so maybe:

Suggested change
* Make sure that the HEC token created for SC4S has permissions to add events to `main`, `_metrics`, and all other event destination indexes.
* Make sure that the HEC token created for SC4S has permissions to write to `_metrics` and all event destination indexes.

* You can leave "Selected Indexes" blank on the token configuration page so that the token has access to
all indexes, including the `lastChanceIndex`. If you do populate this field, take extreme care to keep it up to date; an attempt to
send data to an index that is not in this list results in a `400` error from the HEC endpoint. The `lastChanceIndex` will not be
consulted if the index specified in the event is not configured on Splunk and the entire batch is then not sent to Splunk.
* If you are not using TLS on SC4S, turn off SSL on global settings for HEC in Splunk.
* SC4S traffic must be sent to HEC endpoints that are configured directly on the indexers.
mstopa-splunk marked this conversation as resolved.
Show resolved Hide resolved

### Step 2: Create a load balancing mechanism
mstopa-splunk marked this conversation as resolved.
Show resolved Hide resolved
Create a load balancing mechanism between SC4S and Splunk indexers. See [Set up load balancing](https://docs.splunk.com/Documentation/Splunk/9.2.1/Forwarding/Setuploadbalancingd) for more information. Note that this should not be confused with load balancing between [sources and SC4S](../lb.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splunk docs on load balancing don't apply to this case. Can we make it more in the style of:

In some situations, it is necessary to ensure balancing of the output from SC4S to Splunk indexers. Note that this should not be confused with load balancing between [sources and SC4S](../lb.md). 

what we mean here is that:

  • if you have Splunk Cloud no worry, you're already covered and your SC4S output will be automatically load balanced to Splunk indexers
  • if you have Splunk Enterprise and a single indexer, you obviously don't need an lb
  • if you have Splunk Enterprise and mutliple indexers, you should load balance your SC4S output

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! Is there another topic we should link to in case they are now to the product and need some guidance for creating this type of load balancing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there's nothing to link at this point, we don't provide any further recommendations for lbs at this point

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"In some configurations, you should ensure output balancing from SC4S to Splunk indexers. To do this, you create a load balancing mechanism between SC4S and Splunk indexers. See Set up load balancing for more information."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jenworthington sounds great but we cannot use this link because it's for heavy forwarders, not sc4s:

In some configurations, you should ensure output balancing from SC4S to Splunk indexers. To do this, you create a load balancing mechanism between SC4S and Splunk indexers.


When configuring your load balancing mechanism, Keep in mind the following:

* Splunk Cloud provides an internal ELB on TCP 443.
* For Splunk Enterprise set up your Splunk HTTP Event Collector with the HEC endpoints behind a load balancer.
* An external load balancer simplifies long-term maintenance by eliminating the need to manually keep the list of HEC URLs specified in SC4S current. Set up a load balancer using virtual IP and configured for https round-robin without sticky session.
* If a load balancer is not available, you can configure a list of HEC endpoint URLs with native syslog-ng load balancing. For internal load balancing of syslog-ng you should:
* Load balance ten or fewer indexers.
* Bse HEC exclusively for syslog.
mstopa-splunk marked this conversation as resolved.
Show resolved Hide resolved
* Have SC4S extract timestamps from messages (default behavior) rather than use the time of receipt for the message.
132 changes: 67 additions & 65 deletions docs/gettingstarted/k8s-microk8s.md
@@ -1,126 +1,99 @@

# Install MicroK8s
The SC4S deployment model with Microk8s uses specific features of this distribution of k8s.
While this may be reproducible with other distributions such an undertaking requires more advanced
awareness and responsibility for the administrator.
# Install and configure SC4S with MicroK8s
mstopa-splunk marked this conversation as resolved.
Show resolved Hide resolved
SC4S with Microk8s leverages features of MicroK8s:

* (metalLB) ensure source IP is preserved
* Bring any operating system (window/centos/rhel/ubuntu/debian)
* Uses MetalLB to preserve the source IP.
* Works with any of the following operating systems: Window, Centos, rhel, Uubuntu, Debian.

This configuration requires as least 2 IP addressed one for host and one for the internal load balancer.
We suggest allocation of 3 ip addresses for the host and 5-10 addresses for later use
Note the following:

# FAQ
Question: How is this deployment model supported?
Answer: Similar to other deployment methods, Splunk supports the container itself and the procedural guidance for implementation but does not directly support
or otherwise provide resolutions for issues within the runtime environment.
* In deployments, Splunk supports the container and the procedural guidance for implementation. Splunk does not directly support
or provide resolutions for issues within the runtime environment.
* If you use a load balancer with one instance per host, traffic is restricted
to the entry node and only one instance of SC4S runs per node. MetalLB features will be limited and it will function as a cluster manager.

Question: Why is this "load balancer" ok but others are not?
Answer: While we are using a load balancer with one instance per host, the traffic is restricted
to the entry node and one instance of sc4s will run per node. This limits the function of MetalLB to
the same function as a Cluster Manager.
## Install and configure SC4S with MicroK8s
1. Create IP addresses. Your configuration must have at least one IP address for the host and one IP address for the internal load balancer. As a best practice, you should create three IP addresses for the host and five to ten additional addresses to use later.

Question: Is this a recommended deployment model?
Answer: Yes, the single-server microk8s model is a recommended option. The use of clustering does have additional tradeoffs and should be carefully considered
on a deployment-specific basis.

```bash
#we need to have a normal install of kubectl because of operator scripts
sudo snap install microk8s --classic --channel=1.24
# Basic setup of k8s
sudo usermod -a -G microk8s $USER
sudo chown -f -R $USER ~/.kube

su - $USER
microk8s status --wait-ready
#Note when installing metallb you will be prompted for one or more IPs to used as entry points
#Into the cluster if your plan to enable clustering this IP should not be assigned to the host (floats)
#If you do not plan to cluster then this IP may be the same IP as the host
#Note2: a single IP in cidr format is x.x.x.x/32 use CIDR or range syntax
microk8s enable dns
microk8s enable community
microk8s enable metallb
microk8s enable rbac
microk8s enable storage
microk8s enable openebs
microk8s enable helm3
microk8s status --wait-ready

```
# Add SC4S Helm repo
2. Create an SC4S Helm repository. Use the following command to create a Helm repository to work with SC4S:

```bash
microk8s helm3 repo add splunk-connect-for-syslog https://splunk.github.io/splunk-connect-for-syslog
microk8s helm3 repo update
```

# Create a config file
Dependent on whether you want to store HEC token as a kubernetes secret create `values.yaml` file.
If you wish to provide HEC token value in plaintext configure it as in example below:

The HEC token can be configured either as a plane text or as a secret.
3. Create a configuration file. You can store HEC token as a kubernetes secret `values.yaml` file or HEC token value in plain text.

As Plaintext Configuration:
* To create a pain text configuration file:

```yaml
--8<---- "docs/resources/k8s/values_basic.yaml"
```

As Secret Configuration:
* To create a secret configuration file:

```yaml
--8<---- "docs/resources/k8s/values_basic_no_token.yaml"
```
# Install SC4S
4. Install SC4S:

```bash
microk8s helm3 install sc4s splunk-connect-for-syslog/splunk-connect-for-syslog -f values.yaml
```
HEC token as a kubernetes secret:
5. Install the HEC token as a kubernetes secret:
```bash
export HEC_TOKEN="00000000-0000-0000-0000-000000000000" # provide your token here!
microk8s helm3 install sc4s --set splunk.hec_token=$HEC_TOKEN splunk-connect-for-syslog/splunk-connect-for-syslog -f values.yaml
```
# Upgrade SC4S

To upgrade SC4S:

```bash
microk8s helm3 upgrade sc4s splunk-connect-for-syslog/splunk-connect-for-syslog -f values.yaml
```

# Setup for HA with multiple nodes
# Set up high vailablity (HA) with multiple nodes

See https://microk8s.io/docs/high-availability
Three identically-sized nodes are required for HA. See https://microk8s.io/docs/high-availability for more information.

Note: Three identically-sized nodes are required for HA
1. Use the following command to set up HA:

```yaml
--8<---- "docs/resources/k8s/values_ha.yaml"
```

Upgrade sc4s to apply the new config
2. Upgrade SC4S to apply the new configuration:

```bash
microk8s helm3 upgrade sc4s splunk-connect-for-syslog/splunk-connect-for-syslog -f values.yaml
```

# Advanced Configuration
# Configure environment variables

Using helm based deployment precludes direct configuration of environment variables and
context files but most configuration can be set via the values.yaml
If your configuration uses a helm based-deployment, you cannot configure environment variables and
context files directly. Instead, use the `values.yaml` file to update your configuration.

1. Use the following command to edit the `values.yaml` file:

```yaml
--8<---- "docs/resources/k8s/values_adv.yaml"

```

`config_files` and `context_files` are variables used to specify configuration and context files that need to be passed to the splunk-connect-for-syslog.
2. Use the `config_files` and `context_files` variables to specify configuration and context files that are passed to SC4S.

`config_files`: This variable contains a dictionary that maps the name of the configuration file to its content in the form of a YAML block scalar.
`context_file`: This variable contains a dictionary that maps the name of the context files to its content in the form of a YAML block scalar. The context file named splunk_metadata.csv and host.csv are being passed with the `values.yaml`
* `config_files`: This variable contains a dictionary that maps the name of the configuration file to its content in the form of a YAML block scalar.
* `context_file`: This variable contains a dictionary that maps the name of the context files to its content in the form of a YAML block scalar. The context files `splunk_metadata.csv` and `host.csv` are passed with `values.yaml`:

```yaml
--8<---- "docs/resources/k8s/values_adv_config_file.yaml"
```

# Resource Management
# Manage resources

Generally two instances will be provisioned per node adjust requests and limits to
allow each instance to use about 40% of each node presuming no other workload is present
Provision two instances per node to adjust requests and limits. This lets each instance use about 40% of each node if no other workload is present:

```yaml
resources:
Expand All @@ -131,3 +104,32 @@ resources:
cpu: 100m
memory: 128Mi
```


# FAQ

# Editor's Note: Do we need this? Should it be a step? A different topic?

```bash
#we need to have a normal install of kubectl because of operator scripts
sudo snap install microk8s --classic --channel=1.24
# Basic setup of k8s
sudo usermod -a -G microk8s $USER
sudo chown -f -R $USER ~/.kube

su - $USER
microk8s status --wait-ready
#Note when installing metallb you will be prompted for one or more IPs to used as entry points
#Into the cluster if your plan to enable clustering this IP should not be assigned to the host (floats)
#If you do not plan to cluster then this IP may be the same IP as the host
#Note2: a single IP in cidr format is x.x.x.x/32 use CIDR or range syntax
microk8s enable dns
microk8s enable community
microk8s enable metallb
microk8s enable rbac
microk8s enable storage
microk8s enable openebs
microk8s enable helm3
microk8s status --wait-ready

```