Skip to content

Latest commit

 

History

History
378 lines (296 loc) · 10.2 KB

getting_started.md

File metadata and controls

378 lines (296 loc) · 10.2 KB

Getting started

Data generation process

The data generation process uses this analogy: generated data flows from source to sink.

To generate data it is then necessary to define:

  • source: what data is generated, eg. data model
  • sink: where data is sent to, eg. ES index
  • flow: how data is transmitted, eg. how fast or how much?
  • schema: fields definition, eg. ECS 8.2.0

Each of the above is handled by its own REST API endpoint. An arbitrary number of sources, sinks, flows and schemas can be defined on the same server.

Install

Currently Geneve is packaged only for Homebrew, you need first to install the Geneve tap

$ brew tap elastic/geneve

then the tool itself

$ brew install geneve

REST API server

Data is generated by the Geneve server, you start it with

$ geneve serve
2023/01/31 16:40:23 Control: http://localhost:9256

The server keeps the terminal busy with its logs, to stop just press ^C. The first line in the log shows where to reach it, this is the base url of the server, all the API endpoints are reachable (but not browseable) under api/.

For the rest of this document we'll assume that the following shell variables are set:

  • $GENEVE points to the Geneve server, url http://localhost:9256
  • $TARGET_ES is the url of the target Elasticsearch instance
  • $TARGET_KIBANA is the corresponding Kibana's url

Now open a separate terminal to operate on the server with curl.

Loading the schema

The schema describes the fields that can be present in a generated document. At the moment it needs to be explicitly loaded into the server.

Download the latest version (or any other, if you have preferences) from https://github.com/elastic/ecs/releases and search for file ecs_flat.yml in the folder ecs-X.Y.Z/generated/ecs/.

Supposing that the path of said file is in shell variable $SCHEMA_YAML, you load it with

$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/schema/ecs" --data-binary "@$SCHEMA_YAML"

The ecs in the endpoint api/schema/ecs is an arbitrary name, it's how the loaded schema is addressed by the server.

Define the data model

In the data model you describe the data that shall be generated. It can be as simple as a list of fields that need to be present or more complex for defining also the relations among them.

How to write a data model is separate subject (see Data model), here we focus on how to configure one on the server. You use the api/source endpoint.

$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/source/mydata" --data-binary @- <<EOF
schema: ecs
queries:
  - 'network where cidrMatch(destination.ip, "10.0.0.0/8", "192.168.0.0/16")'
EOF

Note the reference to the previously loaded schema ecs and name of this newly defined source, mydata. Also, queries is a list. You can add as many queries you need, at each iteration Geneve will select one randomly.

You can generate some data right on terminal for early inspection

$ curl -s "$GENEVE/api/source/mydata/_generate?count=1" | jq
[
  {
    "@timestamp": "2023-01-31T18:19:20.197+01:00",
    "destination": {
      "ip": "192.168.130.52"
    },
    "event": {
      "category": [
        "network"
      ]
    }
  }
]

Kibana rules and alerts

If all you need is security alerts then you can use security detection rules as data models; generated events will make the detection engine create alerts for you. You can select rules by name, tags or (rule) id.

Be sure to direct data to one of the indices monitored by the chosen rule(s).

By rule name

Example of source configuration where the rule is selected by name:

$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/source/mydata" --data-binary @- <<EOF
schema: ecs
rules:
  - name: IPSEC NAT Traversal Port Activity
    kibana:
      url: $TARGET_KIBANA
EOF

Note how the queries entry is now replaced by rules, which specifies the rule name and the Kibana URL the rule shall be downloaded from.

By rule tags

Similarly with rule tags, they can be combined with boolean operators or and and:

$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/source/mydata" --data-binary @- <<EOF
schema: ecs
rules:
  - tags: AWS or Azure or GCP
    kibana:
      url: $TARGET_KIBANA
EOF

By rule id

Once more with rule_id as defined on per-rule base (not to be confused with the id of the rule Kibana stored object):

$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/source/mydata" --data-binary @- <<EOF
schema: ecs
rules:
  - rule_id: a9cb3641-ff4b-4cdc-a063-b4b8d02a67c7
    kibana:
      url: $TARGET_KIBANA
EOF

Set the destination

Once you're happy with the data model it's time to configure where data shall be sent to. Endpoint api/sink serves the purpose.

The command is rather unsofisticated:

curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/sink/mydest" --data-binary @- <<EOF
url: $TARGET_ES/myindex/_doc
EOF

The generated documents are POSTed to the configured url one by one. The name of this sink is mydest, the destination index is myindex.

Configure the flow

Flow configuration is also quite basic, you just need a source and a sink. They need to be already defined in the server.

Use count to specify how many documents should be generated and sent to the stack. This flow is named myflow.

$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/flow/myflow" --data-binary @- <<EOF
source:
  name: mydata
sink:
  name: mydest
count: 1000
EOF

All is left to do is to initiate the generation with

$ curl -s -XPOST "$GENEVE/api/flow/myflow/_start"

You can also check the progress with

$ curl -s "$GENEVE/api/flow/myflow"
params:
    source:
        name: mydata
    sink:
        name: mydest
    count: 1000
state:
    alive: true
    documents: 250
    documents_per_second: 350

Or stop it with

$ curl -s -XPOST "$GENEVE/api/flow/myflow/_stop"

Extra steps

Geneve assumes the target stack and index to be ready for documents acceptance, it seems pointless and expensive to duplicate the stack and indices configuration functionality.

Depending on your needs and the configuration of your stack, you may need or not to do extra steps before actually pumping any document into the stack.

Index mappings

If your target index does not exist and is not managed by any index template, then you may want to create it and configure its mappings.

Geneve can help you with the mappings, the api/source/<name>/_mappings endpoint returns the mappings of all the possible fields that can be encountered in the documents generated by that source.

Use the Elasticsearch index API to create the index

$ curl -s -XPUT -H "Content-Type: application/json" $TARGET_ES/myindex --data @- <<EOF
{
  "mappings": $(curl -fs "$GENEVE/api/source/mydata/_mappings")
}
EOF

Note the embedded Geneve source API call to get the mappings, its output is merged in the index API request.

Kibana data view

If you want to use Kibana Security to analyze the generated data, you need a data view in place. If your target index is not already included in some existing data view, then you need to create one by yourself.

Use the following command to create it from command line

$ curl -s -XPOST -H "Content-Type: application/json" -H "kbn-xsrf: true" $TARGET_KIBANA/api/data_views/data_view --data @- <<EOF
{
  "data_view": {
     "title": "myindex"
  }
}
EOF

GeoIP data

While Geneve is well capable of generating fields with IPv4 and IPv6 addresses, the same does not apply to their geographical location.

As workaround you can leverage the stack geoip processor to enrich the data.

First create the ingest pipeline (ex. geoip-info)

$ curl -s -XPUT -H "Content-Type: application/json" $TARGET_ES/_ingest/pipeline/geoip-info --data @- <<EOF
{
  "description": "Add geoip info",
  "processors": [
    {
      "geoip": {
        "field": "client.ip",
        "target_field": "client.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "source.ip",
        "target_field": "source.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "destination.ip",
        "target_field": "destination.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "server.ip",
        "target_field": "server.geo",
        "ignore_missing": true
      }
    },
    {
      "geoip": {
        "field": "host.ip",
        "target_field": "host.geo",
        "ignore_missing": true
      }
    }
  ]
}

Next, append ?pipeline=geoip-info to the url of your sink (see Set the destination). This instructs the stack to pass the generated data through the just created geoip-info pipeline.

Optionally, ensure that your stack keeps the Geoip database up to date

$ curl -s -XPUT -H "Content-Type: application/json" $TARGET_ES/_cluster/settings --data @- <<EOF
{
  "transient": {
    "ingest": {
      "geoip": {
        "downloader": {
          "enabled": "true"
        }
      }
    }
  }
}
EOF

At last, update your data model so to include the fields you want the geoip processor to fill in. Geneve will generate them with random content, the ingest pipeline will replace that content with better one.

$ curl -s -XPUT -H "Content-Type: application/yaml" "$GENEVE/api/source/mydata" --data-binary @- <<EOF
schema: ecs
queries:
  - 'network where
       cidrMatch(destination.ip, "10.0.0.0/8", "192.168.0.0/16") and
       destination.geo.city_name != null and
       destination.geo.country_name != null and
       destination.geo.location != null
    '
EOF

In case the generated IP does not have any entry in the geoip database, the ingest pipeline will leave the content generated by Geneve as is. This will result in completely bogus randomic city, country etc names. If you read them, you'll know where the come from. We've issue #115 to deal with this.

For more details read GeoIP processor.