Skip to content
This repository has been archived by the owner on Nov 28, 2023. It is now read-only.

Limiting or extending Kafka messaging

Andy Slack edited this page Aug 20, 2015 · 3 revisions

By default, Kafka conserves the past 7 days of messaging, which can take up a fair deal of disk space in a production system.

To alter the retention time for Kafka messaging, we'll need to update the Kafka config file: kafka/config/server.properties

In the case of the datasift-connector, chef is responsible for building the Kafka config (and will rewrite kafka/config/server.properties each time it rebuilds), so we'll need to update the local datasift-connector/chef/nodes/datasift-connector.json with a "log_retention_hours" value added to the kafka.broker object.

For example:

"kafka": {
  "ulimit_file": 128000,
  "broker": {
    "log_dirs": [
      "/mnt"
    ],
    "log_retention_hours": 48,
    "zookeeper_connect": [
      "localhost:2181"
    ],
    "zookeeper_connection_timeout_ms": 15000
  }
}
....

When datasift-connector is rebuilt, /opt/kafka/config/server.properties (in your vagrant / EC2 instance) will now have log.retention.hours=48

Other Kafka config properties can be added as well: http://kafka.apache.org/07/configuration.html

Note, We recommend EC2 instances with at least 2GB of memory and 20GB of storage, which should be sufficient disk space in most cases.