Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graphite Tags DB disappears - maybe related to issues with rabbitmq ? #452

Open
turbopape opened this issue Jan 21, 2021 · 0 comments
Open

Comments

@turbopape
Copy link

turbopape commented Jan 21, 2021

Hey Guys,
I already posted this in the graphite repo.

I am operating a Kubernetes setup.

Graphite is sitting behind carbon-relay-ng. I am using it's stock whisper storage engine.
My specific config is as follows:

[carbon]
    pattern = ^carbon\.
    retentions = 10s:6h,1m:90d

    [default]
    pattern = .*
    retentions = 2m:6h,10m:7d,30m:30d,60m:360d

    [default]
    pattern : .*
    xFilesFactor = 0.0
    aggregationMethod = avg

Carbon-relay-ng is sipping tags-enabled messages from a rabbitmq queue.
Here is the config:

## Global settings ##
    # instance id's distinguish stats of multiple relays.
    # do not run multiple relays with the same instance id.
    # supported variables:
    #  ${HOST} : hostname
    instance = "${HOST}"

    ## System ##
    # this setting can be used to override the default GOMAXPROCS logic
    # it is ignored if the GOMAXPROCS environment variable is set
    max_procs = 2
    pid_file = "carbon-relay-ng.pid"
    # directory for spool files
    spool_dir = "spool"

    ## Logging ##
    # one of trace debug info warn error fatal panic
    # see docs/logging.md for level descriptions
    # note: if you used to use "notice", you should now use "info".
    log_level = "info"

    ## Admin ##
    admin_addr = "0.0.0.0:2004"
    http_addr = "0.0.0.0:8081"

    ## Inputs ##
    ### plaintext Carbon ###
    listen_addr = "0.0.0.0:2003"
    # close inbound plaintext connections if they've been idle for this long ("0s" to disable)
    plain_read_timeout = "0s"
    ### Pickle Carbon ###
    pickle_addr = "0.0.0.0:2013"
    # close inbound pickle connections if they've been idle for this long ("0s" to disable)
    pickle_read_timeout = "0s"

    ## Validation of inputs ##
    # you can also validate that each series has increasing timestamps
    validate_order = false

    # How long to keep track of invalid metrics seen
    # Useful time units are "s", "m", "h"
    bad_metrics_max_age = "24h"

    [[route]]
    key = 'carbon-default'
    type = 'sendAllMatch'
    # prefix = ''
    # notPrefix = ''
    # sub = ''
    # notSub = ''
    # regex = '.*'
    # notRegex = ''
    destinations = [
      'graphite-statsd.graphite.svc.cluster.local:2003 spool=true pickle=false'
    ]

    ### AMQP ###
    [amqp]
    amqp_enabled = true
    amqp_host = "aRabbitHost"
    amqp_port = 5672
    amqp_user = "SomeUser"
    amqp_password = "SomePassword"
    amqp_vhost = "/"
    amqp_exchange = "messages"
    amqp_queue = ""
    amqp_key = "metrics"
    amqp_durable = false
    amqp_exclusive = true

    ## Instrumentation ##
    [instrumentation]
    # in addition to serving internal metrics via expvar, you can send them to graphite/carbon
    # IMPORTANT: setting this to "" will disable flushing, and metrics will pile up and lead to OOM
    # see https://github.com/graphite-ng/carbon-relay-ng/issues/50
    # so for now you MUST send them somewhere. sorry.
    # (Also, the interval here must correspond to your setting in storage-schemas.conf if you use Grafana Cloud)
    graphite_addr = "graphite-statsd.graphite.svc.cluster.local:2003"
    graphite_interval = 10000  # in ms

For no apparent reason, from time to time, I lose all my tag related information. All gone.

I thought this was related to k8s upgrading hosts (though somehow reinitializing volumes), suspected it was related to a problem with carbon-relay-ng deleting tags when it loses connection with rabbit etc...
But then all other "normal" series are there and working good.
Sorry if this is a noob-ish question but I really explored every possible idea to no avail.
Has anyone here experimented the same before?
Will it be better if I anyhow use a Redis backend?
Thank you so much :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant