Remote storage #10

juliusv · 2013-01-04T16:24:05Z

Prometheus needs to be able to interface with a remote and scalable data store for long-term storage/retrieval.

This ensures that these files are properly included only in testing. [Fixes #10]

johann8384 · 2015-02-05T06:29:58Z

Is there anyone planning to work on this? Is the work done in the opentsdb-integration branch still valid or has the rest of the code-base moved past that?

beorn7 · 2015-02-05T08:22:49Z

The opentsdb-integration branch is indeed completely outdated (still using the old storage backend etc.). Personally, I'm a great fan of the OpenTSDB integration, but where I work, there is not an urgent enough requirement to justify a high priority from my side...

juliusv · 2015-02-05T11:50:32Z

To be clear, the outdated "opentsdb-integration" was only for the
proof-of-concept read-back support (querying OpenTSDB through Prometheus).

Writing into OpenTSDB should be experimentally supported in master, but
the last time we tried it was a year ago on a single-node OpenTSDB.

You initially asked on #10:

"I added the storage.remote.url command line flag, but as far as I can tell
Prometheus doesn't attempt to store any metrics there."

A couple of questions:

did you enable the OpenTSDB option "tsd.core.auto_create_metrics"?
Otherwise OpenTSDB won't auto-create metrics for you, as the option is
false by default. See
http://opentsdb.net/docs/build/html/user_guide/configuration.html
if you run Prometheus with -logtostderr, do you see any relevant log
output? If there is an error sending samples to TSDB, it should be logged
(glog.Warningf("error sending %d samples to TSDB: %s", len(s), err))
Prometheus also exports metrics itself about sending to OpenTSDB. On
/metrics of your Prometheus server, you should find the counter metrics
"prometheus_remote_storage_sent_errors_total" and
"prometheus_remote_storage_sent_samples_total". What do these say?

Cheers,
Julius

On Thu, Feb 5, 2015 at 9:22 AM, Björn Rabenstein notifications@github.com
wrote:

The opentsdb-integration branch is indeed completely outdated (still using
the old storage backend etc.). Personally, I'm a great fan of the OpenTSDB
integration, but where I work, there is not an urgent enough requirement to
justify a high priority from my side...

—
Reply to this email directly or view it on GitHub
#10 (comment)
.

sammcj · 2015-02-11T03:05:40Z

I cannot +1 this enough

mwitkow · 2015-03-05T08:44:03Z

Is InfluxDB on the cards in any way? :)

beorn7 · 2015-03-05T09:22:45Z

Radio Yerevan: "In principle yes." (Please forgive that Eastern European digression... ;)

mwitkow · 2015-03-05T09:31:31Z

:D That was slightly before my time ;)

juliusv · 2015-03-05T12:08:49Z

See also: https://twitter.com/juliusvolz/status/569509228462931968

We're just waiting for InfluxDB 0.9.0, which has a new data model which
should be more compatible with Prometheus's.

On Thu, Mar 5, 2015 at 10:31 AM, Michal Witkowski notifications@github.com
wrote:

:D That was slightly before my time ;)

—
Reply to this email directly or view it on GitHub
#10 (comment)
.

pires · 2015-04-07T14:32:38Z

We're just waiting for InfluxDB 0.9.0, which has a new data model which
should be more compatible with Prometheus's.

Can I say awesome more than once? Awesome!

fabxc · 2015-04-07T14:49:49Z

Unfortunately, @juliusv ran some tests with 0.9 and InfluxDB consumed 14x more storage than Prometheus.

Before it was an overhead of 11x but Prometheus's could reduce storage size significantly since then - so in reality InfluxDB has apparently improved in that regard.
Nonetheless, InfluxDB did not turn out to be the eventual answer for long-term storage, yet.

beorn7 · 2015-04-07T15:02:01Z

At least experimental write support is in master, as of today, so anybody can play with Influxdb receiving Prometheus metrics. Quite possible somebody finds the reason for the blow-up in storage space and everything will be unicorns and rainbows in the end...

pires · 2015-04-07T15:38:38Z

@beorn7 that's great. TBH I'm not concerned about disk space, it's the cheapest resource on the cloud after all. Not to mention, I'm expecting to hold data with a very small TTL, i.e. few weeks.

beorn7 · 2015-04-07T16:00:19Z

@pires In that case, why not just run two identically configured Prometheis with a reasonably large disk?
A few weeks or months is usually fine as retention time for Prometheus. (Default is 15d for a reason... :) The only problem is that if your disk breaks, your data is gone, but for that, you have the other server.

fabxc · 2015-04-07T16:00:22Z

@pires do you have a particular reason to hold the data in another database for that time? "A few weeks" does not seem to require a long-term storage solution. Prometheus's default retention time is 15 days - increasing that to 30 or even 60 days should not be a problem.

pires · 2015-04-07T16:37:31Z

@beorn7 @fabxc I am currently using a proprietary & very specific solution that writes monitoring metrics into InfluxDB. This can eventually be replaced with Prometheus.

Thing is I have some tailored apps that read metrics from InfluxDB in order to reactively scale up/down, that would need to be rewritten to read from Prometheus instead. Also, I use continuous queries. Does Prometheus deliver such a feature?

brian-brazil · 2015-04-07T16:41:01Z

http://prometheus.io/docs/querying/rules/#recording-rules are the equivalent to InfluxDB's continuous queries.

dever860 · 2015-07-01T12:14:21Z

+1

drawks · 2015-07-31T22:48:42Z

👍

blysik · 2015-10-08T23:46:19Z

How does remote storage as currently implemented interact with PromDash or grafana?

I have a use case where I want to run Prometheus in a 'heroku-like' environment, where the instances could conceivably go away at any time.

Then I would configure a remote, traditional influxdb cluster to store data in.

Could this configuration function normally?

matthiasr · 2015-10-09T00:01:10Z

This depends on your definition of "normally", but mostly, no.

Remote storage as it is is write-only; from Prometheus you would only get what it has locally.

To get at older data, you need to query OpenTSDB or InfluxDB directly, using their own interfaces and query languages. With PromDash you're out of luck in that regard; AFAIK Grafana knows all of them.

You could build your dashboards fully based on querying them and leave Prometheus to be a collection and rule evaluation engine, but you would miss out on its query language for ad hoc drilldowns over extended time spans.

matthiasr · 2015-10-09T00:03:09Z

Also note that both InfluxDB and OpenTSDB support are somewhat experimental, under-exercised on our side, and in flux.

mattkanwisher · 2015-10-21T17:05:09Z

We're kicking around the idea of a flat file exporter, thus we can start storing long term data and then once bulk import issue is done we can use that #535. Would you guys be open for a PR around this?

juliusv · 2015-10-21T21:07:24Z

For #535 take a look at my way outdated branch import-api, where I once added an import API as a proof-of-concept: https://github.com/prometheus/prometheus/commits/import-api. It's from March, so it doesn't apply to master anymore, but it just shows that in principle adding such an API using the existing transfer formats would be trivial. We just need to agree that we want this (it's a contentious issue, /cc @brian-brazil) and whether it should use the same sample transfer format as we use for scraping. The issue with this transfer format is that it's optimized for the many-series-one-sample (scrape) case, while with batch imports you often care more about importing all samples of a series at once, without having to repeat the metric name and labels for each sample (massive overhead). But maybe we don't care about efficiency in the (rare?) bulk import case, so the existing format could be fine.

For the remote storage part, there was this discussion
https://groups.google.com/forum/#!searchin/prometheus-developers/json/prometheus-developers/QsjXwQDLHxI/Cw0YWmevAgAJ about decoupling the remote storage in some generic way, but some details haven't been resolved yet. The basic idea was that Prometheus could send all samples in some well-defined format (JSON, protobuf, or whatever) to a user-specified endpoint which could then do anything it wants with it (write it to a file, send it to another system, etc.).

So it might be ok to add a flat file exporter as a remote storage backend directly to Prometheus, or resolve that discussion above and use said well-defined transfer format and an external daemon.

brian-brazil · 2015-10-21T21:13:15Z

I think for flat file we'd be talking the external daemon, as it's not something we can ever read back from.

mattkanwisher · 2015-10-26T11:44:57Z

So the more I think about it, it would be nice to have this /import-api (a raw data) api, so we can have backup nodes mirroring the data from the primary prometheus. Would their be appetite for a PR for this and corresponding piece inside of prometheus to import the data. So you can have essentially read slaves?

brian-brazil · 2015-10-26T12:14:33Z

For that use case we generally recommend running multiple identical Prometheus servers. Remote storage is about long term data, not redundancy or scaling.

mattkanwisher · 2015-10-26T12:16:08Z

I think running multiple scrapers is not a good solution cause the data won't match, also there is no way to backfill data. So we have issue where I need to spin up some redundant nodes and now they are missing a month of data. If you have an api to raw import the data you could at least catch them up. Also the same interface could be used for backups

brian-brazil · 2015-10-26T12:20:44Z

So we have issue where I need to spin up some redundant nodes and now they are missing a month of data. If you have an api to raw import the data you could at least catch them up. Also the same interface could be used for backups

This is the use case for remote storage, you pull the older data from remote storage rather than depending on Prometheus being stateful. Similarly in such a setup there's no need for backups, as Prometheues doesn't have any notable state.

juliusv · 2017-03-10T15:12:09Z

@brian-brazil Oh yeah, I have multiple vector selector sets, but great point about different offsets!

pilhuhn · 2017-03-10T15:42:31Z

A simple static duration is not sufficient, as the remote storage may not be caught up that far yet or Prometheus may have retention going further back. I think this is something we'll have to figure out

I don't think Prometheus having retention going further back is really an issue here, as long as the remote can (already) provide the data. Worst case is with downsampling that you lose granularity.

juliusv · 2017-03-10T15:49:01Z

@pilhuhn I meant it the other way around: if you have a Prometheus retention of 15d and you query only data older than 15d from the remote storage, it doesn't necessarily mean that Prometheus will already have all data younger than 15d (due to storage wipe or whatever).

Well, for a first iteration we're just going to query all time ranges from everywhere.

juliusv · 2017-03-15T19:36:59Z

There's a WIP PR for the remote read integration here for anyone who would like to take a look early: #2499

ghost · 2017-04-15T15:05:33Z

I'm trying to use the remote_storage_adapter to send metrics from prometheus to opentsdb. But I'm getting these errors in the logs:

WARN[0065] cannot send value NaN to OpenTSDB, skipping sample &model.Sample{Metric:model.Metric{"instance":"localhost:9090", "job":"prometheus", "monitor":"codelab-monitor", "location":"archived", "quantile":"0.5", "__name__":"prometheus_local_storage_maintain_series_duration_seconds"}, Value:NaN, Timestamp:1492267735191}  source=client.go:78

WARN[0065] Error sending samples to remote storage       err=invalid character 'p' after top-level value num_samples=100 source=main.go:281 storage=opentsdb

I've also tried using influxdb instead of opentsdb, with similar results:

EBU[0001] cannot send value NaN to InfluxDB, skipping sample &model.Sample{Metric:model.Metric{"job":"prometheus", "instance":"localhost:9090", "scrape_job":"ns1-web-pinger", "quantile":"0.99", "__name__":"prometheus_target_sync_length_seconds", "monitor":"codelab-monitor"}, Value:NaN, Timestamp:1492268550191}  source=client.go:76

Here's how I'm starting the remote_storage_adapter:

# this is just for influxdb, i make the appropriate changes if trying to use opentsdb
./remote_storage_adapter -influxdb-url=http://138.197.107.211:8086 -influxdb.database=prometheus -influxdb.retention-policy=autogen -log.level debug

Here's the Prometheus config:

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

remote_write:
  url: "http://localhost:9201/write"

Is there something I'm misunderstanding about how to configure the remote_storage_adapter?

juliusv · 2017-04-15T16:41:10Z

@tjboring Neither OpenTSDB nor InfluxDB support float64 NaN (not a number) values, so these samples are skipped when sending samples to them. We have mentioned this problem to InfluxDB, and if we're lucky, they will support NaN values sometime in the future, or maybe we can find another workaround.

OpenTSDB issue: OpenTSDB/opentsdb#183
InfluxDB issue: influxdata/influxdb#4089

I am not sure where the invalid character 'p' after top-level value error comes from though.

ghost · 2017-04-15T22:51:49Z

@juliusv Thanks for the pointers to the opentsdb/influxdb issues. I was just seeing the error messages on the console and thought nothing was being written, not realizing those are just samples that are being skipped. I've since confirmed that samples are indeed making it to the remote storage db. :)

mattbostock · 2017-04-17T17:07:18Z

Now that remote read and write APIs are in place (albeit experimental), should this issue be closed in favour of raising more specific issues as they arise?

https://prometheus.io/docs/operating/configuration/#<remote_write>
https://prometheus.io/docs/operating/configuration/#<remote_read>

prasenforu · 2017-04-21T07:41:11Z

Any body tried with container ? Please paste Dockerfile

Because I am not able to find "remote_storage_adapter" executable file in docker "prom/prometheus" version 1.6

/prometheus # find / -name remote_storage_adapter
/prometheus #

Please

sorrowless · 2017-04-21T17:50:00Z

@prasenforu I have built a docker image with remote_storage_adapter from current master code: gra2f/remote_storage_adapter, feel free to use it.

@juliusv I have a problems similar to @tjboring ones:

time="2017-04-21T17:45:00Z" level=warning msg="cannot send value NaN to Graphite,skipping sample &model.Sample{Metric:model.Metric{"name":"prometheus_target_sync_length_seconds", "monitor":"codelab-monitor", "job":"prometheus", "instance":"localhost:9090", "scrape_job":"prometheus", "quantile":"0.9"}, Value:NaN, Timestamp:1492796695772}" source="client.go:90"

but I am using Graphite. Is it okay?

ghost · 2017-04-21T20:32:12Z

@sorrowless

Do you see other metrics in Graphite that you know came from Prometheus?

In my case I verified this by connecting to the Influxdb server I was using, and running a query. It gave me back metrics, which confirmed that Prometheus was indeed writing metrics; it's just that some were being skipped, per the log message.

sorrowless · 2017-04-21T20:50:24Z

@tjboring yes, I can see some of the metrics in Graphite and what's more strange for me is that I cannot understand why some are there and some are not. For example, sy and us per CPU stored into Graphite but load average is not.

prasenforu · 2017-04-22T00:37:51Z

@sorrowless

Not able to find the image, can you please share the url.

Thanks in advance.

sorrowless · 2017-04-22T08:59:56Z

@prasenforu just run
$ docker pull gra2f/remote_storage_adapter
in your command line, that's all you need

prasenforu · 2017-04-22T09:20:17Z

@sorrowless

Thanks.

juliusv · 2017-04-24T19:21:48Z

@mattbostock As you suggested, I'm closing this issue. We should open more specific remote-storage related issues in the future.

Further usage questions are best asked on our mailing lists or IRC (https://prometheus.io/community/).

prasenforu · 2017-04-27T10:41:32Z

@sorrowless

I was looking the images, I saw there was file remote_storage_adapter in /usr/bin

but rest of prometheus file and volume not there,

~ # find / -name remote_storage_adapter
/usr/bin/remote_storage_adapter
~ # find / -name prometheus.yml
~ # find / -name prometheus

Anyway can you please send me the dockerfile of "gra2f/remote_storage_adapter"

sorrowless · 2017-04-30T08:23:55Z

@prasenforu
you do not need main prometheus executable to use remote storage adapter. Use prom/prometheus image for that.
What related for Dockerfile - all it is doing is copy prebuilt remote_storage_adapter to it and run it, that's all.

gdmelloatpoints · 2017-08-16T13:35:53Z

If anyone wants to test it out (like I need to), I wrote a small docker-compose based setup to get this up and running locally - https://github.com/gdmello/prometheus-remote-storage.

Make documentation for absent() not, uhm, absent

Revert "Share kubernetes informers in kubernetes discovery to improve performance."

lock · 2019-03-23T10:45:48Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

bernerdschaefer added a commit that referenced this issue Apr 9, 2014

Rename test helper files to helpers_test.go

af56a93

This ensures that these files are properly included only in testing. [Fixes #10]

fabxc removed this from the Small Scale Mission Critical Monitoring Use Cases milestone Sep 21, 2015

juliusv mentioned this issue Mar 15, 2017

Remote Read #2499

Merged

ghost mentioned this issue Apr 20, 2017

Prometheus failed sending data to openTSDB #2639

Closed

juliusv closed this as completed Apr 24, 2017

SamSaffron mentioned this issue Aug 11, 2017

Add note about long term metric storage prometheus/docs#806

Closed

iksaif mentioned this issue Sep 25, 2017

Make a DB aware of the first timestamp stored in it prometheus-junkyard/tsdb#134

Merged

simonpasquier referenced this issue in simonpasquier/prometheus Oct 12, 2017

Merge pull request #10 from brian-brazil/absent

5560f42

Make documentation for absent() not, uhm, absent

cofyc added a commit to cofyc/prometheus that referenced this issue Jun 5, 2018

Merge pull request prometheus#10 from cofyc/revert_shared_informers

4e6f216

Revert "Share kubernetes informers in kubernetes discovery to improve performance."

bobmshannon pushed a commit to bobmshannon/prometheus that referenced this issue Nov 19, 2018

Update Go version in circle.yml (prometheus#10)

acd2f4a

doi-t mentioned this issue Mar 21, 2019

manage Prometheus Persistent metrics storage doi-t/gbookshelf#27

Open

lock bot locked and limited conversation to collaborators Mar 23, 2019

Remote storage #10

Remote storage #10

Comments

juliusv commented Jan 4, 2013

johann8384 commented Feb 5, 2015

beorn7 commented Feb 5, 2015

juliusv commented Feb 5, 2015

sammcj commented Feb 11, 2015

mwitkow commented Mar 5, 2015

beorn7 commented Mar 5, 2015

mwitkow commented Mar 5, 2015

juliusv commented Mar 5, 2015

pires commented Apr 7, 2015

fabxc commented Apr 7, 2015

beorn7 commented Apr 7, 2015

pires commented Apr 7, 2015

beorn7 commented Apr 7, 2015

fabxc commented Apr 7, 2015

pires commented Apr 7, 2015

brian-brazil commented Apr 7, 2015

dever860 commented Jul 1, 2015

drawks commented Jul 31, 2015

blysik commented Oct 8, 2015

matthiasr commented Oct 9, 2015

matthiasr commented Oct 9, 2015

mattkanwisher commented Oct 21, 2015

juliusv commented Oct 21, 2015

brian-brazil commented Oct 21, 2015

mattkanwisher commented Oct 26, 2015

brian-brazil commented Oct 26, 2015

mattkanwisher commented Oct 26, 2015

brian-brazil commented Oct 26, 2015

juliusv commented Mar 10, 2017

pilhuhn commented Mar 10, 2017

juliusv commented Mar 10, 2017

juliusv commented Mar 15, 2017

ghost commented Apr 15, 2017 • edited by ghost

juliusv commented Apr 15, 2017 • edited

ghost commented Apr 15, 2017

mattbostock commented Apr 17, 2017

prasenforu commented Apr 21, 2017

sorrowless commented Apr 21, 2017 • edited

ghost commented Apr 21, 2017

sorrowless commented Apr 21, 2017

prasenforu commented Apr 22, 2017

sorrowless commented Apr 22, 2017

prasenforu commented Apr 22, 2017

juliusv commented Apr 24, 2017

prasenforu commented Apr 27, 2017 • edited

sorrowless commented Apr 30, 2017

gdmelloatpoints commented Aug 16, 2017

lock bot commented Mar 23, 2019

ghost commented Apr 15, 2017 •

edited by ghost

juliusv commented Apr 15, 2017 •

edited

sorrowless commented Apr 21, 2017 •

edited

prasenforu commented Apr 27, 2017 •

edited