Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PodMonitor vs ServiceMonitor what is the difference? #3119

Closed
mandreasik opened this issue Apr 3, 2020 · 29 comments
Closed

PodMonitor vs ServiceMonitor what is the difference? #3119

mandreasik opened this issue Apr 3, 2020 · 29 comments

Comments

@mandreasik
Copy link

Hi, I'm new to prometheus-operator and I do not see difference between ServiceMonitor and PodMonitor.

Documentation describes them as:

ServiceMonitor, which declaratively specifies how groups of services should be monitored. The Operator automatically generates Prometheus scrape configuration based on the definition.

PodMonitor, which declaratively specifies how groups of pods should be monitored. The Operator automatically generates Prometheus scrape configuration based on the definition.

But this does not explain when to use particular one.

I played with them a little bit. Both of them use selector rules to find pods.

So you can either define PodMonitor that search for pods with eg. label app: myAPP, or if your app is behind service that is labeled eg. app:myAPP, you create ServiceMonitor that will scrape metrics from all pods in service with this label. Both of them will produce the same result in prometheus target tab in prometheus UI.

So I would really appreciate if someone will explain to me when one should use PodMonitor and when ServiceMonitor?

@pgier
Copy link
Contributor

pgier commented Apr 3, 2020

I think ServiceMonitor should be the default choice unless you have some reason to use PodMonitor. For example, you may want to scrape a set of pods which all have a certain label which is not consistent between different services.

There is some helpful discussion in the original podmonitor issue #38, but I agree that the docs could be improved here.

@brancz
Copy link
Contributor

brancz commented Apr 7, 2020

The way I see it, ServiceMonitors are the perfect fit if you already have a Service for your pods anyways. However, if it doesn't make sense for other reasons to have a Service for your component, then the PodMonitor is the right choice. It's essentially about avoiding (failure prone) duplicating pod label selectors when possible.

@mandreasik
Copy link
Author

@pgier @brancz Thank you for the explanation :) (now it's more clear to me when to use them)

I hope this issue will help other prometheus-operator newbies understand this matter better :)

@winningjai
Copy link

@pgier @brancz,
I have a doubt about this.

  1. Consider I need to collect metrics from all the pods under a particular service. What I need to prefer PodMonitor or ServiceMonitor.

  2. If I used ServiceMonitor, will Prometheus server scrape all the nodes under the service or it will scrape just the service endpoint alone.

Thanks in advance for your time.

@brancz
Copy link
Contributor

brancz commented May 20, 2020

It will scrape all pods behind the service, because the Service maintains an Endpoints object. That's where Prometheus discovers the targets from.

@yann-soubeyrand
Copy link
Contributor

Hello,

I should say that I'm a little bit confused too: when talking about ServiceMonitor, I would expect my service to be monitored, not the associated Endpoints (otherwise I would have been looking for an EndpointsMonitor). This would be particularly useful when one has a highly available exporter and wants only one of the exporter's pods to be scraped each time to avoid having duplicate metrics and to avoid uselessly loading the service which the exporter requests.

Without using PrometheusOperator, one would use a kubernetes_sd_config with a role of type service in this case. Prometheus would then scrape the service IP, which would randomly scrape one of the exporter's pods. That's precisely what I thought the ServiceMonitor did in the first place due to its naming (ServiceMonitor let one think that it will be translated to kubernetes_sd_config with service role).

How can we achieve this with the actual ServiceMonitor and PodMonitor? Is it too late to rename the actual ServiceMonitor to EndpointsMonitor and create a “real” ServiceMonitor?

@paulfantom
Copy link
Member

@yann-soubeyrand scraping metric endpoints via a load balancer is an antipattern in prometheus. Best practice is to scrape directly and this is why ServiceMonitor is examining all endpoints in a Service and scraping them this way.

@yann-soubeyrand
Copy link
Contributor

yann-soubeyrand commented Jul 9, 2020

@paulfantom so how do you achieve high availability for your exporters? Do you scrape all the replicas of your exporters and then get duplicated metrics and useless load on your applications (elasticsearch or postgresql exporters can be quite demanding on the server they get the metrics from)?

@yann-soubeyrand
Copy link
Contributor

Hi @paulfantom, that's a real question, I'd really like to know what's the right approach for the scraping of highly available exporters if scraping them through a load balancer is an antipattern?

@paulfantom
Copy link
Member

Highly available exporter is an anti-pattern in itself and that has a consequence of #3119 (comment). The best practice is to have an exporter near to an instance that is being monitored.

However, if you need to do this then additionalScrapeConfigs can be helpful

@yann-soubeyrand
Copy link
Contributor

Thanks for your answer @paulfantom. In case one cannot have the exporter near the instance being monitored (like a PostgreSQL instance on AWS RDS), what's the best practice for the deployment of the exporter? Should one deploy a single instance of it and use a PodMonitor (and accept that there can be holes in the metrics when the exporter gets unavailable, which can happen for various reasons like a k8s node being drained in response to cluster scale-in)?

@paulfantom
Copy link
Member

In such case, you need to deploy at least 2 instances and deduplicate data on prometheus level.

@jon-walton
Copy link

Hi @paulfantom I'm in the same situation. I have a redis exporter that cannot be close to redis itself (Azure managed redis). What is the best practice for deduping data at the prometheus level without thanos or cortex?

(sorry, I've searched but only came up with thanos/cortex as a solution. I'm hopeful for a short term solution while I plan out a cortex deployment)

@brancz
Copy link
Contributor

brancz commented Jul 30, 2020

No matther whether you are using Cortex or Thanos, you will always want two Prometheus instances for high availability reasons. If you don't have control over the targets themselves, then I recommend still running 1 exporter per process of a system.

For query time deduplication, you can just use the Thanos querier and none of the other components, which would simplify your setup a lot if that's all you are looking for.

@yxyx520
Copy link

yxyx520 commented Aug 4, 2020

I hope this issue will help other prometheus-operator newbies understand this matter better :)

yep, it helps me a lot.

@gmintoco
Copy link

gmintoco commented Nov 3, 2020

While this is an older issue I would make the argument that potentially exporters that Prometheus scrapes (like the SNMP exporter, or BlackBox exporter) are almost anti-patterns in themselves (in that the scraping occurs via a proxy). However, these are in widespread use because of their important functionality (actually the only functionality that makes Prometheus viable for us).

In these cases, it makes a lot of sense for these exporters to be highly available and load-balanced as they are effectively proxies. While I agree that this load balancing is an anti-pattern I would not consider exporters a "service" (in the sense of monitoring other stuff via them, it makes sense to monitor the exporter itself but that is already covered by the service monitor).

I think it would make sense to support these common use cases simply without having to setup multiple Prometheus instances and deuplication (this also enabled independent scaling of exporters and prometheus, as duplicating Prometheus instances is quite demanding in a lot of environments) with something like a ServiceMonitor that actually scrapes the service IP.

Happy to hear feedback if that makes sense, but just my 2 cents :)

@coderanger
Copy link
Contributor

While this is an older issue I would make the argument that potentially exporters that Prometheus scrapes (like the SNMP exporter, or BlackBox exporter) are almost anti-patterns in themselves (in that the scraping occurs via a proxy). However, these are in widespread use because of their important functionality (actually the only functionality that makes Prometheus viable for us).

In these cases, it makes a lot of sense for these exporters to be highly available and load-balanced as they are effectively proxies. While I agree that this load balancing is an anti-pattern I would not consider exporters a "service" (in the sense of monitoring other stuff via them, it makes sense to monitor the exporter itself but that is already covered by the service monitor).

I think it would make sense to support these common use cases simply without having to setup multiple Prometheus instances and deuplication (this also enabled independent scaling of exporters and prometheus, as duplicating Prometheus instances is quite demanding in a lot of environments) with something like a ServiceMonitor that actually scrapes the service IP.

Happy to hear feedback if that makes sense, but just my 2 cents :)

I don't think that's related to this original question, or that you misunderstand how ServiceMonitors work. They use the service's endpoints directly, not the kube-proxy balancer itself.

@gmintoco
Copy link

gmintoco commented Nov 3, 2020

While this is an older issue I would make the argument that potentially exporters that Prometheus scrapes (like the SNMP exporter, or BlackBox exporter) are almost anti-patterns in themselves (in that the scraping occurs via a proxy). However, these are in widespread use because of their important functionality (actually the only functionality that makes Prometheus viable for us).
In these cases, it makes a lot of sense for these exporters to be highly available and load-balanced as they are effectively proxies. While I agree that this load balancing is an anti-pattern I would not consider exporters a "service" (in the sense of monitoring other stuff via them, it makes sense to monitor the exporter itself but that is already covered by the service monitor).
I think it would make sense to support these common use cases simply without having to setup multiple Prometheus instances and deuplication (this also enabled independent scaling of exporters and prometheus, as duplicating Prometheus instances is quite demanding in a lot of environments) with something like a ServiceMonitor that actually scrapes the service IP.
Happy to hear feedback if that makes sense, but just my 2 cents :)

I don't think that's related to this original question, or that you misunderstand how ServiceMonitors work. They use the service's endpoints directly, not the kube-proxy balancer itself.

I am talking about a ServiceMonitor "like" resource using the service IP instead of the endpoint IPs for use case as outlined by yann-soubeyrand and discussed with brancz and paulfantom stance on this being an anti pattern and therefore wouldn't make sense to support and following on from this why I thought it would make sense as something that may be considered. I can see however that this might make more sense as a feature request rather than a comment here, but I thought it was relevant given the discussion above.

@v-pap
Copy link

v-pap commented Nov 18, 2020

I think that one way to monitor a service without scraping data from all its pods, is to use the uri of the service <service-name>.<namespace>.svc.cluster.local:<service-port> in an additional-scrape-configs secret instead of using a ServiceMonitor.

@lfrittelli
Copy link

Although one can make a very reasonable argument that Prometheus should aim to scrape endpoints and not services, would you not say that it is at least confusing to have a CRD called "ServiceMonitor" that monitors endpoints when Prometheus configuration does support monitoring both endpoints and services independently? At least for educational purposes I'd recommend renaming it or perhaps at least provide further clarification of this fact in the ServiceMonitor documentation.

@michael1011101
Copy link

I think that one way to monitor a service without scraping data from all its pods, is to use the uri of the service <service-name>.<namespace>.svc.cluster.local:<service-port> in an additional-scrape-configs secret instead of using a ServiceMonitor.

Hi @v-pap , I am confused why prometheus need to monitor a service without scraping data. And why do we need to monitor service without data scraping? :(

@joshuasimon-taulia
Copy link

I think that one way to monitor a service without scraping data from all its pods, is to use the uri of the service <service-name>.<namespace>.svc.cluster.local:<service-port> in an additional-scrape-configs secret instead of using a ServiceMonitor.

is it possible to abuse the Probe crd to accomplish the same thing?

@costimuraru
Copy link

The way I see it, ServiceMonitors are the perfect fit if you already have a Service for your pods anyways. However, if it doesn't make sense for other reasons to have a Service for your component, then the PodMonitor is the right choice. It's essentially about avoiding (failure prone) duplicating pod label selectors when possible.

What happens when pods are registered in multiple Services - each with their own ServiceMonitor? The pods will be scraped multiple times, leading to duplicated metrics. In that scenario PodMonitor might make more sense.

@brancz
Copy link
Contributor

brancz commented Mar 12, 2021

This can potentially happen yes. The recommendation is to narrow down with labels when this happens.

@stale
Copy link

stale bot commented Jun 14, 2021

This issue has been automatically marked as stale because it has not had any activity in the last 60 days. Thank you for your contributions.

@stale stale bot added the stale label Jun 14, 2021
@kong62
Copy link

kong62 commented Jun 25, 2021

PodMonitor - no service
ServieMonitor - must has service

@stale stale bot removed the stale label Jun 25, 2021
@coderanger
Copy link
Contributor

Going to close this out as I think this issue has outlived its usefulness.

@xianyuLuo
Copy link

podmonitor don't need services ^--^

@kamazee
Copy link

kamazee commented Jun 21, 2023

I have a case that hasn't been covered in the discussion and hence I wonder what's the best practice is for my case.

There is a service which is scaled to hundreds of instances; there is a business metric that is exposed as a gauge and (under the hood) is stored in a database that all of the instances of the service can access, and the metric is related to a service as a whole, not to any specific instance. So, I want to let Prometheus understand that I want the scraper to make a single request to my service within a specified interval (there are different metrics that I plan to scrape hourly and others that I'd like to scrape daily). Although I read that scraping off a load balancer is a bad practice (and I kind of accept that it is when we're talking about monitoring specific instances), are there any better options for my case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests