Add prometheus metric #57

tliefheid · 2022-10-07T13:08:22Z

No description provided.

alexliesenfeld · 2022-10-20T08:09:54Z

Thanks! I'll look into it soon!

alexliesenfeld

Thanks again for this PR. Your approach seems to be to add the Prometheus related functionality right into the core library itself. Because Prometheus is only one tool out of many out there, I'd like the idea of separating this functionality out into it's own handler extension. Do you think we could use the ResultWriter interface for this and allow to export Prometheus specific health check status using the WithResultWriter config option instead?

tliefheid · 2022-10-24T09:05:16Z

not sure how you imagine this to work. Cause who will register the metrics (gauges/counters) cause that is how prometheus works. if you expect it so send back some details, then it is still up to the user to set the gauges and counters. imho prometheus is one of the most commonly used metrics tools out there.
i'm happy to update this, don't get me wrong. i'm not that familiar as you with the whole concept of the interfaces you have. just getting some more background info and context from you :)

akhy · 2022-10-25T05:26:42Z

Agree, adding a bunch of extra deps to go.mod is not a good idea for this small lib. I also like @alexliesenfeld 's idea to make the metrics lib implement ResultWriter interface in a separate project.

About the metrics format, I don't feel mapping discrete up/down/unknown to 0/1/2 value is right. I'm still not sure but I think I'm going to make my own format which is more or less like this:

# if service1 is up
app_name_health{name="service1", status="up"} 1
app_name_health{name="service1", status="down"} 0
app_name_health{name="service1", status="unknown"} 0

# if service1 is down
app_name_health{name="service1", status="up"} 0
app_name_health{name="service1", status="down"} 1
app_name_health{name="service1", status="unknown"} 0

# if service1 is unknown
app_name_health{name="service1", status="up"} 0
app_name_health{name="service1", status="down"} 0
app_name_health{name="service1", status="unknown"} 1

This way, we can create a fine grained and unambiguous prom query/aggregation.

tliefheid · 2022-10-26T07:32:21Z

a format like that will result in more timeseries overall (3 for each check), also you need to reset all the gauges before you update the correct one, it's a bit more complex to do.

i get that you don't want to inject into the lib itself, but i'm not sure how to do this

alexliesenfeld · 2022-10-26T21:11:44Z

not sure how you imagine this to work. Cause who will register the metrics (gauges/counters) cause that is how prometheus works. if you expect it so send back some details, then it is still up to the user to set the gauges and counters. imho prometheus is one of the most commonly used metrics tools out there. i'm happy to update this, don't get me wrong. i'm not that familiar as you with the whole concept of the interfaces you have. just getting some more background info and context from you :)

Sure, I can provide some more details. What I was referring to was that there is a ResultWriter interface that the library uses to write a check result into a specific format. For example, the default JSONResultWriter takes the current check status and transforms it into a JSON HTTP response.

health/handler.go

Lines 49 to 64 in 09f6eab

    
           // Write implements ResultWriter.Write. 
        
           func (rw *JSONResultWriter) Write(result *CheckerResult, statusCode int, w http.ResponseWriter, r *http.Request) error { 
        
           	jsonResp, err := json.Marshal(result) 
        
           	if err != nil { 
        
           		return fmt.Errorf("cannot marshal response: %w", err) 
        
           	} 
        
           	w.Header().Set("Content-Type", "application/json; charset=utf-8") 
        
           	w.WriteHeader(statusCode) 
        
           	_, err = w.Write(jsonResp) 
        
           	return err 
        
           } 
        
           // NewJSONResultWriter creates a new instance of a JSONResultWriter. 
        
           func NewJSONResultWriter() *JSONResultWriter { 
        
           	return &JSONResultWriter{} 
        
           }

The same way, we probably could have a PrometheusResultWriter that takes the overall check result and transforms it into a format that can be processed by Prometheus.

The user can then configure the library to use the PrometheusResultWriter rather than the default one by configuring it similar to this example:

health/examples/showcase/main.go

Lines 105 to 109 in 09f6eab

    
           handler := health.NewHandler(checker, 
        
           	// A result writer writes a check result into an HTTP response. 
        
           	// JSONResultWriter is used by default. 
        
           	health.WithResultWriter(health.NewJSONResultWriter()),

@TomL-dev What do you think?

akhy · 2022-10-29T11:43:15Z

Just taken a look at WithResultWriter. Turns out it doesn't seem appropriate for my use case, because it's directly produce HTTP responses. I want to have it easily mixed along with metrics from another library.

So I created a lib myself: https://github.com/chickenzord/go-health-prometheus

Please let me know what you think, thanks

alexliesenfeld · 2022-10-30T19:50:22Z

Looks nice! As of now it seems to only expose the aggregated/overall system status. I wonder if it would help if health.CheckState would actually contain all the component check results as well. AFAICS, this way you could expose components individually as well.

akhy · 2022-10-31T03:55:18Z

@alexliesenfeld it's already showing per-component healths...

actually I haven't found a way to get the overall status 😅 for the aggregated/overall, instead of only up/down/unknown, I'd love to add one more degraded status if any non-critical component is down.. but I think that should be out of the scope of this lib because the real implementation can be different for each use cases 🤔

tliefheid · 2022-11-22T19:19:57Z

@alexliesenfeld it's already showing per-component healths...

actually I haven't found a way to get the overall status 😅 for the aggregated/overall, instead of only up/down/unknown, I'd love to add one more degraded status if any non-critical component is down.. but I think that should be out of the scope of this lib because the real implementation can be different for each use cases 🤔

you can solve this by creating prom queries and generate alerts if the result (count or something) drops below a x percentage

fr3fou · 2023-02-26T21:14:59Z

Any updates on this? :D One more thing I think would be useful – keeping track of how long each probe takes to respond

Tom added 5 commits October 7, 2022 14:59

to go 1.19 and add go.sum

82472c1

split out availability

27bc888

renamed func

6c608e1

add prometheus gauge to checks

ea0e28f

cleanup

7d87a78

tliefheid mentioned this pull request Oct 7, 2022

Make health check results consumable by Prometheus #35

Closed

alexliesenfeld requested changes Oct 24, 2022

View reviewed changes

umutphp approved these changes Jul 20, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prometheus metric #57

Add prometheus metric #57

tliefheid commented Oct 7, 2022

alexliesenfeld commented Oct 20, 2022

alexliesenfeld left a comment •

edited

tliefheid commented Oct 24, 2022

akhy commented Oct 25, 2022

tliefheid commented Oct 26, 2022

alexliesenfeld commented Oct 26, 2022 •

edited

akhy commented Oct 29, 2022

alexliesenfeld commented Oct 30, 2022 •

edited

akhy commented Oct 31, 2022

tliefheid commented Nov 22, 2022

fr3fou commented Feb 26, 2023 •

edited

Add prometheus metric #57

Are you sure you want to change the base?

Add prometheus metric #57

Conversation

tliefheid commented Oct 7, 2022

alexliesenfeld commented Oct 20, 2022

alexliesenfeld left a comment • edited

Choose a reason for hiding this comment

tliefheid commented Oct 24, 2022

akhy commented Oct 25, 2022

tliefheid commented Oct 26, 2022

alexliesenfeld commented Oct 26, 2022 • edited

akhy commented Oct 29, 2022

alexliesenfeld commented Oct 30, 2022 • edited

akhy commented Oct 31, 2022

tliefheid commented Nov 22, 2022

fr3fou commented Feb 26, 2023 • edited

alexliesenfeld left a comment •

edited

alexliesenfeld commented Oct 26, 2022 •

edited

alexliesenfeld commented Oct 30, 2022 •

edited

fr3fou commented Feb 26, 2023 •

edited