-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default 404 server with metrics #709
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Hi @vbannai. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
CLA signed. Please check. |
@BenTheElder do you mind looking at the bot that is blocking the pull request. I have already signed the CLA. Thanks. |
sorry @vbannai, the I think you'll have to contact the help desk https://github.com/kubernetes/community/blob/master/CLA.md#troubleshooting kubernetes/kubernetes#27796 (comment) |
/ok-to-test |
I signed it |
@rramkumar1 removing the label manually will not work. |
@BenTheElder : I think I was missing being a member of the Google corp. I have taken care of it now. Hopefully this should work now. |
I think I am now authorized to contributed to CNCF. |
/check-cla |
CLA is green 🎉 there is a test failure remaining:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of comments.
cmd/404-server-with-metrics/README
Outdated
@@ -0,0 +1,374 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Empty line
limitations under the License. | ||
*/ | ||
|
||
// A webserver that only serves a 404 page. Used as a default backend for ingress gce object for kubernetes cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "A webserver that only serves a 404 page. Used as a default backend for ingress-gce"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -0,0 +1,210 @@ | |||
/* | |||
Copyright 2017 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
readHeaderTimeout = flag.Duration("read header timeout", 10*time.Second, "Time in seconds to read the request header before timing out.") | ||
writeTimeout = flag.Duration("write timeout", 10*time.Second, "Time in seconds to write response before timing out.") | ||
idleTimeout = flag.Duration("idle timeout", 10*time.Second, "Time in seconds to wait for the next request when keep-alives are enabled.") | ||
maxJobs = flag.Int("max workers", 100, "Number of parallel/concurrent jobs to run.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should this be maxWorkers and also don't see it used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally planning to use maxJobs to restrict the number of simultaneous, but it turns out that won't help much as the go routines are spun up for each connection in ListenAndServe().
I will remove it.
// command line flags/arguments | ||
port = flag.Int("port", 8080, "Port number to serve default backend 404 page.") | ||
serverTimeout = flag.Duration("timeout", 5*time.Second, "Time in seconds to wait before forcefully terminating the server.") | ||
readTimeout = flag.Duration("read timeout", 10*time.Second, "Time in seconds to read the entire request before timing out.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all the flags here and below should have a "-" instead of the spaces right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that. I have changed the flag names with a "_" instead of "-" as that is what is used in Google3.
fmt.Fprintf(os.Stderr, "server shutting down or received shutdown: %v\n", err) | ||
os.Exit(0) | ||
case http.ErrHandlerTimeout: | ||
fmt.Fprintf(os.Stderr, "handler timedout: %v\n", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: timed out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
path := r.URL.Path | ||
w.WriteHeader(http.StatusNotFound) | ||
// We log 1 out of 4 requests to the logs (make it configurable by a flag??) | ||
fmt.Fprintf(w, "reached NotFound backend, service rules not setup correctly for %s \n", path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this will be visible in customer clusters. I'm not sure we should log that "service rules not setup correctly". It's possible (but probably highly unlikely) that they are using this backend in a meaningful way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-worded the response to be more meaningful.
path := r.URL.Path | ||
w.WriteHeader(http.StatusNotFound) | ||
// We log 1 out of 4 requests to the logs (make it configurable by a flag??) | ||
fmt.Fprintf(w, "reached NotFound backend, service rules not setup correctly for %s \n", path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: reached 404 backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
693f9be
to
2dcee04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments but in general, LGTM.
Adding @bowei for final review.
idleTimeout = flag.Duration("idle_timeout", 10*time.Second, "Time in seconds to wait for the next request when keep-alives are enabled.") | ||
idleLogTimer = flag.Duration("idle_log_timeout", 1*time.Hour, "Timer for keep alive logger.") | ||
logSampleRequests = flag.Float64("log_percent_requests", 0.1, "Fraction of http requests to log [0.0 to 1.0].") | ||
isProd = flag.Bool("is_prod", true, "Indicates if the server is running in production.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how useful this flag is going to be in relation to the shutdown handler. Is there a compelling use case for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is useful for a graceful shutdown if you are trying to test it in non-prod environment. Also, I need an extra handler in addition to the default handler for "/" to make sure that the counters were not incrementing.
I set the default value to be "true" so the shutdown handler will not be active unless you go out of your way to pass the flag to disable it.
hostName, err := os.Hostname() | ||
if err != nil { | ||
fmt.Fprintf(os.Stderr, "could not get the hostname: %v\n", err) | ||
klog.Errorf("could not get the hostname: %v\n", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a Fatalf since we do not want to proceed further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, the behavior of klog module is a little inconsistent with the standard golang logger as log.Fata() also exits by called os.Exit(1).
Whereas klog.Fata() does not do that.
I will add oskExit(1) after klog.FatalF.
os.Exit(0) | ||
case http.ErrHandlerTimeout: | ||
fmt.Fprintf(os.Stderr, "handler timedout: %v\n", err) | ||
klog.Warningf("handler timed out: %v\n", err) | ||
default: | ||
// Should we Fatal() ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove since it's now a Fatalf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but I need to add
os.Exit(1) as klog does not exit unlike standard golang log package.
https://golang.org/src/log/log.go?s=9437:9465#L302
vs
https://sourcegraph.com/github.com/kubernetes/ingress-gce/-/blob/vendor/k8s.io/klog/klog.go#L1208
Performance testing consisted of using Apache "ab" and "curl" to send lots of requests and monitor the metrics on prometheus UI | ||
|
||
### Testing iterations | ||
* **Testing with curl command** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make each testing iteration a sub-heading of "Performance tests" rather than a bullet. So for example, it would look like:
Performance tests
blah
Testing with curl command
blah
Testing with ab
and so on...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, sounds good.
I made the changes as suggested.
can you squash the commits and separate it into two commits? one commit with the actual code changes easy way to do this: git remote update edit to squash everything together (change pick to squash except for the first line) then git reset HEAD^ # soft reset add the text
to the commit |
Rebuild it with newer Go Supports graceful shutdown Add metrics serving How many requests it is serving Serving latency Add logging Respong with a 404 status code and relevant message to every request Configurable sampling requests to a max # of logs/sec [0.0 to 1.0] Periodically if no traffic, just to say “I am alive” Tested the setup on a local desktop with model name : Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz with 12 processor core 64GB RAM Prometheus version 2.8.0 includes yml file for setting alerts and rates Benchmark results Tested with "ab" generating 20M packets over 2000 connections
Squashed the commits as per the request into two commits:
|
LGTM, will leave to @bowei for final approval. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bowei, vbannai The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Added a new 404 server with metrics that supports the following:
Tested the setup on a local desktop with
Prometheus version 2.8.0
Benchmark results