New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compute/metadata: OnGCE
returns true if request to metadata server returns ECONNREFUSED
#5381
Comments
@wperron Thanks for the detailed report, I will try to take a look later today. |
Hey @codyoss thanks for the quick response! I've got the code open right next to me, if you agree with that assessment, just lemme know I can have a PR up by the end of the day 👍 |
@wperron I will have to take a deeper look later but if you think you have a solution feel free to open a PR. I will say we do intentionally race both checks because in some environments different options will resolve faster. The code tries to limit the start-up cost associated detecting environment details. |
I was thinking of following up the DNS lookup with an HTTP request to see if it responds without errors, but you're right it would add a step to that check. I wish there was a way to globally say "hey this isn't on GCE, trust me" and have the |
@wperron Can you elaborate more on your environment? You mention that you are running on GCE? If so what environment(vm, functions, cloud run, app engine). I am curious to why we dectect you are on GCE, but you get a connection refused. |
Our setup is a bit convoluted, the specific case causing an issue is during our CI jobs; we run Buildkite on top of GKE, so each job is a docker container started by the buildkite agent running in a k8s pod. |
In that case is this just a misconfigured docker network perhaps?(I am not an expert in docker networking by any means) I feel like to the question on am I on GCE, the answer should be yes in this case, but networking is not allowing that traffic to flow. What do you think? |
You probably have a lot more context that I do here so I'll defer to your judgement, but I can provide maybe a little more context into the issue we're facing: We're running the Being able to successfully query the metadata server is a soft dependency for the profiler to be able to start, so I would say if that request results in a connection refused, the profiler probably shouldn't be trying to access it later. If the behavior of I'll try to see on my end if we can tweak the Docker networking to allow access to the metadata server, or blackhole the IP, but I don't how much wiggle room I have here as this would impact every CI run in the org. |
I think the best solution, for the time being, is to either let traffic flow through the docker network or as you mentioned just make sure things don't resolve. In the not too distant future we do plan on tweaking our ongce check in a way that I think would be beneficial to you. I know we have had at least one other request in the past for a variable to disable the check and force it to return false.(Can't seem to find the issue right now though) Let me know if you get it working, and please share if you find the solution as others might have run into this too 😄 |
So for now I applied a |
For anyone else who might be reading this, adding a quick |
Thanks for sharing, I will bring this back to my team and discuss. If we were to make a change here I think we would want to make sure it happens for all langs. But like I mentioned in an earlier comment as well this is going to all be refactored in the not too distant future and at that point this should also no longer be an issue. |
Closing this issue in favor of #4920 as the solution would be the same if we implement it. |
Client
Metadata & Profiler
Environment
Buildkite runners running on GCE. The metadata server responds to a ping, but not to a curl (returns ECONNREFUSED error). I can also reproduce locally by simply adding the hostname and IP to my
/etc/hosts
file (obviously pointing to nothing). I'm using theprofiler
package, previously didn't have any issue until #5063 was merged. When the profiler starts, theOnGCE
call returns true, and before the bugfix, the call toZone()
would simply return an empty string, now it's returning an error (as it should). I think the issue stems from theOnGCE
call, since the metadata server can't be reached it should probably returnfalse
.Go Environment
$ go version
go version go1.17.3 linux/amd64
Expected behavior
I expect
OnGCE
to returnfalse
so that the profiler doesn't attempt to get instance metadata when it obviously can't.Actual behavior
OnGCE
returnstrue
, causing the profiler to attempt retrieving instance metadata.Additional Context:
this check here is likely the issue; The test above it will not only fail in the IP doesn't resolve, but also if the connection is refused, whereas the
LookupHost
test only fails if the DNS lookup operation fails, it doesn't check to see if the server responds correctly. Since it only takes one of the two tests to pass for this function to return true, we get into this case where theOnGCE
call returns true even if the server doesn't respond.The text was updated successfully, but these errors were encountered: