-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: implement Ip::Address::asText() #1739
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three primary problems with existing Ip::Address::toStr():
-
The API does a poor job distinguishing its output format from output format of three other similar Ip::Address-to-text conversion methods: toStr(), toUrl(), toHostStr(), and operator "<<". Developers should read all four function descriptions to understand which method applies to their specific use case. Many developers naturally don't, especially when copying code, resulting in surprises, review iterations, bugs, etc.
-
Give-me-a-large-enough-buffer API is usually awkward to use, especially in modern C++ code. Its usage also results in a lot of code duplication.
-
The "force IP version" part of API is a bad idea with worse implementation, resulting in ugly code and bugs, possibly including security vulnerabilities.
When improving this API and (eventually) modifying all its callers, we should aim at solving all these problems. The current PR comes short because it only attempts to solve the second toStr() problem. The current PR is arguably worse than simply replacing existing SBuf-needing ip.toStr(...) calls with ToSBuf(ip)!
Ip::AddressText asText(const Ip::Address &)
To address the above three (and other) problems, I propose to introduce an asText() function that returns an object of an Ip::AddressText class. Existing toStr(), toUrl(), and toHostStr() methods will be deprecated and eventually removed from Ip::Address. The new AddressText class will control the following Ip::Address-to-text conversion aspects (at least):
- whether port information is included: withPort() and/or withoutPort()
- whether IPv6 addresses are bracketed: bracketed() and/or bracketless()
AddressText::withPort() may imply AddressText::bracketed().
I suspect there will also be methods or method parameters to skip printing of certain port numbers that caller considers default for the given address (e.g., port 80 for plain HTTP and port 443 for HTTPS URLs).
All proposed names are subject to review and improvements, of course. Please do not focus on their spelling when deciding whether to support this proposal. The same can be said about the exact Ip::AddressText method composition and default behavior. All these aspects are secondary and can be adjusted as needed when this proposal is detailed/implemented.
@kinkie, I removed undocumented |
src/peer_sourcehash.cc
Outdated
|
||
if (SourceHashPeers().empty()) | ||
return nullptr; | ||
|
||
assert(ps); | ||
HttpRequest *request = ps->request; | ||
|
||
key = request->client_addr.toStr(ntoabuf, sizeof(ntoabuf)); | ||
const auto key = request->client_addr.toStrAsSBuf(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use-case does not need any new API. It can just:
const auto key = request->client_addr.toStrAsSBuf(); | |
const auto key = ToSBuf(request->client_addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to guarantee that two different Squid instances produce the same hash for the same request?
- If yes, then we may want to avoid changing format/hash to avoid breaking that guarantee for instances running different Squid versions. We could still change if there is a compelling reason to do so, of course, but I do not see such a reason here.
- Otherwise, we should not covert the IP address to text just to compute a hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point.
I think we should try to preserve behaviour in a major version,
but I wouldn't feel constrained to do the same across
major versions.
At the same time, I would keep this change as its own PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use-case does not need any new API. It can just:
Adopted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't feel constrained to do the same across major versions.
Is this hash computation (de facto) standardized across different products (including non-Squid-based caches) or can we assume that this is a Squid-specific concern?
If it is a Squid-specific concern: The "we can change things across versions" logic is sound, but it does not work as well when dealing with distributed cache hierarchies where nodes may run different versions. I would not object to a breaking change like that if others are sure that it is warranted, but in that case, again, we should not covert the IP address to text to compute a hash.
The non-obvious sensitivity of this code to address formatting warrants a C++ comment, especially if we migrate to something implicit like ToSBuf().
I would keep this change as its own PR
Thank you. I assume "this change" is "possible change of peer hash values".
src/ip/Address.h
Outdated
char* toStr(char *buf, const unsigned int blen, int force = AF_UNSPEC) const; | ||
SBuf toStrAsSBuf(int force = AF_UNSPEC) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the AsSBuf
portion of the naming is more harm than good. We should just be deprecating the char*
API and otherwise keeping the naming the same.
[EDIT: ... assuming scope is kept at updating methods, not moving to the alternative API @rousskov proposed]
char* toStr(char *buf, const unsigned int blen, int force = AF_UNSPEC) const; | |
SBuf toStrAsSBuf(int force = AF_UNSPEC) const; | |
SBuf toStr(int force = AF_UNSPEC) const; | |
/// Deprecated. Use the SBuf method instead. | |
char* toStr(char *buf, const unsigned int blen, int force = AF_UNSPEC) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that toStrAsSBuf() is a non-starter as far as spelling is concerned. However, FWIW, following the above renaming suggestion will not address my primary concerns and, hence, will not move this PR forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now gone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now gone
Still here and still used. I cannot provide requested feedback until this is gone.
...
Er, these are APIs for use in old C-code. They have never been true C++ APIs. It looks to me like this update to use SBuf is part of fixing that.
FTR; That is an issue forced on us by:
[1] long-term we should use the
Falls foul of your issue (1). ToSBuf() uses the stream operator which was designed solely for
I support this plan. |
Glad to hear that! The next step is to decide how to address problem 1 (i.e. Current APIs do a poor job distinguishing various output formats). I see two basic options/directions: 1a. Default to "minimal" (for the lack of a better word) format (i.e. no brackets and no port). Optional method calls add bells and whistles. We use this approach with AsList and, to some degree, AsHex. Cons: This "minimal" format is arguably the most dangerous one in this case; copy-pasted code does not bring reviewer attention to wrong format for the new location. 1b. Default to "bracketed with port" format. Method calls adjust as needed. Cons: This default is wrong for some use cases; copy-pasted code does not bring reviewer attention to wrong format for the new location. 2 . Require each caller to make formatting choices. Cons: Should use an uglier enum-based interfaces illustrated below: os << asText(ip); // Compile error because the caller forgot to make two required formatting choices
os << asText(ip, Ip::Text::withoutPort); // Compile error because the caller forgot to make bracketing choice
os << asText(ip, Ip::Text::withoutBrackets, Ip::Text::withPort); // implicitly throws exception for IPv6 addresses
os << asText(ip, Ip::Text::throwOnIpv6, Ip::Text::withPort); // explicitly throws exception for IPv6 addresses
os << asText(ip, Ip::Text::withBrackets, Ip::Text::withPort); // OK: all required explicit choices were made I think we should use option 2 for safety sake. It is not too ugly1. You might be able to convince me that option 1b is good enough if it uses "bracketed with port" default. Which option do you prefer? P.S. Again, the spelling of all these new names is a separate/secondary issue. Do not worry about it for now!
Why "Er"? Seems like we agree that this give-me-a-large-enough-buffer API is a problem for the current Squid code.
I strongly disagree: None of the specific things you have mentioned force this part of toStr() API and its implementation.
I disagree that IPvFuture format should be used for IPv4 and IPv6 addresses today and in the foreseeable future. I also doubt we should add support for that format today and in the foreseeable future -- we need to fix serious, fundamental problems with the existing IPv4/IPv6 code before throwing a third (and very rarely used) format into the mix.
Yes, among other problems. I did not say that "simply replacing" is a good solution. We do not need to evaluate it because there is a better one. Footnotes
|
I don't like it much, I'd prefer modifier functions like done in AsList. |
Why? Do you disagree that it is safer? Or do you think safety (i.e., getting the right text for the given context) is less important in this context than some other (hopefully to be disclosed) considerations?
That AsList approach (i.e. optional formatting configuration methods) looks nicer but leads to bugs like the one fixed in 0dc9f3b. These bugs would be difficult to detect in some Ip::Address contexts and some of them will have security implications (e.g., wrong port due to port defaults or perhaps IPv6 addresses that will be misinterpreted as IPv4 addresses with ports). |
The most important thing is verbosity; this approach turns a relatively simple operation into an
That is a fair point |
For the known use cases, the implementation does not need varargs AFAICT: There is a finite (and small!) number of valid combinations of required formatting choices. |
Detected by using AddressSanitizer.
ERROR: Connection to [such-and-such-cache_peer] failed TCP_TUNNEL/503 CONNECT nxdomain.test:443 FIRSTUP_PARENT Squid does not alert an admin about (and decrease health level of) a cache_peer that responded with an error to a GET request. Just like GET responses from a cache_peer, CONNECT responses may (and often do!) reflect client or origin server failures. We should not penalize cache_peers (and alert admins) until we can distinguish these frequent client/origin failures from (relatively rare) cache_peer problems. This change absolves cache_peers of CONNECT problems, restoring parity with GETs and restoring v4 behavior changed (probably by accident) in v5. Also removed Http::StatusCode parameter from failure notification functions because it became essentially unused after the primary Http::Tunneler changes. Tunneler was the only source of status code information that (in some cases) used received HTTP response to compute that status code. All other cases extracted that status code from Squid-generated errors. Those errors were arguably never meant to supply status code information for "this failure is not our fault" decision, and they do not supply 4xx status codes driving that decision. ### Problem evolution 2019 commit f5e1794 effectively started blaming cache_peer for all FwdState CONNECT errors. That functionality change was probably accidental, likely influenced by the names of noteConnectFailure() and peerConnectFailed() functions that abbreviated "Connection", making the functions look as applicable to CONNECT failures. Prior to that commit, the functions were never used for CONNECT errors. After it, FwdState started calling peerConnectFailed() for all CONNECT failures. In 2020 commit 25b0ce4, TunnelStateData started blaming cache_peers as well (by moving that FwdState-only error handling code into Tunneler). The same "accidental functionality change" speculations apply here. In 2022 commit 022dbab, we made an exception for 4xx CONNECT errors as folks deploying newer code started complaining about cache_peers getting blamed for client-caused errors (e.g., HTTP 403 Forbidden replies). We did not realize that the blaming code itself was an unwanted accident. Now we are getting complaints about cache_peers getting blamed for 502 and 503 CONNECT errors caused by, for example, domain names without IPs: As these CONNECT error responses are propagated from parent to child caches, every child cache in the chain logs ERRORs and every cache_peer in the chain gets its health counter decreased!
…cache#1771) ACLHTTPHeaderData::match() required a pointer to non-const HttpHeader but does not (and should not) modify the supplied HttpHeader. Also removed support for nil HttpHeader in that method. All callers already require HttpHeader presence. If that changes, it is the _caller_ that should decide what HttpHeader absence means (match, mismatch, exception/dunno, etc.); ACLHTTPHeaderData does not have enough information to make the right choice and, hence, should not be asked to choose. Also polished related #includes to remove unnecessary ones.
Squid style guidelines require .h files to be wrapped with HAVE_*_H protection and placed after all Squid internal file includes. Add the missing ./configure header checks to generate the needed wrappers and refactor the include sequences to meet current guidelines.
Removing a lot of duplicated code and further simplifying library detection.
Detected by Coverity. CID 1554656: Initialization or destruction ordering is unspecified (GLOBAL_INIT_ORDER). Also update MemBlobStats initialization.
Rework the internals for generating output in ErrorState::Dump, used for expanding the '%W' token in error page templates. Also fix a bug with excessive html-quoting of the output.
Detected by Coverity. CID 1554655: Initialization or destruction ordering is unspecified (GLOBAL_INIT_ORDER). Also switched to compile-time checks for table initialization records.
|
||
char buf[INET6_ADDRSTRLEN]; | ||
if (ip_.isIPv6()) { | ||
inet_ntop(AF_INET6, &ip_.mSocketAddr_.sin6_addr, buf, INET6_ADDRSTRLEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check for errors here? Legacy code doesn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check for errors here?
We should, eventually.
Legacy code doesn't
This PR may not have to check, but it is too early to tell. For example, if this code does not fix any of the (many) other problems with legacy printing code, then it does not have to fix this problem either.
I cannot offer a formula that will tell us which legacy problems should be solved in this PR, but it feels like this PR should solve most if not all of them (eventually). For now, I recommend adding an XXX comment to mark this problem in the new code and focus on this code callers/users/API. We will polish the implementation details later (but probably in this PR).
@rousskov , @yadij , this seems like a nice checkpoint for collecting feedback about the API. Current state:
Next steps, if there are no significant objections to the current state,
Once that is done, I'd start the cleanup, porting callsites to the AddressText API; this might offer the opportunity for deeper changes to move away from the "pass me a large enough buffer" paradigm. What do you think? |
src/ip/Address.h
Outdated
char* toStr(char *buf, const unsigned int blen, int force = AF_UNSPEC) const; | ||
SBuf toStrAsSBuf(int force = AF_UNSPEC) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now gone
Still here and still used. I cannot provide requested feedback until this is gone.
|
||
char buf[INET6_ADDRSTRLEN]; | ||
if (ip_.isIPv6()) { | ||
inet_ntop(AF_INET6, &ip_.mSocketAddr_.sin6_addr, buf, INET6_ADDRSTRLEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check for errors here?
We should, eventually.
Legacy code doesn't
This PR may not have to check, but it is too early to tell. For example, if this code does not fix any of the (many) other problems with legacy printing code, then it does not have to fix this problem either.
I cannot offer a formula that will tell us which legacy problems should be solved in this PR, but it feels like this PR should solve most if not all of them (eventually). For now, I recommend adding an XXX comment to mark this problem in the new code and focus on this code callers/users/API. We will polish the implementation details later (but probably in this PR).
explicit AddressText(const Address &ip); | ||
|
||
/// if the Address has a port, output it (default: no) | ||
/// if set to true, also print brackets for IPv6 addresses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "if set to true..." feature introduces a method call order dependency (e.g., the second XXX below) and other problems (e.g., the third XXX below). Please avoid that.
Eventually, it would be nice to address both problems:
os << AddressText(ip).withPort().bracketed(false); // XXX: No brackets, as expected, but breaks IPv6 printing.
os << AddressText(ip).bracketed(false).withPort(); // XXX: Brackets added, surprising code reader.
// XXX: The following two code lines produce different output:
os << AddressText(ip).withPort(false);
os << AddressText(ip).withPort(true).withPort(false);
@@ -52,15 +52,13 @@ ProxyProtocol::Header::getValues(const uint32_t headerType, const char sep) cons | |||
return SBuf(); | |||
auto logAddr = sourceAddress; | |||
logAddr.applyClientMask(Config.Addrs.client_netmask); | |||
char ipBuf[MAX_IPSTRLEN]; | |||
return SBuf(logAddr.toStr(ipBuf, sizeof(ipBuf))); | |||
return ToSBuf(logAddr.asText()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both new and the old code suffer from the same problem that this PR should address: A reader cannot easily tell whether this code formats the address correctly. For example, I cannot easily tell whether this code prints the port number. The same problem is present in other converted code as well.
To fix this, I suggest adjusting AddressText API so that callers must use
- either withPort() // implies bracketed IPv6 addresses; bans bracketed() calls
- or withoutPort() and bracketed(bool) // explicit on both counts
Any better ideas?
Progressively migrate all users of Ip::Address::toStr, toUrl, toHostStr to
a new manipulator that clarifies how to format the output and simplifies