Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: implement Ip::Address::asText() #1739

Draft
wants to merge 34 commits into
base: master
Choose a base branch
from

Conversation

kinkie
Copy link
Contributor

@kinkie kinkie commented Mar 15, 2024

Progressively migrate all users of Ip::Address::toStr, toUrl, toHostStr to
a new manipulator that clarifies how to format the output and simplifies

@kinkie kinkie added S-waiting-for-feedback M-ignored-by-merge-bots https://github.com/measurement-factory/anubis/blob/master/README.md#pull-request-labels labels Mar 15, 2024
@rousskov rousskov self-requested a review March 15, 2024 11:59
Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three primary problems with existing Ip::Address::toStr():

  1. The API does a poor job distinguishing its output format from output format of three other similar Ip::Address-to-text conversion methods: toStr(), toUrl(), toHostStr(), and operator "<<". Developers should read all four function descriptions to understand which method applies to their specific use case. Many developers naturally don't, especially when copying code, resulting in surprises, review iterations, bugs, etc.

  2. Give-me-a-large-enough-buffer API is usually awkward to use, especially in modern C++ code. Its usage also results in a lot of code duplication.

  3. The "force IP version" part of API is a bad idea with worse implementation, resulting in ugly code and bugs, possibly including security vulnerabilities.

When improving this API and (eventually) modifying all its callers, we should aim at solving all these problems. The current PR comes short because it only attempts to solve the second toStr() problem. The current PR is arguably worse than simply replacing existing SBuf-needing ip.toStr(...) calls with ToSBuf(ip)!

Ip::AddressText asText(const Ip::Address &)

To address the above three (and other) problems, I propose to introduce an asText() function that returns an object of an Ip::AddressText class. Existing toStr(), toUrl(), and toHostStr() methods will be deprecated and eventually removed from Ip::Address. The new AddressText class will control the following Ip::Address-to-text conversion aspects (at least):

  • whether port information is included: withPort() and/or withoutPort()
  • whether IPv6 addresses are bracketed: bracketed() and/or bracketless()

AddressText::withPort() may imply AddressText::bracketed().

I suspect there will also be methods or method parameters to skip printing of certain port numbers that caller considers default for the given address (e.g., port 80 for plain HTTP and port 443 for HTTPS URLs).

All proposed names are subject to review and improvements, of course. Please do not focus on their spelling when deciding whether to support this proposal. The same can be said about the exact Ip::AddressText method composition and default behavior. All these aspects are secondary and can be adjusted as needed when this proposal is detailed/implemented.

@rousskov
Copy link
Contributor

@kinkie, I removed undocumented S-waiting-for-feedback label. We already have S-waiting-for-more-reviewers that is better because it is compatible (as in "does not overlap") with existing S-waiting-for-reviewer.


if (SourceHashPeers().empty())
return nullptr;

assert(ps);
HttpRequest *request = ps->request;

key = request->client_addr.toStr(ntoabuf, sizeof(ntoabuf));
const auto key = request->client_addr.toStrAsSBuf();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use-case does not need any new API. It can just:

Suggested change
const auto key = request->client_addr.toStrAsSBuf();
const auto key = ToSBuf(request->client_addr);

Copy link
Contributor

@rousskov rousskov Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to guarantee that two different Squid instances produce the same hash for the same request?

  • If yes, then we may want to avoid changing format/hash to avoid breaking that guarantee for instances running different Squid versions. We could still change if there is a compelling reason to do so, of course, but I do not see such a reason here.
  • Otherwise, we should not covert the IP address to text just to compute a hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point.
I think we should try to preserve behaviour in a major version,
but I wouldn't feel constrained to do the same across
major versions.

At the same time, I would keep this change as its own PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use-case does not need any new API. It can just:

Adopted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't feel constrained to do the same across major versions.

Is this hash computation (de facto) standardized across different products (including non-Squid-based caches) or can we assume that this is a Squid-specific concern?

If it is a Squid-specific concern: The "we can change things across versions" logic is sound, but it does not work as well when dealing with distributed cache hierarchies where nodes may run different versions. I would not object to a breaking change like that if others are sure that it is warranted, but in that case, again, we should not covert the IP address to text to compute a hash.

The non-obvious sensitivity of this code to address formatting warrants a C++ comment, especially if we migrate to something implicit like ToSBuf().

I would keep this change as its own PR

Thank you. I assume "this change" is "possible change of peer hash values".

src/ip/Address.h Outdated
Comment on lines 214 to 215
char* toStr(char *buf, const unsigned int blen, int force = AF_UNSPEC) const;
SBuf toStrAsSBuf(int force = AF_UNSPEC) const;
Copy link
Contributor

@yadij yadij Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the AsSBuf portion of the naming is more harm than good. We should just be deprecating the char* API and otherwise keeping the naming the same.
[EDIT: ... assuming scope is kept at updating methods, not moving to the alternative API @rousskov proposed]

Suggested change
char* toStr(char *buf, const unsigned int blen, int force = AF_UNSPEC) const;
SBuf toStrAsSBuf(int force = AF_UNSPEC) const;
SBuf toStr(int force = AF_UNSPEC) const;
/// Deprecated. Use the SBuf method instead.
char* toStr(char *buf, const unsigned int blen, int force = AF_UNSPEC) const;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that toStrAsSBuf() is a non-starter as far as spelling is concerned. However, FWIW, following the above renaming suggestion will not address my primary concerns and, hence, will not move this PR forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now gone

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now gone

Still here and still used. I cannot provide requested feedback until this is gone.

@rousskov rousskov added the S-waiting-for-author author action is expected (and usually required) label Mar 15, 2024
@yadij
Copy link
Contributor

yadij commented Mar 15, 2024

There are three primary problems with existing Ip::Address::toStr():

...

2. Give-me-a-large-enough-buffer API is usually awkward to use, especially in modern C++ code. Its usage also results in a lot of code duplication.

Er, these are APIs for use in old C-code. They have never been true C++ APIs. It looks to me like this update to use SBuf is part of fixing that.

3. The "force IP version" part of API is a bad idea with worse implementation, resulting in ugly code and bugs, possibly including security vulnerabilities.

FTR; That is an issue forced on us by:

  • kernel folks insisting on Dual-Stack TCP implementations, and
  • sysadmin insisting on IPv4-only networks, and
  • ASCII-based protocols requiring specific address representations [1]
  • and a number of other more uncommon use-cases.

[1] long-term we should use the IPvFuture representation from RFC3986 whenever possible. I do not think the real world is ready for that yet, case in point being our own IP parser.

When improving this API and (eventually) modifying all its callers, we should aim at solving all these problems. The current PR comes short because it only attempts to solve the second toStr() problem. The current PR is arguably worse than simply replacing existing SBuf-needing ip.toStr(...) calls with ToSBuf(ip)!

Falls foul of your issue (1). ToSBuf() uses the stream operator which was designed solely for debugs() uses and is equivalent to toUrl() not toStr(). It is only good for strings which can cope with URL-format ("[]" wrappers).

Ip::AddressText asText(const Ip::Address &)

To address the above three (and other) problems, I propose to introduce an asText() function that returns an object of an Ip::AddressText class.

I support this plan.

@rousskov
Copy link
Contributor

rousskov commented Mar 15, 2024

To address the above three (and other) problems, I propose to introduce an asText() function that returns an object of an Ip::AddressText class.

I support this plan.

Glad to hear that! The next step is to decide how to address problem 1 (i.e. Current APIs do a poor job distinguishing various output formats). I see two basic options/directions:

1a. Default to "minimal" (for the lack of a better word) format (i.e. no brackets and no port). Optional method calls add bells and whistles. We use this approach with AsList and, to some degree, AsHex. Cons: This "minimal" format is arguably the most dangerous one in this case; copy-pasted code does not bring reviewer attention to wrong format for the new location.

1b. Default to "bracketed with port" format. Method calls adjust as needed. Cons: This default is wrong for some use cases; copy-pasted code does not bring reviewer attention to wrong format for the new location.

2 . Require each caller to make formatting choices. Cons: Should use an uglier enum-based interfaces illustrated below:

os << asText(ip); // Compile error because the caller forgot to make two required formatting choices
os << asText(ip, Ip::Text::withoutPort); // Compile error because the caller forgot to make bracketing choice
os << asText(ip, Ip::Text::withoutBrackets, Ip::Text::withPort); // implicitly throws exception for IPv6 addresses
os << asText(ip, Ip::Text::throwOnIpv6, Ip::Text::withPort); // explicitly throws exception for IPv6 addresses
os << asText(ip, Ip::Text::withBrackets, Ip::Text::withPort); // OK: all required explicit choices were made

I think we should use option 2 for safety sake. It is not too ugly1.

You might be able to convince me that option 1b is good enough if it uses "bracketed with port" default.

Which option do you prefer?

P.S. Again, the spelling of all these new names is a separate/secondary issue. Do not worry about it for now!


  1. Give-me-a-large-enough-buffer API is usually awkward to use, especially in modern C++ code. Its usage also results in a lot of code duplication.

Er, these are APIs for use in old C-code. They have never been true C++ APIs.

Why "Er"? Seems like we agree that this give-me-a-large-enough-buffer API is a problem for the current Squid code.

  1. The "force IP version" part of API is a bad idea with worse implementation, resulting in ugly code and bugs, possibly including security vulnerabilities.

FTR; That is an issue forced on us by: ...

I strongly disagree: None of the specific things you have mentioned force this part of toStr() API and its implementation.

long-term we should use the IPvFuture representation from RFC3986 whenever possible.

I disagree that IPvFuture format should be used for IPv4 and IPv6 addresses today and in the foreseeable future. I also doubt we should add support for that format today and in the foreseeable future -- we need to fix serious, fundamental problems with the existing IPv4/IPv6 code before throwing a third (and very rarely used) format into the mix.

The current PR is arguably worse than simply replacing existing SBuf-needing ip.toStr(...) calls with ToSBuf(ip)!

Falls foul of your issue (1).

Yes, among other problems. I did not say that "simply replacing" is a good solution. We do not need to evaluate it because there is a better one.

Footnotes

  1. It is possible to implement compile-time guarantees of option 2 while still using backeted(), withPort(), and similar methods from options 1a and 1b (instead of enum constants), but that requires a lot more IoManip code and does not scale well with the number of required formatting choices. I am against that option and am not showing its details here.

@kinkie
Copy link
Contributor Author

kinkie commented Mar 18, 2024

I think we should use option 2 for safety sake. It is not too ugly

I don't like it much, I'd prefer modifier functions like done in AsList.
If we will go this way, I'd rather use or-ed (|) arguments to avoid having to use varargs (or template varargs).
I'm okay to default to "bracketed with port"; we can also add convenience shortcuts (e.g. "bare()" to signify a combination of flags that is unbracketed, no port)

@rousskov
Copy link
Contributor

I don't like it much,

Why? Do you disagree that it is safer? Or do you think safety (i.e., getting the right text for the given context) is less important in this context than some other (hopefully to be disclosed) considerations?

I'd prefer modifier functions like done in AsList.

That AsList approach (i.e. optional formatting configuration methods) looks nicer but leads to bugs like the one fixed in 0dc9f3b. These bugs would be difficult to detect in some Ip::Address contexts and some of them will have security implications (e.g., wrong port due to port defaults or perhaps IPv6 addresses that will be misinterpreted as IPv4 addresses with ports).

@kinkie
Copy link
Contributor Author

kinkie commented Mar 18, 2024

I don't like it much,

Why? Do you disagree that it is safer? Or do you think safety (i.e., getting the right text for the given context) is less important in this context than some other (hopefully to be disclosed) considerations?

The most important thing is verbosity; this approach turns a relatively simple operation into an
an unwieldy syntax, and the proposal seems to rely on varargs as well

I'd prefer modifier functions like done in AsList.

That AsList approach (i.e. optional formatting configuration methods) looks nicer but leads to bugs like the one fixed in 0dc9f3b. These bugs would be difficult to detect in some Ip::Address contexts and some of them will have security implications (e.g., wrong port due to port defaults or perhaps IPv6 addresses that will be misinterpreted as IPv4 addresses with ports).

That is a fair point

@rousskov
Copy link
Contributor

the proposal seems to rely on varargs as well

For the known use cases, the implementation does not need varargs AFAICT: There is a finite (and small!) number of valid combinations of required formatting choices.

kinkie and others added 11 commits April 2, 2024 14:27
    ERROR: Connection to [such-and-such-cache_peer] failed
    TCP_TUNNEL/503 CONNECT nxdomain.test:443 FIRSTUP_PARENT

Squid does not alert an admin about (and decrease health level of) a
cache_peer that responded with an error to a GET request. Just like GET
responses from a cache_peer, CONNECT responses may (and often do!)
reflect client or origin server failures. We should not penalize
cache_peers (and alert admins) until we can distinguish these frequent
client/origin failures from (relatively rare) cache_peer problems. This
change absolves cache_peers of CONNECT problems, restoring parity with
GETs and restoring v4 behavior changed (probably by accident) in v5.

Also removed Http::StatusCode parameter from failure notification
functions because it became essentially unused after the primary
Http::Tunneler changes. Tunneler was the only source of status code
information that (in some cases) used received HTTP response to compute
that status code. All other cases extracted that status code from
Squid-generated errors. Those errors were arguably never meant to supply
status code information for "this failure is not our fault" decision,
and they do not supply 4xx status codes driving that decision.

### Problem evolution

2019 commit f5e1794 effectively started blaming cache_peer for all
FwdState CONNECT errors. That functionality change was probably
accidental, likely influenced by the names of noteConnectFailure() and
peerConnectFailed() functions that abbreviated "Connection", making the
functions look as applicable to CONNECT failures. Prior to that commit,
the functions were never used for CONNECT errors. After it, FwdState
started calling peerConnectFailed() for all CONNECT failures.

In 2020 commit 25b0ce4, TunnelStateData started blaming cache_peers as
well (by moving that FwdState-only error handling code into Tunneler).
The same "accidental functionality change" speculations apply here.

In 2022 commit 022dbab, we made an exception for 4xx CONNECT errors as
folks deploying newer code started complaining about cache_peers getting
blamed for client-caused errors (e.g., HTTP 403 Forbidden replies). We
did not realize that the blaming code itself was an unwanted accident.

Now we are getting complaints about cache_peers getting blamed for 502
and 503 CONNECT errors caused by, for example, domain names without IPs:
As these CONNECT error responses are propagated from parent to child
caches, every child cache in the chain logs ERRORs and every cache_peer
in the chain gets its health counter decreased!
…cache#1771)

ACLHTTPHeaderData::match() required a pointer to non-const HttpHeader
but does not (and should not) modify the supplied HttpHeader.

Also removed support for nil HttpHeader in that method. All callers
already require HttpHeader presence. If that changes, it is the _caller_
that should decide what HttpHeader absence means (match, mismatch,
exception/dunno, etc.); ACLHTTPHeaderData does not have enough
information to make the right choice and, hence, should not be asked to
choose.

Also polished related #includes to remove unnecessary ones.
Squid style guidelines require .h files to be wrapped with
HAVE_*_H protection and placed after all Squid internal file
includes.

Add the missing ./configure header checks to generate the
needed wrappers and refactor the include sequences to meet
current guidelines.
Removing a lot of duplicated code and further
simplifying library detection.
Detected by Coverity. CID 1554656: Initialization or destruction
ordering is unspecified (GLOBAL_INIT_ORDER).

Also update MemBlobStats initialization.
Rework the internals for generating output in ErrorState::Dump,
used for expanding the '%W' token in error page templates.

Also fix a bug with excessive html-quoting of the output.
Detected by Coverity. CID 1554655: Initialization or destruction
ordering is unspecified (GLOBAL_INIT_ORDER).

Also switched to compile-time checks for table initialization records.
@kinkie kinkie changed the title WIP: use SBuf for Ip::Address::toStr WIP: implement Ip::Address::asText() Apr 10, 2024

char buf[INET6_ADDRSTRLEN];
if (ip_.isIPv6()) {
inet_ntop(AF_INET6, &ip_.mSocketAddr_.sin6_addr, buf, INET6_ADDRSTRLEN);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for errors here? Legacy code doesn't

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for errors here?

We should, eventually.

Legacy code doesn't

This PR may not have to check, but it is too early to tell. For example, if this code does not fix any of the (many) other problems with legacy printing code, then it does not have to fix this problem either.

I cannot offer a formula that will tell us which legacy problems should be solved in this PR, but it feels like this PR should solve most if not all of them (eventually). For now, I recommend adding an XXX comment to mark this problem in the new code and focus on this code callers/users/API. We will polish the implementation details later (but probably in this PR).

@kinkie
Copy link
Contributor Author

kinkie commented Apr 10, 2024

@rousskov , @yadij , this seems like a nice checkpoint for collecting feedback about the API.

Current state:

  • implemented Ip::AddressText . Format can be specified with booleans to the constructor or with functions.
  • the text emitting function (Ip::AddressText::print) is a simplified port of Ip::Address::toStr()
  • I've implemented unit tests to check for feature parity; please advise if they should cover additional cases

Next steps, if there are no significant objections to the current state,

  • reimplement toStr(), toUrl() and toHostStr() on top of AddressText, extending it as needed

Once that is done, I'd start the cleanup, porting callsites to the AddressText API; this might offer the opportunity for deeper changes to move away from the "pass me a large enough buffer" paradigm.

What do you think?

@kinkie kinkie added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-author author action is expected (and usually required) labels Apr 10, 2024
src/ip/Address.h Show resolved Hide resolved
src/ip/Address.h Outdated Show resolved Hide resolved
src/ip/Address.h Outdated Show resolved Hide resolved
src/ip/Address.h Outdated
Comment on lines 214 to 215
char* toStr(char *buf, const unsigned int blen, int force = AF_UNSPEC) const;
SBuf toStrAsSBuf(int force = AF_UNSPEC) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now gone

Still here and still used. I cannot provide requested feedback until this is gone.

src/ip/Address.h Outdated Show resolved Hide resolved
@rousskov rousskov added S-waiting-for-author author action is expected (and usually required) and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Apr 10, 2024
@kinkie kinkie added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-author author action is expected (and usually required) M-ignored-by-merge-bots https://github.com/measurement-factory/anubis/blob/master/README.md#pull-request-labels labels Apr 11, 2024

char buf[INET6_ADDRSTRLEN];
if (ip_.isIPv6()) {
inet_ntop(AF_INET6, &ip_.mSocketAddr_.sin6_addr, buf, INET6_ADDRSTRLEN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for errors here?

We should, eventually.

Legacy code doesn't

This PR may not have to check, but it is too early to tell. For example, if this code does not fix any of the (many) other problems with legacy printing code, then it does not have to fix this problem either.

I cannot offer a formula that will tell us which legacy problems should be solved in this PR, but it feels like this PR should solve most if not all of them (eventually). For now, I recommend adding an XXX comment to mark this problem in the new code and focus on this code callers/users/API. We will polish the implementation details later (but probably in this PR).

explicit AddressText(const Address &ip);

/// if the Address has a port, output it (default: no)
/// if set to true, also print brackets for IPv6 addresses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "if set to true..." feature introduces a method call order dependency (e.g., the second XXX below) and other problems (e.g., the third XXX below). Please avoid that.

Eventually, it would be nice to address both problems:

os << AddressText(ip).withPort().bracketed(false); // XXX: No brackets, as expected, but breaks IPv6 printing.
os << AddressText(ip).bracketed(false).withPort(); // XXX: Brackets added, surprising code reader.

// XXX: The following two code lines produce different output:
os << AddressText(ip).withPort(false); 
os << AddressText(ip).withPort(true).withPort(false); 

@@ -52,15 +52,13 @@ ProxyProtocol::Header::getValues(const uint32_t headerType, const char sep) cons
return SBuf();
auto logAddr = sourceAddress;
logAddr.applyClientMask(Config.Addrs.client_netmask);
char ipBuf[MAX_IPSTRLEN];
return SBuf(logAddr.toStr(ipBuf, sizeof(ipBuf)));
return ToSBuf(logAddr.asText());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both new and the old code suffer from the same problem that this PR should address: A reader cannot easily tell whether this code formats the address correctly. For example, I cannot easily tell whether this code prints the port number. The same problem is present in other converted code as well.

To fix this, I suggest adjusting AddressText API so that callers must use

  • either withPort() // implies bracketed IPv6 addresses; bans bracketed() calls
  • or withoutPort() and bracketed(bool) // explicit on both counts

Any better ideas?

@rousskov rousskov added the S-waiting-for-author author action is expected (and usually required) label May 1, 2024
@kinkie kinkie added the M-ignored-by-merge-bots https://github.com/measurement-factory/anubis/blob/master/README.md#pull-request-labels label May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M-ignored-by-merge-bots https://github.com/measurement-factory/anubis/blob/master/README.md#pull-request-labels S-waiting-for-author author action is expected (and usually required) S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box
Projects
None yet
4 participants