New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EAI support for notqmail #197
base: main
Are you sure you want to change the base?
Conversation
This adds EAI support to notqmail. Adapted from dapted from Arnt Gulbrandsen unicode address support [here](http://arnt.gulbrandsen.priv.no/qmail/qmail-smtputf8.patch). 1. Use conf-smtputf8 to compile the EAI support 2. If conf-smtputf8 has -DSMTPUTF8, use tryidn2 to see if idn2_lookup_u8() from libidn2 can be used. if Yes, create hassmtputf8.h and #define SMTPUTF8 3. include hassmtputf8.h in qmail-smtpd.c, qmail-remote.c 4. The utf8read() function has been adapted from utf8received() by Erwin Hoffman 5. Following unit tests cases have been added - smtpcode() - in qmail-remote.c (test 3 digit codes, valid and junk codes) - mailfrom_parms() in qmail-smtpd.c (test SMTPUTF8 in the MAIL FROM parameter) - utf8read() in qmail-remote.c (test utf8, non-utf8) - get_capability() in qmail-remote.c to test for presence of EHLO capability
Thank you in particular for taking care to write tests for this functionality, especially since we've seen bugs in the currently available implementations. When and how should this PR land? My thoughts: since SMTPUTF8 is a relatively small elaboration of qmail's existing 8-bit cleanliness, I'd like to see it built and enabled by default in some future notqmail release. Since the feature introduces a dependency on libidn2, I'd like to defer on-by-default until some later release that brings its own breaking changes. That leaves us merging EAI initially as off-by-default (and perhaps also autoconfigured-if-found). This can happen whenever the PR is ready, so it could potentially be part of 1.09 but shouldn't block that release if not. |
On Thu, 17 Dec 2020 at 02:21, Amitai Schleier ***@***.***> wrote:
When and how should this PR land? My thoughts: since SMTPUTF8 is a
relatively small elaboration of qmail's existing 8-bit cleanliness, I'd
like to see it built and enabled by default in some future notqmail
release. Since the feature introduces a dependency on libidn2, I'd like to
defer on-by-default until some later release that brings its own breaking
changes. That leaves us merging EAI initially as off-by-default (and
perhaps also autoconfigured-if-found). This can happen whenever the PR is
ready, so it could potentially be part of 1.09 but shouldn't block that
release if not.
It can be disabled by default by commenting out `-DSMTPUTF8` in
conf-smtputf8. But the man page will continue to describe the EAI feature.
|
qmail-remote.c
Outdated
#ifdef SMTPUTF8 | ||
#include <idn2.h> | ||
#include "utf8read.h" | ||
#include "env.h" | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing a lot of new ifdefs and it makes me a little uneasy about how "glanceable" our code is. We need some ifdefs for EAI because it's compile-time optional, but they can be (1) more isolated and (2) fewer in number. I'll give some examples of changes I'd like to see.
qmail-remote.c
Outdated
#ifdef SMTPUTF8 | ||
static stralloc idnhost = { 0 }; | ||
static int smtputf8 = 0; /*- if remote has SMTPUTF8 capability */ | ||
static char *enable_utf8 = 0; /*- enable utf8 */ | ||
int flagutf8; | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These variables don't cost much to have, so we could lose this ifdef. Another function of the ifdef is to document that these are EAI-specific; we could accomplish that by putting them in a new .c
file (eai_remote.c
?) and linking with the new .o
.
qmail-remote.c
Outdated
@@ -118,36 +131,50 @@ substdio smtpfrom = SUBSTDIO_FDBUF(saferead,-1,smtpfrombuf,sizeof(smtpfrombuf)); | |||
|
|||
stralloc smtptext = {0}; | |||
|
|||
static void get(unsigned char *uc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't checked, just guessing we should probably retain the old name as a synonym in case patches call it.
qmail-remote.c
Outdated
#ifdef SMTPUTF8 | ||
if (enable_utf8 && (flagutf8 || utf8read())) | ||
substdio_puts(&smtpto,"> SMTPUTF8\r\n"); | ||
else | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's cheap to check these variables even in the case where we built without EAI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we #ifdef enable_utf8 to 0 then this comes for free as the compiler would optimize it away.
qmail-remote.c
Outdated
#ifdef SMTPUTF8 | ||
if (enable_utf8 && header.len) substdio_put(&smtpto, header.s, header.len); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous
qmail-remote.c
Outdated
|
||
#ifdef SMTPUTF8 | ||
if (enable_utf8 && !flagutf8) | ||
flagutf8 = containsutf8(s,str_len(s)); | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous
qmail-remote.c
Outdated
#ifdef SMTPUTF8 | ||
enable_utf8 = env_get("UTF8"); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous. Regarding the environment variable, I'm familiar with setting env vars to control my locale settings, but is this an idiomatic way for a sysadmin to control mail server config? Assuming we're built with EAI, I'm wondering whether EAI should be enabled/disabled systemwide, per domain (incoming and outgoing), or some other level of granularity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And even if all of this is fine this variable name is a bit too generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should SMTPUTF8 be ok instead of just UTF8. We can have a control file to have the setting per domain. We could add a line in getcontrols() to read a control file.
qmail-remote.c
Outdated
#ifdef SMTPUTF8 | ||
} else | ||
if (enable_utf8) { | ||
char *asciihost = 0; | ||
if (!stralloc_0(&host)) temp_nomem(); | ||
switch (idn2_lookup_u8(host.s,(uint8_t**)&asciihost,IDN2_NFC_INPUT)) { | ||
case IDN2_OK: break; | ||
case IDN2_MALLOC: temp_nomem(); | ||
default: perm_dns(); | ||
} | ||
if (!stralloc_copys(&idnhost,asciihost)) temp_nomem(); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ifdef is a little extra tricky! It's only really needed around the idn2_lookup_u8()
logic. That could be moved into eai_remote.c
(along with an ifdef), wrapped in a descriptive function name, and called from here. Then we wouldn't need an ifdef here.
qmail-remote.c
Outdated
#ifdef SMTPUTF8 | ||
switch (relayhost ? dns_ip(&ip,&host) : dns_mxip(&ip,enable_utf8 ? &idnhost : &host,random)) { | ||
#else | ||
switch (relayhost ? dns_ip(&ip,&host) : dns_mxip(&ip,&host,random)) { | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 472 is enough all by itself :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I would put it on a line distinct from the switch to make the code less convoluted.
qmail-smtpd.c
Outdated
#ifdef SMTPUTF8 | ||
int smtputf8 = 0, flagutf8 = 0; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just always define these, in a separate eai_smtpd.c
if that helps readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And every one on it's own line. Since they are global variables they are also automatically initialized to 0. Adding an explicit initializer is actually less efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do away with flagutf8.
qmail-smtpd.c
Outdated
#ifdef SMTPUTF8 | ||
stralloc mfparms = {0}; | ||
void mailfrom_parms(const char *arg) | ||
{ | ||
int i; | ||
int len; | ||
|
||
len = str_len(arg); | ||
mfparms.len=0; | ||
i = byte_chr(arg,len,'>'); | ||
if (i > 4 && i < len) { | ||
while (len) { | ||
arg++; len--; | ||
if (*arg == ' ' || *arg == '\0' ) { | ||
if (case_starts(mfparms.s,"SMTPUTF8")) smtputf8 = 1; | ||
mfparms.len=0; | ||
} else | ||
if (!stralloc_catb(&mfparms,arg,1)) die_nomem(); | ||
} | ||
} | ||
} | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, it appears the only use of this function (which I recognize from Arnt's patch) is to check whether the peer is trying to request SMTPUTF8
for this delivery. Let's name it accordingly and mark it static
. If we see other uses for it later, we can generalize then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same function can be used for getting the ESMTP AUTH and SIZE parameters. I will change this to static for the moment.
qmail-smtpd.c
Outdated
#ifdef SMTPUTF8 | ||
if (env_get("UTF8")) { | ||
flagutf8 = 1; | ||
out("250-SMTPUTF8\r\n"); | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheap to check whether the admin wants to advertise SMTPUTF8
-- let's just always do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let us remove flagutf8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand what flagutf8
is doing here, I'd keep it as a file global, making sure it gets set in exactly one place (probably in the same place where control files get read). I'd keep all the flagutf8
checks, too -- just expressed in terms of the global boolean rather than the mechanics of how the config gets applied. And then I might call it utf8_available
, or something that more precisely expresses this is whether the server offers the capability, as distinct from client-requested states like utf8_requested
or utf8_negotiated
.
If I don't understand what flagutf8
is doing here, let's fix that first :-)
qmail-smtpd.c
Outdated
#ifdef SMTPUTF8 | ||
if (flagutf8) mailfrom_parms(arg); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boolean checks are cheap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previous comment. Let us remove flagutf8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to say let's keep flagutf8
and get rid of the #ifdef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
qmail-smtpd.c
Outdated
#ifdef SMTPUTF8 | ||
received(&qqt,smtputf8 ? "UTF8SMTP" : "SMTP",local,remoteip,remotehost,remoteinfo,fakehelo); | ||
#else | ||
received(&qqt,"SMTP",local,remoteip,remotehost,remoteinfo,fakehelo); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 413 is enough by itself :-)
qmail-remote.c
Outdated
#ifdef SMTPUTF8 | ||
} else | ||
if (enable_utf8) { | ||
char *asciihost = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be NULL as it's a pointer variable
qmail-remote.c
Outdated
#ifdef SMTPUTF8 | ||
switch (relayhost ? dns_ip(&ip,&host) : dns_mxip(&ip,enable_utf8 ? &idnhost : &host,random)) { | ||
#else | ||
switch (relayhost ? dns_ip(&ip,&host) : dns_mxip(&ip,&host,random)) { | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I would put it on a line distinct from the switch to make the code less convoluted.
qmail-smtpd.c
Outdated
#ifdef SMTPUTF8 | ||
int smtputf8 = 0, flagutf8 = 0; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And every one on it's own line. Since they are global variables they are also automatically initialized to 0. Adding an explicit initializer is actually less efficient.
qmail-smtpd.c
Outdated
void mailfrom_parms(const char *arg) | ||
{ | ||
int i; | ||
int len; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t
while (len) { | ||
arg++; len--; | ||
if (*arg == ' ' || *arg == '\0' ) { | ||
if (case_starts(mfparms.s,"SMTPUTF8")) smtputf8 = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the case thing is actually right here. I have to reread the RfC, but I would bet this is not case insensitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 5321 2.4. General Syntax Principles and Transaction Model
Verbs and argument values (e.g., "TO:" or "to:" in the RCPT command and extension name keywords) are not case sensitive, with the sole exception in this specification of a mailbox local-part.
SMTP Extensions may explicitly specify case-sensitive elements. But RFC 6531 doesn't explicitly mention any case requirement for SMTPUTF8 keyword. So I assume it inherits the definition from RFC 5321 that the keyword should be case insenstive.
Should UTF-8 support everywhere really be an option? It is backward compatible with ASCII from the very byte level, so also being able to read UTF-8 might not break anyone's setup... On the other hand, having the libidn2 dependency optional as we are pulling https://www.gnu.org/software/libunistring/ ? My ideal would be to provide a tiny idn.c that implements punny code encoding (is that the aim?), but I yet have to provide this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there really any need to parse the "with UTF8SMTP" along with checking that the other host is supporting UTF8SMTP ?
If we have an email featuring UTF8 names and we need sending it we are screwed and end-up in the same result: the mail is not delivered. Does failing in a different way worth extra code?
And if we have UTF8 in a mail that lacks "with UTF8SMTP" in a Received
header, shall we drop the mail, even though it shall also be well-encoded?
UTF8 email (and content in general) does not have problem with software without support for it (excepted for graphical programs that needs to fetch the codepoints), only with those that religiously reject content with the then-be-damned extra bit.
If anyone is having the a reference to some RFC for "with UTF8SMTP", I am highly interested. So far it looks like some indication for debugging which host scrambled the email rather than carrying any useful information.
Hopefully I am not misunderstanding everything.
echo "extern int smtputf8;"; \ | ||
echo "extern void temp_nomem();"; \ | ||
./get_function "void mailfrom_parms" \ | ||
../qmail-smtpd.c|sed s}die_nomem}temp_nomem}g) > mailfrom_parms.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C code into the makefile and grepping functions from other files looks a bit sudden by looking at the rest of the code.
There is some interesting idea for maintenance, but it would be easier to integrate such changes onto another changes, and globally for all code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I don't like this. Looks very crude and can easily break when changes are mode to the original c code. But if it breaks, the test will fail during make itself and alert the maintainer to fix it.
# from the source code | ||
# | ||
start_line=`grep -n "^$1" $2|cut -d: -f1` | ||
sed -n "$start_line,$"p $2 | while IFS= read -r "line" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would sed -n "/^$1/,/}/ p" "$2"
not do the same? Also, pointers would not work, but we admittedly have to encounter this case before working around it...
tests/unittest_utf8read.c
Outdated
"To: root\n" | ||
"\n" | ||
"error: can't create transaction lock on /var/lib/rpm/.rpm.lock (Resource temporarily unavailable)\n" | ||
"error: /tmp/skype.gpgsig.zgIdKM: key 1 import failed.\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this count as veeeery indirect advertising for microsoft? ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove this. I just cut pasted the latest email in my Maildir/new folder without reliazing the skype thing.
utf8read.c
Outdated
if (case_starts(receivedline.s, "Received: from")) received++; /* found Received header */ | ||
if (received) { | ||
for (i = 0; i < receivedline.len; i++) | ||
if (*(receivedline.s + i) != ' ' && *(receivedline.s + i) != '\t') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually like this *(variable + i)
notation without syntactic sugar for being very clear about what indices are, but the rest of the source goes with variable[i].
Might as well not introduce multiple coding style into a same daemon.
} | ||
return 0; | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation from the patch was using a state machine, rather fun and creative, but plain dumb if/for code like yours feels more straightforward IMHO.
Here it is:
+void checkutf8message()
+{
+ int pos;
+ int i;
+ int r;
+ char ch;
+ int state;
+
+ if (containsutf8(sender.s, sender.len)) { utf8message = 1; return; }
+ for (i = 0;i < reciplist.len;++i)
+ if (containsutf8(reciplist.sa[i].s, reciplist.sa[i].len)) {
+ utf8message = 1;
+ return;
+ }
+
+ state = 0;
+ pos = 0;
+ for (;;) {
+ r = substdio_get(&ssin,&ch,1);
+ if (r == 0) break;
+ if (r == -1) temp_read();
+
+ if (!stralloc_append(&firstpart,&ch)) temp_nomem();
+
+ if (ch == '\r')
+ continue;
+ if (ch == '\t')
+ ch = ' ';
+
+ switch (state) {
+ case 6: /* in Received, at LF but before WITH clause */
+ if (ch == ' ') { state = 3; pos = 1; continue; }
+ state = 0;
+ /* FALL THROUGH */
+
+ case 0: /* start of header field */
+ if (ch == '\n') return;
+ state = 1;
+ pos = 0;
+ /* FALL THROUGH */
+
+ case 1: /* partway through "Received:" */
+ if (ch != "RECEIVED:"[pos] && ch != "received:"[pos]) { state = 2; continue; }
+ if (++pos == 9) { state = 3; pos = 0; }
+ continue;
+
+ case 2: /* other header field */
+ if (ch == '\n') state = 0;
+ continue;
+
+ case 3: /* in Received, before WITH clause or partway though " with " */
+ if (ch == '\n') { state = 6; continue; }
+ if (ch != " WITH "[pos] && ch != " with "[pos]) { pos = 0; continue; }
+ if (++pos == 6) { state = 4; pos = 0; }
+ continue;
+
+ case 4: /* in Received, having seen with, before the argument */
+ if (pos == 0 && (ch == ' ' || ch == '\t')) continue;
+ if (ch != "UTF8"[pos] && ch != "utf8"[pos]) { state = 5; continue; }
+ if(++pos == 4) { utf8message = 1; state = 5; continue; }
+ continue;
+
+ case 5: /* after the RECEIVED WITH argument */
+ /* blast() assumes that it copies whole lines */
+ if (ch == '\n') return;
+ state = 1;
+ pos = 0;
+ continue;
+ }
+ }
+}
Cannot most of it be summarized by strncasestr(fieldstart, fieldlen, " with UTF8")
? Case-insensitive searching for the string " with utf8" within the "Received:" header.
Thank you @mbhangui for bringing this starting point, that puts on the table everything to do. |
The relevant rfc is RFC 6531 Section 3.6 The MAIL command parameter SMTPUTF8 asserts that a message is an
We shouldn't be dropping the mail. Rather qmail-remote shouldn't use the SMTPUTF8 extenstion in the MAIL FROM command
RFC 6531 (latest) and RFC 5336 (old one)
While going through the commands I realize that the original patch was written as per RFC 5336, hence the header with UTF8SMTP. RFC 5336 has been obsoleted by RFC6531. And the keyword UTF8SMTP should be changed to SMTPUTF8.
Good that you asked those questions and I see that there are minor changes to be made. |
I misread the above. The protocol type still remans UTF8SMTP. RFC 5336 had just the description worded differently
|
I think I understand what you think is a problem. I'll explain below
I see a problem. Let us say we are running SMTP service for clients to relay emails to external domains. qmail-smtpd advertises SMTPUTF8 and a client uses EAI and uses internationalized domain names in the FROM or the RCPT for a domain example.com (for example). The mail goes into the queue and qmail-remote finds that example.com doesn't support EAI. We should actually be dropping the mail with error like One can argue that the original sender will get a bounce and then correct the message, but I'm not sure what we should do. Will read the RFC again to clear up this issue |
From <https://docs.github.com/en/free-pro-team@latest/github/creating-cloning-and-archiving-repositories/about-code-owners#codeowners-file-location>: "Code owners are automatically requested for review when someone opens a pull request that modifies code that they own." I've not listed everyone in our GitHub org, just the folks who are usually doing the reviewing (and from whom I usually request reviews). No exclusion is intended, just sensible defaults. We can always manually request additional reviewers on any given PR, and anyone who wants to get more generally involved can of course add themselves here.
"Multiplication result may overflow 'int' before it is converted to 'long'." Make n a datetime_sec (synonym for long) to begin with.
920d49e
to
44a00c9
Compare
4295ceb
to
9764a86
Compare
83070f9
to
dae3b88
Compare
68c4a37
to
6b6f5fe
Compare
358654a
to
aac711a
Compare
This adds EAI support to notqmail. Adapted from Arnt Gulbrandsen unicode address
support here.
if Yes, create hassmtputf8.h and #define SMTPUTF8
Erwin Hoffman
and junk codes)
MAIL FROM parameter)
capability