Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP EAI (SMTPUTF8 for SMTP, UTF8=ACCEPT for IMAP) #190

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

arnt
Copy link

@arnt arnt commented Nov 24, 2022

Hi,

this is a WIP patch to support EAI. Some remarks:

  1. Dovecot generally accepts just-send-8 and passes it on. This does the same, because IMO a patch should agree with the project. Therefore this does no downgrading or anything like that.
  2. I haven't implemented the UTF8 syntax for APPEND. The parser in cmd-append.c confused me and now a high-priority task prevents me from spending more time on this bit. This definitely should be done.
  3. I seem to remember that Dovecot had a nice automatic test suite back in hg days, is that in another repo?
  4. LMTP always accepts EAI messages, Submission accepts them if and only if the upstream relay does.
  5. Dovecot has some clever logic to choose format for strings it emits via IMAP. This doesn't extend that logic, which doesn't harm correctness but means that Dovecot may send something via literal that could be a quoted-string.

I'd appreciate comments.

This adds the capability and accepts UTF8 quoted-strings.

The SEARCH command is changed as required by RFC 6855, section 3, final
paragraph. There are no changes to the searching, as 6855 only changes
changes the syntax.

This does not add any kind of downgrading. Before this change, Dovecot would
accept an APPEND of a message such as

   From: grå@grå.org
   Subject: grå
   ...

and would just-send-8 that to any IMAP clients. This change maintains that policy.
This adds support for SMTPUTF8 (RFC 6531) to the LMTP and Submission
services. The Submission service advertises this extension only if its
upstream relay does, and takes some care to avoid a misunderstood error
message.
@cmouse
Copy link
Contributor

cmouse commented Nov 24, 2022

Thanks, we'll take a look.

@cmouse
Copy link
Contributor

cmouse commented Nov 24, 2022

Hi, one big issue with this code is that it mainly just makes dovecot accept UTF8, yet it does very little in the way of actually handling it. It also does not deal with unicode normalizations required for recipient handling.

So in short, while we do really appreciate the effort, we would need this to actually take care of the utf-8 we are accepting, and not just wish for the best and store it, which seems to be missing from your TODO list.

@arnt
Copy link
Author

arnt commented Nov 24, 2022

Thanks for your quick response.

The patch doesn't do anything about the UTF8 because RFCs 589x, 6531 and 6855 demand nothing of a server such as Dovecot. To name three examples:

  1. The RFCs permit fields such as subject to contain UTF8 instead of 2047-encoded UTF8, but do not change anything about the content. Only the encoding of the header field is changed, and I believe Dovecot already handled it since it handles just-send-8 so well.
  2. The RFCs require normalisation on user input, which matters for webmail systems but not for a Submission server. Again, Dovecot escapes untouched. You could also say that MTAs that route mail should normalise, but Dovecot doesn't do that either. (Note that this patch doesn't change the set of destination addresses supported by LMTP.)
  3. I'm not sure what RFC 9051 says about e.g. search. Perhaps a server that supports unicode content should normalise. But RFC 6855 says nothing about that, so normalisation is not relevant to an implementation of RFC 6855. 6855 simply changes the search syntax, it doesn't change the rules for executing the search at all. It's good if a search for "grå" match 0067 0072 0061 030A as well as 0067 0072 00E5, but that applies to the unicode bodies Dovecot already supports just as much as to addresses.

@arnt
Copy link
Author

arnt commented Nov 24, 2022

Actually, let's do it differently. Why don't you just make some imaptest tests that fail, and I'll make them pass. That'll explain concisely what you have in mind. Does that sound good to you?

@arnt
Copy link
Author

arnt commented Nov 24, 2022

One more question. This PR's goal ist limited in scope to EAI support like gmail's: Users can receive mail from and send mail to grå@grå.org and deal with that mail as with all mail about grå, It does not aim to host a domain such as grå.org itself. Is that an acceptable scope to you?

I could do a separate PR to support hosting if you want to, but I'd really prefer that to be a separate PR.

@cmouse
Copy link
Contributor

cmouse commented Nov 24, 2022

The problem is mostly that if i send email to ℌdž@domain.com, it should go to hdž@domain.com. If there is no normalization done, these two are considered different user. This is handled by https://www.rfc-editor.org/rfc/rfc8265, which applies to sender/recipient names too.

That said, this also needs to handle other headers as well. This is governed by https://www.rfc-editor.org/rfc/rfc6532. Before we start coming up with examples, why not take a moment to read these?

@arnt
Copy link
Author

arnt commented Nov 24, 2022

I know both of them. I should ;)

I believe Dovecot escapes the requirements in 8265 at present. (Being able to host grå.org would change this.) 8265 applies to software that accepts addresses from outside, does some sort of comparison and then does something differently based on the result of the comparison. Dovecot as it stands accepts addresses, but does nothing differently based on the result. For example, the submission relays all addresses to the backend server.

6532 changes a number of limits and lifts a number of restrictions that matter to this PR, but AFAICT Dovecot already lifted those long ago, so no changes were necessary.

6532 has implications for DSN generators. As far as I can tell, none of the code I touched will generate DSNs, but the Sieve code will. Are you saying that you'd require Sieve to be updated as well?

@cmouse
Copy link
Contributor

cmouse commented Nov 24, 2022

Well we derive the username from the destination address, so at minimum that has to work, as well as SEARCH FROM some@address.

We can consider doing this in increments, as long as we clearly refuse stuff we cannot handle correctly.

@cmouse
Copy link
Contributor

cmouse commented Nov 24, 2022

Oh and sieve is these days pretty wedded to dovecot, so it should not break either.

@arnt
Copy link
Author

arnt commented Nov 24, 2022

I could make a followup PR to support local users with UTF8 names. The goal of this PR is more restricted (the same level of support as gmail currently has — foo@gmail.com can send mail to foo@grå.org, but there cannot be a grå@gmail.com).

It would be much easier for me to get management approval for more work if gmail-level support is merged. The followup would then add support for local users with EAI addresses and for Sieve.

This PR is a WIP and I can put more W into it, but I can't put unlimited work into it without an assurance that the work will be merged, see?

@cmouse
Copy link
Contributor

cmouse commented Nov 24, 2022

ok. I'll have to discuss this internally then.

@cmouse
Copy link
Contributor

cmouse commented Nov 24, 2022

At minimum SEARCH FROM has to work. Even if we only accept utf8 senders.

@arnt
Copy link
Author

arnt commented Nov 24, 2022

I absolutely agree that SEARCH FROM has to work. (It's a bit tricky, e.g. if the message contains a DKIM or PGP signature over the bytes as received.) How do you prefer automated tests in PRs such as this?

@cmouse
Copy link
Contributor

cmouse commented Nov 24, 2022

we have internal ci with tests (provide some shell script / python script) and unit tests if possible. imaptest script is ok too, although not sure if it can test this, esp. the problem cases.

@arnt
Copy link
Author

arnt commented Nov 24, 2022

Could you possibly link to an example in the style you most prefer?

@cmouse
Copy link
Contributor

cmouse commented Nov 24, 2022

we'll have to adapt the ci test in any case.

@arnt
Copy link
Author

arnt commented Dec 13, 2022

Sadly, RFC 8625 is required even in the gmail-level support I had in mind for this PR. Sieve tests such as address "from" :is :all require it.

@arnt
Copy link
Author

arnt commented Dec 26, 2022

Hi,

I wrote automatic tests for this (and did a little more work on the PR too).

I believe that the optimal way to support EAI in Dovecot is with three PRs, this is one of them and logically the first.

  1. This PR, which is sufficient to converse with non-ASCII addresses. Much of what's needed already worked. For example, I have automatic tests that show that unicode normalisation already works as necessary for IMAP SEARCH and S/MIME. That already worked, hence there's no code for that in the PR.
  2. A PR for pigeonhole, which needs to support tests such as address "from" :is :all, autoreplies, forwarding, vacation etc.
  3. A second PR for dovecot/core, to support hosting non-ASCII addresses.

I can write the second and third, but I need this merged, or an assurance that it will be merged.

@cmouse
Copy link
Contributor

cmouse commented Dec 26, 2022

Thanks. We'll take a look and let you know.

@HLFH
Copy link

HLFH commented Jan 25, 2023

@cmouse Hi. Any updates on your decisions?

I suppose before the SMTPUTF8 support in dovecot, it is best to disable SMTPUTF8 in postfix?

smtputf8_enable=no

@cmouse
Copy link
Contributor

cmouse commented Jan 25, 2023

@HLFH Yes, you need to disable smtputf8 in postix.
@arnt we are still looking at this, and we are maybe leaning into incorporating libunistring & libidna2 to do the unicode work for us. This would make some of the problem cases go away, especially the normalizations needed to make header searches work. Still, I can run your patch through our CI to see what it makes of it and if it spots any bigger issues.

@cmouse
Copy link
Contributor

cmouse commented Jan 25, 2023

@arnt it seems to pass current tests so that's at least good. There were some boolean issues, but we can take care of those if we merge this.

@arnt
Copy link
Author

arnt commented Jan 26, 2023

You don't really need to disable SMTPUTF8 in postfix — senders get the same error message in both cases. Disabling and enabling both have advantages, and both are really small advantages ;)

To whom should I send the test? UID SEARCH for the same word normalised and denormalised gave the same result, so something or other already handles normalisation.

arnt added 2 commits June 12, 2023 12:38
This is based on code from someone called 'vk'. It's incomplete (needs
RFC8265 support at least), but works for him/her.
This stores the unicode form of domains in all indexes, meaning that
searching uses and serverside parsing shows the human-readable form of all
addresses.
@arnt
Copy link
Author

arnt commented Mar 1, 2024

I updated the PR so that adding a message with From: info@xn--gr-zia.org now returns grå.org in the envelope and search from "grä" matches it.

@cmouse
Copy link
Contributor

cmouse commented Mar 1, 2024

Hi, we'll take a look.

@cmouse
Copy link
Contributor

cmouse commented Apr 1, 2024

Moved as internal merge request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants