Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve email address normalization #171

Open
RohanNagar opened this issue Sep 1, 2023 · 1 comment
Open

Improve email address normalization #171

RohanNagar opened this issue Sep 1, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@RohanNagar
Copy link
Owner

RohanNagar commented Sep 1, 2023

Please describe the feature that you are requesting

Currently, calling the normalized() method on an Email object will return the address without any comments. Version 1.6 introduced the option to strip quotes if the address will still be valid after removal.

There are other things that would make the user experience much better. This feature is more suited for a major version upgrade so that defaults can be adjusted.

Improvements:

  1. Make quote-stripping the default behavior, with an option to disable.
  2. Lowercase all characters in the address by default. While technically email servers could treat the local-part of addresses as case-sensitive, this is not typical and not done by the major email servers (e.g. Gmail). Additionally, according to RFC Name servers and resolvers must compare [domains] in a case-insensitive manner. Add an option to disable.
  3. Add an option to remove . (dot) characters from the local-part. Gmail allows any number of dot . characters in the local-part of the address. Technically we could also have two . (dot) characters in a row in the local-part for Gmail addresses. As an aside, an option could be added to the base JMail validation to allow two (or more) dots in a row.
  4. Add an option to remove any sub-addressing or tagged-addressing. Many mail servers support adding to the end of the local-part a + sign (or in rare cases a - sign, or even more rare an arbitrary character), followed by characters, and the mail will be sent to the same address. Normalization should be able to optionally remove these, and users should be able to specify the separator character.

Additional context

https://stackoverflow.com/a/9808332
https://support.google.com/mail/answer/7436150
https://en.wikipedia.org/wiki/Email_address#Sub-addressing

@RohanNagar RohanNagar added the enhancement New feature or request label Sep 1, 2023
@RohanNagar RohanNagar self-assigned this Sep 1, 2023
@RohanNagar
Copy link
Owner Author

Additionally, RFC 6530 provides a lot of good detail on internationalization of email addresses. In particular regarding the local-part,

In general, it is wise to support addresses in Normalized form,
using at least Normalization Form NFC. Except in circumstances in
which NFKC would map characters together that the parties
responsible for the destination mail server would prefer to be
kept distinguishable, supporting the NFKC-conformant form would
yield even more predictable behavior for the typical user.

and

Unnormalized strings are valid, but sufficiently bad practice
that they may not work reliably on a global basis. Servers
should not depend on clients to send normalized forms but
should be aware that procedures on client machines outside the
control of the MUA may cause normalized strings to be sent
regardless of user intent.

Along these lines, providing an option to normalize the local-part according to Normalization Form NFC or NFKC could be useful.

The difficulty of implementing this remains to be seen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant