Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to get the ASCII only version of an Email #149

Open
RohanNagar opened this issue May 15, 2023 · 5 comments
Open

Add ability to get the ASCII only version of an Email #149

RohanNagar opened this issue May 15, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@RohanNagar
Copy link
Owner

RohanNagar commented May 15, 2023

Please describe the feature that you are requesting

Some email addresses have internationalized domain names. Mail servers are supposed to handle these by converting them to their ASCII equivalent, but some old mail servers may not be doing this. Therefore, it would be helpful if JMail can provide a way to get the ASCII equivalent email address of a parsed email address.

The goal is to create a method on the Email object that would return an ASCII/UTF8 only version, like so:

Email parsed = JMail.validator().tryParse("test@faß.de").get();

String asciiOnly = parsed.toAscii();

Additional context

bbottema/simple-java-mail#463

https://gist.github.com/JamesBoon/feeb7428b3558d581c0459f7302bd9a5

Note that the IDN.toAscii() method uses an out of date standard, IDNA2003. We need to implement the latest standard, INDA2008.

@RohanNagar RohanNagar added the enhancement New feature or request label May 15, 2023
@RohanNagar RohanNagar self-assigned this May 15, 2023
@JamesBoon
Copy link

JamesBoon commented May 16, 2023

That feature would be a great enhancement!

I am just not sure what would be the right approach. There is the great ICU4J library, but it will add 14MB of dependencies.
And if you are on android, the required com.ibm.icu.text.IDNA is already available as android.icu.text.IDNA (Reference).

Maybe it could be an optional (non-transitive) dependency? (And maybe if not present falling back to java.net.IDN)

This issue at okhttp maybe of interest: square/okhttp#6910

@RohanNagar
Copy link
Owner Author

Thanks for the additional details! Adding the ICU4J library as an optional dependency might be a good first start.

Even better would be to implement the toAscii method ourselves. I was taking a look at the source code for ICU4J and it doesn't look too bad. There would even be some things we could remove (for example ICU4J checks for some invalid domains that have parts that start with hyphens, but JMail would have already checked for those).

@arnt
Copy link

arnt commented Jun 12, 2023

FWIW, such a function would have both advantages and disadvantages.

Advantages: It would help with sending mail to some addresses, I don't know how many. I suspect few. In my experience (I work with this) most of the users use non-ASCII localparts. Turning faß@faß.de into faß@xn--fa-hia.de doesn't help with anything.

Disadvantages: Some servers, Microsoft Exchange is the most prominent but far from the only one, don't handle the ASCII form while searching. If you use Exchange and search for faß, messages containing test@xn--fa-hia aren't returned. This isn't an easy bug to fix due to interactions with PGP, S/MIME and perhaps DKIM, and Microsoft has said they won't even try to fix it. Exchange isn't the only one with this problem, so conversion to ASCII would be a bit of a footgun feature. Likely to lead users into trouble.

@RohanNagar
Copy link
Owner Author

@arnt thank you for chiming in with the additional information, this is very useful.

Regarding non-ASCII local-parts: would it then be more beneficial to allow for converting the entire address to ASCII, both the local-part and the domain?

Regarding mail servers not handling the ASCII form in searches - I can see how this might introduce some confusion. I think the intention of JMail is to make working with email addresses easier, so I'm kind of torn on this since I can see some situations where having the ASCII conversion would be useful and some where it might cause confusion. Maybe adding some of these details into a Javadoc would help with potential confusion.

@arnt
Copy link

arnt commented Jun 12, 2023

Hi,

having an ASCII represenation would certainly make some things simpler, but it doesn't exist.

Converting the localpart to ASCII isn't possible. Converting to ASCII poses some very big problems, and the main goal of the project is to cater to people who can read and write but don't know the latin alphabet. When the problems turned out to be big, the people who were working on the project decided to drop that feature. This is why RFC 5504 was deprecated.

An example of the kind of problem: Some scripts are written left-to-right and others right-to-left. Email addresses are unambiguously readable as long as both localpart and domain are written in the same direction. But if you use left-to-right ASCII for the localpart and right-to-left for the domain, both parts should be displayed on the same side of the middle @ sign, creating an exciting variety of usability, security and readability problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants