Fix folding non-ASCII header #1566

ts1 · 2023-02-17T06:10:51Z

sebbASF · 2023-02-17T10:49:53Z

It would be helpful to have at least one test please

ts1 · 2023-02-21T08:57:10Z

Before I write a test, 3 tests failed with my change.
I briefly checked that these encodings round-tripped correctly, as shown below.
If this is okay, I will fix result strings as my modified code outputs.

However, the test here states opposite of my problem.

mail/spec/mail/encodings_spec.rb

Line 286 in 9c46620

it "should not fold a long string that has no spaces" do

May I remove 'not' here and show that it round-trips correctly?

diff --git a/spec/mail/encodings_spec.rb b/spec/mail/encodings_spec.rb
index 7428b938..79702670 100644
--- a/spec/mail/encodings_spec.rb
+++ b/spec/mail/encodings_spec.rb
@@ -293,8 +293,9 @@ RSpec.describe Mail::Encodings do
       end
       mail = Mail.new
       mail.subject = original
+      mail = Mail.new(mail.encoded)
       expect(mail[:subject].decoded).to eq original
-      expect(mail[:subject].encoded).to eq result
+      #expect(mail[:subject].encoded).to eq result
     end
 
     it "should round trip a complex string properly" do
@@ -306,13 +307,13 @@ RSpec.describe Mail::Encodings do
       mail = Mail.new
       mail.subject = original
       expect(mail[:subject].decoded).to eq original
-      expect(mail[:subject].encoded).to eq result
+      #expect(mail[:subject].encoded).to eq result
       mail = Mail.new(mail.encoded)
       expect(mail[:subject].decoded).to eq original
-      expect(mail[:subject].encoded).to eq result
+      #expect(mail[:subject].encoded).to eq result
       mail = Mail.new(mail.encoded)
       expect(mail[:subject].decoded).to eq original
-      expect(mail[:subject].encoded).to eq result
+      #expect(mail[:subject].encoded).to eq result
     end
 
     it "should round trip another complex string (koi-8)" do
diff --git a/spec/mail/fields/unstructured_field_spec.rb b/spec/mail/fields/unstructured_field_spec.rb
index af3407da..f1fb0293 100644
--- a/spec/mail/fields/unstructured_field_spec.rb
+++ b/spec/mail/fields/unstructured_field_spec.rb
@@ -158,8 +158,10 @@ RSpec.describe Mail::UnstructuredField do
       @field = Mail::UnstructuredField.new("X-SMTPAPI", string)
       string = string.dup.force_encoding('UTF-8')
       result = "X-SMTPAPI: =?UTF-8?Q?{=22unique=5Fargs=22:_{=22mailing=5Fid=22:147,=22a?=\r\n =?UTF-8?Q?ccount=5Fid=22:2},_=22to=22:_[=22larspind@gmail.com=22],_=22categ?=\r\n =?UTF-8?Q?ory=22:_=22mailing=22,_=22filters=22:_{=22domainkeys=22:_{=22sett?=\r\n =?UTF-8?Q?ings=22:_{=22domain=22:1,=22enable=22:1}}},_=22sub=22:_{=22{{op?=\r\n =?UTF-8?Q?en=5Fimage=5Furl}}=22:_[=22http://betaling.larspind.local/O?=\r\n =?UTF-8?Q?/token/147/Mailing::FakeRecipient=22],_=22{{name}}=22:_[=22[FIRST?=\r\n =?UTF-8?Q?_NAME]=22],_=22{{signup=5Freminder}}=22:_[=22=28her_kommer_til_at?=\r\n =?UTF-8?Q?_st=C3=A5_hvorn=C3=A5r_folk_har_skrevet_sig_op_...=29=22],?=\r\n =?UTF-8?Q?_=22{{unsubscribe=5Furl}}=22:_[=22http://betaling.larspind.?=\r\n =?UTF-8?Q?local/U/token/147/Mailing::FakeRecipient=22],_=22{{email}}=22:?=\r\n =?UTF-8?Q?_[=22larspind@gmail.com=22],_=22{{link:308}}=22:_[=22http://beta?=\r\n =?UTF-8?Q?ling.larspind.local/L/308/0/Mailing::FakeRecipient=22],_=22{{con?=\r\n =?UTF-8?Q?firm=5Furl}}=22:_[=22=22],_=22{{ref}}=22:_[=22[REF]=22]}}?=\r\n"
-      expect(@field.encoded).to eq result
-      expect(@field.decoded).to eq string
+      #expect(@field.encoded).to eq result
+
+      field = Mail::UnstructuredField.new("X-SMTPAPI", @field.encoded)
+      expect(field.decoded).to eq "X-SMTPAPI: #{string}\r\n"
     end
 
     it "should fold properly with continuous spaces around the linebreak" do

sebbASF · 2023-02-21T11:21:10Z

May I remove 'not' here and show that it round-trips correctly?

No: folding must only occur between words separated by white space, so it is vital that there is a test to ensure that this rule is tested.

ts1 · 2023-02-21T12:29:17Z

So, what should we do when very long non-ASCII subject without whitespace is given?
Japanese text in usual doesn't contain whitespace.
If it is not folded, it results in garbled text.
I tested it on Rails to directly Gmail, and Rails to Mac Mail via AWS SES.
The subject is always garbled around 110th Japanese letter.
That is near 998th byte of UTF-8 QP.

With this patch the subject is flawlessly transferred.

sebbASF · 2023-02-21T16:51:26Z

Folding is only defined for values that contain foldable white space.
If you add a fold at an arbitrary location, then when it is unfolded, there will be a spurious white space character.

I don't know what the solution is here, but changing the tests does not seem right.

sebbASF · 2023-02-22T00:36:26Z

I've dug a little further, and folded white space is ignored between encoded words.

This is a way to support words that are too long: chop the word up into smaller chunks and encode all the chunks.
This works equally well for long ascii strings (provided you encode the chunks).

It does look as though something like your proposed fix might achieve that, but the failing tests need to be investigated further. It's possible that some tests need adjustment to allow for additional folding, or it may be that the proposed fix is incomplete.

Additionally, the code ought to work for long ascii words (which are not normally encoded).

I hope to take a further look soon.

ts1 · 2023-02-22T06:34:58Z

I rewrote it to fold ASCII-only long words safely, and abandoned 78-character limit but not to exceed 998 characters.
So the existing tests almost succeeds (except one).
Also I added 4 tests and 3 of them fails without my change.

sebbASF · 2023-02-22T09:22:53Z

Sorry, but that is not acceptable.

Not all MTAs can handle 998 characters in an email line, so it is vital to keep the 78 char limit as per the RFC.

Ensuring that output lines are no more than 78 chars would automatically fix the original issue.

We also need a test to show the original issue. The expected result once the issue has been fixed will depend on how the long subject has been split into chunks, but the test can check that each line of the encoded output is no longer than 78 chars.

I am hopeful that the issue can be fixed with fewer changes.

ts1 · 2023-02-22T09:50:59Z

The original issue is shown as this test

it "should fold a Japanese subject with more than 998 characters long and no white space" do

It is okay to limit to 78 chars, but it will break more existing tests.

sebbASF · 2023-02-22T10:46:37Z

I see the test now. I was looking for the subject from the original issue.

It is okay to limit to 78 chars, but it will break more existing tests.

In which case the tests need to be carefully inspected to see if they are correct, and adjusted if necessary. For example, the folds may be in a slightly different place than originally expected, but the result is still OK.

I am looking at this again today.

sebbASF · 2023-02-22T16:37:14Z

See #1570 which fixes most folding issues.
There are a couple of instances where the output line is a bit longer than 78 chars

ts1 · 2023-02-23T04:18:12Z

#1570 looks good. I withdraw this.

ts1 force-pushed the master branch from 5f9fa38 to e2e100e Compare February 22, 2023 06:03

ts1 added 2 commits February 22, 2023 15:13

Add folding tests

a2442d4

Fix folding long word

515c535

ts1 force-pushed the master branch from e2e100e to 515c535 Compare February 22, 2023 06:16

ts1 closed this Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix folding non-ASCII header #1566

Fix folding non-ASCII header #1566

ts1 commented Feb 17, 2023

sebbASF commented Feb 17, 2023

ts1 commented Feb 21, 2023

sebbASF commented Feb 21, 2023

ts1 commented Feb 21, 2023

sebbASF commented Feb 21, 2023

sebbASF commented Feb 22, 2023

ts1 commented Feb 22, 2023

sebbASF commented Feb 22, 2023

ts1 commented Feb 22, 2023

sebbASF commented Feb 22, 2023

sebbASF commented Feb 22, 2023

ts1 commented Feb 23, 2023

Fix folding non-ASCII header #1566

Fix folding non-ASCII header #1566

Conversation

ts1 commented Feb 17, 2023

sebbASF commented Feb 17, 2023

ts1 commented Feb 21, 2023

sebbASF commented Feb 21, 2023

ts1 commented Feb 21, 2023

sebbASF commented Feb 21, 2023

sebbASF commented Feb 22, 2023

ts1 commented Feb 22, 2023

sebbASF commented Feb 22, 2023

ts1 commented Feb 22, 2023

sebbASF commented Feb 22, 2023

sebbASF commented Feb 22, 2023

ts1 commented Feb 23, 2023