Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thunderbird Addon 'ImportExport NG' - special characters after Export #565

Open
MTubifex opened this issue May 5, 2024 · 47 comments
Open

Comments

@MTubifex
Copy link

MTubifex commented May 5, 2024

Dear Christopher,

Thank you very much for developing and providing this great add-on for Thunderbird.
I've been looking for something like this for a long time and it works really well.

I would still like to ask if there is a way to remove bad cryptic special characters from the subject line that occur when exporting emails if the subject line contains special characters. These characters (mybe from UTF16) in file names cause significant problems when processing them in Windows.

I use the following format formula under options to export emails:
${date_custom}[I]{${sender_email}}{${recipient_email}} ${subject}

which, for example, delivers this result when exported:
20240425_2117_[I]{ebay@ebay.com}{mymail@gmx.de} ??Text from the subject line??.eml

Is it possible to use 'ImportExportTools NG' to filter out these special characters when exporting?

See image in attachement

Best regards, Mr.PT
Bild6 (Sonderzeichen in Dateinamen)

@cleidigh
Copy link
Collaborator

cleidigh commented May 6, 2024

Mr.PT
Thanks for your enthusiastic support!
The file naming system is one of my babies I am particularly fond of %-)
After first implementing it in IETNG, I ported it over to my other extension, PrintingTools NG. That does header and other printing enhancements as well as PDF output. I had a request to do exactly what you are asking for, filter non ascii characters outside the base 256 single byte characters. See below :

2024-05-06-14-01-31

Windows file system has no issues with utf-16 characters or emojis, however, many, typically older, programs do have issues. I assume this is your issue.

I'm working on a maintenance release that I really don't want to extend, however, I can look at back-porting the unicode filter, just with a preference, without adding it to the ui which has no room and avoids translations etc.

So let me look and I will see if that is doable.
Christopher

@MTubifex
Copy link
Author

MTubifex commented May 9, 2024 via email

@cleidigh
Copy link
Collaborator

cleidigh commented May 9, 2024

@MTubifex
Listening to users is the only way I operate!
I totally agree about the insidious nature of emojis. I maybe old school, but emojis just don't belong in subjects!!
Scripts like you wrote are great, but not if you need them all the time.

Fortunately I looked and back porting my
filterNonASCIICharacters: function (str) {
function should not be a problem as long as I don't mess with the ui for now. NOTE, I have this as a non substitution filter (removes emojis)
I'll post beta here for you to try.

@MTubifex
Copy link
Author

MTubifex commented May 10, 2024 via email

@cleidigh
Copy link
Collaborator

cleidigh commented May 10, 2024

Peter
Here is b5 beta for v14.0.3.
https://github.com/thunderbird/import-export-tools-ng/blob/v14.0.3/xpi/beta/import-export-tools-ng-14.0.3-b5-tb.xpi

I have ported the filter from PrintingTools NG and added a boolean preference to enable it, no UI additions.

For the preference, go to the Config Editor at the end of Settings.
Enter the preference below and you can toggle it.

extensions.importexporttoolsng.export.filename_filterUTF16_7bitASCII

Note that the way I had it in PTNG it strips upper ascii also to filter out those symbols etc.
This means accented characters like â or German ü are filtered also. In PTNG I also have a Latinize transform that would transform to a and u respectively. So you might want to comment on that.

Christopher

@MTubifex
Copy link
Author

Dear Christopher,

Thank you for your quick answers and the beta version of IENG with filter.

The filterUTF16_7bitASCII filter works great and the cryptic characters disappear. That is already progress.

20240511_Filter_Result

But as you already mentioned, the German umlauts ä ü ö are also removed. This is of course not so good for Germans, as it makes the text very illegible for us and unfortunately we have a lot of these umlauts.

I think that the 'Latinize transformation' ä > a ö > o ü > u might be a bit better. Can I already try this transformation with a configuration switch?

Best regards - Peter
[Sat.11.May.2024 03:59]

@cleidigh
Copy link
Collaborator

cleidigh commented May 11, 2024 via email

@MTubifex
Copy link
Author

Dear Christopher,
Many Thank you for your active support in dealing with 'ImportExport NG'
(what does 'NG' actually mean?)

Unfortunately I had to deactivate the beta version v14.0.3. again.
The reason: Email > Context > Export Messages > Copy to Clipboard > Message
no longer works. The content of the email is not copied to the clipboard, for whatever reason.

20240512_ImportExportAddon No more Copy to Clipboard

After I reinstalled the version 'importexporttools_ng-14.0.1-tb.xpi', it works fine as before. Unfortunately, the 'filterUTF16_7bitASCII' therfore is no longer possible :(

Is there a bug here or am I doing something wrong?

Kind regards - Mr.PT

[Sun.12.May.2024 8:09 p.m.]

@cleidigh
Copy link
Collaborator

Hi Peter, guten tag
I'm working on the latinize transform port, may get it out today.
I just verified some debug for the export methods clobbered the clipboard function. I'll fix that for the next beta. I have not done full regression testing yet so this slipped through, sorry.
Christopher

@cleidigh
Copy link
Collaborator

Peter
b6 has the clipboard functionality fixed removing conflicting debug.
I also have the latinize transform ported in. Use the following pref:

extensions.importexporttoolsng.export.filename_latinize

Hopefully this will now create filenames acceptable to all of your target programs.

https://github.com/thunderbird/import-export-tools-ng/blob/v14.0.3/xpi/beta/import-export-tools-ng-14.0.3-b6-tb.xpi
Christopher

@MTubifex
Copy link
Author

Hi Christopher,
You're just faster than the police allow - great!

I tested the 'import-export-tools-ng-14.0.3-b6-tb.xpi' version.

Result:
extensions.importexporttoolsng.export.filename_filterUTF16_7bitASCII True : Special characters are removed top!
extensions.importexporttoolsng.export.filename_latinize True : Umlauts ä/ö/ü become a/o/u ok
Would it be very time-consuming to convert the umlauts into ä > 'ae' ö > 'oe' ü > 'ue'?
I mean, that would probably be the best option for German-syntax.

Unfortunately, exporting email to the clipboard doesn't work :(
Maybe in version 14.1.0.xpi ?

Best regards - Peter

@cleidigh
Copy link
Collaborator

cleidigh commented May 13, 2024 via email

@cleidigh
Copy link
Collaborator

Peter
The one "view" I missed testing is the message in a separate window. This does have an error. The other views all do copyToClipboard correctly. Message list, message pane and message in a tab all work for me.
Can you verify we see The same behavior?

Regarding the umlats. There are a lot of unicode characters in the transform table. Currently all forms get changed to a single latin character, however, I have no problem changing them to two character transforms. The trick is identifying the exact source codes.
Can you send me an email with a subject containing each of these characters? Then I can compare to the table. My test account :
test1@kokkini.net

The other question is if these characters MUST be transformed for your windows programs or not? The UTF16 filter does not have to filter to 7bit ASCII, perhaps that's just creating an unnecessary issue?
Christopher
Christopher

@cleidigh
Copy link
Collaborator

Don't bother doing any captures, I am dealing with a bunch of spaghetti code %-(

@MTubifex
Copy link
Author

Hi Christopher,

sorry for my absence - I'm still involved in other projects in the meantime.

How can I help with development?
What do you need to unravel the spaghetti code?

Attached is an Excel list (and PDF) with all the required characters and their ASCI code
Let me know if I can assist/support.

VG Peter

Anhang: (see also test1@kokkini.net)
Excel_All characters for german filenames.pdf
Excel_All characters for german filenames.xls

Excel_All characters for german filenames.pdf
Excel_All characters for german filenames.xls

@cleidigh
Copy link
Collaborator

Peter
I am still working on the copyToClipboard. It's working, but throws a non fatal error that doesn't happen for the plaintext converter, go figure.
Looking at your chart makes me think the following :

  • you can use several upper ascii characters that are filtered along with UTF16 filter
  • The latinize transform doesn't do anything you really need as far as I can tell

I should change the filter to include the whole 256 range (illegal characters are always filtered)
From your list that leaves the " character to be filtered.
PrintingTools also has a filter characters pref where you enter characters to be filtered. You could enter the " there.
This keeps the filters and transforms general while I think meeting your needs. What do you think?
Christopher

@cleidigh
Copy link
Collaborator

Sorry, the illegal characters including " are already transformed into _

So if I just change the filter to only filter utf16 you should be good yes?

@MTubifex
Copy link
Author

Good evening Christopher,

If you ask me, I would describe the filter like this:

The filter is intended to filter (remove) the following from the subject line:

All characters that cannot be found on a standard keyboard, such as ý þ ÿ Ý Þ etc.
All characters that are not allowed to be used in file names: "*/:<>?|

The rest should be allowed through unchanged.

Or described differently:
Only pass characters that match this ASCII code:
32,33,35,36,37,38,39,40,41,43,44,45,46,48,49,50,51,52,53,54,55,56,57,59,61, 64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88, 89,90,91,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,12 2,123,125,126,128,167,196,214,220,223,228,246,252

Does that help?

Best regards, PT

@cleidigh
Copy link
Collaborator

ok, but that's a little specific concerning the non keyboard characters.
That's what the open character filter is for.
I will make a pure UTF-16 filter and port the character list filter over.
Again this keeps things general.
That will do exactly what you describe.
Christopher

@cleidigh
Copy link
Collaborator

cleidigh commented May 16, 2024

Peter
I have b8 which takes care of a variety of issues related to my restructure.
copyToClipboard in all views looks good. If you can do like four trys
and then check the console for errors, I have NO_INTERFACE errors which
only appear in one profile. It's non fatal, but I am curious if you see it.

For the filtering I have done the following :

  • Changed the filter to be a pure unicode filter, leaves all ascii intact
  • The new filter has its preference changed
extensions.importexporttoolsng.export.filename_filterUTF16
  • I ported the open character filter that you can set a string of characters to filter
extensions.importexporttoolsng.export.filename_filter_characters

So in your case you use the UTF16 filter for emojis and then use the character filter for any ascii characters you want to filter.
Should do everything you need in a general fashion.

https://github.com/thunderbird/import-export-tools-ng/blob/v14.0.3/xpi/beta/import-export-tools-ng-14.0.3-b8-tb.xpi

Cheers
Christopher

@MTubifex
Copy link
Author

Hi again dear Christopher,

I've tested the new filter and it works fine:

20240517_Result Filter filename_filterUTF16_mark

Copying the email text to the clipboard also works, but all 'CrLf' (Returns) are removed, so that the email text becomes a long spaghetti text. However, the ng-14.0.3-b8-tb is close to the optimal version. If you can still solve the line break problem, the result would be perfect.

Question: How do I set the 'extensions.importexporttoolsng.export.filename_filter_characters' parameter. This is apparently not a Boolean value. Does it have to be set to True/False manualy to activate it?

Thank you so much - your power beta tester - Peter
[Fri.17.May.2024 01:18]

@cleidigh
Copy link
Collaborator

Mr. Beta tester, you are a night owl! (unless you are secretly in Asia or the West coast)

Something odd. I've done a hundred or so clipboard operations, crlf line ends intact???
B8 has the debug output pre and post conversion to text. Can you capture the debug for a clip?
Also paste into notepad, save as txt and send me the file or paste file here.
Fun never ends...

The filter character pre is a unicode string. When empty does nothing. You set the string by hitting the edit pencil. Then enter the characters you want filtered, no separators. Cool thing is you can enter unicode characters, emojis as well.

Christopher

@MTubifex
Copy link
Author

Hi Chris,

yes 'night owl' is as what I was born... ;)

First: 'extensions.importexporttoolsng.export.filename_filter_characters' as an direct Filter ist a cool feature!! I'll definitiv use it!!
So, when I find an unwanted character in the subject-Line or somewhere else:
Can I copy&Paste this character directly to a Text-String and then put it as String-Text on the
.filename_filter_characters parameter? Or must I find the ASCII-Code or something else?
Can you give me an example, how to set an charakter to be filtered? Perhaps 3 chars like this "😃🍔🎹"

The Clipboard-Problem:
Hier comes a screenshot-Collection as PDF-File to verify the Clipboard-behaviour under importexporttools_ng-14.0.1-tb.xpi and import-export-tools-ng-14.0.3-b8-tb.xpi from the same Email:

Attach: Copy_Clipboard_ImpExpNG.pdf
Copy_Clipboard_ImpExpNG.pdf

@cleidigh
Copy link
Collaborator

Peter
I see the results in notepad, however, what I need is the debug output so I can see the pre clipboard data.
The other test would be to export the selected message and export to plaintext. That uses a similar transform.

I will answer your filter questions in a moment.
Christopher

@MTubifex
Copy link
Author

Ok, how can I generate a debug output, or where can I find it?
(export the selected message and export to plaintext leads to the same result: all text in one line)

Greatings from Germany PT

@cleidigh
Copy link
Collaborator

Ok so the consistency between the clipboard and Plaintext makes it more likely that there is something with the html=> plaintext converter. I am using the new Thunderbird converter in v14.0.3 which works fine on my windows 10 dev machine. I saw your tests were on Windows 7. That might be the issue. Do you have a newer win 10-11 setup?

For the debug output, open the console with Control-Shift-J
You can clear with the trashcan.
Do a clip, return to the console.
Right click and copy all messages.
Paste into text file and post.

For the filter, yes you cut and paste or type the characters, no codes needed.
Christopher

@cleidigh
Copy link
Collaborator

Peter
If I do a beta showing side by side the two different converters will you be available to do a quick test in half hour?
Christopher

@MTubifex
Copy link
Author

MTubifex commented May 19, 2024 via email

@MTubifex
Copy link
Author

MTubifex commented May 20, 2024 via email

@MTubifex
Copy link
Author

Good morning Christopher,

Going to bed at 6 a.m., calls again at 11 a.m. - night owls have a hard time ;(

But good news. After restarting the computer, exporting email text works with the Beta9 version!!!! That's already fixed... :)))
I didn't realize that I would obviously have to restart again after changing the add-on - maybe just restarting ThunderBird would be enough - I'll test again.

The Bug-Report still shows: "tb.ui.interaction.message_display - The key length must be limited to 72 characters." when copy text in Clip - anyway, it seems to work...

========================
Test2: ExportMails with 'extensions.importexporttoolsng.export.filename_filterUTF16 = True"

Result:
20240506-Schau mal, was da blüht, Peter!-1877.eml
20240508-Bereit für die Küchenschlacht_ -1896.eml

Everything is good here too: the special characters are gone and the umlauts are still there!
Conclusion: From the current perspective, Beta9 could be a permanent upgrade.

So much for now
Quick greetings and I'll lie down again chrzzz chrzzz chrzzz ...

[Mon.20.May.2024 12:20]
Peter

@cleidigh
Copy link
Collaborator

Peter
Thanks.
Yes, get some sleep.
The recent Thunderbird versions seem to be caching code preventing restartless addon installation. This is something I definitely have to investigate.
After knowing this and after your zzzs, would you mind trying b8 again with a Thunderbird restart? If the new conversion works on win7 that would be better, if not I will have a win version switch for each converter.

That sounds good about the filters, now you don't have to do your secondary operations. As I said before the filename system is one of my big additions to the original authors code. I hope now I can release and start my major rewrite.
Cheers from Providence.
Christopher

@cleidigh
Copy link
Collaborator

Any update on trying b8 with a Thunderbird restart?
Christopher

@MTubifex
Copy link
Author

MTubifex commented May 22, 2024 via email

@cleidigh
Copy link
Collaborator

I am interested if b8, after Thunderbird restart after installing b8 produces a proper copytoclip.
Nothing with filters, bp has what we want.
Thanks Christopher

@MTubifex
Copy link
Author

MTubifex commented May 22, 2024 via email

@cleidigh
Copy link
Collaborator

Peter
My apologies for that mistake. I have deleted that post.

I don't understand what is happening with b8 inconsistencies. I will do a beta that uses the old converter for Windows 7 and the new converter for Windows 10-11
That should be safe.
I'm on multiple things today so not sure when.
Christopher

@MTubifex
Copy link
Author

MTubifex commented May 22, 2024 via email

@cleidigh
Copy link
Collaborator

ok
I also realized I have to determine how to distinguish between Windows versions. Since I don't have win7 we may need to test.

@cleidigh
Copy link
Collaborator

ok i think I found the method
if you go to the console on your windows 7 machine and type
navigator.userAgent
it Should give us what we need.

@cleidigh
Copy link
Collaborator

Peter
I was able to get b10 out.
It has all the filters and uses the old converter for Windows 7 and the new converter for all else. Do tb restart to be safe. If you can verify on 7 and 10 that would be great. Hopefully this gets us there.
Christopher

https://github.com/thunderbird/import-export-tools-ng/blob/v14.0.3/xpi/beta/import-export-tools-ng-14.0.3-b10-tb.xpi

@MTubifex
Copy link
Author

MTubifex commented May 23, 2024 via email

@MTubifex
Copy link
Author

MTubifex commented May 24, 2024 via email

@cleidigh
Copy link
Collaborator

Peter
Yahoo!
Now I can cleanup
Thanks master tester!!
Christopher

@cleidigh
Copy link
Collaborator

cleidigh commented May 24, 2024

Master Tester:

In the process of cleaning up I discovered a redundant call to the converter function. I don't think this was an issue, however, I don't want to chance it. I can't verify the windows 7 path.

Can I bother you for what I hope is final on the copyToClipboard issue?
You only need to check copyToClipboard on Windows 7.
Nothing has changed with the filters.

Something about the code and myself :

As I have mentioned before the codebase includes a lot of the original author's code with his and my additions along the way. While I have managed to do significant complete rewrites of the mbox import export and other things, much of the code remains to be done. While you may have called me speedy once, that's the exception. I am really both slow and limited. I have ALS and have to program by eye gaze (when eyes are good.)

I only mention because we have developed a very nice report. For me who used to travel the world, my user relationships with great people like yourself are my Vicarious travel.
I appreciate it quite a lot.

Christopher

@cleidigh
Copy link
Collaborator

cleidigh commented May 24, 2024

don't bother, just found another error...
sigh %-{

Ironically what I found by chance is the UTF16 messages like Greek (my wife is Greek) were broken. I'm kind of exhausted on this, but going back to v14.0.1 to try to figure it out. I'm already in bed, but it's my bedtime job.
Christopher (NOT master programmer)

@MTubifex
Copy link
Author

MTubifex commented May 25, 2024 via email

@cleidigh
Copy link
Collaborator

Peter
The error I found was with plaintext export so should not affect clipboard or filters.
So b12 should be it. If you can verify b12 for win7 & 10 clipboard operation that would be great.
I am desperately hoping to release in the next couple of days.
Christopher

https://github.com/thunderbird/import-export-tools-ng/blob/v14.0.3/xpi/beta/import-export-tools-ng-14.0.3-b12-tb.xpi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants