Issue-185 changes #695

sterlingpickens · 2021-04-23T22:13:49Z

These changes should resolve #185
Please, test and let me know if any other changes need to be made.
It is my intent not to break things for other people.

vapier · 2021-04-23T23:05:55Z

please undo all the whitespace changes. they don't look necessary or related and make it very hard to understand what the patch is actually doing.

sterlingpickens · 2021-04-23T23:27:11Z

I completely replaced 1 function with 3, but I can add the extra whitespaces everywhere if you like (maintaining the style used in the replaced function) ?
The file as a whole is already mixed style.

sterlingpickens · 2021-04-24T00:33:52Z

I'll try to clarify.
entities.h was updated to include all current html5 entities instead of only html4.
The gdTcl_UtfToUniChar function was removed. The comp_entities function was removed.
gdTcl_UtfToUniChar became gd_Entity_To_Unicode (no longer requires comp_entities), gd_JISX0208_To_Unicode (2-byte encoding not utf-8), and gd_UTF8_To_Unicode (now accepts 4-byte utf8 for the official unicode limit).
The reasoning behind splitting the functions is because JISX0208 and UTF8 are mutually exclusive and overlap.
The entities all start with &, so it kind of piggy backs regardless of the encoding, i've maintained that behavior.

I considered adding JISX0213, but that's a different matter.

vapier · 2021-04-24T00:49:27Z

the indentation in src/entities.h is incorrect.

it's unclear how you're generating this, but it isn't with the entities.tcl that we've used historically. however it's done, entities.tcl needs to be updated or replaced.

sterlingpickens · 2021-04-24T00:53:33Z

I wrote a C program to do it, as the format of the website changed and it no longer works.
I can remove the extra tab, but I figured it wouldn't matter.
http://sterlingdesktops.com/pub/test/entities.c

In an ideal situation there would be an entities.h with the prototypes and an entities.c containing the function and the lookup table. jisx0208.h as well. I didn't want to modify the build system or take things that far.

We're kinda just putting a bandade on things, without a complete rewrite of gdft.c. There are a lot of things that could be improved.

vapier · 2021-04-24T01:11:26Z

we want that logic in the repo so we can verify & keep in sync easier. and we don't want to keep around old scripts that are known to be outdated.

sterlingpickens · 2021-04-24T01:13:43Z

I'd have to add the header portion of the printing and make it output the file as a whole.
Right now it is just outputting the relevant lines.
I suppose I can do that. I never really used tcl, so I can't be certain of how to make that perfect.

I was wondering if there was an irc channel for dev discussion ?
Is everything done here on github ?

sterlingpickens · 2021-04-24T01:25:07Z

There is another way I can write gd_UTF8_To_Unicode that uses half as many lines of code as well.

sterlingpickens · 2021-04-24T20:07:30Z

Ok, i've got the entities_gen.c to replace entities.tcl.
What else should I do ?

My use of FT_Select_Charmap(face, charmap->encoding); is not ideal.
When the user calls the gdImageStringFTEx function with a selected encoding, would the use of FT_Select_Charmap(face, FT_ENCODING_UNICODE); break things for anyone ?

It might be best to rewrite that entire section to map the selected encoding to an FT_Select_Charmap call
ie: FT_Select_Charmap(face, FT_ENCODING_ADOBE_CUSTOM);
Then we can check the return for error (maybe something for a future pull).

I think only FT_Set_Charmap existed when gdft.c was written.

sterlingpickens · 2021-04-26T02:20:55Z

If you want to cancel this pull request I don't mind. I can reopen again later once i'm done with all the other changes I need to make. It'll likely be in a broken state for a few days.

vapier · 2021-04-26T02:54:17Z

shrug it doesn't cost us anything to leave open

sterlingpickens · 2021-04-26T11:58:24Z

I think it works now. I'll have to do some extensive testing and look everything over, but i'm close.

sterlingpickens · 2021-04-28T21:12:34Z

As far as I know this is done, unless someone else sees a problem.

src/entities.py

src/Makefile.am

src/entities.py

src/gdft.c

sterlingpickens · 2021-05-02T03:50:57Z

Should we remove the old "* gdTcl_UtfToUniChar is borrowed from Tcl ..." comment as well ?
That is 41 lines.
Or are we still sharing enough to mention it ?

vapier · 2021-05-02T15:55:18Z

Should we remove the old "* gdTcl_UtfToUniChar is borrowed from Tcl ..." comment as well ?

i'm indifferent. if you think the amount of borrowed code is still significant, leaving the comment sounds reasonable.

sterlingpickens · 2021-05-04T06:42:58Z

Is this done now ?

pierrejoye · 2021-09-03T04:52:23Z

What is the status on this PR?

sterlingpickens · 2021-09-03T05:13:01Z

It works for me, although there have been a handfull of changes in master since april.
I don't know if there are issues for other people.

pierrejoye · 2021-09-03T10:29:42Z

@sterlingpickens could you apply a change (whatever, white space etc) to run the CI again? Last one it was broken. We also improved CI now using github actions on Windows, Mac and linux, intel and ARM. Thanks :)

sterlingpickens · 2021-09-03T18:57:41Z

I tried to sync with main, hopefully that didn't break it.
I really don't understand this CI stuff.
A problem with gd_entities ?
Maybe something that got merged wasn't supposed to happen.
This entire PR has dragged out too long and i'm lost. It might have to be started over from scratch.

sterlingpickens · 2021-09-03T19:30:42Z

Ok, it looks like adding entities.c to CMakeLists.txt was needed.

src/entities.py

vapier · 2021-09-03T20:38:57Z

src/entities.py

+    for key in entities:
+        if name_matcher.match(key):
+            string = "\t" + "{\"" + key.replace("&", "").replace(";", "") + "\", "
+            codepoints = entities[key]["codepoints"]


pad codepoints out in place:

codepoints = [str(x) for x in entities[key]["codepoints"]] if len(codepoints) == 1: codepoints.append("0")

I don't understand this.

@vapier I think it is ok as it is now. This script should be ran by maintainers only anyway, before a release etc.

If you don't see any other thing, are you good to go with this PR? (I am :)

vapier · 2021-09-03T20:43:00Z

src/entities.py

+                string = string + "0}"
+            if len(codepoints) > 2:
+                print("Warning: entity with >2 codepoints detected")
+            file_ent_c.write(string + ",\n")


once you do the above tweaks, the final code gets a lot simpler and easier to read. the string concats you've written are very hard to follow, but now you can write:

file_ent_c.write("\t{\"" + key + "\", " + ",".join(codepoints) + "},\n")

I'll have to think about this for a few hours. Everything else is done.
I'm not a python coder. Everything I do has been is in C for the past ~10yrs.

src/entities.py

.gitignore

pierrejoye · 2021-09-04T01:26:56Z

Thanks @sterlingpickens :)

About the CI, ono every new PR, or push to master or a PR, the builds&tests are ran on the various configurations. You can see the result in the PR or directly here.

pierrejoye · 2021-09-04T01:28:45Z

@sterlingpickens And my apologies, it should have been merged long ago. I can give you a hand to get it done.

@vapier what are the remaining issues? The Python script, as long as it works, is not very important. It can always be improved later or use jq to create the header file from Json :)

pierrejoye · 2021-09-04T01:30:04Z

I approved it for now so the tests will be executed :)

sterlingpickens · 2021-09-04T01:35:42Z

Had some mistakes in that last commit.

sterlingpickens · 2021-09-04T01:43:49Z

I have to insert a 0 as a placeholder for the ones with only one codepage, that likely contributes to the "hard to read" aspect of the entities.py main loop.

pierrejoye · 2021-09-04T01:44:14Z

Well done!

The only thing I would like is some tests for the multibyte entities, then we are good to go I think. Thoughts?

sterlingpickens · 2021-09-04T01:50:04Z

Do you mean inside entities.py ?

pierrejoye · 2021-09-04T01:54:39Z

Do you mean inside entities.py ?

the resulting C entities used from freetype functions. I suppose some will never disappear, so we can add a test to ensure we don't break it in future releases. Does it make sense?

sterlingpickens · 2021-09-04T01:58:43Z

Yes, if the entites.py hits the assert with an error. Then anything it generates will be garbage.
The way I had it before today was to just continue and warn, only taking the 1 and 2 codepage entities to generate the C file. EDIT: It still functions this way, i'll have to think about this some more.

sterlingpickens · 2021-09-04T02:00:37Z

So, maybe create a backup, and/or refuse to overwrite, the .c/.h entities if any problem is detected.

pierrejoye · 2021-09-04T02:10:29Z

backup is in git so we don't need it, no? The tests should ensure two things: - the generated entities c file.is correct and build - the GD functions handle multibytes are handles correctly

…

On Sat, Sep 4, 2021, 9:00 AM sterlingpickens ***@***.***> wrote: So, maybe create a backup, and/or refuse to overwrite, the .c/.h entities if any problem is detected. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#695 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACE6KA6P4MTYQYLMVLDRB3UAF4U7ANCNFSM43PNAVSQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

sterlingpickens · 2021-09-04T03:05:10Z

I don't think freetype will have an issue with any codepages we give it(0 to UINT32_MAX), it'll just print a placeholder glyph if it's not valid. For the entities.c any entity of the form {char *, uint32_t, uint32_t} should be fine.
ie: {char *, uint32_t, uint32_t, uint32_t} would be a problem or NR_OF_ENTITIES being wrong. We shouldn't have to worry about that if entities.py works right.

I guess you would like something in the tests directory which validates entities.c/.h ?

pierrejoye · 2021-09-04T04:21:05Z

yes :)

…

On Sat, Sep 4, 2021, 10:05 AM sterlingpickens ***@***.***> wrote: I don't think freetype will have an issue with any codepages we give it(0 to UINT32_MAX), it'll just print a placeholder glyph if it's not valid. For the entities.c any entity of the form {char *, uint32_t, uint32_t} should be fine. ie: {char *, uint32_t, uint32_t, uint32_t} would be a problem or NR_OF_ENTITIES being wrong. We shouldn't have to worry about that if entities.py works right. I guess you would like something in the tests directory which validates entities.c/.h ? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#695 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACE6KCZVEKD52DCMWZIZALUAGEHBANCNFSM43PNAVSQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

pierrejoye · 2021-09-04T04:21:56Z

and also the code who handles the input texts and pass the right codepoints to freetype.

…

On Sat, Sep 4, 2021, 11:20 AM Pierre Joye ***@***.***> wrote: yes :) On Sat, Sep 4, 2021, 10:05 AM sterlingpickens ***@***.***> wrote: > I don't think freetype will have an issue with any codepages we give it(0 > to UINT32_MAX), it'll just print a placeholder glyph if it's not valid. For > the entities.c any entity of the form {char *, uint32_t, uint32_t} should > be fine. > ie: {char *, uint32_t, uint32_t, uint32_t} would be a problem or > NR_OF_ENTITIES being wrong. We shouldn't have to worry about that if > entities.py works right. > > I guess you would like something in the tests directory which validates > entities.c/.h ? > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#695 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AACE6KCZVEKD52DCMWZIZALUAGEHBANCNFSM43PNAVSQ> > . > Triage notifications on the go with GitHub Mobile for iOS > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> > or Android > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. > >

sterlingpickens · 2021-09-05T19:10:02Z

Ok, I wrote several of those little test programs back in april. I'll try to work up the energy to repurpose and clean some code up for a tests subdir.
ie:

Extract all supported glyphs from 1 or 2 common fonts and run it through libgd.
Create a entities.txt, containing all current known entities(represented for each of the 4 notations), and run that through.
Cycle through all supported unicode ranges, just to verify there are no segfaults or other problems.

pierrejoye · 2021-09-07T03:54:06Z

Thank you @sterlingpickens !! Awesome work!

I will do a 2.3.3 release today or later this week, then master will become 2.4 and we can merge this PR. finally, thank you for the hard work and patience :)

maximepvrt · 2023-03-22T11:14:28Z

@pierrejoye Is merging this pull request still part of the project?

Issue-185 changes

00e3fc9

sterlingpickens added 4 commits April 23, 2021 19:46

add entity_gen.c and entities.html with updates

9cd8da8

add entity_gen.c and entities.html with updates

6ac0b30

delete a couple extra lines

c8a561b

dont conflict with define

4223cda

sterlingpickens added 2 commits April 24, 2021 18:52

dont ship entities.h

ca919bb

alpha entities.py work

7aaa95e

sterlingpickens added 5 commits April 25, 2021 19:54

sync with upstream

f534d28

make gdImageStringFTEx ready for dual codepage entities

3849d5b

entities/malloc cleanups

c4b7630

typo and initialize ch_entity

2faaefe

use uint32_t in entities_s

6e734ac

sterlingpickens added 2 commits April 26, 2021 05:28

ensure all hex/dec entities are 1 codepoint

82b1fcc

gdRealloc members vs. size error

3a76ca4

sterlingpickens mentioned this pull request Apr 28, 2021

Add support for 4 byte UTF-8 characters and HTML entities over  and 𘚟 #185

Open

eliminate an extra if

e4952cc

vapier requested changes Apr 29, 2021

View reviewed changes

Merge branch 'master' of https://github.com/libgd/libgd

5c50ba3

CMakeLists.txt add entities.c

f485a61

vapier requested changes Sep 3, 2021

View reviewed changes

Requests-01

c0486d3

Requests-01b

2b762a2

Issue-185 changes #695

Are you sure you want to change the base?

Issue-185 changes #695

Conversation

sterlingpickens commented Apr 23, 2021

vapier commented Apr 23, 2021

sterlingpickens commented Apr 23, 2021 • edited

sterlingpickens commented Apr 24, 2021

vapier commented Apr 24, 2021

sterlingpickens commented Apr 24, 2021 • edited

vapier commented Apr 24, 2021

sterlingpickens commented Apr 24, 2021 • edited

sterlingpickens commented Apr 24, 2021

sterlingpickens commented Apr 24, 2021 • edited

sterlingpickens commented Apr 26, 2021 • edited

vapier commented Apr 26, 2021

sterlingpickens commented Apr 26, 2021

sterlingpickens commented Apr 28, 2021

sterlingpickens commented May 2, 2021

vapier commented May 2, 2021

sterlingpickens commented May 4, 2021

pierrejoye commented Sep 3, 2021

sterlingpickens commented Sep 3, 2021

pierrejoye commented Sep 3, 2021

sterlingpickens commented Sep 3, 2021

sterlingpickens commented Sep 3, 2021

vapier Sep 3, 2021

Choose a reason for hiding this comment

sterlingpickens Sep 3, 2021

Choose a reason for hiding this comment

pierrejoye Sep 7, 2021

Choose a reason for hiding this comment

vapier Sep 3, 2021

Choose a reason for hiding this comment

sterlingpickens Sep 3, 2021 • edited

Choose a reason for hiding this comment

pierrejoye commented Sep 4, 2021

pierrejoye commented Sep 4, 2021

pierrejoye commented Sep 4, 2021

sterlingpickens commented Sep 4, 2021

sterlingpickens commented Sep 4, 2021

pierrejoye commented Sep 4, 2021

sterlingpickens commented Sep 4, 2021

pierrejoye commented Sep 4, 2021

sterlingpickens commented Sep 4, 2021 • edited

sterlingpickens commented Sep 4, 2021

pierrejoye commented Sep 4, 2021 via email

sterlingpickens commented Sep 4, 2021

pierrejoye commented Sep 4, 2021 via email

pierrejoye commented Sep 4, 2021 via email

sterlingpickens commented Sep 5, 2021 • edited

pierrejoye commented Sep 7, 2021

maximepvrt commented Mar 22, 2023

sterlingpickens commented Apr 23, 2021 •

edited

sterlingpickens commented Apr 24, 2021 •

edited

sterlingpickens commented Apr 24, 2021 •

edited

sterlingpickens commented Apr 24, 2021 •

edited

sterlingpickens commented Apr 26, 2021 •

edited

sterlingpickens Sep 3, 2021 •

edited

sterlingpickens commented Sep 4, 2021 •

edited

sterlingpickens commented Sep 5, 2021 •

edited