Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue-185 changes #695

Open
wants to merge 41 commits into
base: master
Choose a base branch
from
Open

Conversation

sterlingpickens
Copy link

These changes should resolve #185
Please, test and let me know if any other changes need to be made.
It is my intent not to break things for other people.

@vapier
Copy link
Member

vapier commented Apr 23, 2021

please undo all the whitespace changes. they don't look necessary or related and make it very hard to understand what the patch is actually doing.

@sterlingpickens
Copy link
Author

sterlingpickens commented Apr 23, 2021

I completely replaced 1 function with 3, but I can add the extra whitespaces everywhere if you like (maintaining the style used in the replaced function) ?
The file as a whole is already mixed style.

@sterlingpickens
Copy link
Author

I'll try to clarify.
entities.h was updated to include all current html5 entities instead of only html4.
The gdTcl_UtfToUniChar function was removed. The comp_entities function was removed.
gdTcl_UtfToUniChar became gd_Entity_To_Unicode (no longer requires comp_entities), gd_JISX0208_To_Unicode (2-byte encoding not utf-8), and gd_UTF8_To_Unicode (now accepts 4-byte utf8 for the official unicode limit).
The reasoning behind splitting the functions is because JISX0208 and UTF8 are mutually exclusive and overlap.
The entities all start with &, so it kind of piggy backs regardless of the encoding, i've maintained that behavior.

I considered adding JISX0213, but that's a different matter.

@vapier
Copy link
Member

vapier commented Apr 24, 2021

the indentation in src/entities.h is incorrect.

it's unclear how you're generating this, but it isn't with the entities.tcl that we've used historically. however it's done, entities.tcl needs to be updated or replaced.

@sterlingpickens
Copy link
Author

sterlingpickens commented Apr 24, 2021

I wrote a C program to do it, as the format of the website changed and it no longer works.
I can remove the extra tab, but I figured it wouldn't matter.
http://sterlingdesktops.com/pub/test/entities.c

In an ideal situation there would be an entities.h with the prototypes and an entities.c containing the function and the lookup table. jisx0208.h as well. I didn't want to modify the build system or take things that far.

We're kinda just putting a bandade on things, without a complete rewrite of gdft.c. There are a lot of things that could be improved.

@vapier
Copy link
Member

vapier commented Apr 24, 2021

we want that logic in the repo so we can verify & keep in sync easier. and we don't want to keep around old scripts that are known to be outdated.

@sterlingpickens
Copy link
Author

sterlingpickens commented Apr 24, 2021

I'd have to add the header portion of the printing and make it output the file as a whole.
Right now it is just outputting the relevant lines.
I suppose I can do that. I never really used tcl, so I can't be certain of how to make that perfect.

I was wondering if there was an irc channel for dev discussion ?
Is everything done here on github ?

@sterlingpickens
Copy link
Author

There is another way I can write gd_UTF8_To_Unicode that uses half as many lines of code as well.

@sterlingpickens
Copy link
Author

sterlingpickens commented Apr 24, 2021

Ok, i've got the entities_gen.c to replace entities.tcl.
What else should I do ?

My use of FT_Select_Charmap(face, charmap->encoding); is not ideal.
When the user calls the gdImageStringFTEx function with a selected encoding, would the use of FT_Select_Charmap(face, FT_ENCODING_UNICODE); break things for anyone ?

It might be best to rewrite that entire section to map the selected encoding to an FT_Select_Charmap call
ie: FT_Select_Charmap(face, FT_ENCODING_ADOBE_CUSTOM);
Then we can check the return for error (maybe something for a future pull).

I think only FT_Set_Charmap existed when gdft.c was written.

@sterlingpickens
Copy link
Author

sterlingpickens commented Apr 26, 2021

If you want to cancel this pull request I don't mind. I can reopen again later once i'm done with all the other changes I need to make. It'll likely be in a broken state for a few days.

@vapier
Copy link
Member

vapier commented Apr 26, 2021

shrug it doesn't cost us anything to leave open

@sterlingpickens
Copy link
Author

I think it works now. I'll have to do some extensive testing and look everything over, but i'm close.

@sterlingpickens
Copy link
Author

As far as I know this is done, unless someone else sees a problem.

src/entities.py Outdated Show resolved Hide resolved
src/entities.py Outdated Show resolved Hide resolved
src/Makefile.am Outdated Show resolved Hide resolved
src/entities.py Show resolved Hide resolved
src/entities.py Outdated Show resolved Hide resolved
src/gdft.c Outdated Show resolved Hide resolved
src/gdft.c Outdated Show resolved Hide resolved
src/gdft.c Show resolved Hide resolved
src/gdft.c Outdated Show resolved Hide resolved
src/gdft.c Outdated Show resolved Hide resolved
@sterlingpickens
Copy link
Author

Should we remove the old "* gdTcl_UtfToUniChar is borrowed from Tcl ..." comment as well ?
That is 41 lines.
Or are we still sharing enough to mention it ?

@vapier
Copy link
Member

vapier commented May 2, 2021

Should we remove the old "* gdTcl_UtfToUniChar is borrowed from Tcl ..." comment as well ?

i'm indifferent. if you think the amount of borrowed code is still significant, leaving the comment sounds reasonable.

@sterlingpickens
Copy link
Author

Is this done now ?

@pierrejoye
Copy link
Contributor

What is the status on this PR?

@sterlingpickens
Copy link
Author

It works for me, although there have been a handfull of changes in master since april.
I don't know if there are issues for other people.

@pierrejoye
Copy link
Contributor

@sterlingpickens could you apply a change (whatever, white space etc) to run the CI again? Last one it was broken. We also improved CI now using github actions on Windows, Mac and linux, intel and ARM. Thanks :)

@sterlingpickens
Copy link
Author

I tried to sync with main, hopefully that didn't break it.
I really don't understand this CI stuff.
A problem with gd_entities ?
Maybe something that got merged wasn't supposed to happen.
This entire PR has dragged out too long and i'm lost. It might have to be started over from scratch.

@sterlingpickens
Copy link
Author

Ok, it looks like adding entities.c to CMakeLists.txt was needed.

src/entities.py Outdated Show resolved Hide resolved
src/entities.py Outdated Show resolved Hide resolved
src/entities.py Outdated Show resolved Hide resolved
src/entities.py Outdated Show resolved Hide resolved
src/entities.py Outdated Show resolved Hide resolved
src/entities.py Outdated Show resolved Hide resolved
for key in entities:
if name_matcher.match(key):
string = "\t" + "{\"" + key.replace("&", "").replace(";", "") + "\", "
codepoints = entities[key]["codepoints"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pad codepoints out in place:

codepoints = [str(x) for x in entities[key]["codepoints"]]
if len(codepoints) == 1:
    codepoints.append("0")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vapier I think it is ok as it is now. This script should be ran by maintainers only anyway, before a release etc.

If you don't see any other thing, are you good to go with this PR? (I am :)

string = string + "0}"
if len(codepoints) > 2:
print("Warning: entity with >2 codepoints detected")
file_ent_c.write(string + ",\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once you do the above tweaks, the final code gets a lot simpler and easier to read. the string concats you've written are very hard to follow, but now you can write:

file_ent_c.write("\t{\"" + key + "\", " + ",".join(codepoints) + "},\n")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to think about this for a few hours. Everything else is done.
I'm not a python coder. Everything I do has been is in C for the past ~10yrs.

src/entities.py Outdated Show resolved Hide resolved
.gitignore Show resolved Hide resolved
@pierrejoye
Copy link
Contributor

Thanks @sterlingpickens :)

About the CI, ono every new PR, or push to master or a PR, the builds&tests are ran on the various configurations. You can see the result in the PR or directly here.

@pierrejoye
Copy link
Contributor

@sterlingpickens And my apologies, it should have been merged long ago. I can give you a hand to get it done.

@vapier what are the remaining issues? The Python script, as long as it works, is not very important. It can always be improved later or use jq to create the header file from Json :)

@pierrejoye
Copy link
Contributor

I approved it for now so the tests will be executed :)

@sterlingpickens
Copy link
Author

Had some mistakes in that last commit.

@sterlingpickens
Copy link
Author

I have to insert a 0 as a placeholder for the ones with only one codepage, that likely contributes to the "hard to read" aspect of the entities.py main loop.

@pierrejoye
Copy link
Contributor

image

Well done!

The only thing I would like is some tests for the multibyte entities, then we are good to go I think. Thoughts?

@sterlingpickens
Copy link
Author

Do you mean inside entities.py ?

@pierrejoye
Copy link
Contributor

Do you mean inside entities.py ?

the resulting C entities used from freetype functions. I suppose some will never disappear, so we can add a test to ensure we don't break it in future releases. Does it make sense?

@sterlingpickens
Copy link
Author

sterlingpickens commented Sep 4, 2021

Yes, if the entites.py hits the assert with an error. Then anything it generates will be garbage.
The way I had it before today was to just continue and warn, only taking the 1 and 2 codepage entities to generate the C file. EDIT: It still functions this way, i'll have to think about this some more.

@sterlingpickens
Copy link
Author

So, maybe create a backup, and/or refuse to overwrite, the .c/.h entities if any problem is detected.

@pierrejoye
Copy link
Contributor

pierrejoye commented Sep 4, 2021 via email

@sterlingpickens
Copy link
Author

I don't think freetype will have an issue with any codepages we give it(0 to UINT32_MAX), it'll just print a placeholder glyph if it's not valid. For the entities.c any entity of the form {char *, uint32_t, uint32_t} should be fine.
ie: {char *, uint32_t, uint32_t, uint32_t} would be a problem or NR_OF_ENTITIES being wrong. We shouldn't have to worry about that if entities.py works right.

I guess you would like something in the tests directory which validates entities.c/.h ?

@pierrejoye
Copy link
Contributor

pierrejoye commented Sep 4, 2021 via email

@pierrejoye
Copy link
Contributor

pierrejoye commented Sep 4, 2021 via email

@sterlingpickens
Copy link
Author

sterlingpickens commented Sep 5, 2021

Ok, I wrote several of those little test programs back in april. I'll try to work up the energy to repurpose and clean some code up for a tests subdir.
ie:

  1. Extract all supported glyphs from 1 or 2 common fonts and run it through libgd.
  2. Create a entities.txt, containing all current known entities(represented for each of the 4 notations), and run that through.
  3. Cycle through all supported unicode ranges, just to verify there are no segfaults or other problems.

@pierrejoye
Copy link
Contributor

Thank you @sterlingpickens !! Awesome work!

I will do a 2.3.3 release today or later this week, then master will become 2.4 and we can merge this PR. finally, thank you for the hard work and patience :)

@maximepvrt
Copy link

@pierrejoye Is merging this pull request still part of the project?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for 4 byte UTF-8 characters and HTML entities over  and 𘚟
4 participants