-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved support for non-Latin language inclusion in report PDFs, and validate translations for new phrases and new languages #417
Comments
…an, Hindi, Tamil script etc) via uharfbuzz cpython module; had to install gcc and g++ for this to work on the arm64 build; also updated Jupyter Lab start alias as per #399. Also added new languages and auto-translations of these in anticipation of translation validation, in support of #367. This image hasn't been fully tested yet, and more language features require implementation (e.g. right to left template support for Arabic and Persian; needs to be added as new issue)
After apparently successfully installing uharfbuzz, I timed the running of generating reports both without and with text shaping enabled. I did this after having already generated the image resources, so these were skipped over --- I was just interested in the impact on PDF generation for three languages that it wouldn't necessarily be expected to have a major impact on (English, Spanish, and Chinese - Simplified). For the change, I added global-indicators/process/subprocesses/_utils.py Lines 1128 to 1137 in f8789eb
From the GHSCI console I ran Before:
After:
So, it took 4 seconds longer to produce 9 reports in 3 languages with text shaping compared to without. That's a pretty negligible difference in the scheme of things. Was there an aesthetic difference (bearing in mind, I didn't really expect to see one with these fonts --- I just want to confirm there aren't adverse impacts)? I couldn't really notice any. On the left here is before, on the right is after (fwiw; no obvious at-a-glance change in Chinese or Latin script): So.... what if we tried this with Hindi text? Tried a few fonts, but ended up apparently mostly working with a recommended one from fpdf2 itself --- and after: the text shaping version is different, but is it more correct? Fingers crossed! |
… reports in general with additional template refinements following feedback; also implemented a 'download_file()' function, currently only applied for fonts, it could be extended to be used for other dataset types, as per #418
…ally verbose phrases that frequently fail, for #417
When producing our 25 city reports, we experienced issues supporting some non-Latin scripts like Tamil:
global-healthy-liveable-cities/global_scorecards#7
However, the PDF templating software we use (fpdf2) has recently implemented changes that should provide better support for non-Latin languages:
https://py-pdf.github.io/fpdf2/Unicode.html#note-on-non-latin-languages
To take advantage of this though, we will need to implement and test changes to our software:
The text was updated successfully, but these errors were encountered: