Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

twitter: replace unicodeit with unicodeitplus #664

Open
GraemeWatt opened this issue Jun 16, 2023 · 1 comment
Open

twitter: replace unicodeit with unicodeitplus #664

GraemeWatt opened this issue Jun 16, 2023 · 1 comment
Assignees

Comments

@GraemeWatt
Copy link
Member

The new unicodeitplus package looks better suited than unicodeit to converting LaTeX expressions in paper titles to Unicode for the purpose of tweeting. It overcomes some of the limitations of UnicodeIt mentioned in svenkreiss/unicodeit#25. Switching over is a simple matter of replacing unicodeit.replace with unicodeitplus.parse. Most of the cleanup operations in the cleanup_latex function will no longer be needed, although unicodeitplus does not yet handle ~ or \rm. Before making the switch, it would be good to run some tests over all (or at least many) existing paper titles to identify remaining limitations of unicodeitplus.parse. I've already identified some problems with \sqrt having complicated arguments that I'll raise in a separate issue.

@GraemeWatt
Copy link
Member Author

I just wrote a Jupyter notebook that gets the titles of all (almost 10,000) HEPData records and compares the output from latex2text, unicodeit and unicodeitplus. I'll wait until a future release of unicodeitplus to address the remaining limitations before making the switch from unicodeit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: On Hold
Development

No branches or pull requests

1 participant