Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverse display of trees for RTL text #4

Open
mr-martian opened this issue Jul 21, 2022 · 5 comments
Open

Reverse display of trees for RTL text #4

mr-martian opened this issue Jul 21, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@mr-martian
Copy link

The first sentence of Exodus (with random head service):
image

Text: וְאֵ֗לֶּה שְׁמֹות֙ בְּנֵ֣י יִשְׂרָאֵ֔ל הַבָּאִ֖ים מִצְרָ֑יְמָה אֵ֣ת יַעֲקֹ֔ב אִ֥ישׁ וּבֵיתֹ֖ו בָּֽאוּ׃

It would be nice if the tree would display RTL to match the text rather than LTR as it is currently.

Also, is there a reason for never displaying supertokens?

@amir-zeldes
Copy link
Contributor

I'm not sure how complex RTL support might be, I'll leave that to @lgessler and @lauren-lizzy-levine to answer, but regarding super tokens, something looks weird here - the tokens that are part of MWTs are not rendered at all and instead we see underscores...

I think the datamodel does actually support MWT, so this may be a bug. But aside from the underscores, I don't think that MWTs should necessarily be visualized in this view, since they can't be annotated and would mainly be distracting to visualize. If we end up implementing support for word tokenization in the segmentation module, it might make more sense to figure out MWT rendering there (since segmentation is the basic annotation operation with which you would edit MWTs - they have no POS, deprel or lemma)

@mr-martian
Copy link
Author

The underscores in this case are because I initially didn't have surface forms for components of MWTs in Ancient_Hebrew-PTNK, so that's an issue with my data rather than your code.

@amir-zeldes
Copy link
Contributor

Oh, phew, that's a relief to hear, then the system is working as expected. We should leave this issue open for the RTL thing, but for MWTs I don't think we'll add them in this particular view in the future (the segmenter may be a good place to represent them at some point).

@lgessler lgessler added the bug Something isn't working label Aug 6, 2022
@lgessler
Copy link
Collaborator

lgessler commented Aug 6, 2022

Hmm, suspect this would require medium effort. The fix would require detecting when a sentence is RTL and modifying the SVG rendering to accommodate. I unfortunately don't think we'll get to this soon, so in the meantime a transcription into an LTR script could be a workaround.

@amir-zeldes
Copy link
Contributor

Agreed, this is not a priority while there are a lot of other things to do on the system. For posterity I'll just add that I think RTL is implemented in Arborator now, and I remember dimly implementing it once before it was supported in the official arborator code, so if the algorithm is the same as Arborator's I probably have written down somewhere what code to change (it wasn't tons).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants