Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emojis/Grapheme clusters seem to be broken in pyte #131

Open
chubin opened this issue Apr 3, 2020 · 2 comments
Open

Emojis/Grapheme clusters seem to be broken in pyte #131

chubin opened this issue Apr 3, 2020 · 2 comments

Comments

@chubin
Copy link

chubin commented Apr 3, 2020

Consider this Python 3 code:

# -*- coding: utf-8 -*-

from __future__ import print_function, unicode_literals

import pyte

if __name__ == "__main__":
    emoji_string = "☁️"
    print(emoji_string.encode("utf-8").hex())
    print("---")

    screen = pyte.Screen(80, 24)
    stream = pyte.Stream(screen)
    stream.feed(emoji_string)
    for character in screen.display[0][:3]:
        print(character.encode("utf-8").hex())

emoji_string contains one grapheme cluster,
that is displayed like in terminal/editor/etc:

Screenshot_2020-04-03_14-39-04

This emoji is displayed as a single one, but it conists of two and.
Pyte seems to drop the second (the rest except the first part?) part of the cluster,
and so the output of the program looks like this:

e29881efb88f
---
e29881
20
20

We see that efb88f was dropped, and immediately after e29881, spaces follow (20).

Is it a bug in pyte or is it expected behaviour?
Maybe, I've missed some configuration mode?

@superbobry
Copy link
Collaborator

This is very likely a bug. Feel free to submit a PR ;)

@chubin
Copy link
Author

chubin commented Apr 12, 2020

I have written a small workaround for this problem, it works fine for me, but I don't think that it is a good solution for this bug.

That is how I do it:

  def _fix_graphemes(text):
      """
      Extract long graphemes sequences that can't be handled
      by pyte correctly because of the bug pyte#131.
      Graphemes are omited and replaced with placeholders,
      and returned as a list.
  
      Return:
          text_without_graphemes, graphemes
      """
  
      output = ""
      graphemes = []
  
      for gra in grapheme.graphemes(text):
          if len(gra) > 1:
              character = "!"
              graphemes.append(gra)
          else:
              character = gra
          output += character
  
      return output, graphemes

I extract the graphemes before rendering, like this:

text, graphemes = _fix_graphemes(text)

and then after rendering I put them back.

It works like it should, but I am not sure that this method is (1) general enough (2) good for pyte, because it introduces a new dependency: grapheme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants