Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Produce intermediate representation from parsing #147

Open
abingham opened this issue Jul 7, 2021 · 6 comments
Open

Idea: Produce intermediate representation from parsing #147

abingham opened this issue Jul 7, 2021 · 6 comments

Comments

@abingham
Copy link

abingham commented Jul 7, 2021

The core of this proposal is to introduce an intermediate form of parsed data between the stream and the screen. Rather than the screen feeding its parsed results directly to the screen, it would generate a stream of objects representing the parsed data, and these could be forwarded to the Screen API and potentially other clients. This IR could also be stored, analyzed, replayed, etc.

This idea came out of some work I was doing to learn more about control codes. In particular, I borrowed heavily (stole) from pyte's Stream class in my Parser implementation. I think this kind of thing could be introduced to pyte with full backwards compatibility, and it would mean I wouldn't need to duplicate Stream. I know this by itself isn't a very compelling argument for modifying pyte, but it might be useful in pyte as well (e.g. I saw some issues related to improving debugging).

In any event, I thought I'd float the idea and see what you thought. I should be able to do most of the coding, though of course I'd appreciate any guidance you've got.

@huangyunict
Copy link

+1 to this idea. In current pyte implementation, the stream and screen are tightly coupled. It is difficult to inject a customized processor between stream and screen.

@milahu

This comment was marked as outdated.

@milahu
Copy link

milahu commented Apr 10, 2022

already possible: use a custom screen object

# pyte/test_parser.py
# python3 -m pyte.test_parser

# ansi color codes https://gist.github.com/Prakasaka/219fe5695beeb4d6311583e79933a009

#from pyte.screens import Screen, DiffScreen, HistoryScreen, DebugScreen
from .screens import Screen, DiffScreen, HistoryScreen, DebugScreen

#from pyte.streams import Stream, ByteStream
from .streams import Stream, ByteStream

terminal_width = 40
terminal_height = 4

class CustomScreen(Screen):
    def draw(self, *args):
        print("custom listener: draw", repr(args))
        super().draw(*args)
    def set_title(self, *args):
        print("custom listener: set_title", repr(args))
    def select_graphic_rendition(self, *args):
        print("custom listener: select_graphic_rendition", repr(args))

screen = CustomScreen(terminal_width, terminal_height)
stream = ByteStream(screen)

stream.feed(b"".join([
    b"\x1b", # esc = \e
    b"]", # osc
    b"2;new title", # params: 2, "new title"
    b"\x07", # bel = \a -> end of string

    b"\x1b", # esc
    b"[", # csi
    b"0;31", # params: 0, 31 -> red
    b"m", # select_graphic_rendition

    b"red", # text

    b"\x1b[0;32m", # esc csi green

    b"green", # text

    b"\x1b[0m", # reset style

    b"default", # text
]))

term_lines = screen.display[:] # copy array
for line_idx, line in enumerate(term_lines):
    print(f"{line_idx:4d} {line} ¶")

output

custom listener: set_title ('new title',)
custom listener: select_graphic_rendition (0, 31)
custom listener: draw ('red',)
custom listener: select_graphic_rendition (0, 32)
custom listener: draw ('green',)
custom listener: select_graphic_rendition (0,)
custom listener: draw ('default',)
   0 redgreendefault                          ¶
   1                                          ¶
   2                                          ¶
   3                                          ¶

@superbobry
Copy link
Collaborator

As @milahu points out, this should be doable without any changes to pyte.

The coupling between Stream and Screen is tight in a sense that the names of event handlers are fixed, but Stream does not assume anything about the implementation of Screen. So, you could have a custom Screen class which emits IR instructions instead of doing buffer manipulations. pyte.DebugScreen already does something like that, except that it logs the intercepted events to stderr.

@abingham
Copy link
Author

So, you could have a custom Screen class

This was exactly the approach I took at first. It turned out that didn’t give me everything I needed, though. In particular, the information about precisely which bytes were parsed for each call to a Screen method was lost. I suspect that pyte itself wouldn’t benefit greatly from providing this kind of information, though, so there may not be a compelling argument for making it here.

@milahu
Copy link

milahu commented Apr 10, 2022

precisely which bytes were parsed for each call to a Screen method

doable with near-zero overhead

https://github.com/milahu/pyte/tree/parser-pass-token-source

edit: fixed edgecase where token spans across two data buffers

$ git checkout master
$ BENCHMARK=tests/captured/htop.input python benchmark.py
htop.input->Screen: Mean +- std dev: 144 ms +- 5 ms
htop.input->DiffScreen: Mean +- std dev: 145 ms +- 5 ms
htop.input->HistoryScreen: Mean +- std dev: 378 ms +- 9 ms

$ git checkout parser-pass-token-source
$ BENCHMARK=tests/captured/htop.input python benchmark.py
htop.input->Screen: Mean +- std dev: 144 ms +- 5 ms
htop.input->DiffScreen: Mean +- std dev: 145 ms +- 4 ms
htop.input->HistoryScreen: Mean +- std dev: 379 ms +- 11 ms
example use
class CustomScreen(Screen):
    last_offset = 0
    def draw(self, *args, source=""):
        print("custom listener: draw", repr(args)) # source == args[0]
        super().draw(*args)
    def set_title(self, *args, source=""):
        print("custom listener: set_title", repr(args), "source", repr(source))
    def select_graphic_rendition(self, *args, source=""):
        print("custom listener: select_graphic_rendition", repr(args), "source", repr(source))

screen = CustomScreen(terminal_width, terminal_height)
stream = ByteStream(screen)

# ...
# same code as above

output

custom listener: set_title ('new title',) source '\x1b]2;new title\x07'
custom listener: select_graphic_rendition (0, 31) source '\x1b[0;31m'
custom listener: draw ('red',)
custom listener: select_graphic_rendition (0, 32) source '\x1b[0;32m'
custom listener: draw ('green',)
custom listener: select_graphic_rendition (0,) source '\x1b[0m'
custom listener: draw ('default',)
   0 redgreendefault                          ¶
   1                                          ¶
   2                                          ¶
   3                                          ¶

I suspect that pyte itself wouldn’t benefit greatly from providing this kind of information, though, so there may not be a compelling argument for making it here.

yepp, for pyte this is just wasted cpu time
but it would be nice to use the pyte source to compile such a parser
https://stackoverflow.com/questions/56487216/how-can-i-convert-python-code-into-a-parse-tree-and-back-into-the-original-code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants