Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming-friendly format #2

Open
Yoric opened this issue Aug 31, 2018 · 0 comments
Open

Streaming-friendly format #2

Yoric opened this issue Aug 31, 2018 · 0 comments

Comments

@Yoric
Copy link
Contributor

Yoric commented Aug 31, 2018

Since we intend to favor streaming parsing, we need to consider a format suited for streaming.

Strings + Lazy parsing

One of the problems we are going to encounter is the combination of strings and lazy parsing:

  • consider two independent lazy functions foo and bar, where bar is somewhere further down the stream from foo;
  • assume that foo defines a literal string s that does not show up in our AOT dictionary;
  • how should bar refer to s in such a way that we do not first need to parse foo?

One way to do this is the following:

  • divide the stream in packets;
  • each packet starts with a table of strings, which may now used by every packet further down the line.

If we do so, the packet containing foo will define literal string s. The packet containing bar will either be the same packet or a packet further down the line, and will be able to access s.

As a bonus, this will let us compress these strings table using a well-known algorithm, such as brotli.

Model State + Lazy Parsing

We will need to adapt our models to restart from a well-specified state whenever parsing a lazy function.

(TBD)

Offsets + Entropy + Streaming

We need the ability to tell the decoder where to fetch a lazy function. In non-entropy-coding versions, we could reference the actual offset at which a lazy function was encoded. With entropy coding, offsets make no sense.

A partial solution would be the following:

  • each packet may contain a number of (aligned) lazy declarations;
  • each packet's header declares the lazy declarations included in this packet (as keys, actual value of the key is an arbitrary string), with their starting-offset-in-packet;
  • when encoding a [lazy] field, we specify the key at which to find the content of the field;
  • note that a lazy declaration could span over several packets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant